Detecting Language (Cull)

Language ID is a process that attempts to identify the language(s) in documents. Documents with very little text or with mostly numbers (such as spreadsheets) tend to make poor candidates for language identification. The Language ID tool can be configured to detect the predominant language or to try to detect multiple languages with a document.

To detect language

  1. On the Cull Document List, click the Action icon and select Language ID.
  2. On the Language ID dialog box, enter the following information on the Language Options tab:
    • Max Text Snippet Size: 10 KB to 20 MB
    • Language Probability: 0.05 to 0.99.
    • Detection Mode: Single or Multiple.
    • Select to Detect OCRed Documents. Clear to include.
    • Detect Languages in only undetected documents or all documents.
    • Min File Size: 10 Bytes to 300 Bytes.
    • Short to Normal Threshold: 50 Bytes to 1000 Bytes.
    • Select to Ignore Spreadsheets. Clear to include.
  3. Click Save.