Detecting Language

Language ID is a process that attempts to identify the language(s) in documents. Documents with very little text or with mostly numbers (such as spreadsheets) tend to make poor candidates for language identification. The Language ID tool can be configured to detect the predominant language or to try to detect multiple languages with a document.

Detected languages appear in the Languages section of the Document Viewer Coding Pane.

To detect language

  1. On the Review Document List, click the Action icon and select Language ID.
  2. On the Language ID dialog box, enter the following information on the Language Options tab:
    • Max Text Snippet Size: 10 KB to 20 MB
    • Language Probability: 0.05 to 0.99.
    • Detection Mode: Single or Multiple.
    • Select to Detect OCRed Documents. Clear to include.
    • Detect Languages in only undetected documents or all documents.
    • Min File Size: 10 Bytes to 300 Bytes.
    • Short to Normal Threshold: 50 Bytes to 1000 Bytes.
    • Select to Ignore Spreadsheets. Clear to include.
  3. On the Distribution tab, select one of the following :
    • Default Strategy (Process on any available Workers): Select to distribute documents amongst available workers.
    • Select Workers (Process with specific Workers): Select the workers to receive the distributed documents.
  4. Click Save.