Processing Data

Processing data is the act of moving the data into Nebula using required identification steps to ensure that data is appropriately linked.

Perform the following to ingest files into a Nebula repository using the Upload Files page:

  1. Select the collection of folders and files you want to upload.
  2. Assign custodians to the file collection.
  3. If you choose, you can also:

    (A) Rename the file collection.

    (B) Add notes to the file collection.

    (C) Assign a tracking ID.

    (D) Import the file collection directly into a matter.

    (E) Apply advanced options to filter the selected files that are ingested.

  4. Process the file collection.

Uploading folders and files

Files can be uploaded from a local computer or the project's dedicated file share.

To upload documents from a local computer

  1. Click Import > Processing.
  2. In the Upload Files section, do one of the following:
    • Drag and drop the files you want to upload from your computer into the Drop files here box.
    • Click choose file and select the files you want to upload from a local computer.

To upload documents from the project's dedicated file share

  1. Click Import > Processing.
  2. In the Upload Files section, click browse in file share.
  3. Use the File Browser to navigate within share locations associated with the repository for data to be processed
  4. Click Select.

Assigning Custodians

After selecting the folders files you want to upload, select the custodians for file collections. You can select a custodian for the entire file collection level or individual custodians for each folder and file. You can also assign new custodians to existing drop lists.

Note: "Generate_Based_On_Folder_Name" is available for folders selected from the share browser or for uploaded archives.

To assign a single custodian to all uploaded document sources

  1. Use the Assign Custodian drop list to select the custodian for all items in the Upload Files list.

To assign a custodian to each uploaded document source

  1. Use the Custodian name drop lists to select the custodian for each item in the Uploaded files list.

To add new custodians to drop lists

  1. If the custodian you want to assign does not appear in the Assign Custodian or Custodian name drop lists, click the Add icon .
  2. In the Add Custodian dialog box, type the Name of the new custodian and click Add.

Changing the File Collection Name

The collection name is a unique identifier for the file collection, similar to a batch name. It automatically defaults to the matter number and sequential code; however, you can overwrite this and create your own Collection Name.

Note: The Collection Name cannot be changed once the data is processed.

To change the Collection Name

  • In the Collection Name section, click the Update icon and then update the name.

Adding Notes

You have the option to attach notes to the collection of files you want ingested.

To add notes to a file collection

  1. In the Notes section, type the text you want attached to the file collection (optional).

Assigning a Tracking ID

You have the option to assign a tracking ID to the file collection you want ingested.

To assign a tracking ID of the file collection

  1. Select a Tracking ID for the file collection or add a new one (optional).

Selecting Data Provider

You have the option to selecting the matter of the file collection you want ingested.

Note: Selecting the matter name bypasses the culling process and places document directly into Review.

To select matter of the file collection

  1. Select the Data Providers ingesting the file collection (optional).

Applying Advanced Options

Advance options enable you to apply additional selection criteria to files prior to processing.

To apply advanced options

  1. On the Upload Files page, select the advanced options you want to apply from the following sections:
  2. Auto Promotion Options:

    • Remove Duplicates: Select to eliminate duplicate copies of repeating data within the newly processed batch.
    • Remove Exportable Duplicates: Select to eliminate repeating copies within the newly processed batch that duplicate documents from prior batches that were promoted to Review.
    • Matter: Select the "case" to which the uploaded files will be assigned.
    • Export Type: Select from the following:
    • Total: Export documents only.
    • Related: Export documents and related items, such as attachments.
    • Deduplication Type: Select from the following:
    • Global: Removes duplicate copies of repeating data across all custodians.
    • Custodian: Removes duplicate copies based on individual custodians.
    • Auxiliary Search: Select to restrict data to that based on the criteria of a saved search.

Advanced Promotion Options:

  1. Text Email Headers:
  2. Max Spreadsheet Size:
  3. Exclude Attached Images:
  4. Use System Date: Select to take system date into consideration when populating the Default Date attribute. (The logic to populate the attribute becomes Sent Date > Received Date > Last Modified Date > Created Date > System Date.)
  5. De-NIST: Select to exclude NIST documents.
  6. Ignore System Dates after Collection Date: Select to ignore any dates in the data set more recent than the date of processing.
  7. Explode Embedded: Select to save images embedded within files as additional files.
  8. Needs OCR JPG: Select file size setting identifying JPG image files as OCR candidates.
  9. Needs OCR PDF: Select file size setting identifying TIFF image files as OCR candidates.
  10. Needs OCR TIFF: Select character setting identifying PDF files as OCR candidates.
  11. Auto-Run

    • Run Keywords: Select to perform keyword searches. (Keywords will be highlighted in the document during Review.)
    • Run Detect Language: Select to determine which languages are present in file.
    • Run Sentiment Analysis: Select to enable Sentiment Document Types in the NLP Options section.
    • Run Named Entity Detection: Select to enable Entity Document Types and Entity Categories in the NLP Options section.

    NLP Options

    • Entity Document Types: Select the type of document on which NLP (Natural Language Processing) will be applied.
    • Entity Categories: Select the type of entity on which NLP will be applied.
    • Sentiment Document Types: Select the type of document on which NLP will be applied.

    OCR Options

    Language Options

    • Max Text Snippet Size: Select the increment of the maximum text snippet size from between 10 KB and 20 MB.
    • Language Probability: Select the threshold of certainty that must be reached in order to identify a language.
    • Detection Mode: Select from Single or Multiple
    • Primary Language: Select the predominant language in use.
    • Min File Size: Select the minimum size of files at which to run Language Identification.

    Container Options

    Only applies to Perceptive processing

    1. Container Split Chunk Size: Select method of processing PSTs in parallel chunks (either by percentage or number of messages).
    2. Container Split Threshold: Select the PSTs size threshold at which containers are split and processed in parallel.

Processing the Collection

After selecting the files to ingest, specifying custodians and other parameters, and, optionally, configuring any advanced options, you are ready to start processing the collection. After the processing task completes, Nebula displays the Import Details.

To process the collection

  1. After selecting the files to process and configuring the collection, click Start.