Natural Language Processing

Natural language processing (NLP) is a range of computational techniques for the analysis and understanding of human language.

Note: For additional information about NLP, please see Utilizing Nebula NLP to Derive Data Intelligence in the Nebula Tech Blog.

Nebula provides the following NLP technologies:

  • Sentiment Analysis (SA): Analyzes text to determine whether it contains a positive or negative sentiment. Nebula document sets are analyzed at the sentence level and the results aggregated. Each document has a maximum high positive score and a maximum low positive score.
  • Score Sentiment
    < 50 Negative
    = 50 Neutral
    > 50 Positive
  • Named Entity Recognition (NER): Analyzes text in document sets to locate and classify entities into pre-defined categories (such as the names of persons, organizations, locations, products, events, laws, and so on).
  • Personally Identifiable Information and Protected Health Information Detection and Extraction (PII/PHI) : Analyzes text in document sets to detect PII/PHI information (such as driver's license number, IT address, passport number, credit card number, and so on). PII/PHI Detection can be configured at the Repository level to focus only on relevant geographic regions and specified types of PII/PHI categories. A document's PII/PHI Categories can be displayed on the Document List in Cull and Review.

NLP Setup and Use

The following outline provides the general steps involved with using NLP in Nebula:

A. Setup NLP in Nebula

NLP can be configured either when importing documents or after the document collection has been imported.

To setup NLP when importing documents

When ingesting files with Import, apply the following advanced options:

To setup Sentiment Analysis:

Auto-Run section

  • Run Sentiment Analysis: Select to enable Sentiment Analysis in the NLP Options section.

NLP Options section

  • Sentiment Analysis: Select the type of document on which Sentiment Analysis will be applied.

To setup Named Entity Recognition:

Auto-Run section

  • Run Entity Recognition: Select to enable Entity Recognition the NLP Options section.

NLP Options section

  • Entity Recognition: Select the type of document on which Named Entity Recognition will be applied.

To setup PII/PHI Detection

Auto-Run section

  • Run PII/PHI Detection: Select to enable PII/PHI Detection in the NLP Options section.

NLP Options section

  • PII/PHI Detection: Select the type of document on which PII/PHI Detection and Extraction will be applied.

To set NLP after document collection has been imported

  1. On the Import History page, locate the document set you want to analyze with NLP, click its Action icon and select NLP Processing.
  2. On the Natural Language Processing dialog box, select the type of NLP process you want to apply (Entity Recognition, PII/PHI Detection, Sentiment Analysis), then choose its Application Categories.
  3. To include documents added to the collection since the last time it was processed, select Reprocess Existing Data.
  4. Click Start.

B. View and filter the results of the NLP analysis

On the Cull Dashboard, review the results in the NLP tab. A bubble chart categorizes the recognized names entities by size to reflect the prevalence of these entities and by color to indicate its category.

Hover over a named entity bubble to view its category and the number of documents that contain that entity.

You can filter the results by sentiment analysis range, category, and named entity.

To filter by Sentiment

  1. Click and drag the upper and lower Sentiment slide bars to create a sentiment range between the maximum high positive score and a maximum low positive score. The selected range appears as a filter chip.

To filter by Category

  1. Select Categories to display only the corresponding named entities (you can select more than one).

To filter by named entity

  1. Click the named entity bubble you want used to filter.
  2. Note: Selected bubbles are a darker shade of the category color.

    To view the selected named entities, hover over the chip.

C. Apply the filters

Searching for NLPs

Method 1:

  1. After selecting the filters to apply using the NLP tab, click Filter.
  2. On the Cull Document List, click the Action icon and select Modify.
  3. View the search criteria for the filtered NLP selections.
  4. Click Search.

Method 2:

  • With the Search Builder, use the following search filters to locate NLP documents by sentiment score and/or named entity:
  • Field Operator      

    NLP Sentiment Score

    Equals

    Not Equals

    Less Than OR Equal To

    Greater Than OR Equal TO

    From/Though

    Not Between

    Is Set

    Is Not Set

    Enter Sentiment Score

    NLP Entities

    Equals
    Does Not Equal

    Is Set

    Is Not Set

    Select Choice:

    Organizations

    Products

    Facilities

    Laws

    Events

    Locations

    People

    Select Choices:

     

    PII/PHI Detectors

    Equals

    Does Not Equal

    Is Set

    Is Not Set

    Select Choice Categories:

    Bank account number

    Contact information

    Credit card number

    Crytocurrency wallet address

    Driver's license number

    National citizen/resident ID number

    National medical/insurance ID number

    National Tax ID number

    Passport number