Natural Language Processing
Natural language processing (NLP) is a range of computational techniques for the analysis and understanding of human language.
Note: For additional information about NLP, please see Utilizing Nebula NLP to Derive Data Intelligence in the Nebula Tech Blog.
Nebula provides the following NLP technologies:
- Sentiment Analysis (SA): Analyzes text to determine whether it contains a positive or negative sentiment. Nebula document sets are analyzed at the sentence level and the results aggregated. Each document has a maximum high positive score and a maximum low positive score.
- Named Entity Recognition (NER): Analyzes text in document sets to locate and classify entities into pre-defined categories (such as the names of persons, organizations, locations, products, events, laws, and so on).
- Personally Identifiable Information and Protected Health Information Detection and Extraction (PII/PHI) : Analyzes text in document sets to detect PII/PHI information (such as driver's license number, IT address, passport number, credit card number, and so on). PII/PHI Detection can be configured at the Repository level to focus only on relevant geographic regions and specified types of PII/PHI categories. A document's PII/PHI Categories can be displayed on the Document List in Cull and Review.
Score | Sentiment |
---|---|
< 50 | Negative |
= 50 | Neutral |
> 50 | Positive |
NLP Setup and Use
The following outline provides the general steps involved with using NLP in Nebula:
A. Setup NLP in Nebula
NLP can be configured either when importing documents or after the document collection has been imported.
To setup NLP when importing documents
When ingesting files with Import, apply the following advanced options:
To setup Sentiment Analysis:
Auto-Run section
- Run Sentiment Analysis: Select to enable Sentiment Analysis in the NLP Options section.
NLP Options section
- Sentiment Analysis: Select the type of document on which Sentiment Analysis will be applied.
To setup Named Entity Recognition:
Auto-Run section
- Run Entity Recognition: Select to enable Entity Recognition the NLP Options section.
NLP Options section
- Entity Recognition: Select the type of document on which Named Entity Recognition will be applied.
To setup PII/PHI Detection
Auto-Run section
- Run PII/PHI Detection: Select to enable PII/PHI Detection in the NLP Options section.
NLP Options section
- PII/PHI Detection: Select the type of document on which PII/PHI Detection and Extraction will be applied.
To set NLP after document collection has been imported
- On the Import History page, locate the document set you want to analyze with NLP, click its Action icon
and select NLP Processing.
- On the Natural Language Processing dialog box, select the type of NLP process you want to apply (Entity Recognition, PII/PHI Detection, Sentiment Analysis), then choose its Application Categories.
- To include documents added to the collection since the last time it was processed, select Reprocess Existing Data.
- Click Start.
B. View and filter the results of the NLP analysis
On the Cull Dashboard, review the results in the NLP tab. A bubble chart categorizes the recognized names entities by size to reflect the prevalence of these entities and by color to indicate its category.
Hover over a named entity bubble to view its category and the number of documents that contain that entity.
You can filter the results by sentiment analysis range, category, and named entity.
To filter by Sentiment
- Click and drag the upper and lower Sentiment slide bars to create a sentiment range between the maximum high positive score and a maximum low positive score. The selected range appears as a filter chip.
To filter by Category
- Select Categories to display only the corresponding named entities (you can select more than one).
To filter by named entity
- Click the named entity bubble you want used to filter.
Note: Selected bubbles are a darker shade of the category color.
To view the selected named entities, hover over the chip.
C. Apply the filters
Searching for NLPs
Method 1:
- After selecting the filters to apply using the NLP tab, click Filter.
- On the Cull Document List, click the Action icon
and select Modify.
- View the search criteria for the filtered NLP selections.
- Click Search.
Method 2:
- With the Search Builder, use the following search filters to locate NLP documents by sentiment score and/or named entity:
Field | Operator | |||
---|---|---|---|---|
NLP Sentiment Score |
Equals Not Equals Less Than OR Equal To Greater Than OR Equal TO From/Though Not Between Is Set Is Not Set |
Enter Sentiment Score | ||
NLP Entities |
Equals
Is Set Is Not Set |
Select Choice: Organizations Products Facilities Laws Events Locations People |
Select Choices:
|
|
PII/PHI Detectors |
Equals Does Not Equal Is Set Is Not Set |
Select Choice Categories: Bank account number Contact information Credit card number Crytocurrency wallet address Driver's license number National citizen/resident ID number National medical/insurance ID number National Tax ID number Passport number
|