Searching for Named Entities
Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, time expressions, quantities, monetary values, etc.
MiaRec voice analytics automatically extract the following names entity classes from a transcript:
Table 1. Supported named entity classes
Named entity class | Description |
---|---|
#CARDINAL |
Numerals that do not fall under another type |
#DATE |
Absolute or relative dates or periods |
#EVENT |
Named hurricanes, battles, wars, sports events, etc. |
#FAC |
Buildings, airports, highways, bridges, etc. |
#GPE |
Countries, cities, states |
#LANGUAGE |
Any named language |
#LAW |
Named documents made into laws. |
#LOC |
Non-GPE locations, mountain ranges, bodies of water |
#MONEY |
Monetary values, including unit |
#NORP |
Nationalities or religious or political groups |
#ORDINAL |
"first", "second", etc. |
#ORG |
Companies, agencies, institutions, etc. |
#PERCENT |
Percentage, including "%" |
#PERSON |
People, including fictional |
#PRODUCT |
Objects, vehicles, foods, etc. (not services) |
#QUANTITY |
Measurements, as of weight or distance |
#TIME |
Times smaller than a day |
#WORK_OF_ART |
Titles of books, songs, etc. |
Using NER classes in MQL expressions
Named entity classes can be included in MQL expression.
For example, the class #PERSON
can be used in data redaction expression to automatically remove person names from audio recordings and transcript.
Another sample expression
R"[0-9]+" NOTIN #MONEY
In the above example, we are searching for digits 0 to 9 (using the Regex pattern [0-9]+
), but not if they are found inside a text labeled with MONEY class.