Searching for Phrases and Words
MQL supports searching for a phrase or a word using the following text matchers:
| Matcher | Example | Description |
|---|---|---|
| Word | shipment |
Search for the word "shipment" in a transcript |
| Quoted Term | "problem with a shipment" |
Search for the exact phrase "problem with a shipment" in a transcript |
| Simple (wildcard) pattern | ship* |
Search for the words that begin with "ship", like "shipment", "shipping", as well as "ship" |
| Regex pattern | R"ship(ment|ping)" |
Search for the words matching the regular expression, in this example, it matches the words "shipping" and "shipment". Note, with such a regular expression, ship doesn't have to be at the beginning of the word, i.e. it will match the word pre-shipment as well. To enforce a match on word boundaries, add \b (boundary of word) to the regular expression, for example, R"\bship(ment|ping)\b". |
Word and Quoted Term matches
A Quoted Term matcher requires that the text matches literally to the search term. This is best demonstrated in the following examples.
"cancel order"-
This Quoted Term expression matches the phrase "cancel order", but not "cancel my order".
To overcome such a limit, you can use the operator
ORto list all the variants of a phrase, like:"cancel order" OR "cancel my order" OR "cancel this order" cancel ONEAR:5 order-
This expression consists of two Word matchers (the words cancel and order), with a proximity operator ONEAR:5 between them.
The
ONEAR:5operator instructs the search engine to find the word "cancel" followed by the word "order", with a distance between these two words no more than 5.Such an expression matches phrases "cancel order", "cancel my order", "cancel my recent order", etc.
Note 1. The operator
ONEARis order-dependent, i.e. the word "cancel" must appear in a transcript before the word "order". There is an alternative operatorNEARthat is order-independent.Note 2. The operator
ONEAR:5can be omitted because it is a default operator in MQL expressions, i.e. the expressioncancel orderis the same ascancel ONEAR:5 order. cancel NEAR:5 order-
This expression uses an order-independent operator NEAR, which instructs the search engine to find the words "cancel" and "order", that appear in a transcript close to each other (distance up to 5 words), the order of appearance of the searched words is not important.
Such an expression matches "cancel my order" as well as "order that I want to cancel", where words appear in reverse order.
Case insensitiveness
Both Word and Quoted Term matchers are case-insensitive, e.g. the expression order will match "order", "Order", "ORDER" and "oRDER".
Escaping a quote character
To search for a quote symbol (") literally, repeat it twice "".
Example:
"foo "" bar"-
This expression matches the phrase foo " bar.
Note, the double "" is supported in a Quoted Term only. It is a syntax error to use a quote inside a Word matcher, like foo""bar,
but it is ok to use it in a Quoted Term matcher, like "foo""bar".
Simple (wildcard) pattern
A Word matcher supports wildcard pattern matching.
The following table describes wildcard patterns, listing the pattern and its use.
| Pattern | Use | Example |
|---|---|---|
* |
Match zero or more characters | bl* matches bl, black, blue, and blob |
? |
Match exactly one occurrance of any character | h?t finds hot, hat, and hit |
[abc] |
Match one occurance of the characters a, b, or c. | h[oa]t finds hot and hat, but not hit |
[!az] |
Match any characters except a or z | h[!oa]t finds hit, but not hot and hat |
[a-c] |
Match one occurance of a character between a and c | c[a-c]t finds cat and cbt, but not cut |
Note
The wildcard patterns are supported in a Word matcher only. A Quoted Term interprets those symbols literally.
For example bl* is a pattern match, but "bl*" is the exact match.
Such a difference between Word and Quoted Term matchers is useful when you need to search for one of the wildcard symbols literally in a text.
For example, to find an exclamation point in a text, use a Quoted Term expression, like "Great!"
Regular expression (REGEX) pattern
To match complex text patterns, use Regular expressions (REGEX).
The regular expression must be enclosed into R" and " characters. Examples:
| Pattern | Use |
|---|---|
R"[0-9]+ |
Match one of more digits in a text |
R"ship(ment|ping) |
Match words "shipment" and "shipping" |
To match a quote (") character in a regular expression, include it twice, like R"foo""bar".
MiaRec supports standard regular expression patterns.
A regular expression may use any of the following metacharacters:
.-
Matches any single character. For example:
a.c... will match "abc", but not "ac" or "abbc"
[]-
A bracket expression. Matches a single character that is contained within the brackets. For example:
[abc]... will match "a", "b" or "c".
[hc]at... will match "hat" and "cat".
A
-character between two other characters forms a range that matches all characters from the first character to the second. For example:[0-9]... will match any decimal digit.
[a-z]... will match any lowercase letter from "a" to "z".
These forms can be mixed:
[abcx-z]... will match "a", "b", "c", "x", "y" or "z".
To include a literal
-character, it must be written first or last, for example,[abc-],[-abc].To include a literal
]character, it must immediately follow the opening bracket[, for example,[]abc]. [^ ]-
Matches a single character that is not contained within the brackets. For example:
[^abc]... will match any character other than "a", "b", or "c".
[^a-z]... will match any single character that is not a lowercase letter from "a" to "z".
As above, literal characters and ranges can be mixed, like
[^abcx-z] *-
Matches the preceding element zero or more times. For example:
a*c... will match "ac", "abc", "abbbc" etc.
... will match "" (empty string), "0", "1", "2", "14", "502", "98541654", and so on (any combination of digits).[0-9]* ( )*-
Matches zero of more instances of the characters sequence, specified inside parentheses. For example:
(ab)*... will match "", "ab", "abab", "ababab", and so on.
(1234)*... will match "", "1234", "12341234", "123412341234", and so on.
+-
Matches the preceding element one or more times. For example:
ba+... will match "ba", "baa", "baaa", and so on.
0[0-9]+... will match "00", "01", "02", "001", "01234", "09876543210", or any other combination of digits with preceding 0 and minimum length equal to two characters.
?-
Matches the preceding element zero or one time. For example:
ba?... will match "b", or "ba", but not "baa"
0[0-9]?... will match "0", "01", "02", "03", and so on.
|-
The choice (aka alternation or set union) operator matches either the expression before or the expression after the operator. For example:
abc|def... will match "abc" or "def".
(0|011)[1-9]+... will match phone number, which starts with either 0 or 011.
{n}-
Matches the preceding element exactly n times. For example:
a{3}... will match "aaa", but not "a", "aa" or "aaaa"
[0-9]{5}... will match "01234", "56789" or any other combination of digits, which has lenght 5 characters.
{m, n}-
Matches the preceding element at least m and not more than n times. For example:
a{3,5}... will match "aaa", "aaaa", "aaaaa", but not "aa" or "aaaaaaaa".
{m, }-
Matches the preceding element at least m times. For example:
a{2,}... will match "aa", "aaa", "aaaa", and so on.
^-
Matches the beginning of a string. For example:
^[hc]at... will match "hat" and "cat", but only at the beginning of the string
$-
Matches the end of a string. For example:
[hc]at$... will match "hat" and "cat", but only at the end of the string
^[hc]at$... will match "hat" and "cat", but only when the string contains no other characters
\-
Backslash (
\) character is used for escaping metacharacters. For example:1+2... will match "12", "112", "11112", but not "1+2", because "plus" character has a special meaning (see above).
1\+2... will match exactly "1+2". In this example, "plus" character is escaped with backslash character (
\+).