Searching for Phrases and Words
MQL supports searching for a phrase or a word using the following text matchers:
Matcher | Example | Description |
---|---|---|
Word | shipment |
Search for the word "shipment" in a transcript |
Quoted Term | "problem with a shipment" |
Search for the exact phrase "problem with a shipment" in a transcript |
Simple (wildcard) pattern | ship* |
Search for the words that begin with "ship", like "shipment", "shipping", as well as "ship" |
Regex pattern | R"ship(ment|ping)" |
Search for the words matching the regular expression, in this example, it matches the words "shipping" and "shipment". Note, with such a regular expression, ship doesn't have to be at the beginning of the word, i.e. it will match the word pre-shipment as well. To enforce a match on word boundaries, add \b (boundary of word) to the regular expression, for example, R"\bship(ment|ping)\b" . |
Word and Quoted Term matches
A Quoted Term matcher requires that the text matches literally to the search term. This is best demonstrated in the following examples.
"cancel order"
-
This Quoted Term expression matches the phrase "cancel order", but not "cancel my order".
To overcome such a limit, you can use the operator
OR
to list all the variants of a phrase, like:"cancel order" OR "cancel my order" OR "cancel this order"
cancel ONEAR:5 order
-
This expression consists of two Word matchers (the words cancel and order), with a proximity operator ONEAR:5 between them.
The
ONEAR:5
operator instructs the search engine to find the word "cancel" followed by the word "order", with a distance between these two words no more than 5.Such an expression matches phrases "cancel order", "cancel my order", "cancel my recent order", etc.
Note 1. The operator
ONEAR
is order-dependent, i.e. the word "cancel" must appear in a transcript before the word "order". There is an alternative operatorNEAR
that is order-independent.Note 2. The operator
ONEAR:5
can be omitted because it is a default operator in MQL expressions, i.e. the expressioncancel order
is the same ascancel ONEAR:5 order
. cancel NEAR:5 order
-
This expression uses an order-independent operator NEAR, which instructs the search engine to find the words "cancel" and "order", that appear in a transcript close to each other (distance up to 5 words), the order of appearance of the searched words is not important.
Such an expression matches "cancel my order" as well as "order that I want to cancel", where words appear in reverse order.
Case insensitiveness
Both Word and Quoted Term matchers are case-insensitive, e.g. the expression order
will match "order", "Order", "ORDER" and "oRDER".
Escaping a quote character
To search for a quote symbol ("
) literally, repeat it twice ""
.
Example:
"foo "" bar"
-
This expression matches the phrase foo " bar.
Note, the double ""
is supported in a Quoted Term only. It is a syntax error to use a quote inside a Word matcher, like foo""bar
,
but it is ok to use it in a Quoted Term matcher, like "foo""bar"
.
Simple (wildcard) pattern
A Word matcher supports wildcard pattern matching.
The following table describes wildcard patterns, listing the pattern and its use.
Pattern | Use | Example |
---|---|---|
* |
Match zero or more characters | bl* matches bl, black, blue, and blob |
? |
Match exactly one occurrance of any character | h?t finds hot, hat, and hit |
[abc] |
Match one occurance of the characters a, b, or c. | h[oa]t finds hot and hat, but not hit |
[!az] |
Match any characters except a or z | h[!oa]t finds hit, but not hot and hat |
[a-c] |
Match one occurance of a character between a and c | c[a-c]t finds cat and cbt, but not cut |
Note
The wildcard patterns are supported in a Word matcher only. A Quoted Term interprets those symbols literally.
For example bl*
is a pattern match, but "bl*"
is the exact match.
Such a difference between Word and Quoted Term matchers is useful when you need to search for one of the wildcard symbols literally in a text.
For example, to find an exclamation point in a text, use a Quoted Term expression, like "Great!"
Regular expression (REGEX) pattern
To match complex text patterns, use Regular expressions (REGEX).
The regular expression must be enclosed into R"
and "
characters. Examples:
Pattern | Use |
---|---|
R"[0-9]+ |
Match one of more digits in a text |
R"ship(ment|ping) |
Match words "shipment" and "shipping" |
To match a quote ("
) character in a regular expression, include it twice, like R"foo""bar"
.
MiaRec supports standard regular expression patterns.
A regular expression may use any of the following metacharacters:
.
-
Matches any single character. For example:
a.c
... will match "abc", but not "ac" or "abbc"
[]
-
A bracket expression. Matches a single character that is contained within the brackets. For example:
[abc]
... will match "a", "b" or "c".
[hc]at
... will match "hat" and "cat".
A
-
character between two other characters forms a range that matches all characters from the first character to the second. For example:[0-9]
... will match any decimal digit.
[a-z]
... will match any lowercase letter from "a" to "z".
These forms can be mixed:
[abcx-z]
... will match "a", "b", "c", "x", "y" or "z".
To include a literal
-
character, it must be written first or last, for example,[abc-]
,[-abc]
.To include a literal
]
character, it must immediately follow the opening bracket[
, for example,[]abc]
. [^ ]
-
Matches a single character that is not contained within the brackets. For example:
[^abc]
... will match any character other than "a", "b", or "c".
[^a-z]
... will match any single character that is not a lowercase letter from "a" to "z".
As above, literal characters and ranges can be mixed, like
[^abcx-z]
*
-
Matches the preceding element zero or more times. For example:
a*c
... will match "ac", "abc", "abbbc" etc.
... will match "" (empty string), "0", "1", "2", "14", "502", "98541654", and so on (any combination of digits).[0-9]*
( )*
-
Matches zero of more instances of the characters sequence, specified inside parentheses. For example:
(ab)*
... will match "", "ab", "abab", "ababab", and so on.
(1234)*
... will match "", "1234", "12341234", "123412341234", and so on.
+
-
Matches the preceding element one or more times. For example:
ba+
... will match "ba", "baa", "baaa", and so on.
0[0-9]+
... will match "00", "01", "02", "001", "01234", "09876543210", or any other combination of digits with preceding 0 and minimum length equal to two characters.
?
-
Matches the preceding element zero or one time. For example:
ba?
... will match "b", or "ba", but not "baa"
0[0-9]?
... will match "0", "01", "02", "03", and so on.
|
-
The choice (aka alternation or set union) operator matches either the expression before or the expression after the operator. For example:
abc|def
... will match "abc" or "def".
(0|011)[1-9]+
... will match phone number, which starts with either 0 or 011.
{n}
-
Matches the preceding element exactly n times. For example:
a{3}
... will match "aaa", but not "a", "aa" or "aaaa"
[0-9]{5}
... will match "01234", "56789" or any other combination of digits, which has lenght 5 characters.
{m, n}
-
Matches the preceding element at least m and not more than n times. For example:
a{3,5}
... will match "aaa", "aaaa", "aaaaa", but not "aa" or "aaaaaaaa".
{m, }
-
Matches the preceding element at least m times. For example:
a{2,}
... will match "aa", "aaa", "aaaa", and so on.
^
-
Matches the beginning of a string. For example:
^[hc]at
... will match "hat" and "cat", but only at the beginning of the string
$
-
Matches the end of a string. For example:
[hc]at$
... will match "hat" and "cat", but only at the end of the string
^[hc]at$
... will match "hat" and "cat", but only when the string contains no other characters
\
-
Backslash (
\
) character is used for escaping metacharacters. For example:1+2
... will match "12", "112", "11112", but not "1+2", because "plus" character has a special meaning (see above).
1\+2
... will match exactly "1+2". In this example, "plus" character is escaped with backslash character (
\+
).