In Part II our introduction to VoiceXML 2.0 grammars, we will learn how to use tokens, rules and operators to create grammars that match natural utterances.
In Part II of our introduction to VoiceXML 2.0 grammars, we
will learn how to use tokens, rules and operators to create grammars
that match natural utterances.
Grammars match spoken words or touch-tone digits. These words are
referred to as tokens. The simplest grammars are token strings
composed of one or more words. For example, we might create an
inline grammar that matches my first and last name.
Inline grammars are embedded within VoiceXML code instead of
external files. By default, words that aren't marked with a special grammar
symbol (ABNF) or are included in a grammar element (XML) are treated
as tokens. Therefore, we could have excluded the <token>
element in the XML grammar above and it would have performed exactly
In the example above, my full name is a sequence of tokens
(Jonathan followed by Eisenzopf). The Automatic Speech Recognition
(ASR) engine will only recognize my full name in the specified
I've also included an optional method for encapsulating a
sequence of tokens using the <item> element. The end
result is identical to using the <token> element,
however, I wanted to show it to you now because we will be using the
<item> element later in the tutorial.
Grammars often consist of sub-grammars. This allows us to
define re-usable grammar components, such as a phone number.
These sub-grammars are included into other grammars via a rule
reference. A rule reference can point to a local grammar, or an
external grammar rule contained in another file, or even on another
server on the Internet. For example, we may want to create a
sub-grammar that contains all possible first names and include it in
a top-level grammar:
<ruleref uri="#firstName"/> Eisenzopf
The local sub-grammar named firstName is being referenced
in the grammar above. The sub-grammar is local because it's
contained in the same grammar file, however, we could also have
referenced the sub-grammar if it were in a different file by
specifying the full URI of the grammar file:
Grammar files consist of one or more grammar rules. Each rule is
defined by a unique name. Rule names cannot contain a period, colon,
or hyphen character and cannot be named NULL, VOID, or GARBAGE. Rule
names are also case sensitive. To continue expanding on the
name example above, let's create the rule referenced above as firstName.
$firstName = Jonathan;
The unique rule name in ABNF grammars is defined by the character
string to the right of the $ character. This particular rule
is very simple in that it will only match my first name, Jonathan.
XML grammars define rules using the <rule> element.
The unique rule name is contained in the id attribute.
Grammar rule scope
By default, VoiceXML 2.0 grammar rules are private. This
means that rules can only be referenced within the same grammar
file. If we wanted a VoiceXML dialog or another grammar to reference
a grammar rule, we need to specifically scope it as public.
public $firstName = Jonathan;
<rule id="firstName" scope="private">Jonathan</rule>
To scope a grammar rule as public or private in ABNF grammars,
pre-pend (public or private) to the rule definition.
For XML grammars, include the scope attribute to the <rule>
element where the value is either public or private.
One-of (or lists)
So far, our name grammar is not very useful because it only
matches my first and last name. What we want to be able to do is to
expand the list of possible names to include last names for people
whose first name is Jonathan.
We can do this by creating a name rule that includes a rule
reference to a list of first and last names.
$name = $firstName $lastName;
$firstName = Jonathan | Jeff;
$lastName = Eisenzopf | Franklin | Smith;
As you can see, the pipe character is the delimiter for
alternative utterances in SRGS ABNF grammars. The lastName
grammars will match one of Eisenzopf, Franklin or Smith.
For XML grammars, the <one-of> element may contain
one or more <item> elements which contain a string (or
token sequebce) for each alternate utterance.
The name grammar combines the firstName and then
the lastName grammar via rule references to create a full
name grammar, which is capable of recognizing a combination of first
and last names.
The list of possible utterances that the namegrammar
could match are as follows:
- Jonathan Eisenzopf
- Jonathan Franklin
- Jonathan Smith
- Jeff Eisenzopf
- Jeff Franklin
- Jeff Smith
The Speech Recognition Grammar Specification (SRGS) also includes
some very useful operators that allow us to create complex word
patterns that reflect natural language by defining grammar tokens
and rule references as optional and/or repeatable.
Making Tokens Optional
Since callers may response to a prompt using different words,
grammars must be able to define optional words. For example, when
asked to say their name a caller might say any one of the following:
- "Um, my name is Jonathan"
- "My name is Jonathan Eisenzopf"
- "Um, yeah, well, I'm Jonathan Eisenzopf"
We need to be able to identify and capture all of the words that
were uttered in addition to the name. Also, notice that the caller
might only give us their first name, so the last name might also be
$name = [um [yeah well]] ([my name is] | [I'm])
<item repeat="0-1">yeah well</item>
<item repeat="0-1">my name is</item>
For ABNF grammars, optional tokens are defined by surrounding them
with a set of square brackets. Optional tokens can also be grouped.
In the grammar above, the first set of outside brackets will
optionally match "um" or "um yeah
well". Following the first set of optional tokens, we have
used parenthesis to group an optional list of alternative
utterances. The grouping operator (parentheses) are only used in
ABNF grammara because XML grammar are explicitly defined. This
grammar phrase will optionally match "my name is" or
"I'm" or nothing (because they're optional).
Additionally, this grammar will match a first or first and last name
because the $lastName grammar is surrounded by square
Unlike ABNF grammars XML grammars use the repeat attribute
within the <item> element to define optional tokens and
rule expansions. This is done by setting the value of the repeat
attribute to 0-1 which means "zero or one." If you
remember earlier, the <item> element can be used to
encapsulate token sequences. In the example above, we enclose the
item containing the token sequence "yeah well" within the
item containing the token "um". The repeat
attribute in both is set to 0-1 which means that the optional
utterance "um" may be followed by the optional
utterance "yeah well". Next, we match the optional
list of utterances, "my name is" or "I'm"
by enclosing them in a <one-of> element. Lastly, we
include the firstName grammar with <ruleref>
and make the lastName grammar optional by enclosing the
associated <ruleref> in a <item> element
whose repeat attribute is set to 0-1.
Zero or More
We can match zero or more instances of a token by appending the
repeat operator after an ABNF sequence or by setting the repeat
attribute of the <item> element to 0-.
$mood = I am very <0-> happy;
<rule id="mood">I am
The repeat operator for ABNF grammars are enclosed with <>.
The syntax for XML grammars is very similar to defining optional
The example grammar above will match any of the following
- "I am happy"
- "I am very happy"
- "I am very very very very very happy"
One or More
This is very similar to zero or more except that the token must
occur at least one time in the utterance.
$mood = I am very <1-> happy;
<rule id="mood">I am
So, the example above would match any of the following grammars:
- "I am very happy"
- I am very very very happy"
But it would not match:
Token Ranges and Exact Matches
We can also specify a range of instances or an exact number of
instances that a token can occur in an utterance.
$eat = Please <1-5> eat your food;
$eat = Please <5> eat your food;
$eat = Please <5-> eat your food;
eat your food
eat your food
eat your food
The first example would match one to five instances of the word please.
The second example will match exactly five instances and the last
example will match at least five instances of the word please.
If you have absorbed the content of this tutorial, then you will
be able to create almost any VoiceXML 2.0 grammar that's required.
In the next tutorial, we will learn some of the finer details of
About Jonathan Eisenzopf
Jonathan is a Senior Partner of The Ferrum Group, LLC
which provides speech IVR consulting, training, and professional
services. Feel free to send an email to firstname.lastname@example.org
regarding questions or comments about this or any article.