VoiceXML 2.0 Grammars, Part I

Monday Oct 7th 2002 by Jonathan Eisenzopf
Share:

This technical series will provide programmers with a complete introduction to the VoiceXML 2.0 grammar format. In part I, we will discuss the XML and ABNF formats, as well as the structure and elements included in a VXML 2.0 document.

This technical series will provide programmers with a complete introduction to the VoiceXML 2.0 grammar format. In part I, we will discuss the XML and ABNF formats, as well as the structure and elements included in a VXML 2.0 document.

Overview

Grammars define the words and sentences (or touch-tone DTMF input) that can be recognized by a VoiceXML application. One big drawback of VoiceXML 1.0 was that it lacked a standard speech recognition grammar format. To some degree, this reduced the benefits of the specification because it left the burden on VoiceXML browser developers to define the grammar language and format. For example, application grammars written for Nuance Voice Web Server would have to be re-written to work on IBM Voice Server. This problem was rectified with the Speech Recongition Grammar Specification (SRGS) introduced by the W3C Voice Browser group in conjunction with the VoiceXML 2.0 specification.

XML or ABNF?

The VoiceXML 2.0 grammar specification provides two text formats for writing speech recognition grammars: XML or ABNF. XML is a Web standard for representing structured data. Many programming and editing tools incorporate XML editing and processing capabilities. These XML tools can be used to write VoiceXML 2.0 grammars. ABNF stands for Augmented Bacus-Naur Form, and is a format used to specify languages, protocols and text formats. For example HTTP, the communications protocol used on the World Wide Web (and for VoiceXML applications), is specified in ABNF format.

The ABNF grammar format uses special characters to define grammar expressions in a text string while XML grammars are composed of text strings enclosed in XML elements. Whether to use the ABNF or XML format is up to you, however, VoiceXML 2.0 only requires implementers to support the XML format. Therefore, you may want to use the XML format to write grammars if portability is important to you.

If you're already experienced with the GSL or JSGF grammar formats, then you'll likely prefer the ABNF format because of its similarity. If you decide to use the XML format, you will quickly discover that it is extremely verbose compared to ABNF, making it more difficult to read. On the other hand, using the DTD or XML Schema for the XML grammar format in conjunction with an XML editor makes the task less tedious and reduces syntax errors. The authors of the VoiceXML 2.0 grammar format have also included an XSL style sheet for converting XML grammars to ABNF format, which may aid linguists who prefer to proof grammars in a less verbose text format.

Examples will be listed in both ABNF and XML format.

Grammar Headers

ABNF and XML grammar files must contain specific header information; otherwise, the VoiceXML interpreter will fail to recognize the grammar properly. The elements of a grammar file are:

  • grammar declaration
  • language/locale
  • mode
  • root grammar
ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en" mode="voice" root="topRule">
...

Grammar declaration

The grammar declaration specifies the grammar version and optionally, the character encoding scheme that should be used. The grammar version should always be set to 1.0. The character encoding specifies the character symbols that will be used for the grammar. For example, ISO-8859-1 is usually the character encoding used for English. Asian languages including Japanese and Chinese (Big5 or Mandarine) would use a different encoding scheme. In ABNF grammars, this is the first line. In XML, the encoding scheme is defined by the encoding attribute of the XML declaration (the first line of any XML file). 

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en" mode="voice" root="topRule">
...

The grammar version in an XML grammar is defined by the version attribute of the <grammar> element.

Language

Unless the grammar is a DTMF grammar, a language must be specified in the grammar header. For ABNF grammars, the language parameter defines the language (in this example, US English):

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en" mode="voice" root="topRule">
...

The language in an XML grammar is specified by the xml:lang attribute of the <grammar> element.

Mode

Grammars can be scoped for speech input (voice) or touch-tone input (dtmf) based on the value of the mode parameter. The default mode is voice. If the grammar is scoped dtmf, then speech input will not be recognized. VoiceXML 2.0 grammars do not allow mixed mode grammars. That means that a voice scoped grammar cannot include a dtmf scoped grammar or vice versa. In cases where we may want an application to accept both voice and DTMF input, two separate grammars can be defined within a given VoiceXML scope so long as they aren't combined into a single grammar in any way.

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en" mode="dtmf" root="topRule">
...

Root grammar 

When grammars contain many sub-grammar rules in a single file, it's important to identify the root grammar, or main the grammar that will be executed when a VoiceXML dialog calls the grammar file. 

ABNF
# ABNF 1.0 ISO-8859-1;
language en;
mode voice;
root $topRule;
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<grammar version="1.0" 
  xmlns="http://www.w3.org/2001/06/grammar"
  xml:lang="en" mode="voice" root="topRule">
...

Filename Extensions

The filename extension for ABNF grammars is .gram and .grxml for XML grammars. This is the recommended (but not required) filename extension format for grammar files.

About Jonathan Eisenzopf

Jonathan is a member of The Ferrum Group, LLC which specializes in Voice Web consulting and training. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about this or any article.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved