VoiceXML Conformance Report

Monday Dec 16th 2002 by Jonathan Eisenzopf
Share:

In this article, we will test six VoiceXML browsers for VoiceXML 2.0 conformance to determine how compatible today's VoiceXML platforms are with each other.

In this article, we will test six VoiceXML browsers for VoiceXML 2.0 conformance to determine how compatible today's VoiceXML platforms are with each other.

Defining Conformance

Getting a group of scientists to agree on something is a challenge. Getting a small group of toddlers to play quietly is even more challenging. Getting business people and scientists to agree on anything is nearly impossible. Similarly, getting companies to create conforming VoiceXML browsers that are compatible with each other has so far been impossible.

First, we need to define what conformance means within the context of VoiceXML. An application that conforms to a standard means that it fully implements the specification, matches the syntax and follows the rules. For example, many off-the-shelf applications today are able to communicate with other programs on the network. How do they do that? Did all of these software companies work together to enable their programs to talk to one another? Well no, they are able to communicate because they all use a common network protocol called TCP/IP (actually, that's 2 protocols, TCP and IP, but they have a close working relationship). The Internet Engineering TaskForce (IETF) has been responsible for creating networking protocols for several years now. The reason computer programs are able to communicate with each other on a network without understanding the communication mechanisms of every other application out there is because they all utilize a common communications protocol. This is a very powerful concept because it provides a common communications mechanism that allows programmers to leverage existing technologies to create ever more comprehensive and powerful network applications. Imagine having to write your own communications layer every time you wrote a new application.

VoiceXML is in fact a technology that leverages several layers of standardized protocols that are used to transport messages between applications (in our case, a VoiceXML browser and a Web server).

 

The diagram above depicts the standards that a voice browser relies on to communicate.

Why Conformance is Important

I conducted an informal survey made up of participants that are either evaluating VoiceXML platforms or have already implemented a VoiceXML solution.

When asked why they were considering VoiceXML, the most common responses were:

  • New technology
  • Based on open standards
  • Can move to a different platform later
  • Can extend Web applications

Next, I asked participants to rate a list of nine VoiceXML benefits from one to ten, one meaning that the benefit was not important at all and ten meaning that it was very important. The list was created based on a common set of expected benefits that my company, The Ferrum Group, typically gets from customers when they come to us to help them select and implement a speech IVR solution.

The top three VoiceXML benefits important to customers were:

  • Provides a wider variety of platform choices

  • Uses open standards

  • Can port applications to any other VoiceXML platform

Finally, I asked participants What is the most important benefit that you want to see from VoiceXML?

The two most common answers were: 

  • Open Standards

  • Portability

While the survey was not scientific, the results did seem to indicate that customers were most interested in the benefits that come from using an open standard like VoiceXML. 

My conclusion as to why conformance is important is that customers naturally expect it as a byproduct of an open standard. Without conformance, the benefits of using an "open standard" are greatly diminished.

Conformance Test Suite

The next step in my study was to test VoiceXML conformance across a range of VoiceXML browsers using only the VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) as guidelines for creating the test source code. I did not refer to any VoiceXML or SRGS documentation from any of the platform providers.

Purpose

The purpose of this test was to determine how many platform providers that claim VoiceXML 2.0 support are actually able to run compliant code without requiring additional modifications. 

Development Tools

Instead of using a proprietary VoiceXML tool, I decided to use XML Spy and the DTDs provided by the Voice Browser working group (which are linked within VoiceXML 2.0 and SRGS 1.0 specifications). This ensured that:

  • Code that was created was platform independent
  • Code was validated against the official DTDs

Test Source Code

For the test, I developed a minimal VoiceXML application that consisted of:

  • One VoiceXML form to gather a social security number
  • One SRGS XML DTMF grammar
VoiceXML Source Code
<?xml version="1.0"?>
<!DOCTYPE vxml PUBLIC "-//W3C//DTD VOICEXML 2.0//EN"
"http://www.w3.org/TR/voicexml20/vxml.dtd">
<vxml version="2.0" 
      xmlns="http://www.w3.org/2001/vxml">
   <form id="ssn">
      <field name="ssn_number">
         <grammar src="ssn_dtmf.grxml" mode="dtmf" 
                  type="application/srgs+xml"/>
         <prompt bargein="true">Please enter 
         your social security number</prompt>
         <filled>
            <prompt>You entered 
               <value expr="ssn_number"/>
            </prompt>
            <clear namelist="ssn_number"/>
            <goto next="#ssn"/>
         </filled>
      </field>
      <catch event="nomatch noinput">
         <reprompt/>
      </catch>
   </form>
</vxml>
SRGS XML Grammar Source Code
<?xml version="1.0"?>
<!DOCTYPE grammar PUBLIC 
 " -//W3C//DTD GRAMMAR 1.0//EN"
 "http://www.w3.org/TR/speech-grammar/grammar.dtd">
<grammar mode="dtmf" version="1.0" xml:lang="en-US" 
   root="ssn" 
   xmlns="http://www.w3.org/2001/06/grammar">
   <rule id="ssn" scope="public">
      <ruleref uri="#digit"/><ruleref uri="#digit"/>
      <ruleref uri="#digit"/><ruleref uri="#digit"/>
      <ruleref uri="#digit"/><ruleref uri="#digit"/>
      <ruleref uri="#digit"/><ruleref uri="#digit"/>
      <ruleref uri="#digit"/>
   </rule>
   <rule id="digit" scope="private">
      <one-of>
         <item>1</item><item>2</item><item>3</item>
         <item>4</item><item>5</item><item>6</item>
         <item>7</item><item>8</item><item>9</item>
         <item>0</item>
      </one-of>
   </rule>
</grammar>

Conformance Results

The results of the conformance test for the 6 platforms are listed below. The good news is that 3 out of 6 platforms executed the code. The bad news is that 3 of the 6 platforms didn't.

While Browser 4 and Browser 6 didn't execute the code, the changes required to make it work were minimal. However, for the sake of the test, the code either worked or it didn't. To be fair, I did go to the trouble of troubleshooting what needed to change to allow the code to run. This information is detailed below.

Browser 4

To make the code work on Browser 4, I had to change the DTD reference from W3C to one provided by the vendor.

This is a minor change that is acceptable when you want to use extra browser extensions, however, it should still be capable of running generic VoiceXML code that uses the default W3C DTD.

The second change that I had to make was to change the mime type attribute of the <grammar> element to:

application/grammar+xml

This is forgivable because the VoiceXML specification only provides an example of what the mime type might be rather than stating what it must be.

Browser 5

Browser 5 was more difficult. I gave up troubleshooting the problem after spending an hour trying to figure it out.

Browser 6

Like Browser 4, Browser 6 required a different DTD.

Also, as with Browser 4, the mime type attribute of the <grammar> element needed to be changed to:

application/grammar+xml

The third and final change was to remove the SRGS grammar DTD. It took me a while through the process of elimination to discover the solution to this particular problem.

VoiceXML DTD Problems

During the testing process, I noticed that several code checking tools offered by the platform vendors consistently complained about the W3C DTD referenced in the VoiceXML test program. One of the VoiceXML contributors later confirmed that the DTD listed in the specification contained errors, which would be fixed soon. This may or may not have contributed to the fact that Browser 4 and Browser 6 required a different DTD since some XML parsers would not have been able to validate VoiceXML source code using the W3C DTD.

Testing Tool Validation Problems

One thing I noticed as I was testing the various platforms is that the source code valuators offered by the vendors often gave false positive results meaning that when I tested a VoiceXML program that I had intentionally broken, the majority of the tools often reported the code to be valid even though it would not work when I dialed into the application. This made the troubleshooting process all the more difficult. Browser 2 and Browser 3 were the only platform code valuators that accurately identified problems in the source code.

Conclusions

I spent about 60 minutes troubleshooting each of the three platforms that didn't run the VoiceXML test program and I was only able to figure out how to fix the problem on two of them. The fact that debugging output was not very helpful most of the time meant that I had to resort to fixing problems through the process of elimination, which is very time consuming. These code valuators need to do a better job of inspecting element data and attributes in addition to validating the code against a DTD.

From my perspective, it would be better to use a proprietary standard that was supported by a wide range of vendors whose platforms achieved interoperability and conformance than to use an "open standard" in which implementations were inspired by the standard rather than conforming to it.

Unless ALL VoiceXML platforms are able to run compliant code, VoiceXML will not be portable, will not meet customers expectations, and will therefore not be very useful. If this test of 6 browsers is an general indication that only 50% of the available platforms are VoiceXML compliant, then customers need to be careful to test platforms for compliance before making a final decision.

In the future, I plan on extending the VoiceXML 2.0 test script to exercise the rest of the specification and also plan to expand the number of platforms that will be tested. If you have ideas or recommendations on what the test script should contain or would like to recommend VoiceXML gateways that you'd like to see tested, please send me an email with that information.

About Jonathan Eisenzopf

Jonathan is a Senior Partner of The Ferrum Group, LLC  which provides speech IVR consulting, training, and voice user interface design. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about this or any article.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved