Review: IBM WebSphere Voice Toolkit

by Hitesh Seth

This week we review IBM WebSphere Voice Toolkit


IBM WebSphere Voice Toolkit is a complete integrated VoiceXML application development platform. Based on IBM's next generation open source Eclipse IDE (Integrated Development Environment) platform (http://www.eclipse.org), Voice Toolkit includes the traditional rich IDE features such as project/view Management, source code control management (with integration with SCM (source code management) tools such as CVS) and an integrated set of VoiceXML based tools including  application generation wizards, VoiceXML editor, grammar development and testing tools, debugging, development of both static VoiceXML content and J2EE (Java 2 Enterprise Edition) based dynamic application  and a broad set of reusable dialog components. In the rest of the article, we review key features of the Toolkit and the features it brings to the table for VoiceXML based application development. Voice Toolkit supports VoiceXML 1.0 based application development using multiple grammar formats.


Installation of the WebSphere Voice Toolkit requires installation of three components - the Voice Toolkit itself, WebSphere Voice Server SDK and the IBM Reusable Dialog Components. Another component, the Voice Application Debugger (reviewed later in the article) which is currently in beta stage is optional but adds important step-by-step debugging facility. The Voice Server SDK includes desktop versions of IBM TTS (Text-to-Engine) and IBM ViaVoice ASR (Advanced Speech Recognition) Engines. All these components are available for download for Windows 2000 based development environments from IBM Voice Systems homepage. (see the Resources section)

First Looks: The VoiceXML Editor

Perhaps one of the basic and the most common and useful features available in a number of VoiceXML based IDEs, is a VoiceXML editor. IBM's Voice Toolkit's VoiceXML IDE is based on a generic XML IDE but has features which are useful for a VoiceXML application developer such as content assist, bookmarks, tasks. Particularly interesting is the content assist feature which through either a context-sensitive drop down menu or a hotkey (Ctrl-Space bar), provide possible a list of the VoiceXML tags & attributes. The content assist feature is driven based on the DTD (document type definition) based VoiceXML specification (as shown in figure below; click the figure to see a complete IDE). The content assist feature is also customizable, through macros which can be created for tags, attributes and attribute values.

Pronunciation Builder

Apart from the development tools for VoiceXML planet, IBM's forte in speech systems is the capability to execute and host Voice Applications (function as a VoiceXML gateway) with products such as IBM WebSphere Voice Server and Integration with IVR (Interactive Voice Response) platforms such as DirectTalk. VoiceXML currently doesn't have a standard for representing creating phonology. However, Pronunciation Builder (screenshots - 1, 2), a component of the VoiceXML Toolkit allows the developer to compose IPA (International Phonetic Alphabet) based pronunciations of unknown words (such as uncommon names or words typically said in a different fashion). For instance you could change the default pronunciation of J2EE to be "J 2 double e" (represented in the IPA as "&#676;e&#618; tu &#712;d&#652;.b&#601;l i") instead of the standard "j 2 e e" (represented in IPA as "&#676;e&#618; tu i i"). The tool automatically adds a reference to the composed pronunciation into the VoiceXML document using IBM's VoiceXML extension tag "<ibmlexicon>" as shown in the following code snippet. These composed pronunciations are then used by the IBM Text-to-Speech Engine to appropriately create the correctly pronounced synthesized speech using the IBM ViaVoice Text-to-Speech Engine.

Audio Recorder

One of the best practices in early VoiceXML application development is to keep synthesized Text-to-Speech minimum. Instead pre-recorded prompts for dialog introductions provide personality to the application. For integrated development of audio prompts, Voice Toolkit, includes a pretty basic audio recorder (shown below) which allows a developer to record/edit .au/.wav based audio prompts which can be used for development and later for deployment.

Voice Application Debugger

As developers of Java/C++/Visual Basic/Web applications we all are used to debugging applications - the traditional breakpoints, step by step walkthrough, variable watches, interactions etc. VoiceXML being dialog based system, leans itself into the traditional programming paradigm, the major difference being that inputs and outputs can be voice (pre-recorded/generated), and subroutines are sub-dialogs. Voice Application Debugger a utility released out of IBM's alphaWorks division, integrates this step-by-step debugging methodology into the Voice Toolkit. The debugger adds a menu item item called "Debug VoiceXML" which starts the debugger with the VoiceXML document currently edited. The debugger (shown below) also supports debugging of remote (URL based) VoiceXML applications.

Grammar Development

Your VoiceXML application is as rich/good as the grammar it supports. Grammar development and testing is also perhaps the most difficult and also most important part in the development of VoiceXML applications. A number of grammar formats are being used by VoiceXML gateways and hosted voice portals including JSGF (Java Speech Grammar Format), BNF (Backus-Naur Form),  XML based  grammar formats etc. IBM WebSphere Voice Toolkit supports development, testing and inter-conversion of JSGF and BNF based grammars. Two important functions included in the toolkit around grammar development include a wizard for generation of possible utterances (screenshot) and another for testing a grammar (shown below) with any text/speech based utterances. Some features that I would like to see in the next version around this functionality would be a visual (graph-like) representation of the grammar and support for the upcoming XML based grammar specification.

View Grammar Test Tool Screen Shot

Reusable Dialog Components

A key highlight of VoiceXML is that it truly integrates the web application development world with the interactive speech-based telephony applications. However, speech application development isn't easy. It involves the creation of complex dialogs for all the possible voice interactions. For instance, for a simple dialog to get a valid US state as the input, you would need to create a grammar which enlists all the states, etc.

Reusable Dialog Components are an extensive set of reusable dialog components which are available from IBM. They can be used within VoiceXML applications as sub-dialogs or templates. Currently included in the 2.0 release are subdialogs (with their associated grammars and VoiceXML code) for recognizing alphanumeric characters, selecting elements from a list, confirmaiton; processing input for credit card numbers/expiration dates, currency, dates, directions, durations, email addresses, numbers, social security numbers, street types, telephone numbers, time, URL, major cities of US, US states and time zones.

Reusable Dialog Components also include another smaller set of components--known as VoiceXML code templates--which represent a templated complex dialog flow created through a combination of multiple reusable subdialogs. For instance, the included address template can be used to get a user's address information. This template uses Alpha, AlphaNumeric, Confirmation, Number, Street Type, US Major City, US Postal Code and US State subdialog components. Other components included are templates for recognizing credit card information, date range, name and a time range. Reusability--although a simple concept--isn't easy to implement. For instance, the creation of a library of reusable components is one thing, but using those component easily in the application is another challenge. Voice Toolkit makes the job simpler for reusing the extensive dialog components by providing a simple wizard-based approach (shown in the figures below) for using a dialog component in a VoiceXML-based application.
Figure: Select a Reusable Dialog Component

Figure: Customize the parameters of the component

The following code snippets show the code that is generated by the wizard. The example below uses the US Postal Codes Dialog component to get a valid postal code as an input and pass it on to rest of the application.

view code example 1

By just making a few changes, the dialog can be completed into a complete application component. Once you start using the dialog components available, you can easily recognize the value gained by the usage of reusable dialog components and the time/effort that can be saved towards developing fully functional VoiceXML applications through an assembly of components and business logic.

view code example 2

Dynamic Application Development

Apart from the development of the static VoiceXML documents in the VoiceXML editor (which can also be used as templates for dynamic content), Voice Toolkit also provides dynamic VoiceXML application generation based on J2EE (Java 2 Enterprise Edition) based Web Application Development. The tool provides two basic wizards for JSP/Servlets based dynamic application generation - Database Web Page & Java Bean Web Page (shown below). Even though these wizards are quite basic, they provide a great deal of help in getting the basic application template ready.

view the Application Wizard

For instance, the Java Bean Web Page wizard generates a starter VoiceXML-based application based on a Java Bean. Let's complete our example from the reusable dialog component section. We create an example/prototype bean called "Weather" (this bean always returns the same weather for all postal codes; an actual bean however would call a web service from weather.com or some such service and get the actual weather for the postal code).

view bean example 1

Given the reference of the Weather bean to the "Java Bean Page Wizard," it generates the following JSP/Servlets based VoiceXML based Application:


view bean example 2


view bean example 3

Similarly, the Database Web Page Wizard generates the starter VoiceXML application using a SQL query. Voice Toolkit generates all applications based on Java Server Pages and Java Servlets specifications, and using the toolkit's support for either an embedded Apache Tomcat or IBM WebSphere Application Server, the application can be remotely or locally deployed and tested as well (using the Voice Application Debugger).


In summary, IBM WebSphere Voice Toolkit represents a comprehensive and integrated set of tools for VoiceXML application development. What sets it apart from the competition is the easy use of voice application debugging, dynamic application generation from Java Beans and relational databases and most importantly, the extensive set of reusable dialog components. What I would like to see in the coming versions is support for VoiceXML 2.0 standard (currently in draft stage) features and graphical grammar creation tools.


About Hitesh Seth

Hitesh Seth is Chief Technology Evangelist for Silverline Technologies, a global eBusiness and mobile solutions consulting and integration services firm. He is a columnist on VoiceXML technology in XML Journal and regularly writes for other technology publications including Java Developer’s Journal and Web Services Journal on technology topics such as J2EE, Microsoft .NET, XML, Wireless Computing, Speech Applications, Web Services & Integration. Hitesh received his Bachelors Degree from the Indian Institute of Technology Kanpur (IITK), India. Feel free to email any comments or suggestions about the articles featured in this column at hitesh.seth@silverline.com.

This article was originally published on Sunday Oct 20th 2002
Mobile Site | Full Site