VoiceXML Developer Series: A Tour Through VoiceXML, Part IX

by Jonathan Eisenzopf

In this edition of the VoiceXML Developer, we will complete our pizza ordering application by accepting the order, logging the transaction in an Access database, and playing an order confirmation for the user.

In all of the examples thus far, we have collected input but stopped short of submitting the form fields to a back end script. In this edition of the VoiceXML Developer, we will complete our pizza ordering application by accepting the order, logging the transaction in an Access database, and playing an order confirmation for the user. We will also utilize many of the techniques that we have learned over the course of this tutorial to create a high quality voice application.


It's time to put all of the skills you've learned to use. We are going to start by designing a dialog flow. Then we will develop an application architecture and data model based upon the dialog flow. From the dialog flow, we will develop our VoiceXML forms and add the back-end scripting capability that will allow the system to recognize and create customer records as well as add a pizza order to a table in an Access database. We will be using PerlScript for Active Server Pages to perform SQL queries. We could have also used PHP, Cold Fusion, or JSP as well, but I'm most familiar with Perl and prefer its text processing capabilities, which come in handy when we're parsing text representations of user utterances.

Creating a dialog flow

I used Microsoft Viso to create a dialog flow that contains all of the prompts, decision trees, and processing instructions that are required for this application. I've split the whole diagram into two separate pieces. The first piece contains the flow for getting the customer's phone number and address. The flow starts by playing a greeting and is followed by a prompt asking the user for their phone number. The system confirms the number by speaking the number back and asking the customer to confirm that the number that was recognized as the number they uttered. If it's not, the system will prompt the customer for their phone number again until the customer confirms that the correct phone number was recognized.

When the customer does confirm that we have the correct number, we submit the VoiceXML form to a back-end ASP script, which uses we uses the phone number as a key to look up the customer's record in the Access database. If the number doesn't exist, we create one and prompt the user for their address. Their address is recorded as a .wav file and submitted to another ASP script, which saves the file and associates it with the customer's record.

If the customer has previously used the system to place an order and they provide the same phone number, the database will ask the user to confirm the address or provide a new one. If the customer confirms the address, we move on to the next dialog flow, which takes the customer's order. If the customer does not confirm the address, the system prompts the user to record their address, and the resulting .wav file is saved and associated with their customer record.

View the Dialog Flow image

Now we have confirmed a valid phone number and address, we move to the second part of the dialog flow, which prompts the customer for their pizza order. Notice that our prompt is, "May I take your order please." This is a rather open ended question. We need to be ready to handle many different utterances that the user might provide. If the range of utterances is too wide to cover, we may need to provide a more specific prompt that provides the options that the customer can select from. The slots (or form fields) that we need to fill to place a pizza order are the pizza size, the pizza type, and the pizza toppings. Because this will be a mixed initiative dialog, the customer could fill all of the slots with a single utterance like, "I would like a small hand tossed pizza with pepperoni and mushrooms". On the other hand, they might just say, "I'd like a small pepperoni pizza". So we need to be able to handle cases where the customer does not fill all of the slots by providing directed prompts for those fields that were not filled. In the last example utterance, we would want to prompt the customer for a pizza type.

Once all of the slots have been filled, we want to play the order back to the customer and have them confirm the order. If the customer says "Yes", then we thank them for the order, record the order in the Access database, and end the call. If the customer says "No", we need to go back and prompt them for their order again until we get it right.

View the 2nd Dialog Flow image

And that's our dialog flow. It a good idea to map it out on paper or in some kind of a dialog flow so that you or your customer can review the call flow to ensure that it will satisfy the business and customer needs fully.

Designing the application architecture

The next thing we need to do is develop our application architecture, which will identify the dialog flow from a programmatic perspective and specify how the application will be sliced up. First, let's see if we can identify some application components that we can segment off as a VoiceXML document, form, or subdialog. We should also see if we can identify common functions that can be componentized and reused in several places.

  • The first component that seems to be obvious is the root document that will contain our global variables and perhaps even play the greeting. All other VoiceXML documents will point to this file as its application root in the application attribute of the <vxml> element.
  • The second component I see is a module that prompts for the customer's telephone number. This module will include the prompts, error handling, and validation for a telephone number.
  • As we look over the dialog flow, it also seems that we are asking the customer for yes or no answers on several different occasions. We can create one subdialog and reuse it for all of these confirmation requests.
  • The next identifiable component accepts a telephone number as a parameter, queries the Access database for a matching address, and returns the address. If the address doesn't exist, the subdialog will prompt the customer for an address, create a new record, and return the results. If the customer does have a record, we want to confirm the address and have the customer record a new one if the one on record is incorrect.
  • The largest and most complex component is of course the part that takes the actual order. It needs to be able to fill all the slots in a single utterance, and also prompt for the slots that didn't get filled in the first utterance.
  • And finally, the last component takes the pizza order, saves it to the Access database, thanks the customer for their order, and ends the call.
  • What we should do is create a single diagram that encapsulates the dialog flow and the dialog components. In this diagram, we also want to identify the names of the VoiceXML files, the form and field names, and any subdialogs that will be utilized in the application. The dialog below represents the major dialogs in the application. Each dialog will be contained within a .vxml or .asp file. The .asp files will contain PerlScript code that queries the Access database.

    View the "High Level" Architecture image

    You can see the same general dialog flow above as it exists in our other diagrams. But now, we've taken a step towards solidifying a call flow diagram into a VoiceXML application. But we have alot more to do yet. For each dialog in the application above, we need to specify, in voice dialog and VoiceXML terms, the prompts, forms, and fields that exist in each, and how the dialogs transition from one to another. We can do this by breaking the design down into more detailed dialog flow diagrams.


    The main.vxml dialog is fairly simple. We initialize the phone_number variable, which will be used throughout the application, and play a welcome message. Then we transition to telephone_number.vxml.


    This dialog is responsible for capturing the user's telephone number. user is prompted for their phone number in the telephone_number form using the phone.grammar grammar file. Once the user provides a valid phone number, the value is saved in the phone_number variable and the dialog transitions to the confirm_phone_number form, which plays the number back to the user and asks them if it is correct. The user's answer is captured by the yes_or_no.vxml subdialog. If the user responded no, then they are send back to the telephone_number form where they are prompted for their phone number again. Once the user confirms the number by uttering "yes", the dialog transitions to validate_phone_number.asp.

    View the "Telephone Number" Architecture image


    This VoiceXML dialog is different from the others we've looked at so far, because it is an ASP script rather than a static VoiceXML document. It mixes PerlScript code that queries the Access database with VoiceXML content. The first thing we do in this dialog is to query the database for the phone number that the user provided in the telephone_number.vxml dialog. If a record doesn't exist, the resulting VoiceXML output will transition the dialog to the record_address form, which records the user's address and submits it along with the phone number to save_address.asp. If we did find an address that goes with the phone number, we transition the dialog to the confirm_address form, which plays the address and asks the user to confirm that it is indeed the correct address. If the user says yes, we proceed on to the take_order.vxml dialog. Otherwise, we transition the dialog to the record_address form so that they can record a new address.

    View the "Validate Phone Number" Architecture image


    In the case where we need to either create a new record with an address, or update an address for an existing phone number, we need to take a detour to this script to submit the changes to the access database before we proceed on to the take_order.vxml dialog. To do this, we grab the phone number and .wav audio file from the the Request ASP object, do an insert or update to the proper Access table, and transition on to the next dialog. There isn't any user interaction happening in this dialog at all.


    This is the dialog where we take the customer's order. Because it's a mixed dialog, we include an initial block to prompt the user for their order and attempt to fill as many of the fields as we can with a single utterance. For the fields that weren't filled, the VoiceXML interpreter knows to skip the fields that have already been filled and to only prompt for the fields that have not. Once we've filled all of the fields, we transition to the confirm_pizza_order form, which plays the customer's order back to them and asks them to confirm that it is correct. If they say "No", then the dialog transitions back to the pizza_order form and the customer is prompted to say their order again. Once they confirm their order, the dialog transitions to the save_order.asp dialog.

    View the "Take Order" Architecture image


    This is the final dialog, an ASP script, which creates a new record in the Access database for the pizza order, and plays a thank you message for the user. The form fields are retrieved from the ASP Request object, and a bit of PerlScript code parses the input and saves the record via an ADO transaction. At this point, because there are no other dialogs to transition to, the call ends.


    This is the dialog that is utilized by several other dialogs to get confirmation from the user. It's not very fancy, but it works and it's better than having to create three separate confirmation dialogs.

    View the "Yes or No" Architecture image


    Well, so far we've created a dialog flow that provides us with a clear idea of how the dialog is to take place. We took this dialog flow model and solidified it into a high-level application diagram that contained a number of VoiceXML and ASP files. Then we boiled each of those dialogs into specific dialog elements. We are almost ready to write the code for this application. But first, we need to create an Access data model, and record the prompts. We'll tackle that in the next edition of the VoiceXML Developer.

    About Jonathan Eisenzopf

    Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia that specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about the VoiceXML Developer series, or for more information about training and consulting services.

    This article was originally published on Wednesday Oct 9th 2002
    Mobile Site | Full Site