In this edition of the VoiceXML Developer, we will complete our pizza ordering application by accepting the order, logging the transaction in an Access database, and playing an order confirmation for the user.
In all of the examples thus far, we have collected input but stopped
short of submitting the form fields to a back end script. In this edition
of the VoiceXML Developer, we will complete our pizza ordering application
by accepting the order, logging the transaction in an Access database,
and playing an order confirmation for the user. We will also utilize many
of the techniques that we have learned over the course of this tutorial to
create a high quality voice application.
Overview
It's time to put all of the skills you've learned to use. We are going to
start by designing a dialog flow. Then we will develop an application architecture
and data model based upon the dialog flow. From the dialog flow, we will develop
our VoiceXML forms and add the back-end scripting capability that will allow the
system to recognize and create customer records as well as add a pizza order to a table
in an Access database. We will be using PerlScript for Active Server Pages
to perform SQL queries. We could have also used PHP, Cold Fusion, or JSP as well,
but I'm most familiar with Perl and prefer its text processing capabilities,
which come in handy when we're parsing text representations of user utterances.
Creating a dialog flow
I used Microsoft Viso to create a dialog flow that contains all of the prompts,
decision trees, and processing instructions that are required for this application.
I've split the whole diagram into two separate pieces. The first piece contains the
flow for getting the customer's phone number and address. The flow starts by playing
a greeting and is followed by a prompt asking the user for their phone number. The
system confirms the number by speaking the number back and asking the customer to
confirm that the number that was recognized as the number they uttered. If it's not,
the system will prompt the customer for their phone number again until the customer
confirms that the correct phone number was recognized.
When the customer does confirm that we have the correct number, we submit the
VoiceXML form to a back-end ASP script, which uses we uses the phone number
as a key to look up the customer's record in the Access database. If the number
doesn't exist, we create one and prompt the user for their address. Their address
is recorded as a .wav file and submitted to another ASP script, which saves the file
and associates it with the customer's record.
If the customer has previously
used the system to place an order and they provide the same phone number, the database
will ask the user to confirm the address or provide a new one. If the customer confirms
the address, we move on to the next dialog flow, which takes the customer's order.
If the customer does not confirm the address, the system prompts the user to record
their address, and the resulting .wav file is saved and associated with their customer
record.
View the Dialog Flow image
Now we have confirmed a valid phone number and address, we move to the second
part of the dialog flow, which prompts the customer for their pizza order. Notice
that our prompt is, "May I take your order please." This is a rather open ended
question. We need to be ready to handle many different utterances that the
user might provide. If the range of utterances is too wide to cover, we may need
to provide a more specific prompt that provides the options that the customer
can select from. The slots (or form fields) that we need to fill to place a pizza
order are the pizza size, the pizza type,
and the pizza toppings. Because this will be a mixed initiative
dialog, the customer could fill all of the slots with a single utterance like,
"I would like a small hand tossed pizza with pepperoni and mushrooms". On the
other hand, they might just say, "I'd like a small pepperoni pizza". So we need
to be able to handle cases where the customer does not fill all of the slots by
providing directed prompts for those fields that were not filled. In the last
example utterance, we would want to prompt the customer for a pizza type.
Once all of the slots have been filled, we want to play the order back to the
customer and have them confirm the order. If the customer says "Yes", then we thank
them for the order, record the order in the Access database, and end the call.
If the customer says "No", we need to go back and prompt them for their order
again until we get it right.
View the 2nd Dialog Flow image
And that's our dialog flow. It a good idea to map it out on paper or in some kind
of a dialog flow so that you or your customer can review the call flow to ensure that
it will satisfy the business and customer needs fully.
Designing the application architecture
The next thing we need to do is develop our application architecture, which
will identify the dialog flow from a programmatic perspective and specify how
the application will be sliced up. First, let's see if we can identify some application
components that we can segment off as a VoiceXML document, form, or subdialog.
We should also see if we can identify common functions that can be componentized
and reused in several places.
The first component that seems to be obvious is the root document that will
contain our global variables and perhaps even play the greeting. All other
VoiceXML documents will point to this file as its application root in the
application attribute of the <vxml>
element.
The second component I see is a module that prompts for the customer's telephone
number. This module will include the prompts, error handling, and validation for
a telephone number.
As we look over the dialog flow, it also seems that we are asking the customer
for yes or no answers on several different occasions. We can create one subdialog
and reuse it for all of these confirmation requests.
The next identifiable component accepts
a telephone number as a parameter, queries the Access database for a matching
address, and returns the address. If the address doesn't exist, the subdialog
will prompt the customer for an address, create a new record, and return
the results. If the customer does have a record, we want to confirm the address
and have the customer record a new one if the one on record is incorrect.
The largest and most complex component is of course the part that takes
the actual order. It needs to be able to fill all the slots in a single utterance,
and also prompt for the slots that didn't get filled in the first utterance.
And finally, the last component takes the pizza order, saves it to the Access
database, thanks the customer for their order, and ends the call.
What we should do is create a single diagram that encapsulates the dialog flow and
the dialog components. In this diagram, we also want to identify the names of
the VoiceXML files, the form and field names, and any subdialogs that will be utilized
in the application. The dialog below represents the major dialogs in the application.
Each dialog will be contained within a .vxml or .asp file. The .asp files will
contain PerlScript code that queries the Access database.
View the "High Level" Architecture image
You can see the same general dialog flow above as it exists in our other
diagrams. But now, we've taken a step towards solidifying a call flow diagram
into a VoiceXML application. But we have alot more to do yet. For each dialog
in the application above, we need to specify, in voice dialog and VoiceXML
terms, the prompts, forms, and fields that exist in each, and how the dialogs
transition from one to another. We can do this by breaking the design down into
more detailed dialog flow diagrams.
main.vxml
The main.vxml dialog is fairly simple. We initialize the phone_number
variable, which will be used throughout the application, and play a welcome message.
Then we transition to telephone_number.vxml.

telephone_number.vxml
This dialog is responsible for capturing the user's telephone number. user
is prompted for their phone number in the telephone_number form using
the phone.grammar grammar file. Once the user provides a valid phone
number, the value is saved in the phone_number variable and the
dialog transitions to the confirm_phone_number form, which plays
the number back to the user and asks them if it is correct. The user's answer
is captured by the yes_or_no.vxml subdialog. If the user responded
no, then they are send back to the telephone_number form where they
are prompted for their phone number again. Once the user confirms the number by uttering
"yes", the dialog transitions to validate_phone_number.asp.
View the "Telephone Number" Architecture image
validate_phone_number.asp
This VoiceXML dialog is different from the others we've looked at so far,
because it is an ASP script rather than a static VoiceXML document. It mixes
PerlScript code that queries the Access database with VoiceXML content. The
first thing we do in this dialog is to query the database for the phone number
that the user provided in the telephone_number.vxml dialog.
If a record doesn't exist, the resulting VoiceXML output will transition the
dialog to the record_address form, which records the user's
address and submits it along with the phone number to save_address.asp.
If we did find an address that goes with the phone number, we transition the
dialog to the confirm_address form, which plays the address
and asks the user to confirm that it is indeed the correct address. If the user
says yes, we proceed on to the take_order.vxml dialog. Otherwise,
we transition the dialog to the record_address form so that
they can record a new address.
View the "Validate Phone Number" Architecture image
save_address.asp
In the case where we need to either create a new record with an address,
or update an address for an existing phone number, we need to take a detour
to this script to submit the changes to the access database before we
proceed on to the take_order.vxml dialog. To do this, we
grab the phone number and .wav audio file from the the Request ASP object,
do an insert or update to the proper Access table, and transition on to
the next dialog. There isn't any user interaction happening in this dialog
at all.

take_order.vxml
This is the dialog where we take the customer's order. Because it's
a mixed dialog, we include an initial block to prompt
the user for their order and attempt to fill as many of the fields as
we can with a single utterance. For the fields that weren't filled,
the VoiceXML interpreter knows to skip the fields that have already been
filled and to only prompt for the fields that have not. Once we've filled
all of the fields, we transition to the confirm_pizza_order form,
which plays the customer's order back to them and asks them to confirm that
it is correct. If they say "No", then the dialog transitions back to the
pizza_order form and the customer is prompted to say their
order again. Once they confirm their order, the dialog transitions to the
save_order.asp dialog.
View the "Take Order" Architecture image
save_order.asp
This is the final dialog, an ASP script, which creates a new record
in the Access database for the pizza order, and plays a thank you message
for the user. The form fields are retrieved from the ASP Request object,
and a bit of PerlScript code parses the input and saves the record via
an ADO transaction. At this point, because there are no other dialogs to
transition to, the call ends.

yes_or_no.vxml
This is the dialog that is utilized by several other dialogs to get
confirmation from the user. It's not very fancy, but it works and it's better
than having to create three separate confirmation dialogs.
View the "Yes or No" Architecture image
Conclusion
Well, so far we've created a dialog flow that provides us with a clear
idea of how the dialog is to take place. We took this dialog flow model
and solidified it into a high-level application diagram that contained
a number of VoiceXML and ASP files. Then we boiled each of those dialogs
into specific dialog elements. We are almost ready to write the code
for this application. But first, we need to create an Access data model,
and record the prompts. We'll tackle that in the next edition of the
VoiceXML Developer.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia
that specializes in Voice Web consulting and training. He has also written
articles for other online and print publications including WebReference.com
and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding
questions or comments about the VoiceXML Developer series, or for more
information about training and consulting services.