VoiceXML Developer Series: A Tour Through VoiceXML, Part XII

Saturday Oct 12th 2002 by Jonathan Eisenzopf
Share:

In this edition of the series, we complete the first version of the 'Frank's Pizza Palace' application by developing the remaining VoiceXML dialogs.

In this edition of the series, we complete the first version of Frank's Pizza Palace application by developing the remaining VoiceXML dialogs.

Overview

Last time, we developed the first three dialogs in our application. Now it's time to complete the rest of the dialogs and begin testing our application.

The first three dialogs were main.vxml, telephone_number.vxml, and validate_phone_number.vxml. These dialogs played a greeting for the user, prompted them for their phone number, and looked up their address in the Access database respectively.

Assuming that the user did indeed have a record in the database and confirmed that it was correct, the dialog transitions to take_order.vxml.

take_order.vxml

Now it's time to take the customer's order (view source). This VoiceXML dialog is similar to the pizza ordering application we developed in an earlier edition of this series. There are some things that have changed however. The customer's phone number has been stored in the application root as application.phone_number through a previous dialog. I've also added a <property> element on line 4. VoiceXML properties provide various controls on how a VoiceXML dialog functions. This particular property sets the minimum confidence level that the ASR must achieve to successfully recognize an utterance. Values can range from 0.1 to 1, whose value represents a percentage from 10% to 100%. A value of 1 tells the ASR that it must be 100% confident that it has recognized an utterance. Most ASRs will be set to 0.5 (or 50%) by default. I have lowered the default to 30% so that the ASR will not fail because of false negatives. I chose this value after testing the grammar for a while in Nuance V-Builder. I found that while the confidence level fell below 50% when there was background noise, the ASR still produced accurate results the majority of the time.

This is a mixed initiative dialog, meaning that a user can fill in multiple fields with a single utterance. The <initial> element on lines 7 through 11 provide this functionality. This section of the application will execute first and try to match the grammar referenced on line 6. I've made some significant changes to the PIZZA subgrammar in the PIZZA.grammar file since then (view source). The reason for the change has to do with the many variations that a customer might use to order a pizza. After coding and testing about 20 additional variances, I realized that it was ripe for consolidation using positive (+) and kleene (*) operators. A positive closer will match one or more occurrences of the phrase that is located to the right of the operator. The kleene closer will match zero or more occurrences of the phrase to its right.

Line 6 is listed below:

+([SIZE TYPE TOPPINGS] *[pizza with])

The + (or positive closer) operator enables this subgrammar to match numerous variations of a pizza order. A customer can start with pizza size, type, or toppings, optionally followed by the words pizza and/or with. This grammar will match any of the following utterances:

  • small hand tossed pepperoni pizza
  • deep dish large mushroom and pepperoni pizza
  • small pepperoni
  • pizza with olives and mushrooms
  • The number of possible utterances that this grammar will match is too high to count (for me at least). One of the side-effects of this more open grammar is that ASR confidence for matches went down from 60%-85% to as low as 40%. The rate of incorrect matches also rose in some cases where I was not speaking directly into the microphone or did not speak clearly. After lowering the confidence property and tuning the grammar a bit, I decided that the greater breath of possibilities was worth the tradeoff. Of course, if the grammar only matches a few of the form fields, the application can prompt for the unfilled values separately. In cases where grammars start becoming more dynamic, it may be necessary to process the matched text to see if the ASR actually provided a false match. This requires some fancy text processing and/or natural language processing techniques, which we'll save for another time.

    Yet another difference is the fact that we are using pre-recorded prompts instead of synthesized speech from the TTS engine. This really enhances the quality and usability of the application.

    Once we've filled all the fields in the form, the input is sent to the save_order.asp script.

save_order.asp

This PerlScript ASP file (view source) is responsible for taking the form field values passed from the take_order.vxml dialog and saving them to the PizzaOrder table in the Access database. If an error occured while saving the record, we transfer the caller to an operator on line 37. Lines 7 and 8 open a connection to the Access database. Lines 11 through 14 retrieve the form field values from the ASP Request object. Remember, we process VoiceXML forms on the backend the same way we do HTML forms. Lines 15 through 25 convert the phone number from words to numbers and strip out any extraneous text.

Now that we have a connection to the database and have retrieved our form data, it's time to build the SQL string that will save the data to the PizzaOrder table. Line 28 builds the INSERT SQL statement that is sent to Access with the ADO Execute method on line 29. Note that the syntax for the SQL statement may differ if you decide to use a database other than Access.

Line 32 tests the results of the Execute command. If an error occurs, we will output an error message and transfer the caller to a live operator. Otherwise, we will thank the customer for their order and end the call.

save_address.asp is now upload_audio.pl

Now let's step back to the validate_phone_number.asp script. If you recall, if we did not find a customer's address for a given phone number, or if the customer rejects the address on file, they are taken to the record_address form to record their address. This information is saved in a variable named AddressAudio and submitted to save_address.asp.

One of the strange annoyances of ASP is its innability to handle multipart form submittions, which are used to upload files from an HTML form. You would have thought Microsoft would have added this feature after 3 versions of ASP. They did finally add better support in ASP.NET, but this application is being developed in ASP 3 using PerlScript. I had hoped to find a reliable Perl script to handle binary files in ASP, but gave up after trying a few examples that only seemed to half work. Instead, I decided to fall back to a plain old, but very reliable, Perl CGI script to handle the uploaded text. To do this, I changed the next attribute in the validate_phone_number.asp (updated source) script to point to upload_audio.pl (view source) instead of save_address.asp. This script saves the address audio recording to a file named by the phone number that it's associated with. Once the audio file has been saved, the script transitions to take_order.vxml . The updated validate_phone_number.asp file also creates a new record in the Customers table on lines 39 and 40 if one does not already exist.

Conclusion

So now we have a complete application that utilizes many difference aspects of the VoiceXML language. I hope you've learned a lot about VoiceXML in this series, and I hope you keep coming back for more as we delve ever deeper into developing VoiceXML applications. If you have followed this series all the way through, please take some time and send me feedback to let me know that this series was of benefit to you or where you think it can be improved. I'm also working on getting the demo operational so that you can test it over the telephone if it happens that you are not able to test it on your own.

About Jonathan Eisenzopf

Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia that specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about the VoiceXML Developer series, or for more information about training and consulting services.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved