VoiceXML Developer Series: A Tour Through VoiceXML, Part X

Thursday Oct 10th 2002 by Jonathan Eisenzopf

In this edition of the VoiceXML Developer, we will create an Access database for Frank's Pizza Palace and record the audio prompts, based upon the dialog flow that we created in the last article.

In this edition of the VoiceXML Developer, we will create the Access database for Frank's Pizza Palace and record the audio prompts based upon the dialog flow that we created in the last article.

Creating a data model

The first thing we need to do next is design a data model for our Pizza Palace Access database. We can determine the tables and columns that need to be created by analyzing the call flow diagrams we created in the last article. The requirements are fairly simple. We need to be able to lookup a customer's address by phone number and we need to be able to save pizza orders. There is one additional requirement. If a customer does not have a customer record, we must record their address. This address will be stored as a .wav file on the Web server. New customer records should be marked as new and reviewed on a regular basis so that someone can listen to the recordings and type the corresponding address. Until a new customer record has a text address, we can play the address recording for the customer and have them confirm that as their address as opposed to synthesizing the text address.

For our application, we will need a customer table, which will contain their phone number, address, city, state, and zip. It will also contain the file name of the recorded address that the customer uttered when their record was created.

We will also need an order table, which will contain the order date and time, the pizza size, the type of crust, and the toppings. We should also add a boolean flag that indicates whether or not the pizza has been delivered. The diagram below contains a data model representation of our Pizza Palace database.

View Data Model

The Phone column of the Customers table has been set as the primary key of the table. The  PizzaOrder table contains a foreign key constraint that references the Phone column of the Customers table. This will ensure that a pizza order cannot contain a phone number that does not also exist in the Customers table. The AddressAudioFile contains the filename of the address utterance, which is used to manually input the Address field. The PizzaDeliveredFlag is can be Yes or No. This field is set to Yes manually when the order has been delivered to the customer.

Now that we have our design, it's time to create the database. I created an Access database file named pizza_palace.mdb and used the design view to create the tables. In the design view for the Customers table (shown below), I specified all fields as being of type Text whose size was 50 except for the Phone field whose size I set to 15.

The OrderId field of the PizzaOrder table (shown below) is set as the primary key for the table. I've also included the OrderDateTime field, whose type is Date/Time so that we can track when orders were placed to make sure the store is getting orders out in the right order and on time.

Finally, we create the foreign key relationship between the Customers and PizzaOrder tables with the Relationships tool in Access. Drag the Phone field from the Customers table to the PizzaOrder table. This will popup the Edit Relationships window. Make sure that the Phone and CustomerPhone fields are selected for the relationship and check the Enforce Referential Integrity box, which will reject a new pizza order unless there is record with the same Phone number in the Customers table.

View DB Relationships

Now save the database and you're done.

Recording the voice prompts

Now that we've created the Access database, it's time to record the prompts. You can use any recording tool you want, but I would recommend Cool Edit 2000 for recording and editing your .wav file prompts. The waveform you see below was recorded in CoolEdit and is the welcome prompt that will be contained in main.vxml. The utterance I recorded was, "Thanks for calling Frank's Pizza Palace". The silence you see at the beginning of the waveform is where I intentionally waited for one second before speaking into the microphone.

View CoolEdit Screen Capture 1

While this one second sample doesn't contain any speech, it does contain the background noise generated by my computer. Highlight this sample and select Transform->Noise Reduction in the menu (shown below) and then click on the button labeled Get Profile From Selection and click OK.

View CoolEdit Screen Capture 2

This sets the pattern for background noise that we will filter out of the rest of the recording. Next, highlight the entire waveform and goto the Noise Reduction menu again. This time, just click on the OK button. This will automatically filter out the background noise for the whole waveform based upon the noise fingerprint you loaded from the silent sample. You'll notice that the waveform looks a little bit different and if you play the waveform, you'll notice that a lot of the noise that was originally in the sample has been removed.

The next thing we need to do is to adjust the amplitude of the sample. Select Transform->Amplitude->Amplify from the CoolEdit menu. In the menu on the right hand side (shown below), select 10dB Boost and click the OK button.

View CoolEdit Screen Capture 3

You'll notice that the waveform has higher peeks. When you play the new sample, it will be loud and crisp. The last thing we need to do is to remove the silence at the start and end of the sample. Highlight the silent portions of the waveform at the beginning and end and press the delete key. You will want to leave a very small break at the beginning and end, but not quite enough to be audible. Now you should have a waveform that looks like the one below.

View CoolEdit Screen Capture 4

Now, identify all of the prompts that need to be recorded using the dialog flow. Record and save the prompts using the techniques above. Be sure to speak clearly, but not too slow or fast. Try to maintain the same volume level and speaking rate as you record. If you need to, you can adjust the volume and speed in CoolEdit. When you play one waveform after another, it should almost sound like one recording because we will be combining these recordings for prompts. Another technique is to identify common phrases that will be used over and over again, such as "Is this correct". You can save time by recording once and reusing the waveform as many times as possible. Be sure that the inflection in your voice will be appropriate for the waveforms especially where we transition from one to another. For example, "Thanks for calling Frank's Pizza Palace". "May I take your order". When you play these two recordings together, it should sound natural.


Well, we've designed our database and learned how to record the prompts. In the next edition of the VoiceXML Developer, which is also the last article in the "Tour Through VoiceXML", we will conclude by developing the VoiceXML documents and ASP files using the database and prompts that we've recorded in this article.

About Jonathan Eisenzopf

Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia that specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to eisen@ferrumgroup.com regarding questions or comments about the VoiceXML Developer series, or for more information about training and consulting services.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved