Up 'til now, the VoiceXML examples we've used have been directed dialogs, which prompt users for input in a pre-defined order. In this edition of the VoiceXML Developer, we're going to learn how to develop mixed initiative dialogs, which allow users to fill multiple fields with a single utterance.
When an <initial> element appears in a VoiceXML document, the VoiceXML interpreter will execute it before gathering input for any <field> elements in the document. The <initial> element utilizes a <form> level grammar that is defined elsewhere. Otherwise, it can contain prompts and event handlers, but cannot contain a <filled> element, nor can it contain its own grammar. Once an utterance matches the form grammar, the VoiceXML interpreter executes the remainder of the document. Fields that were filled as a result of the initial user utterance will normally be skipped by the VoiceXML interpreter.
This technique enables us to create a grammar that is capable of matching multiple field values in a single utterance and also allows the user to control the order of the input. A great example of this is a bank application where a user wants to transfer $100.00 from their savings to their checking account. In a directed dialog, the dialog progression is controlled by the computer and takes multiple prompts to collect all of the information:
Computer: Please say the type of account you would like to transfer the funds from. Customer: savings. Computer: Please say the type of account you would like to transfer the funds to. Customer: checking. Computer: Please say the amount that you would like to transfer. Customer: One hundred dollars. Computer: Transferring one hundred dollars from your savings to your checking account. Is this correct? Customer: yes.
In a mixed initiative dialog, the user could simply tell the system what to do in a single natural sentence:
Customer: Transfer one hundred dollars from savings to checking. Computer: Transferring one hundred dollars from your savings to your checking account. Is this Correct? Customer: yes.
Wow, that's powerful. It means less time per call and if done right, will make your customers happy too.
To test this application, dial the VoiceXML Planet call VoiceXML Planet at 510-315-6666; press 1 to listen to the demos, then press 5 to hear this example. This example is a variation of the Pizza Palace example that we developed in Part V of this series. This time, we're developing an interface for Frank's Pizza Palace, a fierce competitor of Joe's Pizza Palace. Frank would like to implement a streamlined version of Joe's order application and allow customers to tell the system their order in a more natural way.view example 5
The first thing that you should notice is that we've defined a form-level grammar on line 7. The <initial> element on line 8 contains a <prompt>, which plays the initial prompt for the document and waits for the user to speak. The system will attempt to match the utterance against PIZZA.grammar#ORDER, a GSL subgrammar named ORDER contained in the grammar file named PIZZA.grammar. After the the form grammar matches an utterance, it may prompt the user for more information if the initial utterance didn't fill all three form fields. For example, if I were to say, "I'd like a small", then the system would set the pizza_size field value to equal "small", and then proceed to prompt me for input for the pizza_type and pizza_toppings fields.
Ok, let's take a look at the grammar file. This grammar file is used to not only fill values for the <initial> element, but also for the other form fields in the event that the user's utterance does not match all the fields.view the grammar file
Line 1 contains the ORDER subgrammar, which is set as the <form> grammar. A customer could say any one of the following utterances and match all three fields:
There are many more utterances, and many more possibilities that we've left out. The point here is that we can accommodate the many different combinations that customers might provide. The ORDER subgrammar contains the PIZZA subgrammar, which begins on line 5 and continues through line 13. This subgrammar is essentialy a listing of possible combinations, one per line, of how a customer might order their pizza. We've only listed a few possibilities. There would likely be many more. The PIZZA subgrammar in turn contains the SIZE, TOPPINGS, and TYPE subgroups. Let's take a closer look at these three subgrammars. On line 25 of the TYPE subgrammar, you'll notice a set of curly brackets that contain the statement:
The curly brackets contain the value that the subgrammar will return, and the statement above assigns $string variable, or the matched string, to the pizza_type slot. This actually tells the interpreter to assign the results of the match to the pizza_type form field. This is how a grammar is able to set field values in a mixed initiated VoiceXML dialog. You should see similar statements on lines 19 and 30 that fill the values for the pizza_size and pizza_toppings form fields.
If the initial utterance does not match all of the form fields, then subsequent calls to the subgrammars within each of the remaining fields will. Once all fields have been filled, we play the customer's order back to them on lines 23-27 of the VoiceXML document.
Mixed initiative dialogs are the heart and soul of next generation voice dialogs. We will be covering mixed initiative dialogs in more detail in the future. Thanks again for joining us for another edition of the VoiceXML Developer Tour Through VoiceXML.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC based in Reston, Virginia that specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to email@example.com regarding questions or comments about the VoiceXML Developer series, or for more information about training and consulting services.