Interview with Xiaofei Tang and Xiaolan Zhu(authors of the upcoming WROX Press book Early Adopter VoiceXML)
(XT) - I got interested in VoiceXML in the fall of 2000, while I was with a company developing a VoiceXML-based voice application platform. What I felt was important about VoiceXML is that it's quickly becoming a de-facto standard supported by both big and exciting new players, meaning you can actually write new voice applications that run. My real interest in natural language processing started back in the winter of 1986 when I came across the book The Cognitive Computer: On Language, Learning, and Artificial Intelligence by Roger C. Schank. I hence spent most of the following seven years learning and developing NLP, AI, and machine translation systems. After that, I spent about another seven years learning and developing enterprise and Internet applications. Then I found VoiceXML. To me, VoiceXML can be a great language to develop intelligent real-world language applications to be used by billions of people on the phone.
(XZ) - I got interested in and started learning VoiceXML in September 2000. We were developing a VoiceXML browser based on Nuance speech recognition technologies. Before that, our voice applications were developed totally in Java and the development process was definitely not an easy one! After we developed the VoiceXML browser, new voice applications were developed on it and the time saved had been a lot.
VoiceXMLPlanet - What part of the book did you enjoy writing the most?
(XT) - I really enjoyed writing both the Dynamic VoiceXML chapter and the Nuance SpeechObjects chapter. During the writing, I uncovered many exciting features of VoiceXML, especially the power of <subdialog> and <object>, making me an even firmer believer of the VoiceXML technology.
(XZ) - I enjoyed writing both chapters: Good Application Design and Advanced VoiceXML, but I particularly like the Good Application Design chapter. My background in both computer science and cognitive psychology helps me realize the very importance of good usability of the visual interface in developing voice applications. Users wouldn't care what leading-edge technologies you're using. They only believe in what they hear and what they can say. So I was very glad to have the opportunity to transfer my experience and knowledge in that field to the readers.
VoiceXMLPlanet - What did you and your co-authors want to achieve with the book?
(XT) - I wanted to show the readers that VoiceXML is both real and powerful in developing real-world voice applications.
(XZ) - To describe the VoiceXML language and its practical applications in a clear and convincing way.
VoiceXMLPlanet - What is the strangest/most innovative application of VoiceXML that you've come across?
(XT) - This is a strange question - the most useful applications may not be the most innovative ones, e.g. VoiceXML applications that voice-enable enterprise systems such as messaging and calendaring systems or Customer Relationship Management systems may not be the most innovative. If I have to answer this question, I'd probably say a VoiceXML application I heard, written for the VoiceGenie VoiceXML platform called "Who wants to be a Villionaire?"
(XZ) - I haven't come across any real strange application of VoiceXML yet, but an application that can detect the age/sex of the caller may be strange. You may have to use <object> to possibly implement it.
VoiceXMLPlanet - Where do you think the future of VoiceXML lies?
(XT) - I think in the future VoiceXML will become more and more important due to at least three factors: the core speech recognition engines will get even better, the new versions of the VoiceXML will introduce new and powerful features to cover all important issues related to a phone call, and people will get more and more mobile.
(XZ) - The future of VoiceXML lies in the easy and economical deployment of large-scale VoiceXML applications, in the easy integration with backend enterprise systems, and in the design of very user-friendly voice interface.
VoiceXMLPlanet - What new innovations need to happen for VoiceXML to succeed in the future?
(XT) - Better usability support in the core speech recognition engines to make voice applications developed in VoiceXML sound more natural and intelligent.
(XZ) - Better speech recognition engines and good development tools.
VoiceXMLPlanet - What other technologies are currently capturing your interest?
(XT) - Statistical language model so you can say something out of your strictly defined grammar and make the dialog more intelligent. W3C already has a working draft called Stochastic Language Models (N-Gram) Specification, available at http://www.w3.org/TR/ngram-spec/.
(XZ) - The new XML-based speech recognition grammar format, to be standard in VoiceXML 2.0. This will make it easy to port VoiceXML applications between different speech recognition engine vendors.
VoiceXMLPlanet - Apart from writing books, what do you like to do in your spare time?
(XT) - Playing basketball to win first, or to have fun if can't win; reading history and other good readings; travelling around physically or mentally; eating out in "a clean, well-lighted place" (used to be "candlelit").
(XZ) - Travelling or watching good soap opera's all day and night on a long holiday weekend.
Stay tuned next week on VoiceXMLPlanet.com for an excerpt from their book, VoiceXML.
This interview comes to us from WROX Press--technical books that you can count on!