After months of hype but no substance, the SALT Forum finally released a draft version of the Speech Application Language Tags (SALT) specification on February 19. My initial impression is that it was worth the wait and given follow-through from Microsoft, will significantly advance the use of speech driven applications on computer desktops and PDAs.
What makes the release of the specification significant is the fact that SALT was initiated by Microsoft. With dominance on the desktop and a growing market share in the PDA market, the SALT Forum and Microsoft have made the right move at the right time. While speech-based applications will likely grow over time with the improved quality of speech recognition and synthesis, the advent of multi-model applications that are accessible from multiple devices will likely become the next step in computing's evolution.
Here are some of the features of the SALT draft specification:
Focus on multi-modal development - While VoiceXML could be used on PDAs and the desktop, its focus is on the telephone. SALT was designed to support multiple devices including the telephone.
Supports XML form of SRGS - why re-invent the wheel? What an amazing concept. The SALT specification requires support for the XML form of the Speech Recognition Grammar Specification (SRGS), which was developed by the Voice Browser group of the World Wide Web Consortium (W3C) and is also used in VoiceXML.
Parallel tasks - Users can interact with an application and speak or listen to a SALT application at the same time. For example, a user could browse a list of tasks on their PDA while they also listened to a recorded annotation from their boss.
Applications are DOM based - SALT applications will use the HTML and XML Document Object Model that is already familiar to Web developers.
Uses SSML for speech synthesis - The Speech Synthesis Markup Language (SSML) was also developed by the Voice Browser group at the W3C. SALT utilizes this common format.
Call Control - SALT includes call control features, such as distributing calls based upon the caller's phone number. This is clearly a telephone-based feature, and happens to be a critical piece that the VoiceXML 2.0 specification lacks.
Uses fewer XML elements - In SALT, there are only four top level elements: <prompt>, <listen>, <dtmf>, and <smex>. There are additional elements such as <record> and <grammar>, but there are only 10 XML elements total, versus over 30 in VoiceXML. This may be good or bad, depending on how you look at it. From the looks of it, the elements are basically place holders upon which ECMAScript is hung. This is consistent with the general approach of developing speech applications in a more programmatic way versus the "document construction methodology" that is typically used with VoiceXML.
The bell for Microsoft's dominance of the multi-model application market may toll when it provides support for SALT in its development tools, Web browser and mobile and desktop operating systems. It's still too early to tell whether SALT will join the Microsoft Agent as another Microsoft stepchild or whether it will emerge as part of Microsoft's emerging .NET strategy.
As for the future of VoiceXML vs. SALT, yes the spec is a direct competitor to VoiceXML even though SALT members have been careful to avoid saying it directly. Will SALT displace VoiceXML? It's too early to answer that question. What I do predict is that SALT will be the defacto standard for integrating speech functionality into desktop, PDA, and Web applications. For now, VoiceXML will likely remain the dominant standard for developing next generation IVR functionality that integrates with backend Web applications.
Of course, SALT is just a spec, so you should consider SALT applications purely vaporware until a vendor is able to produce a real demo. I've not been able to get any of the contributors to produce such as demo as of yet, so it is still unclear when SALT will be a viable technology.
If you'd like to know more about SALT or to download the draft specification, visit http://www.saltforum.org.
About Jonathan Eisenzopf
Jonathan is a member of the Ferrum Group, LLC which specializes in Voice Web consulting and training. He has also written articles for other online and print publications including WebReference.com and WDVL.com. Feel free to send an email to firstname.lastname@example.org regarding questions or comments about the VoiceXML Strategy series, or for more information about training and consulting services.