Speech Markup Language
The IBM WebShpere Voice Server is a VoiceXML 2.0-enabled speech environment. The VoiceXML is aimed at developing telephony-based applications, and takes the advantages of Web-based applications delivery to IVR applications. Being different from IBM, MS is using SALT 1.0 within MS Speech Server. SALT is a set of light-weight extensions to XML, adding speech-enabled telephony to Web-based applications and bringing them into a multimodal model. SALT targets speech-enabled applications across all devices such as telephones, PDAs, tablet PCs, and desktop PCs. The VoiceXML focuses on telephony application development whereas SALT is focused on multimodal speech applications that can be accessed by the whole device. These points will help you choose which one will be used in your real speech-enabled applications. So far, we have seen that IBM also has been starting to provide a multimodal toolkit in related products.
Framework and Programming
The Microsoft Speech Server and Speech SDK are based on the MS .NET Framework. You have to install the .NET Framework and ASP.NET Speech Controls modules in the speech server (SES/TAS) development machines as well as the Web server. The MS Speech Application SDK is being seamlessly integrated with MS .NET Visual Studio 2003; when you install the SDK, all controls of the SDK will be appear on the Visual Studio 2003 development environment toolbar. When programming under ASP.NET, on the server side you can code in C#.NET or VB.NET and use JScript or VBScript to code on the client side. Plus, you are able to use ADO.NET to implement database access and transactions.
Components of Server and SDK
The main components of MS Speech Server consist of Speech Engine Services (SES) and Telephony Application Services (TAS). SES includes Speech Recognition Engine for accurately handling users' spoken inputs, Prompt Engine for playing prerecorded prompts back users, and Text-to-Speech Engine using in playing Text-to-Speech by synthesizes audio output from a text string. The TAS contains a SALT Interpreter for dealing with all the speech interface and presentation logic (input and output) and interacting between the speech application and the telephony components of the architecture, Media and Speech Manager for handling requests made by SALT Interpreters to SES for speech recognition and prompt playback, and manages interfaces with the third-party TIM to deliver audio to and from the telephone user, SALT Interpreter Controller using in managing creation, deletion and resetting of the multiple instances of the SALT Interpreter that are managing dialogs with individual callers.
The MS Speech Application SDK provides ASP.NET Speech controls, Speech Control Editor, Speech Grammar Editor, Speech Prompt Editor, Speech Debugging Tools such as Telephony Application Simulator, Speech Debugging Console, Speech Debugging Console Log Player, Speech Add-in for Microsoft Internet Explorer, a speech application deployment service, and a broad set of grammar libraries. The IBM WebSphere Voice Server for Multiplatforms V4.2 includes VoiceXML voice browser, IBM Speech Recognition Engine, IBM TTS Engine, telephony and media component, and so forth. It can connect with many telephony platforms, including WebSphere Voice Response for AIX/Windows, Intel Dialogic, Cisco or Siemens HiPath, and Voice Server Speech Technologies for Windows and Linux.
The IBM WebShpere Voice Toolkit V4.2 can seamlessly integrate with the IBM WebSphere Studio visual development environment. Its components include a VoiceXML editor, grammar editor, pronunciation builder, CCXML editor, a lot of grammar libraries and Natural Language Understanding (NLU) model tools that help developers classify data for the generation of several statistical models, and also allow multiple developers to simultaneously work with the same set of data. The IBM WebShpere Voice Toolkit V4.2 also provides a telephony simulator used in development and testing.
Telephony Interface—Hardware and Software
For connectivity into the enterprise telephony infrastructure and call-control functionality, both IBM WebShpere Voice Server and MS Speech Server need the telephony interface of software and hardware. Intel Corp. and Intervoice Inc. provide a Telephony Interface Manager (TIM) that supports Microsoft Speech Server, which is a required component for any MS Speech Server voice-only solution. The TIM works in conjunction with the MS Speech Server, providing management and control over Intel Dialogic telephony resources. Using Call Manager software, developers can focus on speech application design and flow independent of the underlying telephony infrastructure. Also, the TIM is software that provides fast and easy integration of the speech server with the Intel NetStructure voice boards, enabling deployment of robust speech processing applications. Please note that multimodal applications do not require a TIM. The Intel version TIM is known as Intel NetMerge Call Manager.
Currently, the Intel Call Manager and Intervoice TIM support Intel Dialogic D41JCT, DM/V480, and DM/V960 telephony hardware ranging from 4 ports to 96 ports working with MS Speech Server.
The IBM WebSphere Voice Server provides software, telephony, and media component, used to manage the telephony interface. The IBM Voice Server also provides a set of C API used to integrate speech into a telephony platform. The IBM WebSphere Voice Server for Multiplatforms V4.2 can connect many telephony platforms, including WebSphere Voice Response (formerly IBM DirectTalk), Intel Dialogic voice boards, Cisco and Siemens HiPath VoIP Gateway, and Voice Server Speech Technologies for Windows and Linux. For VoIP, you need to install the H.323 telephony component in the voice server.
The IBM WebSphere Voice Server is scalable, starting from basic analog telephony boards to high-density digital solutions with a T1/E1 interface, including Intel Dialogic D/120JCT, D/240JCT, D/480JCT, D/300JCT, and D600JCT. When integrating with IBM DirectTalk, it also provides support for CAS, ISDN, and SS7 signaling connections. The Cisco 2600 Gateway with 2T1 or E1 is supported by the voice server.