Introduction to SALT (Part 4): Microsoft .NET Speech SDK

Saturday Dec 21st 2002 by Hitesh Seth
Share:

In this article, we will look at the new Microsoft .NET Speech SDK which allows for the rapid development of SALT-based multimodal/telephony applications.

So far in our series on SALT (Speech Application Language Tags) we have looked at the basic architecture of SALT based applications. We have also taken a closer look at some of the "elements" of SALT and their syntax. Now that we are ready to start developing, we will take a look at the new Microsoft .NET Speech SDK which allows for the rapid development of SALT-based multimodal/telephony applications. We are going to be using the newly-released Beta 2 as the basis for this discussion.

Introduction to Microsoft .NET

Let's first take an introductory look at the whole Microsoft .NET initiative. From a speech application developers perspective, Microsoft .NET is an environment for developing, deploying and executing web-based and windows applications as well as XML/SOAP-based web services. The objective of the framework is to really arm the developer with a set of tools and a rich set of class libraries, and a flexibility of working with different programming languages (to suit his expertise level). As speech application developers we are focusing on the application business and presentation logic while leaving the "plumbing" work (e.g. connecting to databases, parsing xml etc. etc.) to the underlying tools and framework.

There are really three major components of Microsoft .NET: Microsoft .NET Framework, Visual Studio.NET and .NET Enterprise Servers. Microsoft .NET Framework itself is composed of a set of components, including:

  • .NET Common Language Runtime (CLR) which provides an object and type system common across the .NET compatible languages and is responsible for performing all the ground work/memory allocation, thread/process management, managing security, etc.
  • .NET Supported Languages: The .NET Frameworks provides developers with the flexibility of using a number of different languages, including the newly developed C# (C++/Java like), C++, Visual Basic.NET and JScript. Apart from these core languages, a number of third party vendors have developed support for a number of additional applications as well.
  • .NET Class Libraries., a rich and comprehensive set of classes that provide a lot of pre-built functionality--they are available to all .NET supported programming languages. These include classes built around user interface development, web services, database access, networking, input/output, web application development, XML/XSLT processing, multi-threading, security, etc.
  • ASP.NET as the name suggests, it is the next generation of the popular Active Server Pages (ASP) web development environment for creating dynamic web applications. ASP.NET based applications can use any .NET supported languages as a scripting language within a page execution model. ASP.NET has major advances from ASP, including support for web services, server-based web controls and XML-based configuration of web applications.

For years, Microsoft Visual Studio has been the de-facto standard for developing Visual Basic, Visual C++ and ASP based applications. Visual Studio.NET, the next revision of the Visual Studio toolset, builds on its success and supports .NET programming in C#, Visual Basic.NET, Visual C++ and Jscript.NET for web and Windows application development. It provides an integrated development and debugging platform for the development of Windows form-based GUI applications, Windows services, reusable components (or building blocks), web-based applications and web services.

.NET Enterprise Servers including SQL Server, BizTalk Server, Commerce Server, SharePoint Portal Server, Application Center, Content Management Server, Exchange Server, Host Integration Server, Internet Security & Acceleration Server and Mobile Information Server. These provide the basis of pre-built enterprise class applications and services.

.NET Speech SDK

So where does .NET Speech SDK fit in with Microsoft .NET? The beta 2 release of the SDK has three main additions to the .NET framework and toolset.

.NET Speech Add-in for Internet Explorer

This add-in allows IE to be used as a viewer for the applications. Through a parameter in the URL, the application can be tested and used in a voice-only environment (simulating the telephony-based application model) and a multimodal environment (which is really a real world usage of the application).

ASP.NET-based Speech Controls

The ASP.NET-based Speech Controls allow developers using ASP.NET and Microsoft Visual Studio.NET to create multimodal/telephony applications and/or add speech interactivity to existing web applications. The screenshot below (click to enlarge) shows these tools being used to develop a speech-based interactive pizza ordering application.

These controls add to the existing .NET class libraries and provide developers with the ability to add speech-based interaction to their existing applications or to build new applications. The table below shows a quick reference to the functionality provided by these controls.

Control Function
Speech Controls
QA Collects & process speech/DTMF input from the user
Command Collects inputs such as help, repeat, cancel which is not processed by QA Control
Custom Validator Validates input data through a script
Compare Validator Validates input data by comparing with another control/value
Semantic Map Contains a set of values which provide input controls semantic state and its bindings
Style Sheet Contains a set of common speech controls properties
Call Control Controls
Smex Message Sends a CSTA (Computer-Supported Telecommunications Applications) Message
Transfer Call Transfers the current call
Disconnect Call Disconnects a call
Make Call Initiates a telephone call
Answer Call Answers a Call
Call Info Contains basic information about the current call
Application Controls
Alpha Digits Collects a string of numbers and lettrs
Currency Collects an amount in US dollars
Date Collects a date
Natural Number Collects and validates a natural number
Navigator Allows navigation of a list of table based elements
Phone Collets a US telephone number
Single Item Chooser Allows a user to select a single item from a list by dynamically creating a grammar
SSN Collets a US Social Security Number
Yes No Collects a Yes/No answer
Zipcode Collects a US Zip Code

Speech Tools

The Speech Tools include grammar builder, prompt builder (shown below) and speech debugger, which aids in constructing and testing different parts of a speech application.

The table below provides a quick reference for the tools provided by .NET Speech SDK.

Grammar Editor
  • Graphical Tool for visual development and testing of speech grammars
  • Supports XML-based grammar files based on W3C Speech Recognition Grammar Specification 1.0.
Prompt Editor
  • Set of editing tools that enable speech applications to develop a database of prompts used by the application.
  • Provides a graphical wave editor to customize .wav files
Speech Debugging Console
  • Provides speech information (including speech data, event and errors)
  • Activated when ASP.NET based speech applications is run in debug mode
Speech Control Editor
  • Set of extensions to the Visual Studio WebForms designer for assembling a speech application.

To be Continued

We will continue our exploration of SALT in the next article by actually walking step-by-step through what is involved in developing a telephony/multimodal application using SALT and Microsoft .NET Speech SDK.

Resources

About Hitesh Seth

A freelance author and known speaker, Hitesh is a columnist on VoiceXMLtechnology in XML Journal and regularly writes for other technology publications on emerging technology topics such as J2EE, Microsoft .NET, XML,Wireless Computing, Speech Applications, Web Services & Enterprise/B2BIntegration. Hitesh received his Bachelors Degree from the Indian Instituteof Technology Kanpur (IITK), India. Feel free to email any comments or suggestionsabout the articles featured in this column at hks@hiteshseth.com.

Share:
Home
Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved