- Quick Links
-
- PSI At a Glance
- Company Brochure
- Client Testimonial
- Certification and Membership

White Papers: Voice
- Why Voice?
- What Market Say?
- Enter TECHNOLOGY
- What is Driving?
- What does Technology buy you?
- Voice Application Architecture
Why Voice?
In today’s world you are not just what you know, but how fast you know - speedy access to information is a fundamental facet of communication, especially Business Communication. There are various communication channels between business stakeholders, but the most important is the optimal use of these channels – to do businesses better, faster, cheaper, with innovation and even style and novelty - that is the competitive driver.
Even as businesses find their way through this maze, an option is budding that has the potential to make an audible difference to the way business information is disseminated – Voice. An option that is very apparent, most natural, and always been freely available.
"The ability to add speech to graphical applications will contribute to further growth in the rapidly expanding VoIP market and will enable delivery of innovative and interoperable value-added communications and services over a single packet network."
— Alistair Woodman, Director of Marketing, Voice Technology Center
Cisco Systems
By providing voice access to a web site over the phone, customers are provided with
anytime anywhere access to the services without having to be tied to a computer and be logged into the web to access these services. With an increasing penetration of land line and mobile telephones in today’s world, people essential have a voice based access device with them at all times.
Speech technology, as it improves, will become a very natural and powerful interface for the ubiquitous web devices. Microphones are much smaller than keyboards and keypads; speakers are smaller than screens. So it seems quite likely that many future web devices will have on-board speech recognition (as do some mobile phones today), or perhaps that we'll carry voice-activated universal remotes to talk to the devices in our immediate surroundings. The need of the hour is to have a language, which can enable a user to interact orally with the web from any part of the globe.
What Market Say?
The overall market for voice-recognition technology topped $1 billion for the first time in 2006, a 100 percent increase in just two years. Within that broad market, there are numerous subsectors that are likewise surging: The market for server-based voice-recognition technology to power call centers and the like reached nearly $600 million in 2006 and is expected to double by 2009, according to Opus Research.
The market for speech technology embedded in devices such as phones and auto dashboards - worth about $125 million in 2006, according to research firm Datamonitor - is expected to quadruple to $500 million by 2010, powered by the rapid spread of voice-command features on phones and cars with increasing levels of "talking electronics," from music players to navigational systems. Ultimately, some experts say, voice-recognition systems are likely to be built into almost every gadget, appliance and machine that people use. (Source:CnnMoney.com)
Enter TECHNOLOGY
“Improvements to voice-recognition algorithms and greater computing power have changed speech technology from an approach with limited uses to an increasingly important part of many applications.
As this process has unfolded, developers have searched for an open standard, rather than proprietary development and runtime environments, that would let them easily and quickly add speech input/output capabilities to applications that function across platforms”
Inderpal Singh Mumick, founder and CEO of Kirusa, a wireless-platform developer.
Two approaches came out from this search VoiceXML and SALT.
VoiceXML - Voice Extensible Markup Language. VoiceXML is based on XML and strongly benefits from the ability to move audio data efficiently across the web. It was designed for the development of speech based telephony applications on the web.
SALT - Speech Interface Markup Language. It consists of a small set of XML elements, with associated attributes and DOM object properties, events and methods, which apply a speech interface to web pages. SALT can be used with HTML, XHTML and other standards to write speech interfaces for both voice-only (e.g. telephony) and multimodal applications.
VXML and SALT are software standards that address the unique user-interface requirements of humans interacting with computers by listening and speaking. As extensions of Web technology, they're designed to fit the common Web three-tier information architecture: The presentation tier, where the end user interacts with a "browser"; the middle tier consisting of the Web server and the application's business logic; and the back-end data storage tier. The primary difference between purely text/graphical Web applications and those with voice lies in the presentation tier, where the "browsers" employ voice and audio rather than, or in addition to, text and graphics. VXML was developed under the auspices of the Voice XML Forum, while the more recent SALT standard is supported by the SALT Forum.
Early on there were differences between VoiceXML and SALT and today programming variations remain. The only major difference is the style of programming: VoiceXML provides a Forms Inter-pretation Language Algorithm for sequencing through the fields of a voice form, while programmers must specify this sequence when programming with SALT. However, even this difference will disappear in Version 3, the follow on to VoiceXML 2.0.
What is Driving?
VoiceXML and SALT are best suited for applications that require relatively little input from the user and deliver highly-targeted output that generally is (or could be) available from an HTML Web interface. A typical application is a service, whereby callers dial a phone number to retrieve information, such as stock quotes, air-line flight information, or weather from a Web site. Early adopters tend to use the technology in this way, but VXML or SALT will gain ground for more diverse application, such as voice-enabled intranets and contact centers, notification services, and other innovative telephony services. Some typical applications include the following:
Information retrieval: VXML/SALT is ideal for applications where input requires a
few navigational commands and moderate data entry, such as “Dial or say ‘1’ for yesterday’s assembly line performance statistics,” “Say the name of the product” for updated market-development notes, “or your department number and password” for company news from the intranet. Voice input can use quite a large vocabulary, such as freeform street addresses for a city, or stock quotes for a specified company and
period. Natural language speech recognition makes this interaction easier than
ever. “I need flight information from RDU to Baltimore on Friday.” “When is order 54362A for Jean Smith due to arrive in Columbus?”
Electronic commerce: VXML/SALT is naturally well suited for customer service applications (such as tracking parcel shipments, checking account updates, and using
call center services), as well as financial applications, such as getting stock quotes or conducting online banking. If the customer has specific ordering information (from a catalog or direct mail flyer, for instance), VXML can be useful for order-taking applications.
Telephony services: Personal name dialing, one-number “follow-me” services, teleconferencing set-up, and other telephony features can be voice-enabled. For example, a company or service provider could place a phone directory of its employees or subscribers on its Web site, which could then be used to voice-dial just by speaking their names.
Directory assistance: Nortel Networks had already automated directory assistance service in the mid-1990s (“What city... what listing...”), but VXML gives
this type of automation new power and flexibility through integration with the Web. Corporate name dialing as a packaged application makes it easy to reach colleagues whose numbers you don’t know simply by speaking their name or department.
Internal processes. Because security features that apply to the Web, such as firewalls and encryption, can be applied to voice applications as well, VXML or SALT can be used to create secure intranet applications those voice-enable internal processes, such as supply ordering, HR self-service, and corporate news.
What does Technology buy you?
It brings the advantages of Web-based development and content delivery to telephony services, especially to IVR self-service applications. Everyone can access the Web. For all that the PC has been heralded as the multimedia communications portal of the future, the phone is and will continue tinue to be important. Phones are available able just about everywhere in the world, and there are many more of them than Internet-connected computers. Phones are always on; they don’t have to be booted up. Mobile phones are small enough to be carried everywhere, much more portable than the slimmest laptop, at a tiny fraction of the price. And their batteries last longer.
Aany telephone, even the most primitive old-style phones, can become a voice portal into the Web. A voice browser running on a telephony server interprets the input (speech or dial-pad tones) and passes it to the application logic running on the Web server. There’s no need for a cumbersome PC with Web browser and special Internet connection. When voice-activated “universal versal remotes” take hold, they will parse VXML content from all devices in the vicinity.
Self-service applications can be much more sophisticated. Blending the advanced speech processing capabilities of telephony servers with the virtually limitless information repositories on the Web, VXML and SALT makes it feasible to implement very powerful, flexible applications.








