FT.com site; May 03 2005
By Andrew Baxter
Speech recognition software is one of the many technologies to have suffered from being oversold in its early years, leading to disappointment and frustration for businesses and users. Its "coming of age" has been heralded regularly - and prematurely - but now it has truly arrived in enterprises around the world.
Banks and other financial services companies, airlines and travel groups are among enterprises waking up to speech recognition as a way to cut costs in their contact centres while improving service and reducing failed transactions.
Telecoms operators, with a natural vested interest in keeping people using the phone, are introducing sophisticated speech recognition - often enhanced with a character or "persona" to humanise the system and emphasise their brand values - for directory and other inquiries.
With initial, more basic investments behind them, in simple touch tone systems or Interactive Voice Response (IVR) where users say a number instead of pressing a button, companies are now looking for more from speech recognition.
"There is a real trend to continue exploring, and find more to automate," says Elizabeth Herrell, vice-president at Forrester Research.
As has happened elsewhere in the IT world, the arrival of open standards, along with falling prices for the software, has contributed to increased interest in speech recognition.
But the main development over the past two to three years has been, simply, that the technology is getting better.
"There is no question about the ability of speech recognition to handle difficult tasks over the telephone," says Bill Meisel, president of Californian analyst TMA Associates.
Error rates have fallen by 30 per cent a year for well over 10 years, says Mr Meisel, who is chairing the conference programme today and tomorrow at Voice World Europe 2005 at Olympia, London.
Ms Herrell points also to the introduction of refined speech patterns that do not require special training for the end user, enlarged pronunciation dictionaries and better elimination of background noise.
The development of natural language systems, enabling callers to speak more or less normally in something approaching a short conversation - without having to confirm every word - is perhaps the most important recent trend, however.
Mr Meisel gives an example of the type of conversation that can now be avoided: "I'd like to go to Boston" - "What time would you like to go to Austin?" - "No, I said Boston." The system's first answer should understand the caller's first question so that the conversation can continue.
The aim is to eliminate situations where every question from the caller leads to two more questions, he says. Steve Cramoysan, a principal analyst at Gartner Research, says there is a lot of magic in this "disambiguation" of questions.
All these developments have taken place in "speech engines" from companies such as Nuance and ScanSoft, and Mr Cramoysan says the engines are now good enough for enterprises, even if there is still room for improvement.
The tabular content relating to this article is not available to view. Apologies in advance for the inconvenience caused.The development of packaged, or at least partially-packaged, applications that sit on top of the speech engines is also spurring the enterprise market for speech recognition, says Ms Herrell, as it can reduce the time a project takes to implement by between 30 and 40 per cent, and save costs.
Still, she notes that almost all applications still require fine tuning to match the expected utterances of the user population, while more repeatable solutions are needed that could be applied across a number of industries - for adding an account holder to a policy, say.
In other circumstances, this continuing evolution in the technology could have persuaded enterprises to hold back before investing. However, the return on investment for speech applications is normally quite short and easy to measure, especially where they are replacing humans in a call centre or freeing humans to concentrate on less routine, higher-value inquiries. Most companies realise a payback in six to 24 months, says Ms Herrell.
Inevitably, this has been the primary motivation for companies to introduce speech recognition software, but as the technology improves additional benefits are emerging for enterprises and users. "The secondary motivator, which is less easy to prove, is to improve customer satisfaction," says Mr Cramoysan. "There are segments of the customer base that like the automated process, and people who prefer to check their bank balance anonymously."
Similarly, says Ms Herrell, there is evidence that people like to conduct automated conversations where the voice is "always preppy", or programmed to remain friendly and equable.
For the enterprise, the benefit is improved response rates.
"In our experience, well-designed natural language speech recognition applications are much 'stickier' than touch-tone or even simple (single word/answer) speech-based alternatives, meaning that customers typically exercise more options and are more likely to transact," says Richard Small, senior manager in the customer relationship management practice at Deloitte.
An additional benefit could come from better handling of staff turnover, which can be as high as 50 per cent a year in some call centres. "[With speech recognition] you won't improve customer service when compared with a well-trained operative whom you can get immediately," says Mr Meisel. "But if you get an untrained operative, it takes a long time [to complete the training]."
Beyond the call centre, speech recognition software offers intriguing possibilities for enterprises when used in combination with other technologies.
Mr Small believes the "next big thing" could be "100 per cent ID&V," where voiceprint verification and unique customer data is used to authenticate customers - the so-called "front-door challenge". "Wide scale roll-out of this technology will have a similar impact on security and customer confidence to that of Chip and Pin in retail banking," he predicts.
Mr Meisel suggests that, as companies upgrade their communications networks to voice over Internet Protocol (VoIP), speech recognition could be added without the need for more hardware.
It could then be used to answer callers automatically and direct calls anywhere in the world, via VoIP, or to agents working at home.
In the longer-term, Gartner's Mr Cramoysan sees speech recognition developing into a pervasive "access utility," used in a multi-modal manner - on the web or via mobile handsets or other devices - to access a range of applications such as Oracle or SAP business software. This, he suggests, will require enterprises to think beyond the individual applications for voice software, as they tend to do now.
Voice software suppliers, who already face the challenge of working out which markets warrant the investment to create the applications, will have the added problem of figuring out which application works best on which access mode - and who will need it.
- - - - - - - - - - - - - - - - - - - - - - - - -
FT.com 09-May-2005
Story read 4337 times