In the 1968 film 2001: A Space Odyssey, a sentient (if ultimately haywire) shipboard computer named HAL 9000 converses with astronauts. Computers today aren’t smart enough to second-guess our actions (thankfully) or to carry on long, rambling discussions. But speech recognition software has gotten good enough to be adopted by major companies ranging from Merrill Lynch and T. Rowe Price to British Airways. These days, when you phone your broker or airline, there’s a good chance that the voice at the other end will belong to a computer — and that you’ll be able to command that computer with ordinary sentences, such as “What’s my account balance?” and “Buy 2,000 shares of IBM at $102.”
This technology is having a big financial impact on many companies’ call centers, where the software has replaced human operators in handling repetitive, informational calls asking for things like store locations and hours. It’s more consistent than even the most dedicated employee, it’s cheaper in the long term, and it never takes sick days. Moreover, it’s faster and more flexible than the previous generation of so-called interactive voice response, or IVR, systems — those annoying recordings that ask you to push 1 for sales, 2 for customer service, and so on.
How does it work? Essentially, speech recognition software listens for words rather than complete sentences. It first analyzes a stream of speech to identify phonemes — the sounds that make up words. Next it compares the phonemes to prestored sound patterns in its database to figure out what you actually said (e.g., “account” will likely be in a bank’s database, but “weather” will not). Finally the software assembles those words into a meaningful command or request for information.
For example, on a stock-trading site, if you say “buy” or “sell,” the system will then start scanning your speech for a name of a stock, a quantity, and a price. If something is missing or unclear, the software will ask you to repeat it. These programs are sophisticated enough that, with a bit of fine-tuning by your IT staff, they’re rarely flummoxed by background noise, accents, or poor pronunciation. According to Nuance, one of three major producers of voice recognition software, the technology is about 95 percent accurate at interpreting human speech on the first try, provided it’s within a well-defined subject area (such as the stock market). In practice, when the software can ask speakers for word confirmations and clarifications, the success rate rises to nearly 100 percent. For example, if you say you want to buy 100 shares of Cisco, the system might ask whether you mean tech vendor Cisco Systems or food-services company Sysco Corp. before executing the trade.
As you might expect, technology like this doesn’t come cheap. A full-scale voice recognition system for a major corporation can cost as much as several million dollars, depending on call volume and what features you want. But because of the speed and efficiency gains, it typically pays for itself in about six months, says William Meisel, president of speech-industry consulting firm TMA Associates. In addition to Nuance (NUAN), both SpeechWorks (SPWX) and IBM (IBM) have systems on the market, but you’ll most likely buy the technology through a telecom or IVR vendor such as Nortel (NT), Edify (SONE), or InterVoice-Brite (INTV). These companies license the underlying speech recognition technologies and can integrate them into your existing telecom setup. You’ll still need human operators to handle more complicated customer requests and to deal with cases where you want to provide white-glove customer service. But for simple transactions, routine account questions, order status inquiries, and the like, speech recognition is a safe bet.
One caveat: Don’t bother putting your money into speech-activated websites just yet. Many telecom carriers, as well as startups like TellMe and BeVocal, are making bold promises about a “voice Web,” where we will be able to surf the Internet by telephone. But these sites are still in their infancy and not yet widespread enough to be useful, says TMA Associates’s Meisel. Besides, who really wants to listen to a computer read a webpage over the phone?
As voice recognition technology matures, new uses for it will continue to evolve. In the meantime, today’s software may not be good enough to carry on breezy discussions a la HAL 9000, but it can help you keep costs down and improve customer service — and that’s an idea whose time definitely has come.
Mea Culpa: In a previous version, we mistakenly called the HAL 9000 the HAL 2000.
Link broken? Try the Wayback Machine.