Blog: Dialing for Dialog

November 2007

11/25/2007

HAL's dreams versus useful tools

Hal9000

Speech recognition research has always aimed at building machines that can talk and understand speech as humans do. However, the realization of this dream is years away, and speech recognition technology is still severely limited. Yet, current speech technology enabled the automation of services effectively used by millions of people. As other machines, like for instance ATMs, speech recognition should not be regarded as a replacement of human beings, but rather as a tool that allows controlling a computer with voice. And like all tools, it requires a little learning for its users to be able to reap greater automation benefits.

HAL 9000, the 'almost' perfect computer of "2001 a Space Odyssey" with Doug Rain's soothing voice and capable of personal feelings and autonomous decisions, permeated most of my youth SciFi dreams and imprinted my career as an adult. For more than 20 years I pursued the goal of natural language communication with machines, fascinated by the power of statistics, huge amounts of data, and automatic learning. Indeed, learning to talk as humans do has been the holy grail of human-machine spoken communication research from more than half a century.

The dream, HAL’s dream, has not faded, but we now understand that there is a marked difference between the dream of building a machine that talks and understands as humans do, and the vision of creating a useful tool. Let's use ATMs, the ubiquitous cash machines, as a reference. ATMs are not mechanical replications of bank tellers, and they never wanted to be. But we do not dismiss them just because they are not that "human."

We actually like ATMs. Why? Because they are fast, always available around every corner, they speak our language no matter which part of the world we are in, and never make mistakes--I have never ever received less or (alas) more cash than the amount that I have withdrawn from my account. And if there was a mistake, it was always "my" mistake, because either I punched the wrong key, or I did not understand what the machine asked me, or because I forgot I had already taken the cash and put it it my wallet just a second before, though I remained puzzled looking at the empty cash slot.

Yes, I do like talking to humans, but unless I have to perform a non-usual transaction, I always choose an ATM over a human bank teller. ATMs do not fulfill HAL's dream, they are tools, not duplicates of humans. But we know how to use them and what to expect from them. And they make our life easier.

HAL's dream of building a human-like speaking machine has not faded, academic research is still pursuing it. It is an ambitious goal pursued by many brilliant scientists. But until we reach it--and we are not there yet--we do have to understand that a tool is a tool is a tool. And voice recognition technology, today, is a tool. Period!

In the mid 1990s AT&T automated their operator service by using a speech recognizer that could understand five, and only five words: calling-card, collect, third-party, person-to-person, operator. Only five words... that’s far from HAL's dream, but so useful to AT&T customers who rarely complained just because they wanted more "natural language," or because they wanted to be able to say "I am traveling in France, I do not have any money, I forgot my credit card at home, can you make a collect call to 555 111 1212?" rather than just "collect!" And so useful to AT&T, by allowing them to save hundreds of million of dollars with only those 5 words!  

So, what's speech technology today? It is a tool that allows the control of a remote computer using your voice. Why voice? Because in some situations that's the only way, or the most convenient method to control a machine or input data. Do we need natural language? We do sometimes, when 5 words are not enough or we cannot summarize all the possibilities with a small set of keywords.

Voice recognition technology is a tool, but it is our responsibility as users to learn how to use it. We do not go to ATMs and push keys without reason, without understanding what we are doing and without having read the instructions on the display. Probably we don't remember it, but there was a time when we learned how to use ATMs, just as there was a time when we learned how to use answering machines, the Web, and mp3 players. Now that we learned how to use those tools, we are happy with them. ATM machines can work only if users know how to use them. The same is true today for voice self service technology. 

Posted by Roberto on Nov 25, 2007 2:07:29 PM
Voice Recognition | Permalink | Comments (3)