HAL's dreams versus useful tools
Speech recognition research has always aimed at building machines that can talk and understand speech as humans do. However, the realization of this dream is years away, and speech recognition technology is still severely limited. Yet, current speech technology enabled the automation of services effectively used by millions of people. As other machines, like for instance ATMs, speech recognition should not be regarded as a replacement of human beings, but rather as a tool that allows controlling a computer with voice. And like all tools, it requires a little learning for its users to be able to reap greater automation benefits.
HAL 9000, the 'almost' perfect computer of "2001 a
Space Odyssey" with Doug Rain's soothing voice and capable of personal
feelings and autonomous decisions, permeated most of my youth SciFi dreams and
imprinted my career as an adult. For
more than 20 years I pursued the goal of natural language communication with
machines, fascinated by the power of statistics, huge amounts of data, and
automatic learning. Indeed, learning to talk as humans do has been the holy
grail of human-machine spoken communication research from more than half a
century.
The dream, HAL’s dream, has not faded, but we now understand
that there is a marked difference between the dream of building a machine that
talks and understands as humans do, and the vision of creating a useful tool. Let's use ATMs, the ubiquitous cash machines,
as a reference. ATMs are not mechanical replications of bank tellers, and they
never wanted to be. But we do not dismiss them just because they are not that
"human."
HAL's dream of building a human-like speaking machine has not
faded, academic research is still pursuing it. It is an ambitious goal pursued
by many brilliant scientists. But until
we reach it--and we are not there yet--we do have to understand that a tool is
a tool is a tool. And voice recognition technology, today, is a tool. Period!
In the mid 1990s AT&T automated their operator service
by using a speech recognizer that could understand five, and only five words:
calling-card, collect, third-party, person-to-person, operator. Only five
words... that’s far from HAL's dream, but so useful to AT&T customers who
rarely complained just because they wanted more "natural language," or
because they wanted to be able to say "I am traveling in France, I do not
have any money, I forgot my credit card at home, can you make a collect call to
555 111 1212?" rather than just
"collect!" And so useful to AT&T, by allowing them to save
hundreds of million of dollars with only those 5 words!
So, what's speech technology today? It is a tool that allows
the control of a remote computer using your voice. Why voice? Because in some
situations that's the only way, or the most convenient method to control a
machine or input data. Do we need
natural language? We do sometimes, when 5 words are not enough or we cannot
summarize all the possibilities with a small set of keywords.
Voice recognition technology is a tool, but it is our
responsibility as users to learn how to use it. We do not go to ATMs and push
keys without reason, without understanding what we are doing and without having
read the instructions on the display. Probably we don't remember it, but there
was a time when we learned how to use ATMs, just as there was a time when we
learned how to use answering machines, the Web, and mp3 players. Now that we
learned how to use those tools, we are happy with them. ATM machines can work
only if users know how to use them. The same is true today for voice self service technology.
Posted by Roberto on Nov 25, 2007 2:07:29 PM
Voice Recognition
|
Permalink
|
Comments (3)

