Do we need natural language?
In many cases the recognition
of a few keywords is enough to build useful automated systems. However there
are applications that could not be automated without natural language
understanding. This is especially true when the number of choices is
large or there is a potential mismatch between the mental model of the caller
and the system.
So, do we need natural language? If speech recognition is a tool--like a keyboard--and if we can build useful applications based on the recognition of a few words, why do we need sophisticated natural language understanding? Why don't we code all the possible meanings at each point of the interaction into a bunch of keywords and design prompts that clearly instructs the caller to speak one of them?
The reason why we need natural language is that it is not always possible to get away with keywords. Let me make some examples. If I ask you to tell me the toppings you want on your pizza, you can very well express what you want with a set of very predictable words such as mushrooms, ham, or pepperoni. Everyone, or almost everyone, knows what pizza toppings are. The "model" for pizza is so well and widely understood that we can build an effective directed dialog system that takes the caller through a set of very well understood choices: How many pies? Thin or thick crust? Which toppings? Do you want any beverages? And the same is true for other applications such as flight reservation, banking, and stock trading. A menu based system, with well defined choices at each point, can get you what you want in an effective way (by the way ... ATMs are directed dialog machines...)
But now think of a system that helps you troubleshoot your computer, and imagine a directed dialog that asks you to select the problem-- or the symptom of the problem--you are experiencing. It could start by giving you a list of possible symptoms to choose from but the list, most likely, would be so large that it would not be possible to speak it on the phone. The system could attempt at breaking the list into high level categories, like hardware, software, and networking: please tell me if you are experiencing a hardware, software, or networking problem. Except the computer savvy, very few people would know which category to select. I do I know what type of problem I have...that's why I called you!!!
In this situation, and many others, we cannot get away with a bunch of keywords. We cannot leave the burden of selecting what to choose to the caller because the caller does not know what to choose. In many situations, like for instance troubleshooting but also call routing in general, the caller may not share the same mental model of the world, or not have a mental model at all (how many people have a mental model of internet provisioning? And among those that do, how many have the correct one?). The solution consists in letting callers describe what they want in their own words, and let the machine perform the mapping between what they say and a bunch of predefined categories. This is called Statistical Spoken Language Understanding, or SSLU, but people many refer to it in many other ways, such as SLM (Statistical Language Model), How May I Help You (HMIHY) technology, call steering, call routing, etc. But the concept is the same: perform an automatic mapping between all possible natural language expressions and a finite set of categories.
Having said that, the design choice of using natural language in a speech recognition applications is not an easy one. One has to consider a lot of factors and balance the delicate trade-offs between coverage and accuracy that are imposed by the imperfect speech recognition technology . And in many situations the choice is not so obvious. But this is the subject of a future blog.
Posted by Roberto on Dec 9, 2007 10:46:21 AM Permalink

Quite interesting. However, one cannot discount the uses of the directed dialog systems. At the end of the day what will sell is a solution and not the technology behind it.
An example, At many airports you will find ATMs as well as self-service kiosks for purchasing tickets to any destination. The user enters a few letters of the keyboard and the kiosk determines the location and makes the reservation. Even though the ATM can NOT make reservations, and the kiosks can be programmed to dispense cash, the ATMs will be around for a long time to come.
Similarly, the lower costs of a directed dialog system will ensure that they will not be replaced a superior natural-language successor!