Blog: Dialing for Dialog

« Do we need natural language? | Main | AI »

01/27/2008

The complexity ceiling

Pollocknumberone1948 The tools we use determine the complexity we can handle. But tools, in software, are not just the traditional things-that-help-you-do-other-things, but also the abstractions you use. In that sense, the call-flow abstraction is a tool that allows you to build dialog systems up to a certain complexity. Dialog Modules and other abstractions imported from traditional software engineering help push the “complexity ceiling” of the call-flow higher and higher and enable building sophisticated 3rd generation spoken dialog applications. 

“One man’s ceiling is another man’s floor” goes Paul Simon’s song. I would also say “One tool’s ceiling is another tool’s floor”. All the complexity that we can handle depends on the tools we have. We can build a cabin using logs, a hammer, nails, a hacksaw and a ladder, but as soon as we start adding rooms and floors these tools become quite ineffective, and we should start considering using metal joints and fasteners, power drills, and a crane. That’s what I call the complexity ceiling. A tool, or a set of tools, determines that complexity ceiling, the type of complexity you can handle, the level of complexity above which you cannot go. Trying to go above that ceiling would be extremely hard without shifting to a new set of more sophisticated tools

Software can be very complex. What’s a tool in the software industry? Tools are not restricted only to that category of software that we all explicitly call “tools”—like integrated development environments, or IDEs, editors, or debuggers. Programming languages, models, and abstraction are tools as well; they are actually at the basis of the other more “tangible” tools, like the editors and the debuggers.

Let’s talk about spoken dialog systems—after all that’s what this blog is about.  The main abstraction—tool—used for commercial dialog systems today is the call-flow. How did we come about with the idea of “call-flow”? Well…you can imagine that the first time someone with a penchant for programming started building a spoken dialog system (some of us old-timers were there…) he or she probably wrote the whole interaction in C (or maybe in C++). So we can imagine how at that ancient time during the mid 1990 a spoken dialog system looked like: several pages of nested “if-elseif-else” statements, and a few inevitable “goto’s.” After having done that a few times—since programmers are smart and lazy people—those pioneers realized that …actually …that “if-elseif-else” thingy gets into the way, unless you can comfortably read 25 nested conditional statements with goto’s here and there (I know a few who actually can). And also they realized that anytime they were building a new system they were actually doing the same things over and over: perform an action (like for instance play a prompt), evaluate a condition (for instance the return value from a speech recognizer) and, depending on that, select and execute one of a number of possible actions. Blink! It is a graph! Nodes are the actions, and arcs are the conditions! That’s much better than 25 nested “if-else-elseif” statements! And guess what? I can teach it that to a VUI designer in no time!  And by the way…they are already using it in those horrible touch-tone IVRs…

That’s how the call-flow abstraction was imported into the spoken dialog world form the touch-tone IVR world. But the abstraction-tools did not stop here. A few years later someone else realized that even using the call-flow abstraction-tool they were doing the same things over and over again. Anytime they had to collect a piece of information from a speech recognizer—at that time speech recognition systems were still making mistakes, unlike today (oh well…)—they always had to re-prompt in case of low confidence or timeout, or confirm (the “I think you said…” way of talking) when the speech recognizer wasn’t so sure about the result. So some smart and lazy call-flow programmer thought of creating yet another abstraction: the Dialog Module. Dialog Module (or DM) abstraction flourished in the late 1990s, and since then call-flows started to be built with DMs. No longer did VUI designers and call-flow programmers had to specify the logic of every single collection, but they could simply configure DMs with a number of prompts and a bunch of parameters (like number of retries, whether they wanted to have confirmation, etc.). All of a sudden, using DMs, call-flows became less complex and more manageable since they did not have to deal of all those minutiae of timeouts, confirmations, etc. Every time you needed to collect a single piece of information from a caller, rather than re-creating the whole logic, you simply had to put a DM there and configure it. DMs pushed the “complexity ceiling” higher by allowing developers to build more complex applications with the same effort of simpler applications without DMs. That was smart!

Abstractions like DMs are not new in software, au contraire! The whole history of software engineering is a succession of more and more sophisticated abstractions that enabled building more and more complex software. Call-flows, and spoken dialog systems are today following the same path. Indeed we could not possibly build applications like troubleshooting and technical described by hundreds of pages of call-flows, and thousands of DMs, without importing powerful abstractions-tools form the software world. We do use—at SpeechCycle—abstraction tools like inheritance, modularity, recursion and other powerful concoctions invented by software engineers, and we do have “tangible” tools that support them and allow a software avert community, like that of VUI designers, to use the abstractions effectively and build the most sophisticated call-flows for the most complex spoken dialog applications today.

But that’s not all. Creating a call-flow, with logic, prompts and grammars is just the beginning. Testing complex applications with thousands of DMs requires tools; managing hundreds of grammars and thousands of prompts requires tools; building data-driven statistical grammars (SLMs) with hundreds of semantic categories derived from hundred of thousands of utterance samples requires tools; integrating call-flows with customer backends like CRM, databases, and diagnostic systems, requires tools; analyzing deployed system requires tools; reporting system performance requires tools. And even the right tools may fail to effectively deliver sophisticated solutions if there is not a sophisticated process (yet another tool) that orchestrates the whole design-development-delivery cycle

Spoken dialog systems are complex beasts; spoken dialog systems for technical support—what we call 3rd generation, or Speech 3.0—are even more complex beasts. It’s not “just speech technology”, but there is much, much more complexity lurking behind. Taming this complexity requires a high degree of innovation, discipline, and experience. It is not just speech. It is speech, and all the rest!

Posted by Roberto on Jan 27, 2008 6:12:31 PM Permalink

Comments

Post a comment

*Name:
*Email Address:
*Comments: