The complexity ceiling
“One man’s ceiling is another man’s floor” goes Paul Simon’s
song. I would also say “One tool’s ceiling is another tool’s floor”. All the
complexity that we can handle depends on the tools we have. We can build a
cabin using logs, a hammer, nails, a hacksaw and a ladder, but as soon as we
start adding rooms and floors these tools become quite ineffective, and we
should start considering using metal joints and fasteners, power drills, and a
crane. That’s what I call the complexity ceiling. A tool, or a set of tools,
determines that complexity ceiling, the type of complexity you can handle, the
level of complexity above which you cannot go. Trying to go above that ceiling would
be extremely hard without shifting to a new set of more sophisticated tools Software can be very complex. What’s a tool in the software
industry? Tools are not restricted only to that category of software that we
all explicitly call “tools”—like integrated development environments, or IDEs,
editors, or debuggers. Programming languages, models, and abstraction are tools
as well; they are actually at the basis of the other more “tangible” tools,
like the editors and the debuggers. Let’s talk about spoken dialog systems—after all that’s what
this blog is about. The main abstraction—tool—used
for commercial dialog systems today is the call-flow. How did we come about with
the idea of “call-flow”? Well…you can imagine that the first time someone with
a penchant for programming started building a spoken dialog system (some of us old-timers
were there…) he or she probably wrote the whole interaction in C (or maybe in
C++). So we can imagine how at that ancient time during the mid 1990 a spoken
dialog system looked like: several pages of nested “if-elseif-else” statements,
and a few inevitable “goto’s.” After
having done that a few times—since programmers are smart and lazy people—those
pioneers realized that …actually …that “if-elseif-else” thingy gets into the
way, unless you can comfortably read 25 nested conditional statements with
goto’s here and there (I know a few who actually can). And also they realized
that anytime they were building a new system they were actually doing the same
things over and over: perform an action (like for instance play a prompt),
evaluate a condition (for instance the return value from a speech recognizer)
and, depending on that, select and execute one of a number of possible actions.
Blink! It is a graph! Nodes are the actions, and arcs are the conditions!
That’s much better than 25 nested “if-else-elseif” statements! And guess what?
I can teach it that to a VUI designer in no time! And by the way…they are already using it in
those horrible touch-tone IVRs… That’s how the call-flow abstraction was imported into the
spoken dialog world form the touch-tone IVR world. But the abstraction-tools
did not stop here. A few years later someone else realized that even using the
call-flow abstraction-tool they were doing the same things over and over again.
Anytime they had to collect a piece of information from a speech recognizer—at
that time speech recognition systems were still making mistakes, unlike today
(oh well…)—they always had to re-prompt in case of low confidence or timeout,
or confirm (the “I think you said…” way of talking) when the speech recognizer
wasn’t so sure about the result. So some smart and lazy call-flow programmer
thought of creating yet another abstraction: the Dialog Module. Dialog Module
(or DM) abstraction flourished in the late 1990s, and since then call-flows
started to be built with DMs. No longer did VUI designers and call-flow
programmers had to specify the logic of every single collection, but they could
simply configure DMs with a number of prompts and a bunch of parameters (like
number of retries, whether they wanted to have confirmation, etc.). All of a
sudden, using DMs, call-flows became less complex and more manageable since
they did not have to deal of all those minutiae of timeouts, confirmations,
etc. Every time you needed to collect a single piece of information from a
caller, rather than re-creating the whole logic, you simply had to put a DM
there and configure it. DMs pushed the “complexity ceiling” higher by allowing
developers to build more complex applications with the same effort of simpler
applications without DMs. That was smart! Abstractions like DMs are not new in software, au contraire! The whole history of
software engineering is a succession of more and more sophisticated
abstractions that enabled building more and more complex software. Call-flows,
and spoken dialog systems are today following the same path. Indeed we could
not possibly build applications like troubleshooting and technical described by
hundreds of pages of call-flows, and thousands of DMs, without importing
powerful abstractions-tools form the software world. We do use—at SpeechCycle—abstraction
tools like inheritance, modularity, recursion and other powerful concoctions
invented by software engineers, and we do have “tangible” tools that support
them and allow a software avert community, like that of VUI designers, to use the
abstractions effectively and build the most sophisticated call-flows for the
most complex spoken dialog applications today. But that’s not all. Creating a call-flow, with logic,
prompts and grammars is just the beginning. Testing complex applications with
thousands of DMs requires tools; managing hundreds of grammars and thousands of
prompts requires tools; building data-driven statistical grammars (SLMs) with
hundreds of semantic categories derived from hundred of thousands of utterance
samples requires tools; integrating call-flows with customer backends like CRM,
databases, and diagnostic systems, requires tools; analyzing deployed system
requires tools; reporting system performance requires tools. And even the right
tools may fail to effectively deliver sophisticated solutions if there is not a
sophisticated process (yet another tool) that orchestrates the whole
design-development-delivery cycle Spoken dialog systems are complex beasts; spoken dialog
systems for technical support—what we call 3rd generation, or Speech
3.0—are even more complex beasts. It’s not “just speech technology”, but there
is much, much more complexity lurking behind. Taming this complexity requires a
high degree of innovation, discipline, and experience. It is not just speech.
It is speech, and all the rest!
The tools we use determine the
complexity we can handle. But tools, in software, are not just the traditional
things-that-help-you-do-other-things, but also the abstractions you use. In
that sense, the call-flow abstraction is a tool that allows you to build dialog
systems up to a certain complexity. Dialog Modules and other abstractions
imported from traditional software engineering help push the “complexity ceiling”
of the call-flow higher and higher and enable building sophisticated 3rd
generation spoken dialog applications.
Posted by Roberto on Jan 27, 2008 6:12:31 PM Permalink | Comments (0)
