Spoken dialog systems
can help provide better customer care when they are connected and can interact
not only with the caller, but with other systems with which exchange knowledge
and perform actions.
Daniel Dennett, a professor of Philosophy at Tufts University,
one of the most eminent contemporary philosophers of the mind, and one of my
heroes, often talks about the “brain in the vat” metaphor. What is it?
Imagine—just imagine, please don’t do it at home—someone’s brain is removed
from the body and immersed in a vat of liquid that keeps it alive. Also imagine
the brain terminal neurons are attached to powerful computers which will
provide the exact stimuli that would produce, in the brain, a perception of the
world, just the same as a physical body would. The brain in the vat is a
powerful thought experiment commonly used by the philosophers of the mind to
discuss about reality, mind, and consciousness. Here I would like to use it for
a more mundane pursuit—and I humbly apologize for that to all the philosophers of
the mind.
Think for an instant of a brain in a vat; for making the
experiment less grim, do not think of someone’s brain, but a brain artificially
grown in a lab by a group of bio-computing engineers. A brain with no memory of
a past and no dreams of a future; a brain without any connections to the real
world, except for some wires that get into the terminations of the auditory
nerves, and some other wires that connect the articulatory nerves of the mouth
to some special nerve-to-speech apparatus (NTS: not invented yet). No touch, no smell, no taste, no vision. Not
much fun in the vat … uh? The only thing this brain knows is how to recognize a
bunch of spoken words, and which words to speak in response.
What can that poor “thing” do? Not much, except react with
words to what is spoken to it, as programmed by the bio-computing engineers.
What if things in the world around it change? Can it perceive it? Certainly
not. What if someone asks for help and the brain, after having tried to help
with all the possible instructions for which it is programmed, it has to send that
person to a more expert “real” human assistant? Can it send a note that
summarizes what was done so that the human assistant can try something else?
Probably not, because the brain in the vat cannot send “notes” on a different
channel than the one its auditory and speaking nerves are connected to. It
cannot take measurement, do stuff, check things, move objects and verify that
the objects have been moved. All it can do is ask others to do all these
things, and as we know asking other sometimes does not work. All of this ineffectiveness in dealing with
the real world is because the poor thing is “not connected”.
Non-connected brains in the vat are pretty much what we
build today when we create “non-connected” spoken dialog systems. There is very
little perceived—and actual— intelligence in a non-connected static dialog
system; all it can do is recognize speech and talk following a precise and
pre-established call-flow. But if you start connecting it to the rest of the
world, the system can start “perceiving” the status of what’s happening and it
can act consequently in a more “intelligent” way. Speech is not anymore a
repository of knowledge, but it is a mean, a channel among many others, used to
communicate with humans with their strange protocol called natural language. Besides
speech and natural language, there are a lot of other communications going on
through myriads of Web services that “connect” the call-flow with the rest of
the world. We know that because at SpeechCycle we build connected spoken dialog
systems, where the “spoken” part is only one part of the equation. And we
believe that connectedness is the future of intelligent systems.
Imagine you call an automated agent because your internet is
down. A connected system can get to your account, check where you live, and then
check the network to see if there are any outages in that area, or maybe
realize you haven’t paid for three months and then …well…”you have to talk to someone
in the billing department who can help you to resuscitate your account from the limbo of delinquency”…
and by the way … “I can also connect you right away”—and behind the scenes send
the human billing agent a note on what they should do as soon as they pick up
the phone, so we won’t waste any time.
Getting account, network information and sending notes to
human agents—typically called screen-pops—, in other words managing information
and knowledge, are not the only things that connected dialog systems at
SpeechCycle can do. The can actually “do” stuff. They can get into your modem
and cable-box at home and reset them; they can run a series of diagnostics and
determine the exact cause of the problem you are calling about, they can
determine the level of connectivity to your home by sending a ping signal, and do
many other things. And sometimes the SpeechCycle systems can do several of
those things in parallel—something that humans are not very good at—while at
the same time they are talking to you on the phone.
So, what’s involved in building connected spoken dialog
systems? I would say that the most important thing is abstraction. Abstracting
the functionality of the various connections from the intricate details each custom
implementation is the key to success. The creation of abstract objects that
reflect the elements common to all applications in a given vertical—for
instance accounts, network status, service, modem, cable box, premium channels,
etc. —and that can be used by the spoken dialog VUI—aka the
call-flow—regardless of the specific implementation is the key. VUI designers and developers can focus on
the interaction without having to fiddle with the backends, knowing that when
they invoke a “reset modem” command, the call-flow will do the right thing,
will fetch the customer account and the type of modem, will get its IP address,
send a reset signal, wait for the response, and return successfully to the main
dialog thread when the operation is complete. All of this because we are all
connected.