Transcript raytheon08

Spoken Dialog Systems
Diane J. Litman
Professor, Computer Science Department
Spoken Dialog Systems
Systems that interact with users via speech
 Provide automated telephone or microphone
access to a back-end
 Advantages: naturalness, efficiency, eyes
and hands free

user
Speech
Recognition
TTS or
recording
Spoken Dialog
System
DB, web,
system
2
Challenges in Spoken Dialog Systems

Automated speech recognition



Natural language understanding
Dialog Management




Sphinx, Microsoft Speech, Dragon Naturally Speaking
How to keep the conversation going? Best strategy?
How to detect errors in communication?
How to recover from errors?
Spoken language generation
3
Application areas I have worked on

AT&T




Pitt



Phone-based Information Access
Call Centers
Social Networking Systems
(Physics) Tutoring
Backup for Port Authority human operators
Other Interests

Training, Troublesheeting, PDA’s
4
Speech-based Computer Tutors


What are they?
Example





Tutor: Well, if an object has non zero constant velocity, is it moving
or staying still?
Student: Moving
Tutor: Yep. If it’s moving, then its position is changing. So then
what will happen to the packet’s horizontal displacement from the
point of its release?
Student: It will change
Intersection of two fields:
 Spoken
Dialog Systems
 Intelligent Tutoring Systems
5
Intelligent Tutoring Systems

Education
 Classroom
instruction [most frequent form]
 Human (one-on-one) tutoring [most effective
form]

Computer tutors – Intelligent Tutoring
Systems
 Not
as good as human tutors
 Ways to address the performance gap
(Spoken) dialog systems
 Affective (dialog) systems

6
• Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002]
• Sphinx2 speech recognition and Cepstral text-to-speech
7
• Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002]
• Sphinx2 speech recognition and Cepstral text-to-speech
8
• Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002]
• Sphinx2 speech recognition and Cepstral text-to-speech
9
Current Research Directions

Automatic System Optimization


Can a system learn to optimize behavior based on prior data?
Evaluation

How can we tell if we are improving a system?



Can systems be tested with simulated rather than real users?
Affective Computing



Speech vs keyboard, TTS vs recordings, graphics vs. transcripts
How can user emotions be predicted in real-time?
How can the system exploit such information?
Prosodic and Linguistic Analysis

Respond to both what a user says, and how it is said
10
Human-Computer Excerpt
Tutor26: Let me ask my question in a more general way. What happens
to the velocity of a body when there is no force acting on it?
Student27: dammit (ASR: it is)
Tutor28 : Could you please repeat that?
Student29 : same (ASR: i same)
Tutor30 : Right. And what happens to the HORIZONTAL velocity of a
body when there is no HORIZONTAL force acting on it?
Student31: zero (ASR: the zero)
Tutor32 : Well... OK, let's consider how forces in the vertical direction
affect the motion of a body in a horizontal direction in a different
scenario <…omitted…>
Student33: oh gravity you already said this (ASR: beats gravity you're
exert less)
Tutor34: Fine. Are there any other forces acting on the apple as it falls?
Student35: no why are you doing this again (ASR: no y and to it yes)
Tutor36: Correct. The only force acting on the apple as it falls to the
earth is the earth's gravitational force. In which direction does
gravitational force act?
Student37: downward you computer (ASR: downward you computer)
Thank You! Questions?

Further Information

http://www.cs.pitt.edu/~litman/itspoke.html
12