Spoken dialogue technology achievements and

Download Report

Transcript Spoken dialogue technology achievements and

Spoken Dialogue Technology
Achievements and Challenges
Michael McTear
University of Ulster
Overview




Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
What is a spoken dialogue
system?
A spoken dialogue system
is an automated system
that engages in a
dialogue with a human
user using spoken
language as the medium
of interaction.
Types of dialogue system
Two main types of spoken dialogue system

Task-oriented: involves the use of
dialogues to accomplish a task, e.g.
making a hotel booking, or planning
a family holiday

Non-task-oriented: engaging in
conversational interaction, but
without necessarily being
involved in a task that needs to
be accomplished
e.g conversational companion
for the elderly
Application Domains for SDS

Telephone-based services and transactions







Call-routing, Directory assistance, Travel enquiries,
Bank balance, Bank transactions, Flight / hotel / car
rental reservations
In-car interactive and entertainment systems
Automated trouble-shooting
Smart homes applications
Health-care systems e.g. patient monitoring
Educational e,g. Intelligent Tutoring Systems,
Foreign Language Learning
Computer games
Three generations of taskoriented spoken dialogue
system



Informational – to retrieve information e.g. flight
times, football scores, …
Transactional – to assist the user to perform a
transaction e.g. book a flight, pay a bill
Problem-solving – to support the
user in solving a problem e.g. to
troubleshoot a PC that is not
working
Why is dialogue interesting?

Fundamental aspect of human behaviour



Model human conversational competence
Simulate human conversational behaviour
Provide tool for interacting with data,
services, resources on computers



Research challenges
Applications in assistive and educational
environments
Commercial opportunities
Commercial Systems

Focus on






Business opportunities, return on investment (ROI)
Benefits for end users
Benefits for providers
Human factors: performance, usability
Tools and languages for design and maintainability
Application areas: call centre, enquiries, transactions,
healthcare, …
Academic Systems

Focus on




Technologies: speech recognition, spoken language
understanding, dialogue management
AI inspired: planning, reasoning, machine learning
Statistical v symbolic approaches
Advanced dialogue control, error handling, adaptivity,
context representation
Overview




Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
Example 1: Voice Menu
System: Hello and welcome ….
Main menu. For customer service, say ‘service’.
To enquire about an existing order, say ‘order’ …
User: Service
System: Customer service. Would you like to report a fault
or enquire about an extended warranty?
User: Fault
System: Do you have a PC or a laptop?
User: Laptop
System: And the name of the manufacturer?
User: Sony
System: Thank you. Please hold while I transfer you to the
Sony …
http://www.speechstorm.com/
Example 2: Research System
(Mercury: MIT)



Open ended prompt
How may I help you?
Disfluencies in input
August twenty-first no August twelfth
I'd like to fly from Boston to Minneapolis on Tuesday no
Wednesday November 21st
Inexact response
Prompt: Can you provide the approximate departure time
or airline preference
User: Yeah I'd like to fly United and I'd like to leave in the
afternoon
http://groups.csail.mit.edu/sls/research/mercury.shtml
Example 2: continued

Response generation
There are more than 3 flights.
The earliest departure leaves at 1.45 pm.

Mixed initiative: user asks question
Do you have something leaving around 4.45?

Relative date reference
I’d like to return the following Tuesday
Example 3: Voice Search
GOOG411
GOOG-411 (or Google Voice Local Search) is Google's
new 411 service.
With GOOG-411, you can find local business information
completely free, directly from your phone.
You can access 1-800-GOOG-411 from any phone,
anywhere, at anytime.
http://www.google.com/goog411/
GOOG411: Prompts
What city and state?
What business name or category?
(Lists services) Number one, …..
Connects to requested service
GOOG411: What can you say?
At any point in the call:
To go back say "go back"
To start over say "start over" or press *All phones
When asked for a city and state:
Say the full names for example, "Palo Alto California“
To enter a zip code say it or enter with keypad
When asked for business name or category:
Say the full names for example, "Joe's Pizzaria" or "Pizza“
When given results:
To navigate between results say or press the listing number
To receive an SMS say "text message"
To receive a map say "map it"
To get more details say "details"
Overview




Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
Architecture of a spoken dialogue system
a --> xu
Speech
Recognition
(ASR)
HMM
Acoustic
Model
Audio
a
ã
xu
yu
yu, c
Spoken
Language
Understanding
(SLU)
Dialogue Manager (DM)
N-Gram
Language
Model
Text to Speech
Synthesis
(TTS)
Words
ã, c
Dialogue
Control
Response
Generation
user dialogue act (intended )
c confidence
user dialogue act (interpreted)
user acoustic signal
speech recognition hypothesis (words)
Dialogue
Context Model
Concepts
Back
end
Component Technologies





Automatic Speech Recognition (ASR)
Spoken Language Understanding (SLU)
Response Generation (RG)
Text to speech synthesis (TTS)
Dialogue Management (DM)
Issues in ASR for Dialogue




recognising spontaneous speech in noisy
environments
word accuracy does not have to be 100%
use of confidence scores in combination with
other information to determine DM actions
use of additional information (ASR and parse
probabilities, semantic and contextual
features) to re-score recognition hypotheses
Issues in SLU for Dialogue


grammars and parsers for spontaneous speech
(disfluencies, errors)
robust understanding



problems with hand-crafted approaches
use of statistical/ data-driven methods
combined approaches e.g TINA (MIT)


hand-crafted rules with trained probabilities
robust strategy – if full sentence cannot be parsed,
parse and combine fragments, else use word spotting
Issues in Response Generation for
Dialogue

Content selection


Discourse planning






Determining what to say, selecting and ranking
options
discourse relations e.g. comparison, contrast
user-adapted information
Presentation ordering
Referring expression generation
Aggregation – grouping propositions into clauses
and sentences
Use of discourse cues (e.g. firstly, finally, however,
moreover, …)
Issues in Dialogue
Management

Dialogue Control


Representations



Scripts, frames, intelligent agents
Information State Theory
Error handling
Dialogue design


Traditional approaches
Statistical approaches


Reinforcement learning
Corpus / example based approaches
Overview




Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
A vision for the future
Develop systems that can interact intelligently
and co-operatively across a range of
environments using a range of appropriate
modalities to support people in the activities of
their daily lives.
Fundamental research topics






Modelling human conversational competence
Dialogue-related issues for ASR, SLU, NLG,
TTS
Comparison of methods for dialogue
management: rule-based v stochastic
Representation and use of contextual
information
Integration and usage of modalities to
complement and supplement speech
Incremental processing in dialogue
Areas of application






Voice search
Dialogue in vehicles
Mobile speech applications
Multimodal embodied and situated systems
Troubleshooting applications
Dialogue systems for ambient intelligence and
as assistive technologies
Concluding remarks
Spoken Dialogue Technology
 embraces a range of speech and language
technologies
 poses lots of theoretical as well as practical
challenges
 is interesting for commercial developers as
well as academic researchers
 has a wide range of potential applications
Recommended reading
McTear, M. (2004) Spoken Dialogue Technology. Springer.
Lopez Cozar, R. & Araki, M. (2005) Spoken, multilingual
and multimodal dialogue systems. John Wiley & Sons.
Aghajan, H., Augusto, J.C., Lopez Cozar, R. (2009)
Human-Centric Interfaces for Ambient Intelligence.
Elsevier.
Jokinen, K. & McTear, M. (2010) Spoken Dialogue
Systems. Morgan Claypool Publishers.
Wilks, Y. (ed.) (2010) Close Engagements with Artificial
Companions: Key social, psychological, ethical and
design issues. John Benjamins Publishing Company.
Thank you
Questions?