Spoken dialogue technology achievements and
Download
Report
Transcript Spoken dialogue technology achievements and
Spoken Dialogue Technology
Achievements and Challenges
Michael McTear
University of Ulster
Overview
Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
What is a spoken dialogue
system?
A spoken dialogue system
is an automated system
that engages in a
dialogue with a human
user using spoken
language as the medium
of interaction.
Types of dialogue system
Two main types of spoken dialogue system
Task-oriented: involves the use of
dialogues to accomplish a task, e.g.
making a hotel booking, or planning
a family holiday
Non-task-oriented: engaging in
conversational interaction, but
without necessarily being
involved in a task that needs to
be accomplished
e.g conversational companion
for the elderly
Application Domains for SDS
Telephone-based services and transactions
Call-routing, Directory assistance, Travel enquiries,
Bank balance, Bank transactions, Flight / hotel / car
rental reservations
In-car interactive and entertainment systems
Automated trouble-shooting
Smart homes applications
Health-care systems e.g. patient monitoring
Educational e,g. Intelligent Tutoring Systems,
Foreign Language Learning
Computer games
Three generations of taskoriented spoken dialogue
system
Informational – to retrieve information e.g. flight
times, football scores, …
Transactional – to assist the user to perform a
transaction e.g. book a flight, pay a bill
Problem-solving – to support the
user in solving a problem e.g. to
troubleshoot a PC that is not
working
Why is dialogue interesting?
Fundamental aspect of human behaviour
Model human conversational competence
Simulate human conversational behaviour
Provide tool for interacting with data,
services, resources on computers
Research challenges
Applications in assistive and educational
environments
Commercial opportunities
Commercial Systems
Focus on
Business opportunities, return on investment (ROI)
Benefits for end users
Benefits for providers
Human factors: performance, usability
Tools and languages for design and maintainability
Application areas: call centre, enquiries, transactions,
healthcare, …
Academic Systems
Focus on
Technologies: speech recognition, spoken language
understanding, dialogue management
AI inspired: planning, reasoning, machine learning
Statistical v symbolic approaches
Advanced dialogue control, error handling, adaptivity,
context representation
Overview
Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
Example 1: Voice Menu
System: Hello and welcome ….
Main menu. For customer service, say ‘service’.
To enquire about an existing order, say ‘order’ …
User: Service
System: Customer service. Would you like to report a fault
or enquire about an extended warranty?
User: Fault
System: Do you have a PC or a laptop?
User: Laptop
System: And the name of the manufacturer?
User: Sony
System: Thank you. Please hold while I transfer you to the
Sony …
http://www.speechstorm.com/
Example 2: Research System
(Mercury: MIT)
Open ended prompt
How may I help you?
Disfluencies in input
August twenty-first no August twelfth
I'd like to fly from Boston to Minneapolis on Tuesday no
Wednesday November 21st
Inexact response
Prompt: Can you provide the approximate departure time
or airline preference
User: Yeah I'd like to fly United and I'd like to leave in the
afternoon
http://groups.csail.mit.edu/sls/research/mercury.shtml
Example 2: continued
Response generation
There are more than 3 flights.
The earliest departure leaves at 1.45 pm.
Mixed initiative: user asks question
Do you have something leaving around 4.45?
Relative date reference
I’d like to return the following Tuesday
Example 3: Voice Search
GOOG411
GOOG-411 (or Google Voice Local Search) is Google's
new 411 service.
With GOOG-411, you can find local business information
completely free, directly from your phone.
You can access 1-800-GOOG-411 from any phone,
anywhere, at anytime.
http://www.google.com/goog411/
GOOG411: Prompts
What city and state?
What business name or category?
(Lists services) Number one, …..
Connects to requested service
GOOG411: What can you say?
At any point in the call:
To go back say "go back"
To start over say "start over" or press *All phones
When asked for a city and state:
Say the full names for example, "Palo Alto California“
To enter a zip code say it or enter with keypad
When asked for business name or category:
Say the full names for example, "Joe's Pizzaria" or "Pizza“
When given results:
To navigate between results say or press the listing number
To receive an SMS say "text message"
To receive a map say "map it"
To get more details say "details"
Overview
Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
Architecture of a spoken dialogue system
a --> xu
Speech
Recognition
(ASR)
HMM
Acoustic
Model
Audio
a
ã
xu
yu
yu, c
Spoken
Language
Understanding
(SLU)
Dialogue Manager (DM)
N-Gram
Language
Model
Text to Speech
Synthesis
(TTS)
Words
ã, c
Dialogue
Control
Response
Generation
user dialogue act (intended )
c confidence
user dialogue act (interpreted)
user acoustic signal
speech recognition hypothesis (words)
Dialogue
Context Model
Concepts
Back
end
Component Technologies
Automatic Speech Recognition (ASR)
Spoken Language Understanding (SLU)
Response Generation (RG)
Text to speech synthesis (TTS)
Dialogue Management (DM)
Issues in ASR for Dialogue
recognising spontaneous speech in noisy
environments
word accuracy does not have to be 100%
use of confidence scores in combination with
other information to determine DM actions
use of additional information (ASR and parse
probabilities, semantic and contextual
features) to re-score recognition hypotheses
Issues in SLU for Dialogue
grammars and parsers for spontaneous speech
(disfluencies, errors)
robust understanding
problems with hand-crafted approaches
use of statistical/ data-driven methods
combined approaches e.g TINA (MIT)
hand-crafted rules with trained probabilities
robust strategy – if full sentence cannot be parsed,
parse and combine fragments, else use word spotting
Issues in Response Generation for
Dialogue
Content selection
Discourse planning
Determining what to say, selecting and ranking
options
discourse relations e.g. comparison, contrast
user-adapted information
Presentation ordering
Referring expression generation
Aggregation – grouping propositions into clauses
and sentences
Use of discourse cues (e.g. firstly, finally, however,
moreover, …)
Issues in Dialogue
Management
Dialogue Control
Representations
Scripts, frames, intelligent agents
Information State Theory
Error handling
Dialogue design
Traditional approaches
Statistical approaches
Reinforcement learning
Corpus / example based approaches
Overview
Introduction - What is a spoken dialogue
system?
Examples of spoken dialogue systems
Technical issues and challenges
Future Prospects
A vision for the future
Develop systems that can interact intelligently
and co-operatively across a range of
environments using a range of appropriate
modalities to support people in the activities of
their daily lives.
Fundamental research topics
Modelling human conversational competence
Dialogue-related issues for ASR, SLU, NLG,
TTS
Comparison of methods for dialogue
management: rule-based v stochastic
Representation and use of contextual
information
Integration and usage of modalities to
complement and supplement speech
Incremental processing in dialogue
Areas of application
Voice search
Dialogue in vehicles
Mobile speech applications
Multimodal embodied and situated systems
Troubleshooting applications
Dialogue systems for ambient intelligence and
as assistive technologies
Concluding remarks
Spoken Dialogue Technology
embraces a range of speech and language
technologies
poses lots of theoretical as well as practical
challenges
is interesting for commercial developers as
well as academic researchers
has a wide range of potential applications
Recommended reading
McTear, M. (2004) Spoken Dialogue Technology. Springer.
Lopez Cozar, R. & Araki, M. (2005) Spoken, multilingual
and multimodal dialogue systems. John Wiley & Sons.
Aghajan, H., Augusto, J.C., Lopez Cozar, R. (2009)
Human-Centric Interfaces for Ambient Intelligence.
Elsevier.
Jokinen, K. & McTear, M. (2010) Spoken Dialogue
Systems. Morgan Claypool Publishers.
Wilks, Y. (ed.) (2010) Close Engagements with Artificial
Companions: Key social, psychological, ethical and
design issues. John Benjamins Publishing Company.
Thank you
Questions?