L2F – Spoken Language Systems Laboratory
Download
Report
Transcript L2F – Spoken Language Systems Laboratory
L2F - Spoken Language Systems Lab
Genesis
• Created in January 2001
– As a result of a major restructuring of several groups within and outside
INESC ID Lisbon
• Goal
– Bring together several research groups to add relevant contributions to the
area of computational processing of spoken language for European
Portuguese
– United by the problem we want to solve, not by the technology we share
• People
– About 10 PhD researchers, 10 PhD students, 3 MSc students, 12
undergraduated students
– Formal cooperation with CLUL (Center of Linguistics of the Univ. of Lisbon)
2
Mission
Creating technology to bridge the gap
between natural spoken language and the
underlying semantic information
3
Lines of Activity
• Prioritary
– Semantic processing of multimedia contents
– Spoken dialogue systems platforms
• Emerging
– Computer enhanced human-to-human communication
• Automatic transcription of meetings
• Speech-to-speech translation
• Continuing
– Processing other varieties of Portuguese
– E-inclusion
– E-learning
4
Core technologies
• Speech Coding
• Speech Synthesis
» DIXI+
• Speech Recognition
» AUDIMUS
• Language / Accent Identification
• Natural Language Processing
• Dialogue Management
5
DIXI+
• Continuation of the DIXI project (1991)
• Synthesis by concatenation, instead of
by rule
• More elaborate prosodic models
• Developed within the Festival
framework
• Focused on alternative and
augmentative communication
applications
• Currently under development
6
AUDIMUS
• Continuous speech recognition system for the European
Portuguese language
• Hybrid system combining the Multilayer Perceptrons and
Hidden Markov models (MLP/HMM)
• Vocabularies from 5K, 64K, ... depending on the task
• Stochastic language model of the N-gram type
• Speaker independent system or speaker adapted
depending on the task
• First application: radiology report
7
AUDIMUS results on BN Speech Recognition
Word Error Rate (WER %)
57K
SYSTEM
Gender Indep.
BN ( 5h)
BN (22h)
8
F0
30.6
23.2
16.3
All
55.4
42.1
32.8
Semantic processing of multimedia contents
• ALERT
Selective Dissemination of Multimedia Information
• IPSOM
Indexing, Integration and Sound Retrieval in Multimedia Documents
– Improved access to spoken books by the visually impaired (indexing
words, sentences, topics)
– Development of multimedia interfaces for accessing and retrieving
spoken books (didactic applications, etc.)
9
Multimedia
Document
If video
contained
If audio
contained
Image / video
processing
Video based segmentation
Media watch
Audio based segmentation
Speech
processing
Transcription
If text
contained
Multimedia
document
database
10
Automatic
topic
detection
Keywords
Label
database
Match topics
found against
user profiles
Alert
Specific
Users
Spoken dialogue systems
Goal: to develop Spoken dialogue systems and intelligent
multimodal interfaces:
• phone-based information system;
• "intelligent" demo room controllable by voice;
• the development of a story teller: a fully embodied
conversational agent for reading stories to children.
11
118 - Telephone number synthesizer
The requested number is xx-xxx-xx-xx, repeat, xx-xxx-xx-xx
12
Speech based interface for a dialogue system
Telephone
Database
Updater
Internet
Dialogue
• Telephone speech
• Speech recognition (AUDIMUS) of natural language queries
• Query understanding and info retrieval from database
• Generation of natural language reply
• Text-to-Speech synthesis (DIXI+) adapted to limited domain
Speech
AUDIMUS
Text
Speech
13
DIXI+
Dialogue
SQL
Database
Speech based control system for an Hi-Fi
Hi-Fi turn on and
play CD one
TURN ON
PLAY CD ONE
The computer interprets the
command...
Speech was recognized...
The user spoke...
Hi-FI - turn on
and play CD 1.
14
…and sends the IR command
Processing other varieties of Portuguese
• Research Topics:
– Multi-accent corpora
– Multi-accent robust speaker independent ASR
– Language and accent ID
– Computer Aided Language Learning (CALL)/ e-Learning
15
E-inclusion: Eugénio - the word genius
•Vord prediction tool for people with motor
impairments
•Cooperation with cerebral palsy centers
•Public domain tool
•New version released in 2003
16
Synergies with other INESC ID Groups
Agents
Multimodal
HCI
Info.
Retrieval
Speech Mining
Spoken Language
Systems
Source separation
Electronics
Signal Processing
17
Computer
Graphics
More information in:
www.l2f.inesc-id.pt
[email protected]
18