Speech Interface to Virtual Reality Applications - 廖峻鋒(Chun
Download
Report
Transcript Speech Interface to Virtual Reality Applications - 廖峻鋒(Chun
Speech Interface to Virtual
Reality Applications
Authors
Wauchope, K., S. Everett, D. Tate, T. Maney
M.Cernak, A.Sannier
Reporter
Chun-Feng Liao
References
This report discuss 2 implementations of Speech Interface
to Virtual Reality Applications.
M.Cernak, A.Sannier ,Technical Report, “Command Speech
Interface to Virtual Reality Applications”,Virtual Reality
Applications Center at Iowa State University of Science and
Technology, June 2002.
Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive
Virtual Environments for Ship Familiarization," 2nd
International EuroConference on Computer and IT Applications
in the Maritime Industries (COMPIT '03), Hamburg, Germany,
May 14-17, 2003, pp. 70-83.
Agenda
Introduction
Paper I
Paper II
Conclusion
System design Discussion
Introduction
Both papers are newly published.(2002,2003)
These 2 papers address technical details of
Speech-VR integration.\
The 2nd paper take more modern approach .
Both of them use similar architecture.(and are
also similar to ours!)
Ex:Choosing VRML + Java Speech API platform and
encountered serveral difficult problems such as java security
constraint and were force to use a “brwoser as an application ”
instead of “browser as an applet”
Paper I
M.Cernak, A.Sannier ,Technical Report,
“Command Speech Interface to Virtual
Reality Applications”,Virtual Reality
Applications Center at Iowa State
University of Science and Technology,
June 2002.
Purposes of this paper
Describe an approach to control VR
applications using multimodal
command speech interface (CSI)based
on dialog modeling.
Used to imporve the usability of
VRAC’s C6 .
VRAC : Virtual Reality Applications Center
C6 is a Virtual Reality System developed by VRAC.
Multimodal Interaction
Command Addressing,used to trigger
system start to record user’s voice for
recognition.
U :MoleBio
S :Yes
U :(Targeting the atom 512 by mouse)
U :Go There !
S :OK (goto Atom number 512 ).
U: User , S: System
System Architecture
Dialog Management and Speech facilities
VR System
System Architecture
VR : VRAC’s C6
TTS : Festival
SR : CSLU Toolkit
Platform : Windows OS on PII 400
Three Main Components(1)
Speech Synthesis (TTS) : Festival .
Three Main Components(2)
CSLU Toolkit :Dialog Modeling ,
Speech Recognition and Nature
Language Processing.
CSLU was implemented in C and Tcl/tk ,
developed by OGI (Oregon Graduate
Institute )
CSLU (Center of Spoken Language Understanding)
Three Main Components(3)
Communication Bridge to VR
application.
To Integrate CSLU(Speech) and C6(VR).
How to Integrate CSLU and C6
Initial Attempt : CORBA
• C6 support CORBA .
• Try to use “Combat” as tcl extension as
CORBA Client but failed.
• Try to use “Tcl Blend”:
- Tck->Java->CORBA->C6 (efficient problems)
• Result : use TCP socket.
Natural Language Processing
Instead of using standard JSGF , the
authors use a custom grammar and
wrote a specific parser to evaluate it.
Very similar to JSGF.
We will not discuss the custom
grammar in detail here.
SCI Test Environment
A RAD (GUI) tool that help developers
to quickly build the dialog flow.
Paper I Conclusion
Major advantage of this system is quick
deployment.
The problematic area is the Speech
Recognition Accuracy(provided by
CSLU) was poor.
US Navy also developed a Speech
Inteface to VR System , they will
imporved the interaction with VR in
terms of their method.
Future Work
Change TTS and SR to IBM ViaVoice .
• Support JSAPI(Java Speech API)
• Java is easier to communicate with C6 via
CORBA.
Paper II
Wauchope, K., S. Everett, D. Tate, T. Maney,
"Speech-Interactive Virtual Environments for
Ship Familiarization," 2nd International
EuroConference on Computer and IT
Applications in the Maritime Industries
(COMPIT '03), Hamburg, Germany, May 1417, 2003, pp. 70-83.
Introduction
This paper intruduce 2 systems which
help newly-aboard crews of US Navy
ships to be familiar with their
environment quickly.
User : Tell me
where is Rom
101 !
Motivation
Architects of US Navy Ships heavily
use CAD tools to design ship models.
CAD file can be transferred to 3D
model format with little effort.
Accroding to author’s previous
research ,this Virtual Envirionment did
shorten crews’ learning time.
Systems introduced
2 Systems
• MSFT(Multimodal Ship Familiarization
Tool)
• ISFS(Interactive Ship Familiarization
System)
ISFS is a recent transition fo MSFT.
System Architecture:MSFT
Run as different process
MSFT
VE veiwer component and speech
interface run as two separate processes.
Speech interface : using a total IBM
solution :
• ViaVoice.
• IBM’s SMAPI.
• IBM’s SRCL grammar.
Platform : PIII 500MHz
ISFS
A recent transistion of MSFT.
Using VRML as 3D modeling language.
Using JSAPI as interface to speech
engine.
• ViaVoice totally support JSAPI.
• VRML support Java as a scripting language
Other structure is identical to MSFT
system.
Platform : Xeon 2.0GHz ->Need more computing power!
Why Chose to Use
Standalone VRML Brwoser?
Security Limitations.(detail will be discussed later)
VM Limitations.(detail will be discussed later)
Provide opportunities to customize
interface to VRML browser.
In my personal experience,system usually become
unstable when speech engine work with VRML Plugin via EAI’s Java interface.
Security Limitations
JRE imposes security limitations on Java
Applets.
JSAPI was unable to establish a
connection with speech engine unless
we explicitly reconfig the security
settings.
Limited VM
Most VRML Browser ‘s EAI were
implemented using ActiveX thus only
support Microsoft’s old VM which
dosen’t support most modern functions
of Java.
• Ex:This may force us to use Java AWT
instead of swing which provide better GUI.
Providing GUI as VUI
Fallback
GUI provides a fallback in case the
speech recognizer is having trouble
accurately transcribing the user’s voice.
GUI is adjusted dynamically to provide
one-to-one correspondence to VUI .
Paper 2 Conclusion
The Speech Interface is needed because
GUI and VE Viewer both rely on direct
manipulation and keep our hand too
busy.
As HCI become increasingly
multimodel,care must be taken to
integrate in natural manner.
Future Work
VRML is more close to Object –oriented
and tree-structured.
It is hard to represent them in RDBMS.
Must find some way to store model
data easily and efficiently.
Personal thought : Using XML Database.
Discussions
Switchable!
Q&A