Integrating Nuance and Trindikit

Download Report

Transcript Integrating Nuance and Trindikit

Integrating Nuance and Trindikit
David Hjelm
2003-03-20
Nuance
• Speech recognition, voice authentication and text-tospeech engines
• API:s to create speech-recognition and text-to-speech
clients in Java, C++ and C
Trindikit
• Framework for building dialogue systems
• Written in SICStus Prolog
• Contains predefined modules for input, output,
interpretation, etc…
Trindikit text input/output modules
• input_simpletext reads input from screen and stores in
input variable.
• output_simpletext reads output from output variable and
prints on screen
• To use Nuance speech recognition and speech synthesis
instead , input- and output modules must communicate
with a Nuance process, since no Nuance SICStus APIs
exist.
Solution: OAA
• OAA enables communication between Java and SICStus
• SICStus and Java processes register as agents to the same
OAA facilitator. Each agent declares a set of solvables to
facilitator. Solvables are declared using prolog-like syntax.
• Agents can pose queries to OAA community by calling
solve(Query). Facilitator will try to find an agent which
has declared a solvable that matches with Query. In that
case the Query is delegated to the Agent which will try to
solve it.
OAA Nuance Agents
• These OAA agents are provided in the latest distribution of
Trindikit:
– OAANuanceSpeechChannel – OAA java agent which provides
NuanceSpeechChannel (Nuance Java API) functionality to OAA
community
– oaa_recserver – OAA prolog agent which can control a Nuance
recognition server
– oaa_vocalizer – OAA prolog agent which can control a Nuance
TTS server
Trindikit Java OAA agents
• To simplify the writing of new OAA agents a base class for OAA
agents, OAAAgent, is used. This is extended by agent implementing
classes.
• A OAAAgent has of a number of states which it can be in. For each
state a set of solvables is defined. If the facilitator delegates a
solve(Query) request to the agent, the agent will iterate through the
solvables defined for the state the agent currently is in, to find one that
unifies with Query.
• The code that solves a solve(Query) request is implemented in a
wrapper class OAASolver which defines the method solve. Each
OAASolver defines a specific solvable.
• OAASolvers are added to the agent via the addSolver method which
defines the pre-state(s) and post-state(s) of the OAASolver.
OAANuanceSpeechChannel
• OAANuanceSpeechChannel is a java OAA agent which extends
OAAAgent.
• Another implemented agent is OAAVcr (used in the ILT) project,
which functions as a software VCR agent which can record TV
programs (captured using a TV-card)
OAANuanceSpeechChannel states
•
NuanceSpeechChannel offers different functionality depending on its
configuration. For example, if it uses a telephony-based audio provider, a call
has to be answered before recognition can take place. This is mirrored by the
four states (represented as int constants) of OAANuanceSpeechChannel
which are:
0 - STOPPED
There is no speech channel yet
1 - TEL_IDLE
A speech channel using a telephony audio provider has been created. Currently not in a call.
2 - TEL_RUNNING
A speech channel using a telephony audio provider has been created. Currently in a call.
3 - NATIVE_RUNNING
A speech channel using the native audio provider has been created.
OAANuanceSpeechChannel solvables
•
The solvables of OAANuanceSpeechChannel are:
nscCreate(+Package,+Parameters) (creates a new SpeechChannel)
pre-state STOPPED
post-state TEL_IDLE or NATIVE_RUNNING (depending on Parameters)
nscClose (closes the SpeechChannel)
pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING
post-state STOPPED
nscPlayAndRecognize(+Grammar,?RecResult)
pre-state TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
nscRecognizeFile(+Filename,+Grammar,?RecResult)
pre-state TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
nscAppendTTS(+Text)
pre-state TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
OAANuanceSpeechChannel solvables
nscPlay(+Bool)
pre-state TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
nscStartPlay
pre-state TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
nscSetParameter(+Name,+Value)
pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
nscGetParameter(+Name,?Value)
pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
nscGetAllGrammars(?Grammars)
pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING
post-state same as pre-state
SpeechChannel events
•
•
•
Some NuanceSpeechChannel methods throw events, e.g. when the user
starts speaking.
When these events occur OAANuanceSpeechChannel will post a query to
the OAA community consisting of an as close as possible transcription of the
actual java event + a 'nsc' prefix.
Other agents can declare these as solvables and implement code that handles
the events.
nscStartOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs)
nscEndOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs)
nscPartialResultEvent(RecResult)
nscPlaybackStartedEvent
nscPlaybackStoppedEvent(Reason,Tones) nscTerminationEvent(Reason)
nscCallConnectedEvent --todo
nscDTMFEvent(Tones) --todo
nscHungupEvent(Side,Reason) --todo
oaa_recserver
• oaa_recserver is a prolog OAA agent which controls a nuance
recognition server process. Solvables are:
•
nrsStart(+Packages,+Params)
Starts a recserver process using packages Packages and
parameters Params. Format of Packages and Params
is described below.
nrsStop
Stops the recserver process.
nrsGetPackages(?Packages)
Returns the currently loaded recognition packages.
nrsGetState(?State).
Returns current state (stopped or running)
oaa_vocalizer
• oaa_vocalizer is a prolog OAA agent which controls a
nuance vocalizer process. Solvables are:
nvocStart(+Params) Starts a vocalizer process. Params is
any command line arguments.
nvocStop
Stops the vocalizer process.
nvocGetState(?State) Returns current state (stopped or
running)
Integrating it into Trindikit
• Trindikit provides a specific OAA resource, oaag, which
can be used to make queries to the OAA community.
• Input and output modules specific for OAA+Nuance have
been written which make use of oaag.
• A speech recognition grammar resource type,
asr_grammar, keeps track of which speech recognition
grammar Nuance should try to load.
input_nuance_basic_oaa
•
•
•
•
Calls a OAA agent which performs speech recognition. Also communicates
with a nuance recserver OAA agent.
Assumes that if a nuance grammar contains top level symbol '.Top' it has been
compiled into a recognition package named 'top'.
To perform recognition using package 'top', a trindikit resource of type
asr_grammar should be selected in the configuration file.
For all selected resources of type asr_grammar, their corresponding packages
will be loaded onto a recserver. The recclients are created at runtime.
output_nuance_basic_oaa
•
Calls a OAA Agent which performs tts synthesis. Also communicates with a
vocalizer OAA agent.
Future work
• real ASR-grammars in asr_grammar resources
• Trindikit integration with Regulus for converting feature
structure grammars to Nuance grammars
• Use of dynamic grammar compilation, so that no Nuance
grammars have to be written and compiled in advance.
• Integrate with asynchronous Trindikit
• Intelligent barge-in
• etcetera