Transcript Trindikit

Speech recognition grammars
as TRINDIKIT resources
David Hjelm
2003-12-12
TRINDIKIT
• Framework for building dialogue systems
• Written in SICStus Prolog
• Contains predefined modules for input, output,
interpretation, etc…
• Total Information State (TIS) holds information
accessible by modules
• As long as different modules behave similar with respect
to TIS they are interchangeable
Nuance
• Speech recognition, voice authentication and text-tospeech engines
• API:s to create speech-recognition/text-to-speech clients
in Java, C++ and C
• Clients can read and write audio in several ways:
–
–
–
–
native sound card
telephony card
IP-telephony
from audio files
Speech recognition basics
speech
feature extraction
acoustic
features
Acoustic model
(N-gram)
viterbi search
Language model
(N-gram or PCFG)
phoneme or
word lattice
viterbi search or
parsing
word lattice or n-best
list of sentences
Nuance SR models
• Acoustic models (master packages)
– One or several for each language + some
multilingual.
• Language models
– written using Nuance’s Grammar Specification
Language (GSL).
– PCFG, but SLM:s can actually be used as categories
– SLM:s trained from corpus data separately
– compiled using a specific master package into a
recognition package (acoustic + language model)
Nuance GSL
• EBNF variant augmented with
– optional probabilities
– optional rudimentary slot-filling semantics
– a lot of other special stuff like e.g.
•
•
•
•
SLM inclusion
external grammar references
external rule references
special words for e.g. pauses and telephony touch-tones
• Must not be left-recursive
Example Nuance grammars
• Without probabilities or semantics a grammar
can look like this:
.Top [ Cmd Q ]
Cmd ( [ stop play pause ] ?it)
Q ( is [ (the vcr) it ] [stopped playing paused] )
• Start symbol(s) are preceded by ’.’
• Nonterminals are uppercase
• Terminals are lowercase
More example Nuance grammars
• Probabilistic grammar:
.Top [ Cmd~0.6 Q~0.4 ]
Cmd ( [ stop~0.2 play~0.4 pause~0.3 ] ?it~0.3)
Q ( is [ (the vcr)~0.3 it~0.7 ] [stopped playing paused] )
• Slot-filling grammar:
.Top [ Cmd {<cmd $return>} Q {<q $return>} ]
Cmd ( [ stop {return(stop)} play {return(play)}
pause {return(pause)}] ?it)
Q ( is [ (the vcr) it ]
[ stopped {return(stop)} playing {return(play)}
paused {return(pause)} ] )
• Of course they can be combined…
Static or dynamic grammar compilation
• Nuance’s recognize function takes one
argument, which is either of the following:
– a start symbol in the current statically compiled
recognition package. In this case recognition is
performed using the grammar specified.
– a GSL expression. In this case the GSL expression is
dynamically compiled on the fly.
– The GSL expression can not contain recursive rules,
but it can point to a precompiled ’grammar object’
which does.
Current TRINDIKIT – Nuance interface
• TRINDIKIT modules exist for Nuance speech input and
Nuance speech output.
• OAA is used for the communication between TRINDIKIT
(prolog) and Nuance client (java).
• Each OAA agent connects to a facilitator and declares a
set of capabilities. Agents can then pose queries to the
facilitator which delegates the each query to the
appropriate agent(s) and returns an answer to the
requesting agent.
Current TRINDIKIT – Nuance interface
OAA facilitator
TRINDIKIT
OAA gateway
IP
telephony
Nuance
java client
ASR
server
TTS
server
telephony
card
native
sound card
Current TRINDIKIT – Nuance interface
• Nuance java client
– provides (partial) access to Nuance java API via OAA
– loads recognition package at startup
– performs SR using one of its top level grammars
• TRINDIKIT input module
– checks name of dummy resource $asr_grammar for name of top
level grammar
– calls OAA solvable nscPlayAndRecognize(+Grammar,?Result)
• Major disadvantages:
– Recognition package must be compiled before using system and
specified when running java application
– Actual ASR grammar is not a part of TRINDIKIT – can not be
modified or checked for coverage by modules
Upcoming TRINDIKIT – Nuance interface
• Nuance java client
– provides (partial) OAA access to Nuance java API
– loads empty recognition package at startup
– can compile GSL into a Nuance Grammar Object (NGO) via
OAA
– performs SR using a GSL expression which points at a NGO
• TRINDIKIT input module
– checks resource $sr_grammar for actual speech recognition
grammar
– makes sure $sr_grammar is compiled into a NGO at start-up
– calls OAA solvable nscPlayAndRecognize(+GSL,?Result) where
GSL = ’<file:/path/to/ngo>’
Upcoming TRINDIKIT – Nuance interface
OAA facilitator
TRINDIKIT
OAA gateway
Compilation
server
IP
telephony
Nuance
java client
ASR
server
TTS
server
telephony
card
native
sound card
Different ways for implementing
sr_grammar resource
1. Keep the GSL expression making up the Nuance
grammar as a prolog string or atom
•
•
Easy for Nuance input module
Really hard for other modules trying to reason about the
SR grammar
2. Define the EBNF rules as prolog terms
•
•
•
Quite easy for Nuance input module (convert EBNF to
GSL)
Enables reasoning about rules and categories by other
modules
Hard to find a working EBNF prolog notation.
Different ways for implementing
sr_grammar resource
3. Define grammar as a set of context free grammar
rules (Chosen method)
•
•
•
•
•
Some computation by Nuance input module (needs to
convert (CFG to BNF to GSL)
Enables reasoning about rules and categories by other
modules
Enables efficient parsing (if needed)
Easy to find a prolog notation
Portable – same grammar can be ported to many different
speech recognizer grammar formats, as long as they are
CFG-equivalent.
CFG resource definition
• resource relations:
– start_symbol(S)
where S is a nonterminal
– rule(LHS,RHS)
where LHS is a nonterminal and RHS is a list of nonterminals/terminals
– rules(Rules)
where Rules is the set of rules in the resource
• resource operations (not yet implemented):
–
–
–
–
add_rule(rule(LHS,RHS))
delete_rule(rule(LHS,RHS))
add_rules(Rules)
delete_rules(Rules)
CFG rule format
• Example rules:
rule( nonterminal(np),
[ nonterminal(det),
nonterminal(n) ] ).
rule( nonterminal(det), [ terminal(”a”) ] ).
rule( nonterminal(n),
[ terminal(”car”) ] ).
• Convenient when reasoning about rules in grammar but
not very convenient when writing grammars…
• Solution:
– write rules in EBNF-ish notation using operators.
– convert EBNF-ish rules to CFG rules.
’blockworld’ - example CFG resource
• ebnf2cfg:assert_rules/0 converts EBNF rules to CFG
rules and asserts them
:- module( blockworld , [rules/1,rule/2,start_symbol/1] ).
:- ensure_loaded( ebnf2cfg ).
top( np ).
np => det, adj* , n, loc? .
adj => colour | size.
colour => "blue" | "red" | "green".
size => "big" | "small".
det => "a".
n => "sphere" | "cube" | "pyramid".
loc => prep , np.
prep => "in" | "on" | "under" | "above".
:- assert_rules.
Using CFG resource with Nuance input
module
input:init:check_condition( $sr_grammar::start_symbol(Start) ),
check_condition( $sr_grammar::rules(set(Rules)) ),
cfg2gsl(dynamic,Start,Rules,GSL),
oaag:solve(nscCurrentMasterPackage(Package),
( oaag:solve(nscGslCompiledToNGO(GSL,Package,Path) ->
true;
oaag:solve(nscCompileGslToNGO(Gsl,Package,Path)
),!.
input:input:check_condition( $sr_grammar::start_symbol(Start) ),
check_condition( $sr_grammar::rules(set(Rules)) ),
cfg2gsl(dynamic,Start,Rules,GSL),
oaag:solve(nscCurrentMasterPackage(Package),
oaag:solve(nscGslCompiledToNGO(GSL,Package,Path),
join_atoms([’<file:/’,GSL,’>’],NGOGSL),
recognize_score(NGOGSL,String,Score),
apply_update( set( input, String ) ),
apply_update( score := Score ).
What must be done before CFG
resource can be used with Nuance?
• Write actual code of input module (some parts are
missing)
• Implement nscGetMasterPackage(?Pkg) solvable
• Make sure that all nonterminals are upper-case and all
terminals are lower-case in GSL
• Write real CFG resource (use existing Nuance grammar)
• testing, testing and testing…
What should be done?
•
•
•
•
•
•
•
•
Documentation of java and prolog code
Trindikit manual
Eliminate left-recursion
Convert to Chomsky Normal Form (?)
Parser/generator for testing CFGs inside of prolog
Multilingual nuance input module
batch scripts for running with ease
Asynchronous input algorithm
What can be done?
•
PCFG resource
–
•
SLM resource
–
•
can GoDiS semantics be expressed?
Convert typed unification grammars to CFG resources
–
•
would probably not store entire model in memory
Nuance semantics + CFG/PCFG
–
•
if EBNF format is used, how calculate weights when converting
to PCFG? (this has been solved in Nuance though – but is it a
proper solution)
DCG with typed features (regulus), SKVATT(?), HPSG
Grammatical Framework CFG approximation
–
–
e.g. by limiting sentence length or letting grammar
overgenerate
problem: any interesting grammar will overgenerate a lot
What can be done?
• Write modules for Java Speech API, ViaVoice, etc. using
the same CFG resource…
• Use several recognition grammars in sequence (one
after the other on the same input)
• Dynamically generate recognition grammar based on IS
contents and or system expectations
• Letting the system learn new words - ”How do you spell
that?”