Quick Speech Synthesis - People.cs.uchicago.edu

Download Report

Transcript Quick Speech Synthesis - People.cs.uchicago.edu

Probabilistic Parsing
CSPP 56553
Artificial Intelligence
February 25, 2004
Roadmap
• Probabilistic CFGs
– Handling ambiguity – more likely analyses
– Adding probabilities
• Grammar
• Issues with probabilities
– Resolving issues
• Lexicalized grammars
– Independence assumptions
Representation:
Context-free Grammars
• CFGs: 4-tuple
– A set of terminal symbols: Σ
– A set of non-terminal symbols: N
– A set of productions P: of the form A -> α
• Where A is a non-terminal and α in (Σ U N)*
– A designated start symbol S
• L = W|w in Σ* and S=>*w
– Where S=>*w means S derives w by some seq
Representation:
Context-free Grammars
• Partial example
– Σ: the, cat, dog, bit, bites, man
– N: NP, VP, AdjP, Nominal
– P: S-> NP VP; NP -> Det Nom; Nom-> N Nom|N
S
–S
NP
Det
VP
Nom
V
N
NP
Det
Nom
N
The
dog
bit
the
man
Parsing Goals
• Accepting:
– Legal string in language?
• Formally: rigid
• Practically: degrees of acceptability
• Analysis
– What structure produced the string?
• Produce one (or all) parse trees for the string
Parsing Search Strategies
• Top-down constraints:
– All analyses must start with start symbol: S
– Successively expand non-terminals with RHS
– Must match surface string
• Bottom-up constraints:
– Analyses start from surface string
– Identify POS
– Match substring of ply with RHS to LHS
– Must ultimately reach S
Integrating Strategies
• Left-corner parsing:
– Top-down parsing with bottom-up constraints
– Begin at start symbol
– Apply depth-first search strategy
• Expand leftmost non-terminal
• Parser can not consider rule if current input can
not be first word on left edge of some derivation
• Tabulate all left-corners for a non-terminal
Issues
• Left recursion
– If the first non-terminal of RHS is recursive ->
• Infinite path to terminal node
• Could rewrite
• Ambiguity: pervasive (costly)
– Lexical (POS) & structural
• Attachment, coordination, np bracketing
• Repeated subtree parsing
– Duplicate subtrees with other failures
Earley Parsing
• Avoid repeated work/recursion problem
– Dynamic programming
• Store partial parses in “chart”
– Compactly encodes ambiguity
• O(N^3)
• Chart entries:
– Subtree for a single grammar rule
– Progress in completing subtree
– Position of subtree wrt input
Earley Algorithm
• Uses dynamic programming to do parallel
top-down search in (worst case) O(N3) time
• First, left-to-right pass fills out a chart with
N+1 states
– Think of chart entries as sitting between words in
the input string keeping track of states of the
parse at these positions
– For each word position, chart contains set of
states representing all partial parse trees
generated to date. E.g. chart[0] contains all
partial parse trees generated at the beginning of
the sentence
Representation:
Probabilistic Context-free
Grammars
• PCFGs: 5-tuple
– A set of terminal symbols: Σ
– A set of non-terminal symbols: N
– A set of productions P: of the form A -> α
• Where A is a non-terminal and α in (Σ U N)*
– A designated start symbol S
– A function assigning probabilities to rules: D
• L = W|w in Σ* and S=>*w
– Where S=>*w means S derives w by some seq
Parse Ambiguity
• Two parse trees
S
S
NP
N
I
NP
VP
V
NP PP
Det N P NP
Det N
saw the man with the duck
N
VP
V NP
NP PP
Det N P NP
Det N
I saw the man with the duck
Small Probabilistic Grammar
1.0
PP -> P NP
0.45
NP -> N
VP
-> V
0.65
0.45
NP -> Det N
VP -> V NP
0.10
NP ->
Det Adj N VP0.10
-> V NP PP
0.05
NP -> NP PP
0.85
S -> NP VP
S0.15
-> S conj S
Parse Probabilities
P(T , S )  p(r (n))
nT
– T(ree),S(entence),n(ode),R(ule)
– T1 = 0.85*0.2*0.1*0.65*1*0.65 = 0.007
– T2 = 0.85*0.2*0.45*0.05*0.65*1*0.65 = 0.003
• Select T1
• Best systems achieve 92-93% accuracy
Issues with PCFGs
• Non-local dependencies
– Rules are context-free; language isn’t
• Example:
– Subject vs non-subject NPs
• Subject: 90% pronouns (SWB)
• NP-> Pron vs NP-> Det Nom: doesn’t know if subj
• Lexical context:
– Verb subcategorization:
• Send NP PP vs Saw NP PP
– One approach: lexicalization
Probabilistic Lexicalized CFGs
• Key notion: “head”
– Each non-terminal assoc w/lexical head
• E.g. verb with verb phrase, noun with noun phrase
– Each rule must identify RHS element as head
• Heads propagate up tree
– Conceptually like adding 1 rule per head value
• VP(dumped) -> VBD(dumped)NP(sacks)PP(into)
• VP(dumped) -> VBD(dumped)NP(cats)PP(into)
PLCFG with head features
S(dumped)
NP(workers)
NNS(workers)
VP(dumped)
VBD(dumped)
NP(sacks)
PP(into)
NNS(sacks) P(into)
NP(bin)
DT(a) NN(bin)
Workers
dumped
sacks
into
a
bin
PLCFGs
• Issue: Too many rules
– No way to find corpus with enough examples
• (Partial) Solution: Independence assumed
– Condition rule on
• Category of LHS, head
– Condition head on
• Category of LHS and parent’s head
P(T , S )   p(r (n) | n.h(n)) * p(h(n) | n, h(m(n)))
nT
Disambiguation Example
S(dumped)
NP(workers)
VP(dumped)
NP(sacks)
NNS(workers)
VBD(dumped)
NP(sacks)
PP(into)
NNS(sacks) P(into)
NP(bin)
Dt(a)
Workers
dumped
sacks
into
a
NN(bin)
bin
Disambiguation Example II
P (VP  VBDNPPP | VP, dumped )
C (VP(dumped )  VBDNPP )

 C (VP(dumped )   )
 6 / 9  0.67
p (VP  VBDNP | VP, dumped )
C (VP(dumped )  VBD NP)

 C (VP(dumped )   )
 0/9  0
p (in | PP, dumped )
C ( X (dumped )  ...PP(in )..)

 C ( X (dumped )  ...PP...)
p (in | PP, sacks)
C ( X ( sacks)  ...PP(in )...)

 C ( X (sacks)  ...PP...)
 2 / 9  0.22
 0/0
Evaluation
• Compare to “Gold Standard” manual parse
– Correct if constituent has same start, end, label
• Three measures:
– Labeled Recall:
• # correct constituents in candidate parse of s
-----------------------------------------------------------# correct constituents in treebank parse of c
– Labeled Precision:
• # correct constituents in candidate parse of s
-----------------------------------------------------------# total constituents in candidate parse of c
– Cross-brackets: (A (B C)) vs ((A B) C)
• Standard: 90%,90%,1%
Dependency Grammars
• Pure lexical dependency
– Relate words based on binary syntactic relns
– No constituency, phrase structure rules
• Why?
– Better for languages with flexible word orders
– Abstracts away from surface form/order
• Root node + word nodes
– Link with dependency relations – fixed set
• E.g. subj, obj, dat, loc, attr, mode, comp,…
Dependency Grammar Example
<Root>
Main:
Dumped
subj
Workers
obj
Sacks
loc
Into
pcomp
bin
det
a