Transcript PPT
CSA305: Natural Language
Algorithms
Probabilistic
Phrase Structure Grammars
(PCFGs)
December 2004
CSA3050: PCFGs
1
Handling Ambiguities
• The Earley Algorithm is equipped to represent
ambiguities efficiently but not to resolve them.
• Methods available for resolving ambiguities
include:
– Semantics (choose parse that makes sense).
– Statistics: (choose parse that is most likely).
• Probabilistic context-free grammars (PCFGs) offer
a solution.
December 2004
CSA3050: PCFGs
2
PCFG
• A PCFG is a 5-tuple (NT,T,P,S,D) where D
is a function that assigns probabilities to
each rule p P
• A PCFG augments each rule with a
conditional probability.
A [p]
• Formally this is the probability of a given
expansion given LHS non-terminal.
December 2004
CSA3050: PCFGs
3
Example PCFG
December 2004
CSA3050: PCFGs
4
Example PCFG Fragment
• S NP VP [.80]
S Aux NP VP [.15]
S VP [.05]
• Sum of conditional probabilities for a given
ANT = 1
• PCFG can be used to estimate probabilities
of each parse-tree for sentence S.
December 2004
CSA3050: PCFGs
5
Probability of a Parse Tree
• For sentence S, the probability assigned by a
PCFG to a parse tree T is given by
P(r(n))
nT
where n is a node of T and r(n) is the production
rule that produced n
• i.e. the product of the probabilities of all the rules r
used to expand each node n in T.
December 2004
CSA3050: PCFGs
6
Ambiguous Sentence
P(TL) = 1.5 x 10-6
P(TR) = 1.7 x 10-6
P(S)
CSA3050: PCFGs
= 3.2 x 10-6
The Parsing Problem for PCFGs
• The parsing problem for PCFGs is to
produce the most likely parse for a given
sentence, i.e. to compute the T spanning the
sentence whose probability is maximal.
• CYK algorithm assumes that grammar is in
Chomsky Normal Form:
– No productions
– Rules of the form A B C or A a
December 2004
CSA3050: PCFGs
8
CKY Algorithm – Base Case
• Base case: covering input strings with of
length 1 (i.e. individual words). In CNF,
probability p has to come from that of
corresponding rule
• A w [p]
December 2004
CSA3050: PCFGs
9
CKY Algorithm
Recursive Case
• Recursive case: input strings of length > 1:
A * wij if and only if there is a rule
ABC
and some 1 ≤ k ≤ j such that B derives wik and C
derives wkj
• In this case P(wij) is obtained by multiplying
together P(wik) and P(wjk).
• These probabilities in other parts of table
• Take max value
December 2004
CSA3050: PCFGs
10
Probabilistic CKY Algorithm
for Sentence of Length n
1. for k := 1 to n do
2.
π[k-1,k,A] := P(A → wk)
3.
for i := k-2 downto 0 do
4.
for j := k-1 downto i+1 do
5.
π[i,j,A] :=
max [π[i,j,B] * π[j,k,C] * P(A → BC) ]
for each A → BC G
6. return max[π(0,n,S)]
December 2004
CSA3050: PCFGs
11
w
i+1
w
w
k-1
w w
i
December 2004
CSA3050: PCFGs
j
w
k
12
Probabilistic Earley
Non Probabilistic Completer
Procedure Completer( (B -> Z . , [j,k]) )
for each (A -> X . B Y, [i,j]) in chart[j] do
enqueue( (A -> X B . Y, [i,k]), chart[k] )
Probabilistic Completer
Procedure Completer( (B -> Z . , [j,k],Pjk) )
for each (A -> X . B Y, [i,j],Pij) in chart[j] do
enqueue( (A -> X B . Y, [i,k], Pij*Pjk), chart[k] )
December 2004
CSA3050: PCFGs
13
Discovery of Probabilities
Normal Rules
• Use a corpus of already parsed sentences.
• Example: Penn Treebank (Marcus et al 1993)
– parse trees for 1M word Brown Corpus
– skeleton parsing: partial parse, leaving out the “hard”
things (such as PP-attachment)
• Parse corpus and take statistics. Has to account for
ambiguity.
• P(α→β|α)
= C(α→β)/ΣC(α→γ)
= C(α→β)/C(α)
December 2004
CSA3050: PCFGs
14
Penn Treebank Example 1
((S (NP
(NP Pierre Vinken),
(ADJP (NP 61 years) old, ))
will
(VP join
(NP the board)
(PP as
(NP a nonexecutive director))
(NP Nov 29)))
.)
December 2004
CSA3050: PCFGs
15
Penn Treebank – Example 2
( (S (NP (DT The) (NNP Fulton) (NNP County) (NNP Grand) (NNP Jury) )
(VP (VBD said) (NP (NNP Friday) )
(SBAR (-NONE- 0)
(S (NP (DT an) (NN investigation)
(PP (IN of) (NP (NP (NNP Atlanta) ) (POS 's)
(JJ recent) (JJ primary) (NN election) )))
(VP (VBD produced)
(NP
(OQUOTE OQUOTE) (DT no) (NN evidence)
(CQUOTE CQUOTE)
(SBAR (IN that)
(S (NP (DT any) (NNS irregularities) )
(VP (VBD took) (NP (NN place) ))))))))))
(PERIOD PERIOD) )
December 2004
CSA3050: PCFGs
16
Problems with PCFGs
• Fundamental Independence Assumptions:
• A CFG assumes that the expansion of any
one non-terminal is independent of the
expansion of any other non-terminal.
• Hence rule probabilities are always
multiplied together.
• The FIA is not always realistic, however.
December 2004
CSA3050: PCFGs
17
Problems with PCFGs
• Difficulty in representing dependencies
between parse tree nodes.
– Structural Dependencies between the expansion
of a node N and anything above N in the parse
tree, such as M
– Lexical Dependencies between the expansion
of a node N and occurrences of particular
words in text segments dominated by N.
December 2004
CSA3050: PCFGs
18
Tree Dependencies
Mp,s
Nq,r
p
December 2004
q
r
CSA3050: PCFGs
s
19
Structural Dependency
• By examination of text corpora, it has been shown
(Kuno 1972) that there is a strong tendency (c.
2:1) in English and other languages for subject of
a sentence to be a pronoun:
– She's able to take her baby to work versus
– Joanna worked until she had a family
• whilst the object tends to be a non-pronoun
– All the people signed the confessions versus
– Some laws prohibit it
December 2004
CSA3050: PCFGs
20
Expansion sometimes depends
on ancestor nodes
S
subject
NP
NP
Pron
N
he
December 2004
object
VP
saw
CSA3050: PCFGs
Mr. Bush
21
Dependencies cannot be stated
• These dependencies could be captured if it
were possible to say that the probabilities
associated with, e.g.
NP → Pron
NP → N
depend whether NP is subject or object.
• However, this cannot normally be said in a
standard PCFG.
December 2004
CSA3050: PCFGs
22
Lexical Dependencies
• Consider sentence:
"Moscow sent soldiers into Afghanistan."
• Suppose grammar includes
NP → NP PP
VP → VP PP
• There will typically be 2 parse trees.
December 2004
CSA3050: PCFGs
23
PP Attachment Ambiguity
33% of PPs attach to VPs
67% of PPs attach to NPs
S
S
VP
VP
VP
VP
PP
NP
VP
NP
N
V
N
P
NP
NP
N
N
NP
PP
NP
Moscow sent soldiers into Afghanistan
December 2004
NP
V
N
P
N
Moscow sent soldiers into Afghanistan
CSA3050: PCFGs
24
PP Attachment Ambiguity
33% of PPs attach to VPs
67% of PPs attach to NPs
S
S
VP
VP
VP
VP
PP
NP
VP
NP
N
V
N
P
NP
NP
N
N
NP
PP
NP
Moscow sent soldiers from Afghanistan
December 2004
NP
V
N
P
N
Moscow sent soldiers from Afghanistan
CSA3050: PCFGs
25
Lexical Properties
• Raw statistics on the use of these two rules suggest that
NP → NP PP (67 %)
VP → VP PP (33 % )
• In this case the raw statistics are misleading and yield the
wrong conclusion.
• The correct parse should be decided on the basis of the
lexical properties of the verb "send into" alone, since we
know that the basic pattern for this verb is
(NP) send (NP) (PPinto)
where the PPinto attaches to the VP.
December 2004
CSA3050: PCFGs
26
Lexicalised PCFGs
• Basic idea: each syntactic constituent is
associated with a head which is a single
word.
• Each non-terminal in a parse tree is
annotated with that single word.
• Michael Collins (1999) Head Driven
Statistical Models for NL Parsing PhD
Thesis (see author’s website).
December 2004
CSA3050: PCFGs
27
Lexicalised Tree
December 2004
CSA3050: PCFGs
28
Generating Lexicalised Parse Trees
• To generate such a tree, each rule must
identify exactly one right hand side
constituent to be the head daughter.
• Then the headword of a node is inherited
from the headword of the head daughter.
• In the case of a lexical item, the head is
clearly itself (though the word might
undergo minor inflectional modification).
December 2004
CSA3050: PCFGs
29
Finding the Head Constituent
• In some cases this is very easy, e.g.
NP[N] → Det N
(the man)
VP[V] → V NP
(... asked John)
• In other cases it isn't
PP[?] → P NP
(to London)
• Many modern linguistic theories include a
component that define what heads are.
December 2004
CSA3050: PCFGs
30
Discovery of Probabilities
Lexicalised Rules
• Need to establish individual probabilities of, e.g.
VP(dumped) → V(dumped) NP(sacks) PP(into)
VP(dumped) → V(dumped) NP(cats) PP(into)
VP(dumped) → V(dumped) NP(hats) PP(into)
VP(dumped) → V(dumped) NP(sacks) PP(above)
• Problem – no corpus big enough to train with this number
of rules (nearly all the rules would have zero counts).
• Need to make independence assumptions that allow counts
to be clustered.
• Which independence assumptions?
December 2004
CSA3050: PCFGs
31
Charniak’s (1997) Approach
• Normal PCFG: probability is conditioned only on
syntactic category, i.e. p(r(n)|c(n))
• Charniak’s also conditioned the probability of a
given rule expansion on the head of the nonterminal.
p(r(n)|c(n),h(n))
• N.B. This approach would pool the statistics of all
individual rules on previous slide together, i.e. as
VP(dumped) → V NP PP
December 2004
CSA3050: PCFGs
32
Probability of Head
• Now that we have added heads as a conditioning
factor, we must also decide how to compute the
probability of a head.
• The null assumption, that all heads are equally
probable, is unrealistic (different verbs have
different frequencies of occurrence).
• Charniak therefore adopted a better assumption:
probability of a node n having head h depends on
– syntactic category of n and
– head of n’s mother.
December 2004
CSA3050: PCFGs
33
Including Head of Mother
• So instead of equal probabilities for all
heads, we have
p(h(n) = wordi | c(n), h(m(n)))
• Relating this to the circled node in our
previous figure, we have
p(h(n)=sacks| c(n)=NP, h(m(n))=dumped)
December 2004
CSA3050: PCFGs
34
Probability of a Complete Parse
STANDARD PCFG
For sentence S, the
probability assigned by a
PCFG to a parse tree T
was given by
HEAD DRIVEN PCFG
To include probability of
complete parse
P(T) =
p(r(n)|c(n),h(n)) *
nT p(h(n)|c(n), h(m(n)))
P(r(n))
nT
where n is a node of T and
r(n) is the production rule
that produced n
December 2004
CSA3050: PCFGs
35
Evaluating Parsers
• Let
A = # correct constituents in candidate parse
B = # correct constituents in treebank parse
C = # total constituents in candidate parse
• Labelled Recall = A/B
• Labelled Precision = A/C
December 2004
CSA3050: PCFGs
36