Transcript PCFG

CSA3202:
Human Language Technology
Probabilistic
Phrase Structure Grammars
(PCFGs)
December 2011
CSA3202: PCFGs
1
Handling Ambiguities
• The Earley Algorithm is equipped to represent
ambiguities efficiently but not to resolve them.
• Methods available for resolving ambiguities
include:
– Semantics (choose parse that makes sense).
– Statistics: (choose parse that is most likely).
• Probabilistic context-free grammars (PCFGs) offer
a solution.
December 2011
CSA3202: PCFGs
2
PCFG
• A PCFG is a 5-tuple (NT,T,P,S,D) where D
is a function that assigns probabilities to
each rule p  P
• A PCFG augments each rule with a
conditional probability.
A  [p]
• Formally this is the probability of a given
expansion given LHS non-terminal.
December 2011
CSA3202: PCFGs
3
Example PCFG
December 2011
CSA3202: PCFGs
4
A  [p]
• p expresses the probability that the nonterminal A is expanded as 
• A
A
A
December 2011
CSA3202: PCFGs
5
Example PCFG Fragment
• S NP VP [.80]
S Aux NP VP [.15]
S VP [.05]
• Sum of conditional probabilities for a given
ANT = 1
• PCFG can be used to estimate probabilities
of each parse-tree for sentence S.
December 2011
CSA3202: PCFGs
6
Probability of a Parse Tree
• For sentence S, the probability assigned by
a PCFG to a parse tree T is given by
 P(r(n))
• i.e. the product of the probabilities of all the
rules r used to expand each node n in T.
December 2011
CSA3202: PCFGs
7
2 Parses for
“book the dinner flight”
December 2011
CSA3202: PCFGs
8
Book the dinner flight
T1 2.2 x 10-6
December 2011
T2 6.1 x 10-7
CSA3202: PCFGs
9
Ambiguous Sentence
For ambiguous sentence S
P(S) is the sum of the
probabilities of each
reading i.e.
P(S(a)) = 1.5 x 10-6
P(S(b)) = 1.7 x 10-6
P(S) = 3.2 x 10-6
December 2011
CSA3202: PCFGs
10
The Parsing Problem for PCFGs
• The parsing problem for PCFGs is to
produce the most likely parse for a given
sentence, i.e. to compute the T spanning the
sentence whose probability is maximal.
• CYK algorithm assumes that grammar is in
Chomsky Normal Form:
– No  productions
– Rules of the form A B C or A  a
December 2011
CSA3202: PCFGs
11
CKY Algorithm – Base Case
• Base case: covering input strings with of
length 1 (i.e. individual words). In CNF,
probability p has to come from that of
corresponding rule
• A  w [p]
December 2011
CSA3202: PCFGs
12
CKY Algorithm
Recursive Case
• Recursive case: input strings of length > 1:
A * wij if and only if there is a rule
ABC
and some 1 ≤ k ≤ j such that B derives wik and C
derives wkj
• In this case P(wij) is obtained by multiplying
together P(wik) and P(wjk).
• These probabilities in other parts of table
• Take max value
December 2011
CSA3202: PCFGs
13
Probabilistic CKY Algorithm
for Sentence of Length n
1. for k := 1 to n do
2.
π[k-1,k,A] := P(A → wk)
3.
for i := k-2 downto 0 do
4.
for j := k-1 downto i+1 do
5.
π[i,j,A] :=
max [π[i,j,B] * π[j,k,C] * P(A → BC) ]
for each A → BC  G
6. return max[π(1,n,S)]
December 2011
CSA3202: PCFGs
14


i
December 2011
w
i+1


w


j
CSA3202: PCFGs
w
k-1


w


k
15
Probabilistic CKY Table
December 2011
CSA3202: PCFGs
16
Problems with PCFGs
• Fundamental Independence Assumptions:
A CFG assumes that the expansion of any
one non-terminal is independent of the
expansion of any other non-terminal.
• Hence rule probabilities are multiplied
together.
December 2011
CSA3202: PCFGs
17
Fundamental Independence
Assumptions
• P(Nk,l → γ |
anything above Nk,l
in the tree)
= P(Nk,l → γ )
• P(Nk,l → γ |
anything outside k
through l)
= P(Nk,l → γ )
December 2011
CSA3202: PCFGs
Nk,l
wk
wl
18
Evidence Against
Independence Assumptions
• By examination of text corpora, it has been shown
(Kuno 1972) that there is a strong tendency (c.
2:1) in English and other languages for subject of
a sentence to be pronoun:
– She's able to take her baby to work versus
– Joanna worked until she had a family
• whilst the object tends to be a non-pronoun
– All the people signed the confessions
– Some laws prohibit it
December 2011
CSA3202: PCFGs
19
Dependencies cannot be stated
• These dependencies could be captured if it
were possible to say that the probabilities
associated with, e.g.
NP → Pronoun or
NP → Det Noun
depend on whether NP is subject or object
• However, this cannot normally be said in a
standard PCFG.
December 2011
CSA3202: PCFGs
20
Lack of Sensitivity to the properties
of individual Words
• Lexical information can play an important
role in selecting between alternative
interpretations. Consider sentence:
• "Moscow sent soldiers into Afghanistan."
• NP → NP PP
VP → VP PP
• These give rise to 2 parse trees
December 2011
CSA3202: PCFGs
21
PP Attachment Ambiguity
33% of PPs attach to VPs
67% of PPs attach to NPs
S
S
VP
VP
VP
VP
PP
NP
VP
NP
N
V
N
P
NP
NP
N
N
NP
PP
NP
Moscow sent soldiers into Afghanistan
December 2011
NP
V
N
P
N
Moscow sent soldiers into Afghanistan
CSA3202: PCFGs
22
Lexical Properties
• In this case the raw statistics are misleading
and yield the wrong conclusion.
• The correct parse should be decided on the
basis of the lexical properties of the verb
"send into" alone, since we know that the
basic pattern for this verb is
(NP) send (NP) (PPinto)
where the PPinto attaches to the VP.
December 2011
CSA3202: PCFGs
23
Lexicalised PCFGs
• Basic idea: each syntactic constituent is
associated with a head which is a single
word.
• Each non-terminal in a parse tree is
annotated with that single word.
December 2011
CSA3202: PCFGs
24
Lexicalised Tree
December 2011
CSA3202: PCFGs
25
Generating Lexicalised Parse Trees
• To generate such a tree, each rule must
identify exactly one right hand side
constituent to be the head daughter.
• Then the headword of a node is inherited
from the headword of the head daughter.
• In the case of a lexical item, the head is
clearly itself (though the word might
undergo minor inflectional modification).
December 2011
CSA3202: PCFGs
26
Finding the Head Constituent
• In some cases this is very easy, e.g.
NP[N] → Det N
VP[V] → V NP
• In other cases it isn't
water meter adjustment lever
• Modern linguistic theories include a
component that define what heads are.
December 2011
CSA3202: PCFGs
27
Collins (1999)
• To find the head of an NP
• If it is tagged "possessive" then return the
last word (e.g. John's mother's car)
• Else search left to right for the first child
that is a lexical noun
• Else search left to right for the first NP
December 2011
CSA3202: PCFGs
28
Discovery of Probabilities
Normal Rules
• Use a corpus of already parsed sentences.
• Example: Penn Treebank (Marcus et al
1993) – parse trees for 1M word Brown
Corpus.
P(α→β|α) = C(α→β)/ΣγC(α→γ)
= C(α→β)/C(α)
• Parse corpus and take statistics. Has to
account for ambiguity.
December 2011
CSA3202: PCFGs
29
Discovery of Probabilities
Lexicalised Rules
• Need to establish probabilities of, e.g.
VP(dumped) → V(dumped) NP(sacks) PP(into)
VP(dumped) → V(dumped) NP(cats) PP(into)
VP(dumped) → V(dumped) NP(hats) PP(into)
VP(dumped) → V(dumped) NP(sacks) PP(above)
• Problem – no corpus big enough to train with this
number of rules
December 2011
CSA3202: PCFGs
30