Parsing with Features
Download
Report
Transcript Parsing with Features
CS60057
Speech &Natural Language
Processing
Autumn 2007
Lecture 16
5 September 2007
Lecture 1, 7/21/2005
Natural Language Processing
1
Parsing with features
We need to constrain the rules in CFGs, for example
to coerce agreement within and between constituents
to pass features around
to enforce subcategorisation constraints
Features can be easily added to our grammars
And later we’ll see that feature bundles can completely replace
constituents
Lecture 1, 7/21/2005
Natural Language Processing
2
Parsing with features
Rules can stipulate values, or placeholders (variables) for values
Features can be used within the rule, or passed up via the mother
nodes
Example: subject-verb agreement
S NP VP [if NP and VP agree in number]
number of NP depends on noun and/or determiner
number of VP depends on verb
S NP(num=X) VP (num=X)
NP (num=X) det(num=X) n (num=X)
VP(num=X) v(num=X) NP(num=?)
Lecture 1, 7/21/2005
Natural Language Processing
3
Declarative nature of features
The rules can be used in various
ways
NP (num=X) det(num=X) n (num=X)
To build an NP only if det and n
agree (bottom-up)
When generating an NP, to
choose an n which agrees with
the det (if working L-to-R) (topdown)
To show that the num value for an
NP comes from its components
(percolation)
To ensure that the num value is
correctly set when generating an
NP (inheritance)
To block ill-formed input
this
det (num=sg)
these det (num=pl)
the
det (num=?)
man
n (num=sg)
men
n (num=pl)
NP (num=sg)
n(num=pl)
det(num=sg) n(num=sg)
this
Lecture 1, 7/21/2005
Natural Language Processing
men
man
4
Use of variables
NP (num=X) det(num=X) n (num=X)
Unbound (unassigned)
variables (ie variables with a
this det (num=sg)
free value):
these det (num=pl)
the can combine with any
the
det (num=?)
value for num
man n (num=sg)
Unification means that the
men n (num=pl)
num value for the is set to
NP (num=sg)
sg
det(num=?) n(num=sg)
the
Lecture 1, 7/21/2005
Natural Language Processing
man
5
Parsing with features
Features must be compatible
Formalism should allow features to remain unspecified
Feature mismatch can be used to block false analyses, and
disambiguate
e.g. they can fish ~ he can fish ~ he cans fish
Formalism may have attribute-value pairs, or rely on argument
position
e.g. NP(_num,_sem) det(_num) n (_num,_sem)
an = det(sing)
the = det(_num)
man = n(sing,hum)
Lecture 1, 7/21/2005
Natural Language Processing
6
Parsing with features
Using features to impose
subcategorization constraints
VP v
e.g. dance
VP v NP
e.g. eat
VP v NP NP
e.g. give
VP v PP
e.g. wait (for)
VP(_num) v(_num,intr)
VP(_num) v(_num,trans) NP
VP(_num) v(_num,ditrans) NP NP
VP(_num) v(_num,prepobj(_case)) PP(_case)
PP(_case) prep(_case) NP
dance = v(plur,intr)
dances = v(sing,intr)
danced = v(_num,intr)
waits = v(sing,prepobj(for))
Lecture 1, 7/21/2005
for = prep(for)
Natural Language Processing
7
Parsing with features
(top-down)
S NP(_num) VP(_num)
NP(_num) det(_num) n(_num)
VP(_num) v(_num,intrans)
VP(_num) v (_num,trans) NP(_1)
the man shot those elephants
S NP(_num) VP(_num)
NP(_num) det(_num) n(_num)
S
the = det(_num)
man = n(sing)
_num=sing
NP
(_num)
(sing)
VP
(_num)
(sing)
VP(sing) v(sing,intrans)
shot = v(sing,trans)
VP(sing) v(sing,trans) NP(_1)
det
(_num)
(sing)
v
v
NP
n
(sing,intrans)
(sing) (sing,trans)
(_1)
(_num)
(pl)
shot = v(sing,trans)
NP(_1) det(_1) n(_1)
the
man
those = det(pl)
shot
det
(_1)
(pl)
n
(_1)
(pl)
elephants = n(pl)
Lecture 1, 7/21/2005
Natural Language Processing
those elephants
8
Feature structures
Instead of attaching features to the symbols, we can parse with
symbols made up entirely of attribute-value pairs: “feature
structures”
Can be used in the same way as seen previously
Values can be atomic …
ATTR1 VAL1
… or embedded feature structures …
ATTR2 VAL2
ATTR3 VAL3
CAT
NP
NUMBER SG
PERSON 3
Lecture 1, 7/21/2005
CAT
AGR
Natural Language Processing
NP
NUM SG
PERS 3
9
Unification
Probabilistic CFG
August 31, 2006
Lecture 1, 7/21/2005
Natural Language Processing
10
Feature Structures
A set of feature-value pairs
No feature occurs in more than one feature-value pair
(a partial function from features to values)
Circular structures are prohibited.
Lecture 1, 7/21/2005
Natural Language Processing
11
Structured Feature Structure
Part of a third-person singular NP:
Lecture 1, 7/21/2005
Natural Language Processing
12
Reentrant Feature Structure
Two features can share a feature structure as value.
Not the same thing as them having equivalent values!
Two distinct feature structure values:
One shared value (reentrant feature
structure):
Lecture 1, 7/21/2005
Natural Language Processing
13
they can be coindexed
CAT
HEAD
S
AGR 1
SUBJ
Lecture 1, 7/21/2005
NUM SG
PERS 3
[ AGR 1 ]
Natural Language Processing
14
Parsing with feature structures
Grammar rules can specify assignments to or equations
between feature structures
Expressed as “feature paths”
e.g. HEAD.AGR.NUM = SG
CAT
S
HEAD
AGR 1
SUBJ
Lecture 1, 7/21/2005
NUM SG
PERS 3
[ AGR 1 ]
Natural Language Processing
15
Subsumption ()
(Partial) ordering of feature structures
Based on relative specificity
The second structure carry less information,
but is more general (or subsumes) the first.
Lecture 1, 7/21/2005
Natural Language Processing
18
Subsumption
A more abstract (less specific) feature structure subsumes an equally or
more specific one.
A feature structure F subsumes a feature structure G ( F G)
if and only if :
For every structure x in F, F(x) G(x) (where F(x) means the value of
the feature x of the feature structure F).
For all paths p and q in F such that F(p)=F(q), it is also the case that
G(p)=G(q).
An atomic feature structure neither subsumes
nor is subsumed by another atomic feature structure.
Variables subsume all other feature structures.
A feature structure F subsumes a feature structure G (F G) if if all parts of F
subsumes all parts of G.
Lecture 1, 7/21/2005
Natural Language Processing
19
Subsumption Example
Consider the following feature structures:
(1)
(2)
(3)
NUMBER
SG
PERSON
3
NUMBER SG
PERSON 3
(1) (3)
(2) (3)
but there is no subsumption relation between (1) and (2)
Lecture 1, 7/21/2005
Natural Language Processing
20
Feature Structures in The Grammar
We will incorporate the feature structures and the unification process
as follows:
All constituents (non-terminals) will be associated with
feature structures.
Sets of unification constraints will be associated with
grammar rules, and these rules must be satisfied for the
rule to be satisfied.
These attachments accomplish the following goals:
To associate feature structures with both lexical items and
instances of grammatical categories.
To guide the composition of feature structures for larger
grammatical constituents based on the feature structures
of their component parts.
Lecture
1, 7/21/2005
Natural Language
Processing
21
To enforce compatibility
constraints
between specified
parts of grammatical constraints.
Feature unification
Feature structures can be unified if
They have like-named attributes that have the same
value:
[NUM SG] [NUM SG] = [NUM SG]
Like-named attributes that are “open” get the value
assigned:
CAT
NP
NUMBER ??
PERSON 3
Lecture 1, 7/21/2005
NUMBER SG
PERSON 3
Natural Language Processing
=
CAT
NP
NUMBER SG
PERSON 3
22
Feature unification
Complementary features are brought together
CAT
NP
NUMBER SG
[PERSON
3]
=
Unification is recursive
CAT
NP
AGR [NUM SG]
CAT
NP
AGR [PERS 3]
CAT
NP
NUMBER SG
PERSON 3
CAT
=
AGR
NP
NUM SG
PERS 3
Coindexed structures are identical (not just copies):
assignment to oneNatural
effects
all
Lecture 1, 7/21/2005
Language Processing
23
Example
CAT
AGR
SEM
NP
_1 _2
_3
CAT
DET
CAT DET
AGR _1
a
CAT
DET
AGR [VAL DEF]
the
CAT
LEX
AGR
SEM
man
VAL INDEF
AGR NUM SG
N
“man”
[NUM SG]
HUM
Lecture 1, 7/21/2005
CAT N
AGR _2
SEM _3
Natural Language Processing
24
the man
CAT
NP
AGR VAL
_1 DEF
NUM SG
_2
SEM
[_3]
HUM
CAT DET
AGR [VAL
_1 DEF]
CAT
DET
AGR [VAL DEF]
the
CAT
LEX
AGR
SEM
man
N
“man”
[NUM SG]
HUM
Lecture 1, 7/21/2005
Natural Language Processing
CAT N
AGR
LEX _2“man”
SEM
AGR_3 [NUM SG]
SEM HUM
25
a man
CAT
AGR
AGR
SEM
CAT
NP
_1
VAL INDEF
NUM
VALSG
INDEF
NUM_2
SG
[NUM
SG]
AGR VAL
_1 INDEF
NUM SG
CAT N
AGR
LEX _2“man”
SEM
AGR_3 [NUM SG]
SEM HUM
[_3]
HUM
DET
VAL INDEF
AGR NUM SG
CAT
LEX
AGR
SEM
CAT DET
N
“man”
[NUM SG]
HUM
Lecture 1, 7/21/2005
a
man
Natural Language Processing
26
Types and inheritance
Feature typing allows us to constrain possible values a feature can
have
e.g. num = {sing,plur}
Allows grammars to be checked for consistency, and can
make parsing easier
We can express general “feature co-occurrence conditions” …
And “feature inheritance rules”
Both these allow us to make the grammar more compact
Lecture 1, 7/21/2005
Natural Language Processing
27
Co-occurrence conditions and
Inheritance rules
General rules (beyond simple unification) which apply
automatically, and so do not need to be stated (and
repeated) in each rule or lexical entry
Examples:
[cat=np] [num=??, gen=??, case=??]
[cat=v,num=sg] [tns=pres]
[attr1=val1] [attr2=val2]
Lecture 1, 7/21/2005
Natural Language Processing
28
Inheritance rules
Inheritance rules can be over-ridden
e.g. [cat=n] [gen=??,sem=??]
sex={male,female}
gen={masc,fem,neut}
[cat=n,gen=fem,sem=hum] [sex=female]
uxor [cat=n,gen=fem,sem=hum]
agricola [cat=n,gen=fem,sem=hum,sex=male]
Lecture 1, 7/21/2005
Natural Language Processing
29
Unification in Linguistics
Lexical Functional Grammar
If interested, see PARGRAM project
GPSG, HPSG
Construction Grammar
Categorial Grammar
Lecture 1, 7/21/2005
Natural Language Processing
30
Unification
Joining the contents of two feature structures into one
new
(the union of the two originals).
The union is most general feature structure subsumed
by both.
The union of two contradictory feature structures is
undefined
(unification fails).
Lecture 1, 7/21/2005
Natural Language Processing
31
Unification Constraints
Each grammar rule will be associated with a set of
unification constraints.
0 1 … n
{set of unification constraints}
Each unification constraint will be in one of the
following forms.
< i feature path> = Atomic value
< i feature path> = < j feature path>
Lecture 1, 7/21/2005
Natural Language Processing
32
Unification Constraints -- Example
For example, the following rule
S NP VP
Only if the number of the NP is equal to the number of the VP.
will be represented as follows:
S NP VP
<NP NUMBER> = <VP NUMBER>
Lecture 1, 7/21/2005
Natural Language Processing
33
Agreement Constraints
S NP VP
<NP NUMBER> = <VP NUMBER>
S Aux NP VP
<Aux AGREEMENT> = <NP AGREEMENT>
NP Det NOMINAL
<Det AGREEMENT> = <NOMINAL AGREEMENT>
<NP AGREEMENT> = <NOMINAL AGREEMENT>
NOMINAL Noun
<NOMINAL AGREEMENT> = <Noun AGREEMENT>
VP Verb NP
<VP AGREEMENT> = <Verb AGREEMENT>
Lecture 1, 7/21/2005
Natural Language Processing
34
Agreement Constraints -- Lexicon
Entries
Aux does
Aux do
<Aux AGREEMENT NUMBER> = SG
<Aux AGREEMENT PERSON> = 3
<Aux AGREEMENT NUMBER> = PL
Det these
Det this
<Det AGREEMENT NUMBER> = PL
<Det AGREEMENT NUMBER> = SG
Verb serves <Verb AGREEMENT NUMBER> = SG
<Verb AGREEMENT PERSON> = 3
Verb serve
<Verb AGREEMENT NUMBER> = PL
Noun flights <Noun AGREEMENT NUMBER> = PL
Noun flight <Noun AGREEMENT NUMBER> = SG
Lecture 1, 7/21/2005
Natural Language Processing
35
Head Features
Certain features are copied from children to parent in feature
structures. Example:
AGREEMENT feature in NOMINAL is copied into NP.
The features for most grammatical categories are copied from one of
the children to the parent.
The child that provides the features is called head of the phrase,
and the features copied are referred to as head features.
A verb is a head of a verb phrase, and a nominal is a head of a noun
phrase. We may reflect these constructs in feature structures as
follows:
NP Det NOMINAL
<Det HEAD AGREEMENT> = <NOMINAL HEAD AGREEMENT>
<NP HEAD> = <NOMINAL HEAD>
VP Verb NP
<VP HEAD> = <Verb HEAD>
Lecture 1, 7/21/2005
Natural Language Processing
36
SubCategorization Constraints
For verb phrases, we can represent subcategorization constraints
using three techniques:
Atomic Subcat Symbols
Encoding Subcat lists as feature structures
Minimal Rule Approach (using lists directly)
We may use any of these representations.
Lecture 1, 7/21/2005
Natural Language Processing
37
Atomic Subcat Symbols
VP Verb
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT> = INTRANS
VP Verb NP
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT> = TRANS
VP Verb NP NP
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT> = DITRANS
Verb slept
<Verb HEAD SUBCAT> = INTRANS
Verb served <Verb HEAD SUBCAT> = TRANS
Verb gave
<Verb HEAD SUBCAT> = DITRANS
Lecture 1, 7/21/2005
Natural Language Processing
38
Encoding Subcat Lists as Features
Verb gave
<Verb HEAD SUBCAT FIRST CAT> = NP
<Verb HEAD SUBCAT SECOND CAT> = NP
<Verb HEAD SUBCAT THIRD> = END
VP Verb NP NP
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT FIRST CAT> = <NP CAT>
<VP HEAD SUBCAT SECOND CAT> = <NP CAT>
<VP HEAD SUBCAT THIRD> = END
We are only encoding lists using positional features
Lecture 1, 7/21/2005
Natural Language Processing
39
Minimal Rule Approach
In fact, we do not use symbols like SECOND, THIRD.
They are just used to encode lists. We can use lists
directly (similar to LISP).
<SUBCAT FIRST CAT> = NP
<SUBCAT REST FIRST CAT> = NP
<SUBCAT REST REST> = END
Lecture 1, 7/21/2005
Natural Language Processing
40
Subcategorization Frames for Lexical
Entries
We can use two different notations to represent subcategorization
frames for lexical entries (verbs).
Verb want
<Verb HEAD SUBCAT FIRST CAT> = NP
Verb want
<Verb HEAD SUBCAT FIRST CAT> = VP
<Verb HEAD SUBCAT FIRST FORM> = INFINITITIVE
ORTH
CAT
HEAD
Lecture 1, 7/21/2005
WANT
VERB
VP
CAT
SUBCAT CAT NP,
HEAD VFORM
Natural Language Processing
INFINITIVE
41
Implementing Unification
The representation we have used cannot facilitate the destructive
merger aspect of unification algorithm.
For this reason, we add additional features (additional edges to DAGs)
into our feature structures.
Each feature structure will consists of two fields:
Content Field -- This field can be NULL or may contain
ordinary feature structure.
Pointer Field -- This field can be NULL or may contain a
pointer into another feature structure.
If the pointer field of a DAG is NULL, the content field of DAG contains
the actual feature structure to be processed.
If the pointer field of a DAG is not NULL, the destination of that
pointer represents the actual feature structure to be processed.
Lecture 1, 7/21/2005
Natural Language Processing
42
Extended Feature Structures
NUMBER SG
PERSON 3
CONTENT
POINTER
Lecture 1, 7/21/2005
NUMBER
PERSON
NULL
CONTENT SG
POINTER NULL
CONTENT 3
POINTER NULL
Natural Language Processing
43
Extended DAG
SG
C
Num
C
P
P
Nul
l
Per
Null
3
C
P
Lecture 1, 7/21/2005
Natural Language Processing
Null
44
Unification of Extended DAGs
NUMBER SG
NUMBER SG PERSON 3
PERSON
3
SG
C
C
Num
P
P
Null
3
C
Null
C
Per
P
P
Lecture 1, 7/21/2005
Null
Null
Natural Language Processing
45
Unification of Extended DAGs
(cont.)
SG
C
C
Num
P
Null
Null
Per
P
Null
C
P
Null
P
3
C
C
Per
P
P
Lecture 1, 7/21/2005
Null
Natural Language Processing
Null
46
Unification Algorithm
function UNIFY(f1,f2) returns fstructure or failure
f1real real contents of f1 /* dereference f1 */
f2real real contents of f2 /* dereference f2 */
if f1real is Null then { f1.pointer f2; return f2; }
else if f2real is Null then { f2.pointer f1; return f1; }
else if f1real and f2real are identical then { f1.pointer f2; return f2; }
else if f1real and f2real are complex feature structures then
{ f2.pointer f1;
for each feature in f2real do
{ otherfeature Find or create a feature corresponding to feature in
f1real;
if UNIFY(feature.value,otherfeature.value) returns failure then
return failure; }
return f1; }
else return failure;
Lecture 1, 7/21/2005
Natural Language Processing
47
Example - Unification of Complex
Structures
AGREEMENT (1)NUMBER SG
SUBJECT
AGREEMENT (1)
SUBJECT AGREEMENT PERSON 3
NUMBER SG
AGREEMENT (1)
PERSON
3
AGREEMENT (1)
SUBJECT
Lecture 1, 7/21/2005
Natural Language Processing
48
Example - Unification of Complex
Structures (cont.)
•
C
•
Agr
•
C
• Null
•
Num
•
C
• Null
• SG
• Null
Per
Sub
•
C
•
Agr
•
C
• Null
C
•
• Null
• Null
•
C
•
• Null
Lecture 1, 7/21/2005
Sub
•
C
•
• Null
• Null
Agr
•
C
•
• Null
Natural Language Processing
Per
•
C
•3
• Null
49
Parsing with Unification Constraints
Let us assume that we have augmented our grammar with sets of
unification constraints.
What changes do we need to make a parser to make use of them?
Building feature structures and associate them with subtrees.
Unifying feature structures when sub-trees are created.
Blocking ill-formed constituents
Lecture 1, 7/21/2005
Natural Language Processing
50
Earley Parsing with Unification
Constraints
What do we have to do to integrate unification constraints with
Early Parser?
Building feature structures (represented as DAGs) and
associate them with states in the chart.
Unifying feature structures as states are advanced in the
chart.
Blocking ill-formed states from entering the chart.
The main change will be in COMPLETER function of Earley Parser.
This routine will invoke the unifier to unify two feature structures.
Lecture 1, 7/21/2005
Natural Language Processing
51
Building Feature Structures
NP Det NOMINAL
<Det HEAD AGREEMENT> = <NOMINAL HEAD AGREEMENT>
<NP HEAD> = <NOMINAL HEAD>
corresponds to
NP
Det
NOMINAL
Lecture 1, 7/21/2005
HEAD (1)
HEAD AGREEMENT (2)
HEAD (1)AGREEMENT (2)
Natural Language Processing
52
Augmenting States with DAGs
Each state will have an additional field to contain the DAG
representing the feature structure corresponding to the state.
When a rule is first used by PREDICTOR to create a state,
the DAG associated with the state will simply consist of
the DAG retrieved from the rule.
For example,
S NP VP, [0,0],[],Dag1
where Dag1 is the feature structure corresponding to S NP VP.
NP Det NOMINAL, [0,0],[],Dag2
where Dag2 is the feature structure corresponding to S Det
NOMINAL.
Lecture 1, 7/21/2005
Natural Language Processing
53
What does COMPLETER do?
When COMPLETER advances the dot in a state, it
should unify the feature structure of the newly
completed state with the appropriate part of the
feature structure being advanced.
If this unification process is succesful, the new state
gets the result of the unification as its DAG, and this
new state is entered into the chart. If it fails,
nothing is entered into the chart.
Lecture 1, 7/21/2005
Natural Language Processing
54
A Completion Example
Parsing the phrase that flight after that is processed.
NP Det NOMINAL, [0,1],[SDet],Dag1
Dag1
HEAD (1)
NP
Det
HEAD AGREEMENT (2)NUMBER
NOMINAL HEAD (1)AGREEMENT ( 2)
A newly completed state
NOMINAL Noun , [1,2],[SNoun],Dag2
NOMINAL
Dag2
Noun
HEAD
HEAD
(1)
(1)AGREEMENT
SG
NUMBER SG
To advance in NP, the parser unifies the feature structure found under
the NOMINAL feature of Dag2, with the feature structure found under
the NOMINAL feature of Dag1.
Lecture 1, 7/21/2005
Natural Language Processing
55
Earley Parse
function EARLEY-PARSE(words,grammar) returns chart
ENQUEUE(( S, [0,0], chart[0],dag)
for i from 0 to LENGTH(words) do
for each state in chart[i] do
if INCOMPLETE?(state) and NEXT-CAT(state) is not a PS then
PREDICTOR(state)
elseif INCOMPLETE?(state) and NEXT-CAT(state) is a PS then
SCANNER(state)
else
COMPLETER(state)
end
end
return(chart)
Lecture 1, 7/21/2005
Natural Language Processing
56
Predictor and Scanner
procedure PREDICTOR((A B , [i,j],dagA))
for each (B ) in GRAMMAR-RULES-FOR(B,grammar) do
ENQUEUE((B , [i,j],dagB), chart[j])
end
procedure SCANNER((A B , [i,j],dagA))
if (B PARTS-OF-SPEECH(word[j]) then
ENQUEUE((B word[j] , [j,j+1],dagB), chart[j+1])
end
Lecture 1, 7/21/2005
Natural Language Processing
57
Completer and UnifyStates
procedure COMPLETER((B , [j,k],dagB))
for each (A B , [i,j],dagA) in chart[j] do
if newdag UNIFY-STATES(dagB,dagA,B) fails then
ENQUEUE((A B , [i,k],newdag), chart[k])
end
procedure UNIFY-STATES(dag1,dag2,cat)
dag1cp CopyDag(dag1);
dag2cp CopyDag(dag2);
UNIFY(FollowPath(cat,dag1cp),FollowPath(cat,dag2cp));
end
Lecture 1, 7/21/2005
Natural Language Processing
58
Enqueue
procedure ENQUEUE(state,chart-entry)
if state is not subsumed by a state in chart-entry then
Add state at the end of chart-entry
end
Lecture 1, 7/21/2005
Natural Language Processing
59
Probabilistic Parsing
Slides by Markus Dickinson, Georgetown
University
Lecture 1, 7/21/2005
Natural Language Processing
60
Motivation and Outline
Previously,
we used CFGs to parse with, but:
Some ambiguous sentences could not be
disambiguated, and we would like to know the most
likely parse
How do we get such grammars? Do we write them
ourselves? Maybe we could use a corpus …
Where
we’re going:
Probabilistic Context-Free Grammars (PCFGs)
Lexicalized PCFGs
Dependency Grammars
Lecture 1, 7/21/2005
Natural Language Processing
61
Statistical Parsing
Basic
idea
Start with a treebank
a
collection of sentences with syntactic annotation, i.e.,
already-parsed sentences
Examine
which parse trees occur frequently
Extract grammar rules corresponding to those parse
trees, estimating the probability of the grammar rule
based on its frequency
That is, we’ll have a CFG augmented with probabilities
Lecture 1, 7/21/2005
Natural Language Processing
62
Using Probabilities to Parse
P(T): probability of a particular
parse tree
P(T) = ΠnєT p(r(n))
i.e., the product of the
probabilities of all the rules r
used to expand each node n in
the parse tree
Example: given the probabilities on
p. 449, compute the probability of
the parse tree on the right
Lecture 1, 7/21/2005
Natural Language Processing
63
Computing probabilities
We
have the following rules and probabilities (adapted
from Figure 12.1):
S VP
.05
VP V NP
.40
NP Det N
.20
V book
.30
Det that
.05
N flight
.25
P(T) = P(SVP)*P(VPV NP)*…*P(Nflight)
= .05*.40*.20*.30*.05*.25 = .000015, or 1.5 x 10-5
Lecture 1, 7/21/2005
Natural Language Processing
64
Using probabilities
So,
the probability for that parse is 0.000015. What’s
the big deal?
Probabilities are useful for comparing with other
probabilities
Whereas we couldn’t decide between two parses using
a regular CFG, we now can.
For example, TWA flights is ambiguous between being
two separate NPs (cf. I gave [NP John] [NP money]) or
one NP:
A: [book [TWA] [flights]]
B: [book [TWA flights]]
Probabilities
allows Natural
us to
choose
Lecture
1, 7/21/2005
Language
Processingchoice B (see figure
65
12.2)
Obtaining the best parse
Call
the best parse T(S), where S is your sentence
Get the tree which has the highest probability, i.e.
T(S) = argmaxTєparse-trees(S) P(T)
Can use the Cocke-Younger-Kasami (CYK) algorithm to
calculate best parse
CYK is a form of dynamic programming
CYK is a chart parser, like the Earley parser
Lecture 1, 7/21/2005
Natural Language Processing
66
The CYK algorithm
Base
case
Add words to the chart
Store P(A w_i) for every category A in the chart
Recursive case makes this dynamic programming
because we only calculate B and C once
Rules must be of the form A BC, i.e., exactly two
items on the RHS (we call this Chomsky Normal
Form (CNF))
Get the probability for A at this node by multiplying
the probabilities for B and for C by P(A BC)
P(B)*P(C)*P(A
For
BC)
a given A, only keep
the Processing
maximum probability
Natural Language
(again, this is dynamic programming)
Lecture 1, 7/21/2005
67
Problems with PCFGs
It’s
still only a CFG, so dependencies on non-CFG info
not captured
e.g., Pronouns are more likely to be subjects than
objects:
P[(NPPronoun) | NP=subj] >> P[(NPPronoun) |
NP =obj]
Ignores lexical information (statistics), which is usually
crucial for disambiguation
(T1) America sent [[250,000 soldiers] [into Iraq]]
(T2) America sent [250,000 soldiers] [into Iraq]
send
Lecture 1, 7/21/2005
To
with into-PP always attached high (T2) in PTB!
Natural Language Processing
handle lexical information, we’ll turn to lexicalized
68
Lexicalized Grammars
The
head information is passed up in a syntactic
analysis?
e.g., VP[head *1] V[head *1] NP
Well, if you follow this down all the way to the bottom of
a tree, you wind up with a head word
In
some sense, we can say that Book that flight is not
just an S, but an S rooted in book
Thus, book is the headword of the whole sentence
By
adding headword information to nonterminals, we
Lecturewind
1, 7/21/2005
Natural Language
Processing
up with a lexicalized
grammar
69
Lexicalized PCFGs
Lexicalized Parse Trees
Each PCFG rule in a tree
is augmented to identify
one RHS constituent to
be the head daughter
The headword for a node
is set to the head word of
its head daughter
Lecture 1, 7/21/2005
Natural Language Processing
[book]
[book]
[flight]
[flight]
70
Incorporating Head Proabilities:
Wrong Way
adding headword w to node won’t work:
So, the node A becomes A[w]
e.g., P(A[w]β|A) =Count(A[w]β)/Count(A)
Simply
The
probabilities are too small, i.e., we don’t have a big
enough corpus to calculate these probabilities
VP(dumped) VBD(dumped) NP(sacks) PP(into)
3x10-10
VP(dumped) VBD(dumped) NP(cats) PP(into)
8x10-11
These probabilities are tiny, and others will never occur
Lecture 1, 7/21/2005
Natural Language Processing
71
Incorporating head probabilities:
Right way
Previously,
we conditioned on the mother node (A):
P(Aβ|A)
Now, we can condition on the mother node and the
headword of A (h(A)):
P(Aβ|A, h(A))
We’re no longer conditioning on simply the mother
category A, but on the mother category when h(A) is the
head
e.g., P(VPVBD NP PP | VP, dumped)
Lecture 1, 7/21/2005
Natural Language Processing
72
Calculating rule probabilities
We’ll
write the probability more generally as:
P(r(n) | n, h(n))
where n = node, r = rule, and h = headword
We calculate this by comparing how many times the
rule occurs with h(n) as the headword versus how many
times the mother/headword combination appear in total:
P(VP VBD NP PP | VP, dumped)
= C(VP(dumped) VBD NP PP)/ Σβ C(VP(dumped)
β)
Lecture 1, 7/21/2005
Natural Language Processing
73
Adding info about word-word
dependencies
We
want to take into account one other factor: the
probability of being a head word (in a given context)
P(h(n)=word | …)
We condition this probability on two things: 1. the
category of the node (n), and 2. the headword of the
mother (h(m(n)))
P(h(n)=word | n, h(m(n))), shortened as: P(h(n) | n,
h(m(n)))
P(sacks | NP, dumped)
What we’re really doing is factoring in how words relate
to each other
We
will call this a dependency
relation later: sacks is 74
Lecture
1, 7/21/2005
Natural Language Processing
dependent on dumped, in this case
Putting it all together
See
p. 459 for an example lexicalized parse tree for
workers dumped sacks into a bin
For
rules r, category n, head h, mother m
P(T) =
ΠnєT
p(r(n)| n, h(n))
e.g., P(VP VBD NP PP |VP, dumped)
subcategorization info
*
p(h(n) | n, h(m(n)))
e.g. P(sacks | NP, dumped)
dependency info between words
Lecture 1, 7/21/2005
Natural Language Processing
75
Dependency Grammar
Capturing
relations between words (e.g. dumped and
sacks) is moving in the direction of dependency
grammar (DG)
In
DG, there is no such thing as constituency
The structure of a sentence is purely the binary
relations between words
John loves Mary is represented as:
LOVE JOHN
LOVE MARY
where A B means that B depends on A
Lecture 1, 7/21/2005
Natural Language Processing
76
Evaluating Parser Output
Dependency
relations are also useful for comparing
parser output to a treebank
Traditional measures of parser accuracy:
Labeled bracketing precision:
# correct constituents in parse/# constituents in parse
Labeled
bracketing recall:
# correct constituents in parse/# (correct) constituents in
treebank parse
There
are known problems with these measures, so
people are trying to use dependency-based measures
instead
How many dependency relations did the parse get
Lecture 1, 7/21/2005
Natural Language Processing
correct?
77