Parsing with Features

Transcript Parsing with Features

CS60057
Speech &Natural Language
Processing
Autumn 2007
Lecture 16
5 September 2007
Lecture 1, 7/21/2005
Natural Language Processing
1
Parsing with features

We need to constrain the rules in CFGs, for example





to coerce agreement within and between constituents
to pass features around
to enforce subcategorisation constraints
Features can be easily added to our grammars
And later we’ll see that feature bundles can completely replace
constituents
Lecture 1, 7/21/2005
Natural Language Processing
2
Parsing with features



Rules can stipulate values, or placeholders (variables) for values
Features can be used within the rule, or passed up via the mother
nodes
Example: subject-verb agreement
S  NP VP [if NP and VP agree in number]
number of NP depends on noun and/or determiner
number of VP depends on verb
S  NP(num=X) VP (num=X)
NP (num=X)  det(num=X) n (num=X)
VP(num=X)  v(num=X) NP(num=?)
Lecture 1, 7/21/2005
Natural Language Processing
3
Declarative nature of features
The rules can be used in various
ways





NP (num=X)  det(num=X) n (num=X)
To build an NP only if det and n
agree (bottom-up)
When generating an NP, to
choose an n which agrees with
the det (if working L-to-R) (topdown)
To show that the num value for an
NP comes from its components
(percolation)
To ensure that the num value is
correctly set when generating an
NP (inheritance)
To block ill-formed input
this
det (num=sg)
these det (num=pl)
the
det (num=?)
man
n (num=sg)
men
n (num=pl)
NP (num=sg)
n(num=pl)
det(num=sg) n(num=sg)
this
Lecture 1, 7/21/2005
Natural Language Processing
men
man
4
Use of variables

NP (num=X)  det(num=X) n (num=X)
Unbound (unassigned)
variables (ie variables with a
this det (num=sg)
free value):
these det (num=pl)
 the can combine with any
the
det (num=?)
value for num
man n (num=sg)
 Unification means that the
men n (num=pl)
num value for the is set to
NP (num=sg)
sg
det(num=?) n(num=sg)
the
Lecture 1, 7/21/2005
Natural Language Processing
man
5
Parsing with features



Features must be compatible
Formalism should allow features to remain unspecified
Feature mismatch can be used to block false analyses, and
disambiguate


e.g. they can fish ~ he can fish ~ he cans fish
Formalism may have attribute-value pairs, or rely on argument
position
e.g. NP(_num,_sem)  det(_num) n (_num,_sem)
an = det(sing)
the = det(_num)
man = n(sing,hum)
Lecture 1, 7/21/2005
Natural Language Processing
6
Parsing with features

Using features to impose
subcategorization constraints
VP  v
e.g. dance
VP v NP
e.g. eat
VP  v NP NP
e.g. give
VP  v PP
e.g. wait (for)
VP(_num)  v(_num,intr)
VP(_num)  v(_num,trans) NP
VP(_num)  v(_num,ditrans) NP NP
VP(_num)  v(_num,prepobj(_case)) PP(_case)
PP(_case)  prep(_case) NP
dance = v(plur,intr)
dances = v(sing,intr)
danced = v(_num,intr)
waits = v(sing,prepobj(for))
Lecture 1, 7/21/2005
for = prep(for)
Natural Language Processing
7
Parsing with features
(top-down)
S  NP(_num) VP(_num)
NP(_num)  det(_num) n(_num)
VP(_num)  v(_num,intrans)
VP(_num)  v (_num,trans) NP(_1)
the man shot those elephants
S  NP(_num) VP(_num)
NP(_num)  det(_num) n(_num)
S
the = det(_num)
man = n(sing)
_num=sing
NP
(_num)
(sing)
VP
(_num)
(sing)
VP(sing)  v(sing,intrans)
shot = v(sing,trans)
VP(sing)  v(sing,trans) NP(_1)
det
(_num)
(sing)
v
v
NP
n
(sing,intrans)
(sing) (sing,trans)
(_1)
(_num)
(pl)
shot = v(sing,trans)
NP(_1)  det(_1) n(_1)
the
man
those = det(pl)
shot
det
(_1)
(pl)
n
(_1)
(pl)
elephants = n(pl)
Lecture 1, 7/21/2005
Natural Language Processing
those elephants
8
Feature structures




Instead of attaching features to the symbols, we can parse with
symbols made up entirely of attribute-value pairs: “feature
structures”
Can be used in the same way as seen previously
Values can be atomic …
ATTR1 VAL1
… or embedded feature structures …
ATTR2 VAL2
ATTR3 VAL3
CAT
NP
NUMBER SG
PERSON 3
Lecture 1, 7/21/2005
CAT
AGR
Natural Language Processing
NP
NUM SG
PERS 3
9
Unification
Probabilistic CFG
August 31, 2006
Lecture 1, 7/21/2005
Natural Language Processing
10
Feature Structures

A set of feature-value pairs

No feature occurs in more than one feature-value pair
(a partial function from features to values)

Circular structures are prohibited.
Lecture 1, 7/21/2005
Natural Language Processing
11
Structured Feature Structure
Part of a third-person singular NP:
Lecture 1, 7/21/2005
Natural Language Processing
12
Reentrant Feature Structure

Two features can share a feature structure as value.
Not the same thing as them having equivalent values!

Two distinct feature structure values:

One shared value (reentrant feature
structure):
Lecture 1, 7/21/2005
Natural Language Processing
13

they can be coindexed
CAT
HEAD
S
AGR 1
SUBJ
Lecture 1, 7/21/2005
NUM SG
PERS 3
[ AGR 1 ]
Natural Language Processing
14
Parsing with feature structures


Grammar rules can specify assignments to or equations
between feature structures
Expressed as “feature paths”
e.g. HEAD.AGR.NUM = SG
CAT
S
HEAD
AGR 1
SUBJ
Lecture 1, 7/21/2005
NUM SG
PERS 3
[ AGR 1 ]
Natural Language Processing
15
Subsumption ()



(Partial) ordering of feature structures
Based on relative specificity
The second structure carry less information,
but is more general (or subsumes) the first.
Lecture 1, 7/21/2005
Natural Language Processing
18
Subsumption

A more abstract (less specific) feature structure subsumes an equally or
more specific one.

A feature structure F subsumes a feature structure G ( F  G)
if and only if :




For every structure x in F, F(x)  G(x) (where F(x) means the value of
the feature x of the feature structure F).

For all paths p and q in F such that F(p)=F(q), it is also the case that
G(p)=G(q).
An atomic feature structure neither subsumes
nor is subsumed by another atomic feature structure.
Variables subsume all other feature structures.
A feature structure F subsumes a feature structure G (F  G) if if all parts of F
subsumes all parts of G.
Lecture 1, 7/21/2005
Natural Language Processing
19
Subsumption Example
Consider the following feature structures:
(1)
(2)
(3)
NUMBER
SG
PERSON
3
 NUMBER SG 
 PERSON 3 


(1)  (3)
(2)  (3)
but there is no subsumption relation between (1) and (2)
Lecture 1, 7/21/2005
Natural Language Processing
20
Feature Structures in The Grammar

We will incorporate the feature structures and the unification process
as follows:



All constituents (non-terminals) will be associated with
feature structures.
Sets of unification constraints will be associated with
grammar rules, and these rules must be satisfied for the
rule to be satisfied.
These attachments accomplish the following goals:
To associate feature structures with both lexical items and
instances of grammatical categories.
 To guide the composition of feature structures for larger
grammatical constituents based on the feature structures
of their component parts.
Lecture
1, 7/21/2005
Natural Language
Processing
21
To enforce compatibility
constraints
between specified
parts of grammatical constraints.

Feature unification

Feature structures can be unified if
 They have like-named attributes that have the same
value:
[NUM SG]  [NUM SG] = [NUM SG]
 Like-named attributes that are “open” get the value
assigned:
CAT
NP
NUMBER ??
PERSON 3
Lecture 1, 7/21/2005

NUMBER SG
PERSON 3
Natural Language Processing
=
CAT
NP
NUMBER SG
PERSON 3
22
Feature unification

Complementary features are brought together

CAT
NP
NUMBER SG

[PERSON
3]
=
Unification is recursive
CAT
NP
AGR [NUM SG]

CAT
NP
AGR [PERS 3]
CAT
NP
NUMBER SG
PERSON 3
CAT
=
AGR
NP
NUM SG
PERS 3
Coindexed structures are identical (not just copies):
assignment to oneNatural
effects
all
Lecture 1, 7/21/2005
Language Processing

23
Example
CAT
AGR
SEM
NP
_1  _2
_3
CAT
DET

CAT DET
AGR _1

a
CAT
DET
AGR [VAL DEF]

the
CAT
LEX
AGR
SEM

man
VAL INDEF
AGR NUM SG
N
“man”
[NUM SG]
HUM
Lecture 1, 7/21/2005
CAT N
AGR _2
SEM _3
Natural Language Processing
24
the man
CAT
NP
AGR VAL
_1 DEF

NUM SG
_2
SEM
[_3]
HUM

CAT DET
AGR [VAL
_1 DEF]
CAT
DET
AGR [VAL DEF]

the
CAT
LEX
AGR
SEM

man
N
“man”
[NUM SG]
HUM
Lecture 1, 7/21/2005
Natural Language Processing
CAT N
AGR
LEX _2“man”
SEM
AGR_3 [NUM SG]
SEM HUM
25
a man
CAT
AGR
AGR
SEM
CAT
NP
_1
VAL INDEF
NUM
VALSG
INDEF

NUM_2
SG
[NUM
SG]
AGR VAL
_1 INDEF
NUM SG
CAT N
AGR
LEX _2“man”
SEM
AGR_3 [NUM SG]
SEM HUM
[_3]
HUM
DET
VAL INDEF
AGR NUM SG
CAT
LEX
AGR
SEM

CAT DET
N
“man”
[NUM SG]
HUM
Lecture 1, 7/21/2005

a

man
Natural Language Processing
26
Types and inheritance

Feature typing allows us to constrain possible values a feature can
have





e.g. num = {sing,plur}
Allows grammars to be checked for consistency, and can
make parsing easier
We can express general “feature co-occurrence conditions” …
And “feature inheritance rules”
Both these allow us to make the grammar more compact
Lecture 1, 7/21/2005
Natural Language Processing
27
Co-occurrence conditions and
Inheritance rules


General rules (beyond simple unification) which apply
automatically, and so do not need to be stated (and
repeated) in each rule or lexical entry
Examples:
[cat=np]  [num=??, gen=??, case=??]
[cat=v,num=sg]  [tns=pres]
[attr1=val1]   [attr2=val2]
Lecture 1, 7/21/2005
Natural Language Processing
28
Inheritance rules

Inheritance rules can be over-ridden
e.g. [cat=n]  [gen=??,sem=??]
sex={male,female}
gen={masc,fem,neut}
[cat=n,gen=fem,sem=hum]  [sex=female]
uxor [cat=n,gen=fem,sem=hum]
agricola [cat=n,gen=fem,sem=hum,sex=male]
Lecture 1, 7/21/2005
Natural Language Processing
29
Unification in Linguistics




Lexical Functional Grammar
 If interested, see PARGRAM project
GPSG, HPSG
Construction Grammar
Categorial Grammar
Lecture 1, 7/21/2005
Natural Language Processing
30
Unification

Joining the contents of two feature structures into one
new
(the union of the two originals).

The union is most general feature structure subsumed
by both.

The union of two contradictory feature structures is
undefined
(unification fails).
Lecture 1, 7/21/2005
Natural Language Processing
31
Unification Constraints

Each grammar rule will be associated with a set of
unification constraints.
0  1 … n

{set of unification constraints}
Each unification constraint will be in one of the
following forms.
< i feature path> = Atomic value
< i feature path> = < j feature path>
Lecture 1, 7/21/2005
Natural Language Processing
32
Unification Constraints -- Example

For example, the following rule
S  NP VP
Only if the number of the NP is equal to the number of the VP.
will be represented as follows:
S  NP VP
<NP NUMBER> = <VP NUMBER>
Lecture 1, 7/21/2005
Natural Language Processing
33
Agreement Constraints
S  NP VP
<NP NUMBER> = <VP NUMBER>
S  Aux NP VP
<Aux AGREEMENT> = <NP AGREEMENT>
NP  Det NOMINAL
<Det AGREEMENT> = <NOMINAL AGREEMENT>
<NP AGREEMENT> = <NOMINAL AGREEMENT>
NOMINAL  Noun
<NOMINAL AGREEMENT> = <Noun AGREEMENT>
VP  Verb NP
<VP AGREEMENT> = <Verb AGREEMENT>
Lecture 1, 7/21/2005
Natural Language Processing
34
Agreement Constraints -- Lexicon
Entries
Aux  does
Aux  do
<Aux AGREEMENT NUMBER> = SG
<Aux AGREEMENT PERSON> = 3
<Aux AGREEMENT NUMBER> = PL
Det  these
Det  this
<Det AGREEMENT NUMBER> = PL
<Det AGREEMENT NUMBER> = SG
Verb  serves <Verb AGREEMENT NUMBER> = SG
<Verb AGREEMENT PERSON> = 3
Verb  serve
<Verb AGREEMENT NUMBER> = PL
Noun  flights <Noun AGREEMENT NUMBER> = PL
Noun  flight <Noun AGREEMENT NUMBER> = SG
Lecture 1, 7/21/2005
Natural Language Processing
35
Head Features



Certain features are copied from children to parent in feature
structures. Example:
 AGREEMENT feature in NOMINAL is copied into NP.
 The features for most grammatical categories are copied from one of
the children to the parent.
The child that provides the features is called head of the phrase,
and the features copied are referred to as head features.
A verb is a head of a verb phrase, and a nominal is a head of a noun
phrase. We may reflect these constructs in feature structures as
follows:
NP  Det NOMINAL
<Det HEAD AGREEMENT> = <NOMINAL HEAD AGREEMENT>
<NP HEAD> = <NOMINAL HEAD>
VP  Verb NP
<VP HEAD> = <Verb HEAD>
Lecture 1, 7/21/2005
Natural Language Processing
36
SubCategorization Constraints

For verb phrases, we can represent subcategorization constraints
using three techniques:




Atomic Subcat Symbols
Encoding Subcat lists as feature structures
Minimal Rule Approach (using lists directly)
We may use any of these representations.
Lecture 1, 7/21/2005
Natural Language Processing
37
Atomic Subcat Symbols
VP  Verb
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT> = INTRANS
VP  Verb NP
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT> = TRANS
VP  Verb NP NP
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT> = DITRANS
Verb  slept
<Verb HEAD SUBCAT> = INTRANS
Verb  served <Verb HEAD SUBCAT> = TRANS
Verb  gave
<Verb HEAD SUBCAT> = DITRANS
Lecture 1, 7/21/2005
Natural Language Processing
38
Encoding Subcat Lists as Features
Verb  gave
<Verb HEAD SUBCAT FIRST CAT> = NP
<Verb HEAD SUBCAT SECOND CAT> = NP
<Verb HEAD SUBCAT THIRD> = END
VP  Verb NP NP
<VP HEAD> = <Verb HEAD>
<VP HEAD SUBCAT FIRST CAT> = <NP CAT>
<VP HEAD SUBCAT SECOND CAT> = <NP CAT>
<VP HEAD SUBCAT THIRD> = END

We are only encoding lists using positional features
Lecture 1, 7/21/2005
Natural Language Processing
39
Minimal Rule Approach

In fact, we do not use symbols like SECOND, THIRD.
They are just used to encode lists. We can use lists
directly (similar to LISP).
<SUBCAT FIRST CAT> = NP
<SUBCAT REST FIRST CAT> = NP
<SUBCAT REST REST> = END
Lecture 1, 7/21/2005
Natural Language Processing
40
Subcategorization Frames for Lexical
Entries

We can use two different notations to represent subcategorization
frames for lexical entries (verbs).
Verb  want
<Verb HEAD SUBCAT FIRST CAT> = NP
Verb  want
<Verb HEAD SUBCAT FIRST CAT> = VP
<Verb HEAD SUBCAT FIRST FORM> = INFINITITIVE

ORTH

CAT

 HEAD

Lecture 1, 7/21/2005
WANT
VERB

VP
CAT
 SUBCAT  CAT NP, 
 HEAD VFORM

Natural Language Processing




 

INFINITIVE   
41
Implementing Unification



The representation we have used cannot facilitate the destructive
merger aspect of unification algorithm.
For this reason, we add additional features (additional edges to DAGs)
into our feature structures.
Each feature structure will consists of two fields:




Content Field -- This field can be NULL or may contain
ordinary feature structure.
Pointer Field -- This field can be NULL or may contain a
pointer into another feature structure.
If the pointer field of a DAG is NULL, the content field of DAG contains
the actual feature structure to be processed.
If the pointer field of a DAG is not NULL, the destination of that
pointer represents the actual feature structure to be processed.
Lecture 1, 7/21/2005
Natural Language Processing
42
Extended Feature Structures
 NUMBER SG 
 PERSON 3 





CONTENT



 POINTER
Lecture 1, 7/21/2005

 NUMBER


 PERSON

NULL
CONTENT SG   
 POINTER NULL  


CONTENT 3

 POINTER NULL  



Natural Language Processing
43
Extended DAG
 SG
C
Num
C


P

P
 Nul
l
Per
Null

3
C

P
Lecture 1, 7/21/2005

Natural Language Processing
 Null
44
Unification of Extended DAGs
 NUMBER SG 
NUMBER SG  PERSON 3  

PERSON
3


 SG
C
C

Num

P

P

Null
3
C
 Null
C

Per
P

P
Lecture 1, 7/21/2005


Null
 Null
Natural Language Processing
45
Unification of Extended DAGs
(cont.)
 SG
C
C


Num
P

 Null
 Null
Per
P
 Null
C

P
 Null
P
3
C
C

Per
P

P
Lecture 1, 7/21/2005

 Null
Natural Language Processing

Null
46
Unification Algorithm
function UNIFY(f1,f2) returns fstructure or failure
f1real  real contents of f1 /* dereference f1 */
f2real  real contents of f2 /* dereference f2 */
if f1real is Null then { f1.pointer  f2; return f2; }
else if f2real is Null then { f2.pointer  f1; return f1; }
else if f1real and f2real are identical then { f1.pointer  f2; return f2; }
else if f1real and f2real are complex feature structures then
{ f2.pointer  f1;
for each feature in f2real do
{ otherfeature  Find or create a feature corresponding to feature in
f1real;
if UNIFY(feature.value,otherfeature.value) returns failure then
return failure; }
return f1; }
else return failure;
Lecture 1, 7/21/2005
Natural Language Processing
47
Example - Unification of Complex
Structures
 AGREEMENT (1)NUMBER SG  

 SUBJECT

AGREEMENT (1)

SUBJECT AGREEMENT PERSON 3

 NUMBER SG  
AGREEMENT (1) 



PERSON
3



AGREEMENT (1)
 SUBJECT
Lecture 1, 7/21/2005
Natural Language Processing
48
Example - Unification of Complex
Structures (cont.)
•
C
•
Agr
•
C
• Null
•
Num
•
C
• Null
• SG
• Null
Per
Sub
•
C
•
Agr
•
C
• Null
C
•
• Null
• Null
•
C
•
• Null
Lecture 1, 7/21/2005
Sub
•
C
•
• Null
• Null
Agr
•
C
•
• Null
Natural Language Processing
Per
•
C
•3
• Null
49
Parsing with Unification Constraints

Let us assume that we have augmented our grammar with sets of
unification constraints.

What changes do we need to make a parser to make use of them?



Building feature structures and associate them with subtrees.
Unifying feature structures when sub-trees are created.
Blocking ill-formed constituents
Lecture 1, 7/21/2005
Natural Language Processing
50
Earley Parsing with Unification
Constraints

What do we have to do to integrate unification constraints with
Early Parser?




Building feature structures (represented as DAGs) and
associate them with states in the chart.
Unifying feature structures as states are advanced in the
chart.
Blocking ill-formed states from entering the chart.
The main change will be in COMPLETER function of Earley Parser.
This routine will invoke the unifier to unify two feature structures.
Lecture 1, 7/21/2005
Natural Language Processing
51
Building Feature Structures
NP  Det NOMINAL
<Det HEAD AGREEMENT> = <NOMINAL HEAD AGREEMENT>
<NP HEAD> = <NOMINAL HEAD>
corresponds to
 NP
 Det

 NOMINAL
Lecture 1, 7/21/2005
HEAD (1)

HEAD AGREEMENT (2) 
HEAD (1)AGREEMENT (2)
Natural Language Processing
52
Augmenting States with DAGs



Each state will have an additional field to contain the DAG
representing the feature structure corresponding to the state.
When a rule is first used by PREDICTOR to create a state,
the DAG associated with the state will simply consist of
the DAG retrieved from the rule.
For example,
S   NP VP, [0,0],[],Dag1
where Dag1 is the feature structure corresponding to S  NP VP.
NP   Det NOMINAL, [0,0],[],Dag2
where Dag2 is the feature structure corresponding to S  Det
NOMINAL.
Lecture 1, 7/21/2005
Natural Language Processing
53
What does COMPLETER do?

When COMPLETER advances the dot in a state, it
should unify the feature structure of the newly
completed state with the appropriate part of the
feature structure being advanced.

If this unification process is succesful, the new state
gets the result of the unification as its DAG, and this
new state is entered into the chart. If it fails,
nothing is entered into the chart.
Lecture 1, 7/21/2005
Natural Language Processing
54
A Completion Example
Parsing the phrase that flight after that is processed.
NP  Det  NOMINAL, [0,1],[SDet],Dag1
Dag1
HEAD (1)
 NP
 Det
HEAD AGREEMENT (2)NUMBER

 NOMINAL HEAD (1)AGREEMENT ( 2)
A newly completed state
NOMINAL  Noun , [1,2],[SNoun],Dag2
 NOMINAL
Dag2 
 Noun
HEAD
HEAD
(1)
(1)AGREEMENT

SG 



NUMBER SG 
To advance in NP, the parser unifies the feature structure found under
the NOMINAL feature of Dag2, with the feature structure found under
the NOMINAL feature of Dag1.
Lecture 1, 7/21/2005
Natural Language Processing
55
Earley Parse
function EARLEY-PARSE(words,grammar) returns chart
ENQUEUE((   S, [0,0], chart[0],dag)
for i from 0 to LENGTH(words) do
for each state in chart[i] do
if INCOMPLETE?(state) and NEXT-CAT(state) is not a PS then
PREDICTOR(state)
elseif INCOMPLETE?(state) and NEXT-CAT(state) is a PS then
SCANNER(state)
else
COMPLETER(state)
end
end
return(chart)
Lecture 1, 7/21/2005
Natural Language Processing
56
Predictor and Scanner
procedure PREDICTOR((A    B , [i,j],dagA))
for each (B  ) in GRAMMAR-RULES-FOR(B,grammar) do
ENQUEUE((B   , [i,j],dagB), chart[j])
end
procedure SCANNER((A    B , [i,j],dagA))
if (B  PARTS-OF-SPEECH(word[j]) then
ENQUEUE((B  word[j]  , [j,j+1],dagB), chart[j+1])
end
Lecture 1, 7/21/2005
Natural Language Processing
57
Completer and UnifyStates
procedure COMPLETER((B    , [j,k],dagB))
for each (A    B , [i,j],dagA) in chart[j] do
if newdag  UNIFY-STATES(dagB,dagA,B)  fails then
ENQUEUE((A   B  , [i,k],newdag), chart[k])
end
procedure UNIFY-STATES(dag1,dag2,cat)
dag1cp  CopyDag(dag1);
dag2cp  CopyDag(dag2);
UNIFY(FollowPath(cat,dag1cp),FollowPath(cat,dag2cp));
end
Lecture 1, 7/21/2005
Natural Language Processing
58
Enqueue
procedure ENQUEUE(state,chart-entry)
if state is not subsumed by a state in chart-entry then
Add state at the end of chart-entry
end
Lecture 1, 7/21/2005
Natural Language Processing
59
Probabilistic Parsing
Slides by Markus Dickinson, Georgetown
University
Lecture 1, 7/21/2005
Natural Language Processing
60
Motivation and Outline
 Previously,
we used CFGs to parse with, but:
 Some ambiguous sentences could not be
disambiguated, and we would like to know the most
likely parse
 How do we get such grammars? Do we write them
ourselves? Maybe we could use a corpus …
 Where
we’re going:
 Probabilistic Context-Free Grammars (PCFGs)
 Lexicalized PCFGs
 Dependency Grammars
Lecture 1, 7/21/2005
Natural Language Processing
61
Statistical Parsing
 Basic
idea
 Start with a treebank
a
collection of sentences with syntactic annotation, i.e.,
already-parsed sentences
 Examine
which parse trees occur frequently
 Extract grammar rules corresponding to those parse
trees, estimating the probability of the grammar rule
based on its frequency
 That is, we’ll have a CFG augmented with probabilities
Lecture 1, 7/21/2005
Natural Language Processing
62
Using Probabilities to Parse

P(T): probability of a particular
parse tree

P(T) = ΠnєT p(r(n))
i.e., the product of the
probabilities of all the rules r
used to expand each node n in
the parse tree
Example: given the probabilities on
p. 449, compute the probability of
the parse tree on the right
Lecture 1, 7/21/2005
Natural Language Processing
63
Computing probabilities
 We
have the following rules and probabilities (adapted
from Figure 12.1):
 S  VP
.05
 VP  V NP
.40
 NP  Det N
.20
 V  book
.30
 Det  that
.05
 N  flight
.25
 P(T) = P(SVP)*P(VPV NP)*…*P(Nflight)
= .05*.40*.20*.30*.05*.25 = .000015, or 1.5 x 10-5
Lecture 1, 7/21/2005
Natural Language Processing
64
Using probabilities
 So,
the probability for that parse is 0.000015. What’s
the big deal?
 Probabilities are useful for comparing with other
probabilities
 Whereas we couldn’t decide between two parses using
a regular CFG, we now can.
 For example, TWA flights is ambiguous between being
two separate NPs (cf. I gave [NP John] [NP money]) or
one NP:
 A: [book [TWA] [flights]]
 B: [book [TWA flights]]
 Probabilities
allows Natural
us to
choose
Lecture
1, 7/21/2005
Language
Processingchoice B (see figure
65
12.2)
Obtaining the best parse
 Call
the best parse T(S), where S is your sentence
 Get the tree which has the highest probability, i.e.
 T(S) = argmaxTєparse-trees(S) P(T)
 Can use the Cocke-Younger-Kasami (CYK) algorithm to
calculate best parse
 CYK is a form of dynamic programming
 CYK is a chart parser, like the Earley parser
Lecture 1, 7/21/2005
Natural Language Processing
66
The CYK algorithm
 Base
case
 Add words to the chart
 Store P(A  w_i) for every category A in the chart
 Recursive case  makes this dynamic programming
because we only calculate B and C once
 Rules must be of the form A  BC, i.e., exactly two
items on the RHS (we call this Chomsky Normal
Form (CNF))
 Get the probability for A at this node by multiplying
the probabilities for B and for C by P(A  BC)
 P(B)*P(C)*P(A
 For
 BC)
a given A, only keep
the Processing
maximum probability
Natural Language
(again, this is dynamic programming)
Lecture 1, 7/21/2005
67
Problems with PCFGs
 It’s
still only a CFG, so dependencies on non-CFG info
not captured
 e.g., Pronouns are more likely to be subjects than
objects:
 P[(NPPronoun) | NP=subj] >> P[(NPPronoun) |
NP =obj]
 Ignores lexical information (statistics), which is usually
crucial for disambiguation
 (T1) America sent [[250,000 soldiers] [into Iraq]]
 (T2) America sent [250,000 soldiers] [into Iraq]
 send
Lecture 1, 7/21/2005
 To
with into-PP always attached high (T2) in PTB!
Natural Language Processing
handle lexical information, we’ll turn to lexicalized
68
Lexicalized Grammars
 The
head information is passed up in a syntactic
analysis?
 e.g., VP[head *1]  V[head *1] NP
 Well, if you follow this down all the way to the bottom of
a tree, you wind up with a head word
 In
some sense, we can say that Book that flight is not
just an S, but an S rooted in book
 Thus, book is the headword of the whole sentence
 By
adding headword information to nonterminals, we
Lecturewind
1, 7/21/2005
Natural Language
Processing
up with a lexicalized
grammar
69
Lexicalized PCFGs

Lexicalized Parse Trees
 Each PCFG rule in a tree
is augmented to identify
one RHS constituent to
be the head daughter
 The headword for a node
is set to the head word of
its head daughter
Lecture 1, 7/21/2005
Natural Language Processing
[book]
[book]
[flight]
[flight]
70
Incorporating Head Proabilities:
Wrong Way
adding headword w to node won’t work:
 So, the node A becomes A[w]
 e.g., P(A[w]β|A) =Count(A[w]β)/Count(A)
 Simply
 The
probabilities are too small, i.e., we don’t have a big
enough corpus to calculate these probabilities
 VP(dumped)  VBD(dumped) NP(sacks) PP(into)
3x10-10
 VP(dumped)  VBD(dumped) NP(cats) PP(into)
8x10-11
 These probabilities are tiny, and others will never occur
Lecture 1, 7/21/2005
Natural Language Processing
71
Incorporating head probabilities:
Right way
 Previously,
we conditioned on the mother node (A):
 P(Aβ|A)
 Now, we can condition on the mother node and the
headword of A (h(A)):
 P(Aβ|A, h(A))
 We’re no longer conditioning on simply the mother
category A, but on the mother category when h(A) is the
head
 e.g., P(VPVBD NP PP | VP, dumped)
Lecture 1, 7/21/2005
Natural Language Processing
72
Calculating rule probabilities
 We’ll
write the probability more generally as:
 P(r(n) | n, h(n))
 where n = node, r = rule, and h = headword
 We calculate this by comparing how many times the
rule occurs with h(n) as the headword versus how many
times the mother/headword combination appear in total:
P(VP  VBD NP PP | VP, dumped)
= C(VP(dumped)  VBD NP PP)/ Σβ C(VP(dumped)
 β)
Lecture 1, 7/21/2005
Natural Language Processing
73
Adding info about word-word
dependencies
 We
want to take into account one other factor: the
probability of being a head word (in a given context)
 P(h(n)=word | …)
 We condition this probability on two things: 1. the
category of the node (n), and 2. the headword of the
mother (h(m(n)))
 P(h(n)=word | n, h(m(n))), shortened as: P(h(n) | n,
h(m(n)))
 P(sacks | NP, dumped)
 What we’re really doing is factoring in how words relate
to each other
 We
will call this a dependency
relation later: sacks is 74
Lecture
1, 7/21/2005
Natural Language Processing
dependent on dumped, in this case
Putting it all together
 See
p. 459 for an example lexicalized parse tree for
workers dumped sacks into a bin
 For
rules r, category n, head h, mother m
P(T) =
ΠnєT
p(r(n)| n, h(n))
e.g., P(VP VBD NP PP |VP, dumped)
subcategorization info
*
p(h(n) | n, h(m(n)))
e.g. P(sacks | NP, dumped)
dependency info between words
Lecture 1, 7/21/2005
Natural Language Processing
75
Dependency Grammar
 Capturing
relations between words (e.g. dumped and
sacks) is moving in the direction of dependency
grammar (DG)
 In
DG, there is no such thing as constituency
 The structure of a sentence is purely the binary
relations between words
 John loves Mary is represented as:
 LOVE  JOHN
 LOVE  MARY
 where A  B means that B depends on A
Lecture 1, 7/21/2005
Natural Language Processing
76
Evaluating Parser Output
 Dependency
relations are also useful for comparing
parser output to a treebank
 Traditional measures of parser accuracy:
 Labeled bracketing precision:
# correct constituents in parse/# constituents in parse
 Labeled
bracketing recall:
# correct constituents in parse/# (correct) constituents in
treebank parse
 There
are known problems with these measures, so
people are trying to use dependency-based measures
instead
 How many dependency relations did the parse get
Lecture 1, 7/21/2005
Natural Language Processing
correct?
77

Parsing with Features

Transcript Parsing with Features

Directory