Transcript Lec 6

LIN6932
Topics in Computational Linguistics
Lecture 6: Grammar and Parsing (I)
February 15, 2007
Hana Filip
LIN6932 Spring 2007
1
Outline for Grammar/Parsing
Context-Free Grammars (CFG) and Constituency
Some common CFG phenomena for English
Sentence-level constructions
NP, PP, VP
Coordination
Subcategorization
Levels of complexity in grammars and automata:
The Chomsky hierarchy
Top-down and Bottom-up Parsing
Earley Parsing
Quick sketch of probabilistic parsing
LIN6932 Spring 2007
2
Review
Parts of Speech
Basic syntactic/morphological categories that words
belong to
Part of Speech tagging
Assigning parts of speech to all the words in a
sentence
LIN6932 Spring 2007
3
Syntax
Syntax: from Greek syntaxis, “setting out together,
arrangement’
Refers to the way words are arranged together, and
their relationships
Distinction:
Prescriptive grammar: how people ought to talk
Descriptive grammar: how they do talk
Goal of syntax is to model the knowledge that people
unconsciously have about the grammar of their native
language
LIN6932 Spring 2007
4
Syntax
Why should you care?
Grammar checkers
Question answering
Information extraction
Machine translation
LIN6932 Spring 2007
5
4 key ideas of syntax
Constituency
Grammatical relations
Subcategorization
Lexical dependencies
Plus one part we won’t have time for:
Movement/long-distance dependency
LIN6932 Spring 2007
6
Context-Free Grammars
Capture constituency and ordering
Ordering:
What are the rules that govern the ordering of words
and bigger units in the language?
Constituency:
How words group into units and how the various
kinds of units behave
LIN6932 Spring 2007
7
Constituency
Noun phrases (NPs)
–
–
–
–
–
–
Three parties from Brooklyn
A high-class spot such as Mindy’s
The Broadway coppers
They
Harry the Horse
The reason he comes into the Hot Box
How do we know these form a constituent?
They can all appear before a verb:
–
–
–
–
Three parties from Brooklyn arrive…
A high-class spot such as Mindy’s attracts…
The Broadway coppers love…
They sit…
LIN6932 Spring 2007
8
Constituency (II)
They can all appear before a verb:
–
–
–
–
Three parties from Brooklyn arrive…
A high-class spot such as Mindy’s attracts…
The Broadway coppers love…
They sit
But individual words can’t always appear before verbs:
–
–
–
–
*from arrive…
*as attracts…
*the is
*spot is…
Must be able to state generalizations like:
– Noun phrases occur before verbs
LIN6932 Spring 2007
9
Constituency (III)
Preposing and postposing:
On September 17th, I’d like to fly from Atlanta to Denver
I’d like to fly on September 17th from Atlanta to Denver
I’d like to fly from Atlanta to Denver on September 17th.
But not:
*On September, I’d like to fly 17th from Atlanta to Denver
*On I’d like to fly September 17th from Atlanta to Denver
LIN6932 Spring 2007
10
CFG Examples
S -> NP VP
NP -> Det NOMINAL
NOMINAL -> Noun
VP -> Verb
Det -> a
Noun -> flight
Verb -> left
LIN6932 Spring 2007
11
CFGs
S -> NP VP
This says that there are units called S, NP, and VP in
this language
That an S consists of an NP followed immediately by
a VP
Doesn’t say that that’s the only kind of S
Nor does it say that this is the only place that NPs
and VPs occur
LIN6932 Spring 2007
12
Generativity
As with FSAs you can view these rules as either
analysis or synthesis machines
Generate strings in the language
Reject strings not in the language
Impose structures (trees) on strings in the language
LIN6932 Spring 2007
13
Derivations
A derivation is a sequence of rules applied to a string
that accounts for that string
Covers all the elements in the string
Covers only the elements in the string
LIN6932 Spring 2007
14
Derivations as Trees
LIN6932 Spring 2007
15
CFGs more formally
A context-free grammar has 4 parameters (“is a 4tuple”)
1) A set of non-terminal symbols (“variables”) N
2) A set of terminal symbols  (disjoint from N)
3) A set of productions P, each of the form
–
–
A -> 
Where A is a non-terminal and  is a string of symbols
from the infinite set of strings (  N)*
4) A designated start symbol S
LIN6932 Spring 2007
16
Defining a CF language via
derivation
A string A derives a string B if
A can be rewritten as B via some series of rule applications
More formally:
If A -> is a production of P
 and  are any strings in the set (  N)*
Then we say that
–
–
A directly derives 
Or A  
Derivation is a generalization of direct derivation
Let 1, 2, … m be strings in (  N)*, m>= 1, s.t.
–
–
1 2, 2 3… m-1 m
We say that 1derives m or 1* m
We then formally define language LG generated by grammar G
–
–
As set of strings composed of terminal symbols derived from S
LG = {w | w is in * and S * w}
LIN6932 Spring 2007
17
Parsing
Parsing is the process of taking a string and a
grammar and assigning correct trees to input strings
Correct tree: a tree that covers all and only the
elements of the input and has an S node at the top
LIN6932 Spring 2007
18
Context?
The notion of context in CFGs has nothing to do with the
ordinary meaning of the word context in language.
All it means is that the non-terminal on the left-hand side of a
rule is out there all by itself (free of context)
A -> B C
Means that I can rewrite an A as a B followed by a C
regardless of the context in which A is found
LIN6932 Spring 2007
19
Key Constituents (English)
Sentences
Noun phrases
Verb phrases
Prepositional phrases
LIN6932 Spring 2007
20
Sentence-Types
Declaratives: A plane left
S -> NP VP
Imperatives: Leave!
S -> VP
Yes-No Questions: Did the plane leave?
S -> Aux NP VP
WH Questions: When did the plane leave?
S -> WH Aux NP VP
LIN6932 Spring 2007
21
NPs
NP -> Pronoun
I came, you saw it, they conquered
NP -> Proper-Noun
Los Angeles is west of Texas
NP -> Det Noun
The president
NP -> Nominal
Nominal -> Noun Noun
A morning flight to Denver
LIN6932 Spring 2007
22
PPs
PP -> Preposition NP
From LA
To Boston
On Tuesday
With lunch
LIN6932 Spring 2007
23
Recursion
We’ll have to deal with rules such as the following
where the non-terminal on the left also appears
somewhere on the right (directly).
NP -> NP PP
VP -> VP PP
[[The flight] [to Boston]]
[[departed Miami] [at noon]]
LIN6932 Spring 2007
24
Recursion
Of course, this is what makes syntax interesting
flights from Denver
Flights from Denver
Flights from Denver
Flights from Denver
Flights from Denver
Flights from Denver
Etc.
to
to
to
to
to
Miami
Miami
Miami
Miami
Miami
in
in
in
in
February
February on a Friday
February on a Friday under $300
February on a Friday under $300 with lunch
LIN6932 Spring 2007
25
Recursion
Of course, this is what makes syntax interesting
[[flights] [from Denver]]
[[[Flights] [from Denver]] [to Miami]]
[[[[Flights] [from Denver]] [to Miami]] [in February]]
[[[[[Flights] [from Denver]] [to Miami]] [in February]] [on a
Friday]]
Etc.
LIN6932 Spring 2007
26
Implications of recursion and
context-freeness
If you have a rule like
VP -> V NP
It only cares that the thing after the verb is an NP. It
doesn’t have to know about the internal affairs of
that NP
LIN6932 Spring 2007
27
The Point
VP -> V NP
I hate
flights from Denver
Flights from Denver
Flights from Denver
Flights from Denver
Flights from Denver
Flights from Denver
lunch
to
to
to
to
to
Miami
Miami
Miami
Miami
Miami
in
in
in
in
February
February on a Friday
February on a Friday under $300
February on a Friday under $300 with
LIN6932 Spring 2007
28
Bracketed Notation
[S [NP [PRO I] [VP [V prefer [NP [NP [Det a] [Nom [N
morning] [N flight]]]]
LIN6932 Spring 2007
29
Coordination Constructions
S -> S and S
John went to NY and Mary followed him
NP -> NP and NP
VP -> VP and VP
…
In fact the right rule for English is
X -> X and X
LIN6932 Spring 2007
30
Other Syntactic stuff
Grammatical Relations
Subject
– I booked a flight to New York
– The flight was booked by my agent.
Object
– I booked a flight to New York
Complement
– I said that I wanted to leave
LIN6932 Spring 2007
31
Problems
Agreement
Subcategorization
Movement (for want of a better term)
LIN6932 Spring 2007
32
Agreement
This dog
Those dogs
*This dogs
*Those dog
This dog eats
Those dogs eat
*This dog eat
*Those dogs eats
LIN6932 Spring 2007
33
Possible CFG Solution
S -> NP VP
NP -> Det Nominal
VP -> V NP
…
SgS -> SgNP SgVP
PlS -> PlNp PlVP
SgNP -> SgDet SgNom
PlNP -> PlDet PlNom
PlVP -> PlV NP
SgVP ->SgV Np
…
LIN6932 Spring 2007
34
CFG Solution for Agreement
It works and stays within the power of CFGs
But its ugly (loss of generalization)
And it doesn’t scale all that well
LIN6932 Spring 2007
35
Subcategorization
Verbs have preferences for the number and syntactic kinds
of constituents they co-occur with
Sneeze: John sneezed
Find: Please find [a flight to NY]NP
Give: Give [me]NP[a cheaper fare]NP
Help: Can you help [me]NP[with a flight]PP
Prefer: I prefer [to leave earlier]TO-VP
Said: You said [United has a flight]S
…
intransitive verb
transitive verb
ditransitive verb
DO IO/Oblique
VP complement
sentential complement
LIN6932 Spring 2007
36
Subcategorization
But not:
*John sneezed the book
*I prefer United has a flight
*Give with a flight
Subcategorization expresses the constraints that a lexical
predicate places on the number and syntactic kinds of
arguments it wants to take (occur with)
So far we have only considerate one type of lexical predicate:
verb
Nouns, adjectives and prepositions also take arguments
LIN6932 Spring 2007
37
So?
So the various CFG rules for VPs overgenerate.
They permit the presence of strings containing verbs and
arguments that don’t go together
For example
VP -> V NP
predicts that
Sneezed the book is a VP since “sneeze” is a verb and
“the book” is a valid NP
LIN6932 Spring 2007
38
Forward Pointer
It turns out that verb subcategorization facts will
provide a key element for semantic analysis
(determining who did what to who in an event).
LIN6932 Spring 2007
39
Possible CFG Solution
VP -> V
VP -> V NP
VP -> V NP PP
…
VP -> IntransV
VP -> TransV NP
VP -> TransVwPP NP PP
…
LIN6932 Spring 2007
40
Movement
Core example
My travel agent booked the flight
LIN6932 Spring 2007
41
Movement
Core example
[[My travel agent]NP [booked [the flight]NP]VP]S
I.e. “book” is a straightforward transitive verb. It expects
a single NP arg as the subject and a single NP arg within
the VP
LIN6932 Spring 2007
42
Movement - Long Distance
Dependencies
What about?
Which flight do you want me to have the travel agent
book __?
The direct object argument to “book” isn’t appearing
in the right place. It is a long way from where it
originally appeared
And note that it is separated from its verb by 2 other
verbs.
LIN6932 Spring 2007
43
CFGs: a summary
CFGs appear to be just about what we need to account for a lot
of basic syntactic structure in English
But there are problems
That can be dealt with adequately, although not elegantly,
by staying within the CFG framework.
There are simpler, more elegant, solutions that take us out of the
CFG framework (beyond its formal power)
Syntactic theories: HPSG, LFG, CCG, Minimalism, etc
LIN6932 Spring 2007
44