Transcript a flight

Chapter 9: Parsing with
Context-Free Grammars
Heshaam Faili
[email protected]
University of Tehran
Context-Free Grammars

Context-Free Grammars are of the form:


Note that the regular grammars are a proper subset
of the context-free grammars.


A  , where  is a string of terminals and/or non-terminals
This means that every regular grammar is context-free, but
there are context-free grammars that aren’t regular
CFGs only specify what trees look like, not how they
should be computationally derived  We need to
parse the sentences
2
Parsing Intro


Input: a string
Output: a (single) parse tree



Strategies




A useful step in the process of obtaining meaning
We can view the problem as searching through all possible
parses (tree structures) to find the right one
Top-Down (goal-directed) vs. Bottom-Up (data-directed)
Breadth-First vs. Depth-First
Adding Bottom-Up to Top-Down: Left-Corner Parsing
Example

Book that flight!
3
Grammar and Desired Tree
4
Top-Down Parsing

Expand rules, starting with S and working
down to leaves



Replace the left-most non-terminal with each of its
possible expansions.
The full search is on p. 361, Fig. 10.3
While we guarantee that any parse in
progress will be S-rooted, it will expand nonterminals that can’t lead to the existing input


e.g., 5 of 6 trees in third ply = level of the search
space
None of the trees take the properties of the lexical
5
items into account until the last stage
Top-down (breadth-first)
parsing
S
S
NP
VP
S
NP
S
S
VP
Det Nom
NP
PropN
VP
Aux NP
S
VP
VP
S
S
S
S
Aux NP VP
Aux NP VP
VP
VP
Det Nom
PropN
V
NP
V
6
Expansion techniques

Breadth-First Expansion (shown in figure)



All the nodes at each level are expanded once before
going to the next (lower) level.
This is memory intensive when many grammar rules
are involved
Depth-First (shown on p. 367, Fig. 10.7)


Expand a particular node at a level, only considering
an alternate node at that level if the parser fails as a
result of the earlier expansion
i.e., expand the tree all the way down until you can’t
expand any more
7
Top-down (depth-first) parsing
S
S
S
S
NP
NP
VP
Det
Aux
VP
Does
NP VP
Nom
Does
FAIL
Does this flight include a meal ?
S
Aux NP
VP
Does
Det Nom
this
8
Top-Down Depth-First Parsing

There are still some choices that have to be made:

1. Which leaf node should be expanded first?


Left-to-right strategy moves through the leaf nodes in a leftto-right fashion
2. Which rule should be applied first?


There are multiple NP rules; which should be used first?
Can just use the textual order of rules from the grammar

There may be reasons to take rules in a particular order (e.g.,
probabilities)
9
Parsing with an agenda

Search states are kept in an agenda



Based on what we’ve seen before, apply the next
item on the agenda to the current tree
Add new items to (the front of) the agenda, based on
the rules in the grammar which can expand at the
(leftmost) node



Search states consist of partial trees and a pointer to the
next input word in the sentence
We maintain the depth-first strategy by adding new
hypotheses (rules) to the front of the agenda
If we added them to the back, we would have a breadthfirst strategy
See figure 10.6 pg. 366E
10
Bottom-Up Parsing


Bottom-Up Parsing is input-driven  start from the
words and move up to form a tree
Here we match one or more nodes on the upper
fringe of the parse tree against the right-hand side of
a CFG rule, building the left-hand side as a parent
node of those nodes.



We can also have breadth-first and depth-first approaches
The example on the next slide (p. 362, Fig. 10.4) moves in a
breadth-first fashion
While any parse in progress will be tied to the input,
many may not lead to an S!

e.g., left-most trees in plies 1-4 of Fig 10.4
11
Bottom-up parsing
Book
that
flight
Noun
Det
Noun
Verb
Det
Noun
Book
that
flight
Book
that
flight
NOM
NOM
NOM
Noun
Det
Noun
Verb
Det
Noun
Book
that
flight
Book
that
flight
NP
NOM
NP
NOM
VP
NOM
Noun
Det
Noun
Verb
Det
Noun
Book
that
flight
Book
that
flight
VP
NOM
VP
NP
Verb
Det
Noun
Book
that
flight
NP
NOM
NOM
Verb
Det
Noun
Verb
Det
Noun
Book
that
flight
Book
that
flight
12
Comparing Top-Down and
Bottom-Up Parsing

Top-Down:


Bottom-Up:


While we guarantee that any parse in progress will
be S-rooted, it will expand non-terminals that
can’t lead to the existing input, e.g., first 4 trees in
third ply.
While any parse in progress will be tied to the
input, many may not lead to an S, e.g., left-most
trees in plies 1-4 of p. 362, Fig 10.4.
So, both pure top-down and pure bottom up
approaches are highly inefficient.
13
Left-Corner Parsing

Motivation:



Left-corner parsing: a way of using bottom-up
constraints as part of a top-down strategy.



Both pure top-down and bottom-up approaches are
inefficient
The correct top-down parse has to be consistent with the
left-most word of the input
Left-corner rule: expand a node with a grammar rule only if
the current input can serve as the left corner from this rule.
Left-corner from a rule: first word along the left edge of a
derivation from the rule
Put the left-corners into a table, which can then
guide parsing
14
Left-Corner Example
S NP VP
S VP
S Aux NP VP
NP Det Nominal | ProperNoun
Nominal  Noun Nominal | Noun
VP Verb | Verb NP
Noun  book | flight | meal | money
Verb  book | include | prefer
Aux  does
ProperNoun  Houston | TWA
Left Corners
S => NP …=> Det,
ProperNoun
VP => Verb
Aux … => Aux
NP => Det, ProperNoun
VP => Verb
Nominal => Noun
15
Other problems: LeftRecursion


Left-corner parsers still guided by top-down parsing
Consider rules like:
S  S and S
NP  NP PP
 A top-down left-to-right depth-first parser could apply a rule
to expand a node (e.g., S), and then apply that same rule
again, and again, ad infinitum.


Left Recursion: A grammar is left-recursive if a non-
terminal leads to a derivation that includes itself as
its leftmost immediate or non-immediate child (i.e.,
along its leftmost branch).
PROBLEM: Top-Down parsers may not terminate on a
16
left-recursive grammar
Other problems: Repeated
Parsing of Subtrees

When parser backtracks to an alternative expansion
of a non-terminal, it loses all parses of
subconstituents that it built.



There is a good chance that it will rebuild the parses of
some of those constituents again.
This can occur repeatedly.
a flight from Indianopolis to Houston on TWA

NP  Det Nom


Will build an NP for a flight, before failing when the parser
realizes the input PPs aren’t covered
NP  NP PP

Will again build an NP for a flight, before failing when the
parser realizes the two remaining PPs in the input aren’t
covered
17
Other problems: Ambiguity

Repeated parsing of subtrees is even more of a
problem for ambiguous sentences

PP attachment:



Coordination:




NP or VP: I shot an elephant in my pajamas.
NP bracketing: the meal [on flight 286] [from SF] [to Denver]
[old men] and women vs. old [men and women]
3 kinds of ambiguities: attachment, coordination,
noun-phrase bracketing.
Parsers have to disambiguate between lots of valid
parses or return all parses
Will repeat a lot of work parsing the commonalities
of each ambiguity
18
Ambiguity
19
Addressing the problems:
Chart Parsing


More or less a standard method for carrying out
parsing; keeps tables of constituents that have been
parsed earlier, so it doesn’t reduplicate the work.
Each possible sub-tree is represented exactly once.



This makes it a form of dynamic programming (which we
saw with minimum edit distance and the Viterbi algorithm)
Combines bottom-up and top-down parsing
Rather simple and elegant in the way it works!
20
Earley Chart Parsing
Representation


The parser uses a representation for parse state based
on dotted rules. S  NP  VP
Dotted rules distinguish what has been seen so far from
what has not been seen (i.e., the remainder).


The constituents seen so far are to the left of the dot in the rule,
the remainder is to the right.
Parse information is stored in a chart, represented as a
graph.
The nodes represent word positions.
 The labels represent the portion (using the dot notation) of the
grammar rule that spans that word position.
 In other words, at each position, there is a set of labels (each of
which is a dotted rule, also called a state), indicating the partial
21
parse tree produced until then.

Example: Chart for A Dog

Given a trivial grammar
NP  D N
Da
N  dog

Here’s the chart for the complete parse of “a
dog”
[0] D  a [1]
[1] N dog [2]
[ 0] NP  D N [0]
[0] NP  D  N [1]
[0] NP  D N  [2]
(scan)
(scan)
(predict)
(complete)
(complete)
22
More Chart Parsing Terminology


A state is complete if it has a dot at the right-hand
side of its rule. Otherwise, it is incomplete.
At each position, there is a list (actually, a queue)
of states.



The parser moves through the N+1 sets of states in the
chart left-to-right, processing the states in each set in order.
States will be stored in a FIFO (first-in first-out) queue
at each start position
The processing applies one of three operators, each
of which takes a state and produces new states
added to the chart.


Scanner, Predictor, Completer
There is no backtracking.
23
Earley Parsing Algorithm


The parsing algorithm is just a few lines long,
as can be seen on p. 381, Figure 10.16
In the top level loop, for each position, for
each state, it calls the predictor, or else the
scanner, or else the completer.


The algorithm never backtracks and never
removes states, so we don’t redo any work
The goal is to have S  α• as the last chart entry,
i.e. the dot has moved over the entire input to
derive an S
24
The Earley Algorithm
25
The 3 Operators: Predictor,
Scanner, Completer
Procedure PREDICTOR((AB, [i, j]))
For each (B) in grammar do
Enqueue((B  , [j, j]), chart[j])
End
Procedure SCANNER ((AB, [i, j]))
If B is a part-of-speech for word[j] then
Enqueue((B  word[j], [j, j+1]), chart[j+1])
Procedure COMPLETER((B, [j, k]))
For each (AB, [i, j]) in chart[j] do
Enqueue((A B, [i, k]), chart[k])
End
26
Prediction
Procedure PREDICTOR((AB, [i, j]))
For each (B) in grammar do
Enqueue((B  , [j, j]), chart[j])
End
 Predicting is the task of saying we kinds of
input we expect to see



Add a rule to the chart saying that we have not
seen , but when we do, it will form a B
The rule covers no input, so it goes from j to j
Such rules provide the top-down aspect of
the algorithm
27
Scanning
Procedure SCANNER ((AB, [i, j]))
If B is a part-of-speech for word[j] then
Enqueue((B  word[j], [j, j+1]), chart[j+1])
Scanning reads in lexical items




We add a dotted rule indicating that a word has
been seen between j and j+1
This is then added to the following (j+1) chart
Such a completed dotted rule can be used to
complete other dotted rules
These rules also show how the Earley parser
28
has a bottom-up component
Completion
Procedure COMPLETER((B, [j, k]))
For each (AB, [i, j]) in chart[j] do
Enqueue((A B, [i, k]), chart[k])
End
 Completion combines two rules in order to move the
dot, i.e., indicate that something has been seen



A rule covering B has been seen, so any rule A which refers
to B in its RHS moves the dot
Instead of spanning from i to j, A now spans from i to k,
which is where B ended
Once the dot is moved, the rule will not be created
again
29
Example (Book that flight)
30
Example(Book that flight)
31
Example(Book that flight), cont
32
Example(Book that flight), cont
33
Example(Book that flight), cont
34
Earley parsing


The Earley algorithm is efficient, running in
polynomial time.
Technically, however, it is a recognizer, not a
parser


To make it a parser, each state needs to be
augmented with a pointer to the states that its
rule covers
For example, a VP would point to the state where
its V was completed and the state where its NP
was completed
35
Other Dynamic Programming methods

CYK (Cocke-Kasami-Younger) Parser


Using CNF grammar rules
Chart Parsing

Modified version of Earley parsing with
dynamic ordering of states in the algorithm
36
CYK Parsing

The DP method by using CNF grammar



Any CFG can be converted to CNF,



So, don’t loss anything …
AB : unit productions (can be rewrited by A for any
A)
Like other DP methods, a simple (n+1)*(n+1) matrix
used to encode the structure of the sentence (n:
sentence length)



ABC
Am
Indexed is the gap between words
[0 Book 1 that 2 flight 3 ]
[i,j] : is a set of non-terminals that represent all the
constituents that span positions i through j of the
37
input
CYK Parsing, cont,d


Since our grammar is in CNF, the nonterminal entries in the table have exactly two
daughters in the parse.
for each constituent represented by an entry
[i, j] in the table there must be a position in
the input, k, where it can be split into two
parts such that i < k < j.

Given such a k, the first constituent [i,k] must lie
to the left of entry [i, j] somewhere along row i,
and the second entry [k, j] must lie beneath it,
along column j
38
CYK Algorithm
39
40
CYK example
(CNF Grammar)
41
CYK example
(Book the flight through Houston)
42
CYK in practice




Does not have major problem theoretically
The resulted parse tree are not consistent to
syntacticians…(because of CNF formal)
Syntax to Semantic approach complicated …
Post-processing needed to return-back the
result to more acceptable form
43
Chart Parser




In both the CKY and Earley algorithms, the order in
which events occur (adding entries to the table,
reading words, making predictions, etc.) is statically
determined by the procedures that make up these
algorithms.
Unfortunately, dynamically determining the order in
which events occur based on the current information
is often necessary
Chart Parsing facilitates just such dynamic
determination of the order in which chart entries are
processed.
Using Agenda
44
Chart Parser

fundamental rule: generalized the ideas in
CYK and Earley:


if the chart contains two edges A → α • B β , [i, j]
and B → γ •, [ j,k] then we should add the new
edge A →α B • β [i,k] to the chart
Prediction can be top-down of botton-up
45
46
Prediction in Chart Parser
47
Inadequacies of parsing with
plain CFGs


While the Earley algorithm works well
for CFGs, we have to at some point
question the validity of using plain CFGs
We’ll show this by looking at two
phenomena (although, there are many
more):


Subject-verb agreement
Subcategorization frames
48
Modeling Subject-Verb
Agreement in CFGs
The flights leave vs. The flight leaves.
S  3sgNP 3sgVP
S  PluralNP PluralVP
3sgVP 3sgVerb # flies
| 3sgVerb NP # wants + a flight
| 3sgVerb NP PP
# leaves + Boston + in the morning
| 3sgVerb PP # leaves + on Thursday
3sgNP Pronoun # I
| ProperNoun # Denver
| Det 3sgNominal
# a + flight
3sgNominal  Noun 3sgNominal
# morning + flight
| 3sgNoun
#flight
49
Problems with Modeling
Agreement in CFGs

You can see how messy this is, resulting in a massive
increase in the size of the grammar.




Of course, once we add in determiner-noun agreement (e.g.,
“a flight” vs. “(the) flights”), it would get even larger.
Other languages which have gender agreement (e.g.,
French) will make it even worse.
Furthermore, we miss generalizations: all transitive
verbs have an NP object, regardless of whether the
verb is 3rd singular or not
We will need to go to feature-based grammars to
address these problems.
50
Subcategorization Frames in
CFGs


V1.  eat, sleep I want to eat
V2. NP prefer, find, leave Find [NP the flight from Pittsburgh to
Boston]

V3. NP NP show, give, find Show [NP me] [NP the airlines with flights
from Pittsburgh]

V4. PPfrom PPto fly, travel I would like to fly [PP from Boston] [PP to
Philadelphia]

V5. NP PPwith help, load Can you help [NP me] [PP with a flight]

V6. VPto
Airlines]


prefer, want, need
I would prefer [VP to go by United
V7. VPbare_stem can, would, might
I can [VP go from Boston]
V8. V_S mean, imply Does this mean [S American has a hub in
Boston]
51
CFG Grammar For
Subcategorization


VP  V1
| V2 NP
| V3 NP NP
| V4 PPfrom PPto
| V5 NP PPwith
|V6 VPto
|V7 VPbare_stem
|V8 S
V1 eat | sleep,
52
Problem with Modeling Subcat
in CFGs

Again, this results in an explosion in the
number of rules, especially when a full set of
subcategorization frames is included.




If we combine these rules with the agreement
rules, it gets even worse
Also, nouns, adjectives, and prepositions can also
subcategorize for complements.
And again, we have no way to state what’s in
common about these rules
So, we turn to feature-based grammars
53