Transcript ppt

CMSC 723 / LING 645: Intro to
Computational Linguistics
November 10, 2004
Lecture 10 (Dorr):
CFG’s (Finish Chapter 9)
Parsing (Chapter 10)
Prof. Bonnie J. Dorr
Dr. Christof Monz
TA: Adam Lee
What Issues Arise in Creating
Grammars?
Agreement
Subcategorization
Movement
Agreement
 Simple Examples:
–
–
–
–
This dog
Those dogs
*Those dog
*This dogs
 More complicated examples:
–
–
–
–
–
Do any flights stop in Chicago?
Does Delta fly from Atlanta to Boston?
Do I get dinner on this flight?
What flights leave in the morning?
* What flight leave in the morning?
Agreement
 Nouns: Nominal → Noun | Noun Noun
– NominalSg → NounSg | NounSg NounSg
– NominalPl → NounPl | NounPl NounPl
 Subj-Verb Agreement: S → Aux NP VP
– S → Aux3sg NP3sg VP
– S →Auxnon3sg NPnon3sg VP
 Lexical items:
–
–
–
–
NounSg → flight
NounPl → flights
Aux3sg → does | has | can …
Auxnon3sg → do | have | can …
Agreement Features
Propagate Downard …
 NP3sg → (Det) (Card) (Ord) (Quant) (AP) NominalSg
 NPnon3sg → (Det) (Card) (Ord) (Quant) (AP) NominalPl
What’s wrong with this picture?
Combinatorial explosion
Other feature-based expansions?
Loss of Generalization
Subcategorization
Verbs have preferences for the kinds of
constituents they co-occur with
For example:
–
–
–
–
VP → Verb (disappear)
VP → Verb NP (prefer a morning flight)
VP → Verb NP PP (leave Boston in the morning)
VP → Verb PP (leaving on Thursday)
But not: *I disappeared the cat.
Sentential complements
Example:
– You [VP [V said [S there were two flights that
were the cheapest]]]
– You [VP [ V said [S you had a two hundred sixty
six dollar fare]]]
– [VP [ V Tell][NP me ] [S how to get from the
airport in Philadelphia to downtown]]
Rule:
– VP → Verb S
Other VP Constituents
A VP can be the constituent of another VP:
VP → Verb VP
– I want [VP to fly from Milwaukee to Orlando]
– I’m trying [VP to find a flight that goes from
Pittsburgh to Denver next Friday]
Verbs can also be followed by particles to
form a phrasal verb: VP → Verb Particle
– take off
Why do we need
Subcategorization?
 Important observation: while a verb phrase can
have many possible kinds of constituents, not
every verb is compatible with every verb phrase
 Example: verb want can be used
– With a NP complement: I want a flight
– With an infinitive VP complement: I want to fly to
Dallas
 But verb find cannot take such a VP complement
– *I find to fly to Dallas
Subcategorization
Frame
Ø
NP
NP NP
PPfrom PPto
NP PPwith
VPto
VPbrst
S
Verb
eat, sleep, …
prefer, find, leave, ...
show, give, …
fly, travel, …
help, load, …
prefer, want, need, …
can, would, might, …
mean
Example
I want to eat
Find [NP the flight from Pittsburgh to Boston]
Show [NP me] [NP airlines with flights from Pittsburgh]
I would like to fly [pp from Boston] [pp to Philadelphia]
Can you help [NP me] [pp with a flight]
I would prefer [VPto to go by United airlines]
I can [VPbrst go from Boston]
Does this mean [S AA has a hub in Boston]?
Auxiliaries
 Examples:
–
–
–
–
Modals: can, could, may, might
Perfect: have
Progressive: be
Passive: be
 What are their subcategories?
 Ordering: modal < perfect < progressive <
passive
e.g, might have been prevented
Movement
I looked up his grade.
I looked his grade up.
John put the book on the table.
What did John put on the table?
Long distance dependencies.
Grammar Equivalence and
Normal Form
Weak equivalence
Strong equivalence
Chomsky Normal Form (CNF)
A → B C or A → a
CNF Example
Convert to Chomsky-Normal Form
(CNF):
S→aYX
X→aX|b
Y→Ya|b
Why do care about grammar?
We need grammars for parsing!
Note: Parsing is the topic we will cover next.
What Linguistic Representations
are necessary for parsing?
 Words
– Classes: groups of words which behave similarly
– Function/Content
– P.O.S.: Nouns, Verbs, Adjectives, Prepositions
 Constituents:
– Groupings of words into larger units which behavior
similarly and have a particular p.o.s. as their head
– Phrases: NP headed by Noun... VP, PP, ...
Analyzing Language in Terms of
these Representations
Morphological parsing:
– analyze words into morphemes and affixes
– rule-based, FSAs, FSTs
POS Tagging
Syntactic parsing:
– identify component parts and how related
– to see if a sentence is grammatical
– to assign a semantic structure
Syntactic Parsing
 Declarative formalisms like CFGs define the
legal strings of a language but don’t specify how
to recognize or assign structure to them
 Parsing algorithms specify how to recognize the
strings of a language and assign each string one
or more syntactic structures
 Parse trees useful for grammar checking,
semantic analysis, MT, QA, information
extraction, speech recognition…and almost every
task in NLP
CFG for Fragment of English:
G0
Parse Tree for
‘Book that flight’ using G0
Parsing as a Search Problem
 Searching FSAs
– Finding the right path through the automaton
– Search space defined by structure of FSA
 Searching CFGs
– Finding the right parse tree among all possible parse
trees
– Search space defined by the grammar
 Constraints provided by the input sentence and
the automaton or grammar
Two Search Strategies
Top-Down
– Search for tree starting from S until input words
covered.
Bottom-Up
– Start with words and build upwards toward S
Top-Down Parser
 Builds from the root S node to the leaves
 Assuming we build all trees in parallel:
–
–
–
–
Find all trees with root S
Next expand all constituents in these trees/rules
Continue until leaves are pos
Candidate trees failing to match pos of input string are
rejected
 Top-Down: Rationalist Tradition
– Expectation driven
– Goal: Build tree for input starting with S
Top-Down Search Space for G0
Bottom-Up Parsing
 Parser begins with words of input and builds up trees,
applying G0 rules whose right-hand sides match
 Book that flight
N
Det
N
V
Det
N
Book that
flight
Book that
flight
– ‘Book’ ambiguous
– Parse continues until an S root node reached or no further node
expansion possible
 Bottom-Up: Empiricist Tradition
– Data driven
– Primary consideration: Lowest sub-trees of final tree must hook
up with words in input.
Expanding Bottom-Up Search
Space for ‘Book that flight’
Comparing Top-Down
and Bottom-Up
 Top-Down parsers: never explore illegal parses
(e.g. can’t form an S) -- but waste time on trees
that can never match the input
 Bottom-Up parsers: never explore trees
inconsistent with input -- but waste time exploring
illegal parses (no S root)
 For both: how to explore the search space?
– Pursuing all parses in parallel or …?
– Which node to expand next?
– Which rule to apply next?
Search Strategy
and Search Control
 Parallel:
– Explore all possible trees in parallel
 Depth-first search:
– Agenda of search states: expand search space incrementally,
exploring most recently generated state (tree) each time
– When you reach a state (tree) inconsistent with input, backtrack
to most recent unexplored state (tree)
 Which node to expand?
– Leftmost
 Which grammar rule to use?
– Order in the grammar
Basic Algorithm for Top-Down,
Depth-First, Left-Right Strategy
 Initialize agenda with ‘S’ tree and point to first
word and make this current search state (cur)
 Loop until successful parse or empty agenda
– Apply all applicable grammar rules to leftmost
unexpanded node of cur
• If this node is a POS category and matches that of
the current input, push this onto agenda
• Else, push new trees onto agenda
– Pop new cur from agenda
Basic Top-Down Parser
Top-Down Depth-First
Derivation Using G0
Example: Does this flight include a meal?
Example continued …
Augmenting Top-Down Parsing
with Bottom-Up Filtering
 We saw: Top-Down, depth-first, L-to-R parsing
– Expands non-terminals along the tree’s left edge down
to leftmost leaf of tree
– Moves on to expand down to next leftmost leaf…
 In a successful parse, current input word will be
the first word in derivation of the unexpanded
node that the parser is currently processing
 So….lookahead to left-corner of the tree in
– B is a left-corner of A if A =*=> B
– Build table with left-corners of all non-terminals in
grammar and consult before applying rule
Left Corners
Left-Corner Table for G0
Category
S
NP
Nom
VP
Previous
Example:
Left Corners
Det, PropN, Aux, V
Det, PropN
N
V
Summing Up Parsing Strategies
 Parsing is a search problem which may be
implemented with many search strategies
 Top-Down vs. Bottom-Up Parsers
– Both generate too many useless trees
– Combine the two to avoid over-generation: TopDown Parsing with Bottom-Up look-ahead
 Left-corner table provides more efficient look-
ahead
– Pre-compute all POS that can serve as the leftmost
POS in the derivations of each non-terminal category
Three Critical Problems
in Parsing
Left Recursion
Ambiguity
Repeated Parsing of Sub-trees
Left Recursion
Depth-first search will never terminate if
grammar is left recursive: A→A B β
Examples:
NP → NP PP, VP → VP PP, S → S and S, NP → NP and NP

*
Indirect Left Recursion: A→A
Bβ
Example: NP → Det Nominal,
Det → NP ’s
NP
NP
NP
Det
Nominal
’s
Solutions to Left Recursion
Rule ordering
Don't use recursive rules
Limit depth of recursion in parsing to some
analytically or empirically set limit
Don't use top-down parsing
Rule Ordering
Bad:
– NP → NP PP
– NP → Det Nominal
Better alternative:
– First: NP → Det Nominal
– Then: NP → NP PP
Grammar Rewriting
 Rewrite left-recursive grammar as weakly
equivalent non-recursive one.
 Can be done:
– By Hand (ick) or …
– Automatically
 Example
Rewrite: NP → NP PP, NP → Det Nominal
As: NP → Det Nominal Stuff, Stuff → PP Stuff,
Stuff → ε
Problems with
Grammar Rewriting
• Original:
[NP [NP the book]
[PP on [NP [NP the table]
[PP in [NP [NP the yard]
[PP of [NP the house]]]]]]]
• Becomes:
[NP the book
[Stuff
[PP on [NP the table
[Stuff
[PP in [NP the yard
[Stuff
[PP of [NP the house [Stuff]]]
[Stuff]]]]
[Stuff]]]]
[Stuff]]]
Depth Bound
Set an arbitrary bound
Set an analytically derived
bound
Run tests and derive reasonable
bound empirically
Use iterative deepening
Ambiguity
Local Ambiguity
– Leads to hypotheses that are locally reasonable
but eventually lead nowhere
– “Book that flight”
Global Ambiguity
– Leads to multiple parses for the same input
– “I shot an elephant in my pajamas”
More Ambiguity Examples
Multiple legal structures
– Attachment (e.g. I saw a man on a hill with
telescope)
– Coordination (e.g. younger cats and dogs)
– NP bracketing (e.g. Spanish language
teachers)
Two Parse Trees for
Ambiguous Sentence
More Ambiguity: ‘Can you book
TWA flights?’
A Correct Parse for ‘Show me the meal
on Flight UA 386 from San Francisco
to Denver’
Inefficient Re-Parsing of Subtrees
Invariants
Despite ambiguity, there are invariants
Sub-components of final parse tree are re-
analyzed unnecessarily
Except for top-most component, every part
of final tree is derived more than once.
What’s the solution?
 Key to efficient processing is reuse
 Fill table with solutions to sub-problems for later
use.
 We want an algorithm that:
–
–
–
–
Does not do repeated work
Does top-down search with bottom-up filtering
Solves left-recursion problem
Solves an exponential problem
Dynamic Programming
 Create table of solutions to sub-problems (e.g.
subtrees) as parse proceeds
 Look up subtrees for each constituent rather than
re-parsing
 Since all parses implicitly stored, all available
for later disambiguation
 Examples: Cocke-Younger-Kasami (CYK)
(1960), Graham-Harrison-Ruzzo (GHR) (1980)
and Earley (1970) algorithms
Earley Algorithm
 Uses dynamic programming to do parallel topdown search in (worst case) O(N3) time
 First, left-to-right pass fills out a chart with N+1
states
– Think of chart entries as sitting between words in
the input string keeping track of states of the parse at
these positions
– For each word position, chart contains set of states
representing all partial parse trees generated to date.
E.g. chart[0] contains all partial parse trees
generated at the beginning of the sentence
Chart Entries
Represent three types of constituents:
predicted constituents
in-progress constituents
completed constituents
Progress in parse represented
by Dotted Rules
 Position of • indicates type of constituent
 0 Book 1 that 2 flight 3
• S → • VP, [0,0] (predicted)
• NP → Det • Nom, [1,2] (in progress)
• VP →V NP •, [0,3] (completed)
 [x,y] tells us what portion of the input is spanned
so far by this rule
 Each State si:
<dotted rule>, [<back pointer>,<current position>]
0
Book 1 that 2 flight 3
S → • VP, [0,0]
– First 0 means S constituent begins at the start of input
– Second 0 means the dot here too
– So, this is a top-down prediction
NP → Det • Nom, [1,2]
–
–
–
–
the NP begins at position 1
the dot is at position 2
so, Det has been successfully parsed
Nom predicted next
Book 1 that 2 flight 3
(continued)
0
VP → V NP •, [0,3]
– Successful VP parse of entire input
Successful Parse
Final answer found by looking at last entry
in chart
If entry resembles S →  • [nil,N] then
input parsed successfully
Chart will also contain record of all
possible parses of input string, given the
grammar
Parsing Procedure for the
Earley Algorithm
 Move through each set of states in order,
applying one of three operators to each state:
– predictor: add predictions to the chart
– scanner: read input and add corresponding state to
chart
– completer: move dot to right when new constituent
found
 Results (new states) added to current or next set
of states in chart
 No backtracking and no states removed: keep
complete history of parse
States and State Sets
 Dotted Rule si represented as
<dotted rule>, [<back pointer>, <current position>]
 State Set Sj to be a collection of states si with the
same <current position>.
Earley Algorithm from Book
Earley Algorithm (simpler!)
1. Add Start → · S, [nil,0] to state set 0
2. Predict all states you can, adding new predictions to state
set 0. Let i = 1.
3. Scan input word i—add all matched states to state set Si.
Add all new states produced by Complete to state set Si
Add all new states produced by Predict to state set Si
Unless i=n, (a) Let i = i + 1; (b) repeat step 3.
4. At the end, see if state set n contains Start → S ·, [nil,n]
3 Main Sub-Routines of
Earley Algorithm
Predictor: Adds predictions into the chart.
Completer: Moves the dot to the right
when new constituents are found.
Scanner: Reads the input words and enters
states representing those words into the
chart.
Predictor
 Intuition: create new state for top-down
prediction of new phrase.
 Applied when non part-of-speech non-terminals
are to the right of a dot: S → • VP [0,0]
 Adds new states to current chart
– One new state for each expansion of the non-terminal
in the grammar
VP → • V [0,0]
VP → • V NP [0,0]
 Formally:
Sj: A → α · B β, [i,j]
Sj: B → · γ, [j,j]
Scanner
 Intuition: Create new states for rules matching
part of speech of next word.
 Applicable when part of speech is to the right of
a dot: VP → • V NP [0,0] ‘Book…’
 Looks at current word in input
 If match, adds state(s) to next chart
VP → V • NP [0,1]
 Formally:
Sj: A → α · B β, [i,j]
Sj+1: A → α B ·β, [i,j+1]
Completer
 Intuition: parser has finished a new phrase, so
must find and advance states all that were
waiting for this
 Applied when dot has reached right end of rule
NP → Det Nom • [1,3]
 Find all states w/dot at 1 and expecting an NP
VP → V • NP [0,1]
 Adds new (completed) state(s) to current chart
VP → V NP • [0,3]
 Formally: Sk: B → δ ·, [j,k]
Sk: A → α B · β, [i,k],
where: Sj: A → α · B β, [i,j].
Example: State Set S0 for Parsing “Book
that flight” using Grammar G0
nil
Example: State Set S1 for
Parsing “Book that flight”
Scanner
Scanner
VP→  V and VP →  V NP are both passed to
Scanner, which adds them to Chart[1], moving
dots to right
Prediction of Next Rule
When VP → V  is itself processed by
the Completer, S → VP  is added to
Chart[1] since VP is a left corner of S
Last 2 rules in Chart[1] are added by
Predictor when VP → V  NP is
processed
And so on….
Last Two States
Scanner
Scanner
Scanner
γ→S
.
[nil,3] Completer
How do we retrieve the
parses at the end?
Augment the Completer to add pointers to
prior states it advances as a field in the
current state
– i.e. what state did we advance here?
– Read the pointers back from the final state
Error Handling
 What happens when we look at the
contents of the last table column and don't
find a S →  rule?
– Is it a total loss? No...
– Chart contains every constituent and
combination of constituents possible for the
input given the grammar
 Also useful for partial parsing or shallow
parsing used in information extraction
Earley’s Keys to Efficiency
 Left-recursion, Ambiguity and repeated reparsing of subtrees
– Solution: dynamic programming
 Combine top-down predictions with
bottom-up look-ahead to avoid unnecessary
expansions
 Earley is still one of the most efficient
parsers
 All efficient parsers avoid re-computation in
a similar way.
 But Earley doesn’t require grammar
conversion!
Next Time
Read Chapter 14.