Borrowed from Dependency Parsing

Download Report

Transcript Borrowed from Dependency Parsing

Dependency Parsing
Some slides are based on:
1) PPT presentation on dependency parsing by Prashanth Mannem
2) Seven Lectures on Statistical Parsing by Christopher Manning
Constituency parsing
• Breaks sentence into constituents (phrases),
which are then broken into smaller
constituents
• Describes phrase structure and clause
structure ( NP, PP, VP, etc.)
• Structures often recursive
S
NP
VP
VP
mom
is
NP
an
amazing
show
Dependency parsing
• Syntactic structure consists of lexical items,
linked by binary asymmetric relations called
dependencies
• Interested in grammatical relations between
individual words (governing & dependent
words)
• Does not propose a recursive structure, rather
a network of relations
• These relations can also have labels
Dependency vs. Constituency
• Dependency structures explicitly represent
– Head-dependent relations (directed arcs)
– Functional categories (arc labels)
– Possibly some structural categories (parts-of-speech)
• Constituency structure explicitly represent
– Phrases (non-terminal nodes)
– Structural categories (non-terminal labels)
– Possibly some functional categories (grammatical
functions)
Dependency vs. Constituency
• A dependency grammar has a notion of a head
• Officially, CFGs don’t
• But modern linguistic theory and all modern
statistical parsers (Charniak, Collins, …) do, via handwritten phrasal “head rules”:
– The head of a Noun Phrase is a noun/number/…
– The head of a Verb Phrase is a verb/modal/….
Based on a slide by Chris Manning
Dependency vs. Constituency
• The head rules can be used to extract a
dependency parse from a CFG parse (follow
the heads)
• A phrase structure tree can be got from a
dependency tree, but dependents are flat
Based on a slide by Chris Manning
Definition: dependency graph
• An input word sequence w1…wn
• Dependency graph G = (V,E) where
– V is the set of nodes i.e. word tokens in the input
seq.
– E is the set of unlabeled tree edges
(i, j) i, j є V
– (ii, j) indicates an edge from i (parent, head,
governor) to j (child, dependent)
Definition: dependency graph
• A dependency graph is well-formed iff
– Single head: Each word has only one head
– Acyclic: The graph should be acyclic
– Connected: The graph should be a single tree with
all the words in the sentence
– Projective: If word A depends on word B, then all
words between A and B are also subordinate to B
(i.e. dominated by B)
Non-projective dependencies
Ram
saw
a
dog
yesterday
which
was
a
Yorkshire Terrier
Parsing algorithms
• Dependency based parsers can be broadly
categorized into
– Grammar driven approaches
• Parsing done using grammars
– Data driven approaches
• Parsing by training on annotated/un-annotated data
Unlabeled graphs
• Dan Klein recently showed that labeling is
relatively easy and that the difficulty of
parsing lies in creating bracketing (Klein, 2004)
• Therefore some parsers run in two steps:
1) bracketing; 2) labeling
Traditions
• Dynamic programming
– e.g., Eisner (1996), McDonald (2006)
• Deterministic search
– e.g., Covington (2001), Yamada and Matsumoto,
Nivre (2006)
• Constraints satisfaction
– e.g., Maruyama, Foth et al.
Data driven
• Two main approaches
– Global, Exhaustive, Graph-based parsing
– Local, greedy, transition-based parsing
Graph-based parsing
• Assume there is a scoring function:
s :V ´V ´ L ® R
• The score of a graph is
s(G = (V, E)) =
å
s(i, j)
(i, j )ÎE
• Parsing for input string x is
G* = argmaxGÎD(Gx )
All dependency graphs
å
(i, j )ÎE
s(i, j)
MST algorithm (McDonald, 2006)
• Scores are based on features, independent of
other dependencies
s(i, j) = w × f (i, j)
• Features can be
– Head and dependent word and POS separately
– Head and dependent word and POS bigram
features
– Words between head and dependent
– Length and direction of dependency
MST algorithm (McDonald, 2006)
• Parsing can be formulated as
maximum spanning tree problem
• Use Chu-Liu-Edmonds (CLE) algorithm for MST
(runs in O(n2 ) , considers non-projective arcs)
• Uses online learning for determining weight
vector w
s(i, j) = w × f (i, j)
Transition-based parsing
• A transition system for dependency parsing
defines:
– a set C of parser configurations, each of which
defines a (partially built) dependency graph G
– a set T of transitions, each a function t :CC
– for every sentence x = w0,w1, . . . ,wn
• a unique initial configuration cx
• a set Qx of terminal configurations
Transition sequence
• A transition sequence Cx,m = (cx, c1, . . . , cm)
for a sentence x is a sequence of
configurations such that cm Î Q and, for every
ci ¹ cx there is a transition t Î T such that
ci = t(ci-1 )
• The graph defined by cm Î Q is the
dependency graph of x
Transition scoring function
• The score of a transition t in a configuration c
s(c, t) represents the likelihood of taking
transition t out of configuration c
s : C ´T ® R
• Parsing is finding the optimal transition
sequence ( t* = argmaxtÎT s(c,t) )
Yamada and Matsumoto (2003)
• A transition-based (shift-reduce) parser
• Considers two adjacent words
• Runs in iterations, continues as long as new dependencies
are created
• In every iteration, consider 3 different actions and choose
one using SVM (or other discriminative learning technique)
• Time complexity O(n 2 )
• Accuracy was shown to be close to the state-of-the-art
algorithms (e.g., Eisner’s)
Y&M (2003) Actions
• Shift
• Left
• Right
Y&M (2003) Learning
• Features (lemma, POS tag) are collected
from the context
Stack-based parsing
• Introducing a stack and a buffer
– The buffer is a queue of all input words (left to
right)
– The stack begins empty; words are pushed to the
stack by the defined actions
• Reduces Y&M complexity to linear time
2 stack-based parsers
• Nivre’s (2003, 2006) arc-standard
i doesn’t
have a head
already
s
b
j doesn’t
have a head
already
Stack
Buffer
2 stack-based parsers
• Nivre’s (2003, 2006) arc-eager
Example (arc eager)
Red
_ROOT_
S
figures
on
the
screen
indicated
falling
stocks
Q
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
S
on
the
screen
indicated
falling
stocks
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
Red
_ROOT_
S
figures
on
the
screen
indicated
falling
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
S
the
screen
indicated
falling
stocks
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
S
screen
indicated
falling
stocks
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen
S
indicated
falling
stocks
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
the
on
S
screen
indicated
falling
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen
indicated
S
falling
stocks
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
the
on
S
screen
indicated
falling
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
on
figures
S
the
screen
indicated
falling
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
S
figures
on
the
screen
indicated
falling
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
S
stocks
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated falling
stocks
S
Q
Shift
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
S
stocks
Q
Left-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
stocks
S
Q
Right-arc
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
figures
on
the
screen indicated
falling
S
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Example
_ROOT_ Red
S
figures
on
the
screen indicated
falling
stocks
Q
Reduce
Borrowed from Dependency Parsing (P. Mannem)
Graph (MSTParser) vs. Transitions
(MaltParser)
• Accuracy on different languages
Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Graph (MSTParser) vs. Transitions
(MaltParser)
• Sentence length vs. accuracy
Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Graph (MSTParser) vs. Transitions
(MaltParser)
• Dependency length vs. precision
Characterizing the Errors of Data-Driven Dependency Parsing Models, McDonald and Nivre 2007
Known Parsers
• Stanford (constituency + dependency)
• MaltParser (dependency)
• MSTParser (dependency)
• Hebrew
– Yoav Goldberg’s parser
(http://www.cs.bgu.ac.il/~yoavg/software/hebpar
sers/hebdepparser/)