State-of-art in Dependency Parsing

Download Report

Transcript State-of-art in Dependency Parsing

Dependency Parsing
Prashanth Mannem
[email protected]
Outline
Introduction




Phrase Structure Grammar
Dependency Grammar
Comparison
Dependency Parsing


Formal definition
Parsing Algorithms





2
Introduction
Dynamic programming
Constraint satisfaction
Deterministic search
Dependency Parsing (P. Mannem)
July 7, 2015
Introduction

The syntactic parsing of a sentence consists of finding the
correct syntactic structure of that sentence in a given
formalism/grammar.

Dependency Grammar (DG) and Phrase Structure
Grammar (PSG) are two such formalisms.
3
Dependency Parsing (P. Mannem)
July 7, 2015
Phrase Structure Grammar (PSG)
Breaks sentence into constituents (phrases)


Which are then broken into smaller constituents
Describes phrase structure, clause structure


E.g.. NP, PP,VP etc..
Structures often recursive


4
The clever tall blue-eyed old … man
Dependency Parsing (P. Mannem)
July 7, 2015
Draw the phrase structure tree
Red figures on the screens indicated falling stocks

5
Dependency Parsing (P. Mannem)
July 7, 2015
Phrase Structure Tree
6
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Grammar

Syntactic structure consists of lexical items, linked by
binary asymmetric relations called dependencies

Interested in grammatical relations between individual
words (governing & dependent words)

Does not propose a recursive structure

Rather a network of relations
These relations can also have labels

7
Dependency Parsing (P. Mannem)
July 7, 2015
Draw the dependency tree
Red figures on the screens indicated falling stocks

8
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Tree
9
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Tree Example

Phrasal nodes are missing in the dependency structure
when compared to constituency structure.
10
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Tree with Labels
11
Dependency Parsing (P. Mannem)
July 7, 2015
Comparison

Dependency structures explicitly represent




Head-dependent relations (directed arcs)
Functional categories (arc labels)
Possibly some structural categories (parts-of-speech)
Phrase structure explicitly represent



12
Phrases (non-terminal nodes)
Structural categories (non-terminal labels)
Possibly some functional categories (grammatical functions)
Dependency Parsing (P. Mannem)
July 7, 2015
Learning DG over PSG

Dependency Parsing is more straightforward


Direct encoding of predicate-argument structure


Parsing can be reduced to labeling each token wi with wj
Fragments are directly interpretable
Dependency structure independent of word order

Suitable for free word order languages (like Indian languages)
13
Dependency Parsing (P. Mannem)
July 7, 2015
Outline

Introduction




Dependency Parsing


Phrase Structure Grammar
Dependency Grammar
Comparison
Formal definition
Parsing Algorithms




14
Introduction
Dynamic programming
Constraint satisfaction
Deterministic search
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Tree

Formal definition


An input word sequence w1…wn
Dependency graph D = (W,E) where




W is the set of nodes i.e. word tokens in the input seq.
E is the set of unlabeled tree edges (wi, wj) (wi, wj є W).
(wi, wj) indicates an edge from wi (parent) to wj (child).
Task of mapping an input string to a dependency graph
satisfying certain conditions is called dependency parsing
15
Dependency Parsing (P. Mannem)
July 7, 2015
Well-formedness

A dependency graph is well-formed iff

Single head: Each word has only one head.

Acyclic: The graph should be acyclic.

Connected: The graph should be a single tree with all the
words in the sentence.

Projective: If word A depends on word B, then all words
between A and B are also subordinate to B (i.e. dominated by
B).
16
Dependency Parsing (P. Mannem)
July 7, 2015
Non-projective dependency tree
Ram
saw
a
dog
yesterday
which
was
a
Yorkshire Terrier
* Crossing lines
English has very few non-projective cases.
17
Dependency Parsing (P. Mannem)
July 7, 2015
Outline

Introduction




Dependency Parsing


Phrase Structure Grammar
Dependency Grammar
Comparison and Conversion
Formal definition
Parsing Algorithms




18
Introduction
Dynamic programming
Constraint satisfaction
Deterministic search
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Parsing

Dependency based parsers can be broadly categorized
into

Grammar driven approaches


Data driven approaches

19
Parsing done using grammars.
Parsing by training on annotated/un-annotated data.
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Parsing

Dependency based parsers can be broadly categorized
into

Grammar driven approaches


Data driven approaches


Parsing done using grammars.
Parsing by training on annotated/un-annotated data.
These approaches are not mutually exclusive
20
Dependency Parsing (P. Mannem)
July 7, 2015
Covington’s Incremental Algorithm

Incremental parsing in O(n2) time by trying to link
each new word to each preceding one [Covington
2001]:
PARSE(x = (w1, . . . ,wn))
1. for i = 1 up to n
2. for j = i − 1 down to 1
3.
LINK(wi , wj )
21
Dependency Parsing (P. Mannem)
July 7, 2015
Covington’s Incremental Algorithm

Incremental parsing in O(n2) time by trying to link
each new word to each preceding one [Covington
2001]:
PARSE(x = (w1, . . . ,wn))
1. for i = 1 up to n
2. for j = i − 1 down to 1
3.
LINK(wi , wj )

22
Different conditions, such as Single-Head and
Projectivity, can be incorporated into the LINK
operation.
Dependency Parsing (P. Mannem)
July 7, 2015
Parsing Methods

Three main traditions

Dynamic programming


Constraint satisfaction


Maruyama, Foth et al., Duchier
Deterministic search

23
CYK, Eisner, McDonald
Covington,Yamada and Matsumuto, Nivre
Dependency Parsing (P. Mannem)
July 7, 2015
Dynamic Programming


Basic Idea: Treat dependencies as constituents.
Use, e.g. , CYK parser (with minor modifications)
Dependency Parsing (P. Mannem)
July 7, 2015
24
Dependency Chart Parsing




Grammar is regarded as context-free, in which each node
is lexicalized
Chart entries are subtrees, i.e., words with all their left
and right dependents
Problem: Different entries for different subtrees spanning
a sequence of words with different heads
Time requirement: O(n5)
25
Dependency Parsing (P. Mannem)
July 7, 2015
Slide from [Eisner, 1997]
Generic Chart Parsing
for each of the O(n2) substrings,
for each of O(n) ways of splitting it,
for each of S analyses of first half
for each of S analyses of second half,
for each of c ways of combining them:
combine, & add result to chart if best
[cap spending] + [at $300 million] = [[cap spending] [at $300 million]]
S analyses
26
S analyses
cS2 analyses
of which we keep S
Dependency Parsing (P. Mannem)
July 7, 2015
Slide from [Eisner, 1997]
Headed constituents ...
... have too many signatures.
How bad is Q(n3S2c)?
For unheaded constituents, S is constant: NP,VP ...
(similarly for dotted trees). So Q(n3).
But when different heads different signatures,
the average substring has Q(n) possible heads
and S=Q(n) possible signatures. So Q(n5).
27
Dependency Parsing (P. Mannem)
July 7, 2015
Dynamic Programming Approaches




Original version [Hays 1964] (grammar driven)
Link grammar [Sleator and Temperley 1991] (grammar driven)
Bilexical grammar [Eisner 1996] (data driven)
Maximum spanning tree [McDonald 2006] (data driven)
28
Dependency Parsing (P. Mannem)
July 7, 2015
Eisner 1996

Two novel aspects:







Modified parsing algorithm
Probabilistic dependency parsing
Time requirement: O(n3)
Modification: Instead of storing subtrees, store spans
Span: Substring such that no interior word links to
any word outside the span.
Idea: In a span, only the boundary words are active,
i.e. still need a head or a child
One or both of the boundary words can be active
29
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_
30
Red
figures
on
the
screen
indicated
Dependency Parsing (P. Mannem)
falling
stocks
July 7, 2015
Example
_ROOT_
Red
figures
on
the
screen
indicated
falling
stocks
Spans:
Red
31
figures
indicated
Dependency Parsing (P. Mannem)
falling
stocks
July 7, 2015
Assembly of correct parse
_ROOT_
Red
figures
on
the
screen
indicated
falling
stocks
Start by combining adjacent words to minimal spans
Red
32
figures
figures
on
on
the
Dependency Parsing (P. Mannem)
July 7, 2015
Assembly of correct parse
_ROOT_
Red
figures
on
the
screen
indicated
falling
stocks
Combine spans which overlap in one word; this word must
be governed by a word in the left or right span.
on the
33
+
the
screen
→
on
Dependency Parsing (P. Mannem)
the
screen
July 7, 2015
Assembly of correct parse
Red
_ROOT_
figures
on
the
screen
indicated
falling
stocks
Combine spans which overlap in one word; this word must
be governed by a word in the left or right span.
figures
34
on
+
on
the
screen
→
figures
Dependency Parsing (P. Mannem)
on
the
July 7, 2015
screen
Assembly of correct parse
_ROOT_
Red
figures
on
the
screen
indicated
falling
stocks
Combine spans which overlap in one word; this word must
be governed by a word in the left or right span.
Invalid span
Red
35
figures
on
the
screen
Dependency Parsing (P. Mannem)
July 7, 2015
Assembly of correct parse
_ROOT_
Red
figures
on
the
screen
indicated
falling
stocks
Combine spans which overlap in one word; this word must
be governed by a word in the left or right span.
indicated
36
falling
+
falling
stocks
→
indicated
Dependency Parsing (P. Mannem)
falling
July 7, 2015
stocks
Eisner’s Model

Recursive Generation


Each word generates its actual dependents
Two Markov chains:


37
Left dependents
Right dependents
Dependency Parsing (P. Mannem)
July 7, 2015
Eisner’s Model
where
tw(i) is ith tagged word
lc(i) & rc(i) are the left and right children of ith word
where
lcj(i) is the jth left child of the ith word
t(lcj-1(i)) is the tag of the preceding left child
38
Dependency Parsing (P. Mannem)
July 7, 2015
McDonald’s Maximum Spanning Trees




Score of a dependency tree = sum of scores of
dependencies
Scores are independent of other dependencies
If scores are available, parsing can be formulated as
maximum spanning tree problem
Two cases:



Projective: Use Eisner’s parsing algorithm.
Non-projective: Use Chu-Liu-Edmonds algorithm [Chu and Liu
1965, Edmonds 1967]
Uses online learning for determining weight vector w
39
Dependency Parsing (P. Mannem)
July 7, 2015
Parsing Methods

Three main traditions

Dynamic programming


Constraint satisfaction


Maruyama, Foth et al., Duchier
Deterministic parsing

40
CYK, Eisner, McDonald
Covington,Yamada and Matsumuto, Nivre
Dependency Parsing (P. Mannem)
July 7, 2015
Constraint Satisfaction





Uses Constraint Dependency Grammar
Grammar consists of a set of boolean constraints, i.e.
logical formulas that describe well-formed trees
A constraint is a logical formula with variables that
range over a set of predefined values
Parsing is defined as a constraint satisfaction problem
Constraint satisfaction removes values that
contradict constraints
41
Dependency Parsing (P. Mannem)
July 7, 2015
Constraint Satisfaction




Parsing is an eliminative process rather than a
constructive one such as in CFG parsing
Constraint satisfaction in general is NP complete
Parser design must ensure practical efficiency
Different approaches


42
Constraint propagation techniques which ensure local
consistency [Maruyama 1990]
Weighted CDG [Foth et al. 2000, Menzel and Schroder 1998]
Dependency Parsing (P. Mannem)
July 7, 2015
Maruyama’s Constraint
Propagation

Three steps:
1) Form initial constraint network using a “core” grammar
2) Remove local inconsistencies
3) If ambiguity remains, add new constraints and repeat step 2
43
Dependency Parsing (P. Mannem)
July 7, 2015
Weighted Constraint Parsing





Robust parser, which uses soft constraints
Each constraint is assigned a weight between 0.0 and 1.0
Weight 0.0: hard constraint, can only be violated when no
other parse is possible
Constraints assigned manually (or estimated from
treebank)
Efficiency: uses a heuristic transformation-based
constraint resolution method
44
Dependency Parsing (P. Mannem)
July 7, 2015
Transformation-Based Constraint
Resolution






Heuristic search
Very efficient
Idea: first construct arbitrary dependency structure, then
try to correct errors
Error correction by transformations
Selection of transformations based on constraints that
cause conflicts
Anytime property: parser maintains a complete analysis at
anytime → can be stopped at any time and return a
complete analysis
45
Dependency Parsing (P. Mannem)
July 7, 2015
Parsing Methods

Three main traditions

Dynamic programming


Constraint satisfaction


Maruyama, Foth et al., Duchier
Deterministic parsing

46
CYK, Eisner, McDonald
Covington,Yamada and Matsumuto, Nivre
Dependency Parsing (P. Mannem)
July 7, 2015
Deterministic Parsing

Basic idea:



Derive a single syntactic representation (dependency graph)
through a deterministic sequence of elementary parsing
actions
Sometimes combined with backtracking or repair
Motivation:



47
Psycholinguistic modeling
Efficiency
Simplicity
Dependency Parsing (P. Mannem)
July 7, 2015
Shift-Reduce Type Algorithms

Data structures:



Parsing actions built from atomic actions:




Stack [. . . ,wi ]S of partially processed tokens
Queue [wj , . . .]Q of remaining input tokens
Adding arcs (wi → wj , wi ← wj )
Stack and queue operations
Left-to-right parsing
Restricted to projective dependency graphs
48
Dependency Parsing (P. Mannem)
July 7, 2015
Yamada’s Algorithm


Three parsing actions:

Shift

Left

Right
[. . .]S [wi , . . .]Q
[. . . , wi ]S [. . .]Q
[. . . , wi , wj ]S [. . .]Q
[. . . , wi ]S [. . .]Q wi → wj
[. . . , wi , wj ]S [. . .]Q
[. . . , wj ]S [. . .]Q wi ← wj
Multiple passes over the input give time complexity O(n2)
49
Dependency Parsing (P. Mannem)
July 7, 2015
Yamada and Matsumoto

Parsing in several rounds: deterministic bottom-up O(n2)
Looks at pairs of words
3 actions: shift, left, right

Shift: shifts focus to next word pair


50
Dependency Parsing (P. Mannem)
July 7, 2015
Yamada and Matsumoto
 Left: decides that the left word depends on the right one

Right: decides that the right word depends on the left word
51
Dependency Parsing (P. Mannem)
July 7, 2015
Parsing Algorithm

Go through each pair of words

Decide which action to take

If a relation was detected in a pass, do another pass

E.g. the little girl



First pass: relation between little and girl
Second pass: relation between the and girl
Decision on action depends on word pair and context
52
Dependency Parsing (P. Mannem)
July 7, 2015
Nivre’s Algorithm

Four parsing actions:
Shift [. . .]S [wi , . . .]Q
[. . . , wi ]S [. . .]Q
Reduce
[. . . , wi ]S [. . .]Q Эwk : wk → wi
[. . .]S [. . .]Q
Left-Arc [. . . , wi ]S [wj , . . .]Q ¬Эwk : wk → wi
[. . .]S [wj , . . .]Q
wi ← wj
Right-Arc [. . . ,wi ]S [wj , . . .]Q ¬Эwk : wk → wj
[. . . , wi , wj ]S [. . .]Q wi → wj
53
Dependency Parsing (P. Mannem)
July 7, 2015
Nivre’s Algorithm

Characteristics:


54
Arc-eager processing of right-dependents
Single pass over the input gives time worst case complexity
O(2n)
Dependency Parsing (P. Mannem)
July 7, 2015
Example
Red
_ROOT_
figures
on
the
screen
indicated
falling
stocks
S
55
Q
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated
falling
stocks
S
Q
Shift
56
Dependency Parsing (P. Mannem)
July 7, 2015
Example
Red
_ROOT_
figures
on
the
screen
indicated
falling
stocks
S
Q
Left-arc
57
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated
falling
stocks
S
Q
Shift
58
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated
falling
stocks
S
Q
Right-arc
59
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated
falling
stocks
S
Q
Shift
60
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
the
on
screen
indicated
falling
stocks
S
Q
Left-arc
61
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated
falling
stocks
S
Q
Right-arc
62
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
the
on
screen
indicated
falling
stocks
S
Q
Reduce
63
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
on
figures
the
screen
indicated
falling
stocks
S
Q
Reduce
64
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
S
figures
on
the
screen
indicated
falling
stocks
Q
Left-arc
65
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated
falling
stocks
S
Q
Right-arc
66
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated falling
stocks
S
Shift
67
Dependency Parsing (P. Mannem)
July 7, 2015
Q
Example
_ROOT_ Red
figures
on
the
screen
falling
indicated
stocks
S
Q
Left-arc
68
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
figures
on
the
screen
indicated
falling
stocks
S
Right-arc
69
Dependency Parsing (P. Mannem)
July 7, 2015
Q
Example
_ROOT_ Red
figures
on
the
screen
falling
indicated
stocks
S
Q
Reduce
70
Dependency Parsing (P. Mannem)
July 7, 2015
Example
_ROOT_ Red
S
figures
on
the
screen
indicated
falling
stocks
Q
Reduce
71
Dependency Parsing (P. Mannem)
July 7, 2015
Classifier-Based Parsing

Data-driven deterministic parsing:




Deterministic parsing requires an oracle.
An oracle can be approximated by a classifier.
A classifier can be trained using treebank data.
Learning algorithms:

Support vector machines (SVM) [Kudo and Matsumoto 2002,Yamada
and Matsumoto 2003,Isozaki et al. 2004, Cheng et al. 2004, Nivre et al. 2006]

Memory-based learning (MBL) [Nivre et al. 2004, Nivre and Scholz
2004]

72
Maximum entropy modeling (MaxEnt) [Cheng et al. 2005]
Dependency Parsing (P. Mannem)
July 7, 2015
Feature Models

Learning problem:


Approximate a function from parser states, represented by
feature vectors to parser actions, given a training set of gold
standard derivations.
Typical features:

Tokens and POS tags of :



Target words
Linear context (neighbors in S and Q)
Structural context (parents, children, siblings in G)

73
Can not be used in dynamic programming algorithms.
Dependency Parsing (P. Mannem)
July 7, 2015
Dependency Parsers for download



MST parser by Ryan McDonald
Malt parser by Joakim Nivre
Stanford parser
74
Dependency Parsing (P. Mannem)
July 7, 2015
Summary


Provided an intro to dependency parsing and various
dependency parsing algorithms
Read up Nivre’s and McDonald’s tutorial on dependency
parsing at ESSLLI’ 07
75
Dependency Parsing (P. Mannem)
July 7, 2015
76
Dependency Parsing (P. Mannem)
July 7, 2015