Dependency Parsing - Uppsala University

Download Report

Transcript Dependency Parsing - Uppsala University

Dependency Parsing
Joakim Nivre
Dependency Grammar
• Old tradition in descriptive grammar
• Modern theroretical developments:
–
–
–
–
Structural syntax (Tesnière)
Meaning-Text Theory (Mel’čuk)
Word Grammar (Hudson)
Functional Generative Description (Prague)
• Basic idea:
– Syntactic structure consists of binary, asymmetrical
relations between the words of a sentence
Dependency Representations
• Directed graphs:
– V is a set of nodes (tokens)
– E is a set of arcs (dependency relations)
– L is a labeling function on E (dependency types)
• Example:
OBJ
ADV
PR
SUB
ATT
VB
PN JJ
NN
PP
NN
målade han djärva tavlor
På
60-talet
(In) (the-60’s) (painted) (he) (bold) (pictures)
Graph Constraints
• Commonly imposed constraints:
–
–
–
–
Single-head (at most one head per node)
Connectedness (no dangling nodes)
Acyclicity (no cycles in the graph)
Projectivity:
• An arc i  j is projective iff, for every k occurring
between i and j in the input string, i  j.
• A graph is projective iff every arc in A is projective.
Dependency Parsing
• Dependency-based grammar parsing:
– Given a dependency grammar G and an input
string x  *, derive some or all of the
dependency graphs y assigned to x by G.
• Dependency-based text parsing:
– Given a text T = (x1, …, xn), derive the correct
dependency graph yi for every sentence xi  T.
• Text parsing may be grammar-driven or not.
Parsing Methods
• Three main approaches:
– Dynamic programming algorithms applied to
context-free (projective) dependency grammars
– Eliminative parsing techniques applied to
constraint-based formulations of (nonprojective) dependency grammar
– Deterministic parsing algorithms combined
with weak grammars or data-driven methods
Dynamic Programming
• Early formalizations:
– Hays/Gaifman: Equivalent to a subset of
context-free grammars (roughly lexicalized)
– Tabular parsing techniques (cf. CKY parsing)
• Modern developments:
– Link grammar (Sleator and Temperley)
– Bilexical grammar (Eisner):
• Lexicalized parsing in O(n3) time by combining
spans instead of constituents.
Constraint Satisfaction
• Constraints on dependency graphs (Maruyama):
pos(i) = D  [dep(i) = DET  pos(head(i)) = N  i < head(i)]
pos(i) = N  [dep(i) = SUBJ  pos(head(i)) = V  i < head(i)]
pos(i) = V  [dep(i) = ROOT  head(i) = nil]
[head(i) = head(j)  dep(i) = dep(j)]  i = j
• Graph satisfying the above constraints:
DET
D
a
SUBJ
N
dog
V
runs
Parsing with Constraints
• Eliminative parsing:
– Start with all formally possible analyses (in a
compact representation)
– Eliminate representations that violate
constraints until only valid analyses remain
• Variations:
– Weighted constraints
– Transformational parsing
Deterministic Parsing
• Covington’s fundamental algorithm:
– Accept words one by one starting at the
beginning of the sentence, and try linking each
word as head or dependent of every previous
word.
• Variations on shift-reduce parsing:
– Standard (Kudo, Matsumoto, Yamada)
– Arc-eager (Nivre)
Arc-Eager Parsing
• Configuration: C = S, I, A
S = Stack
I = Input (remaining)
A = Arc relation (current)
• Initialization:
nil, W, 
• Termination:
S, nil, A
for any S, A
• Acceptance:
S, nil, A
if (W, A) is connected
Transitions
• Left-Arc (LA):
wi|S, wj|I, A  S, wj|I, A  {(wj, wi)}
if a : a  A  dep(a) = wi
• Right-Arc (RA):
wi|S, wj|I, A  wj|wi|S, I, A  {(wi, wj)}
if a : a  A  dep(a) = wj
• Reduce (RE):
wi|S, I, A  S, I, A
if a : a  A  dep(a) = wi
• Shift (SH):
S, wi|I, A  wi|S, I, A
MaltParser
• Robust, data-driven dependency parsing
using a combination of:
– Deterministic parsing (e.g. arc-eager)
– Discriminative machine learning (e.g. MBL)
– User-defined feature models:
• Lexical features
• Part-of-speech features
• Dependency type features
Why (Not) Dependency Parsing?
• Potential advantages of dependency parsing:
– Dependency relations are close to semantic relations,
which facilitates semantic interpretation
– Dependency representations are more constrained (less
complex), which facilitates parsing
– Dependency representations are more suitable for
languages with free or flexible word order
• Potential disadvantages:
– Dependency representations are less expressive
– Dependency representations are less well understood
formally and computationally