Supertagging
Download
Report
Transcript Supertagging
Supertagging
CMSC 35100
Natural Language Processing
January 31, 2006
Roadmap
Motivation
Supertags
Definition & Examples
Parsing as disambiguation
Tagging, Parsing & Lexicalization
Structural filters
Unigram & N-gram models
Results & Discussion
Motivation: Good, Robust
Parsing
Main approaches:
Finite-state parsers:
Shallow parsing, longest match resolves ambiguity
Hand-crafted
Partition domain-independent, domain-dependent
Statistical parsers:
Assign some structure to any string w/probability
Automatically extracted from manual annotations
Not linguistically transparent, hard to modify
Lexicalization
Lexicalization:
Finite set of elementary structures
Trees, strings,etc
Anchored on lexical item
Lexical items
Provides integration of syntax, semantics of lexical
Enforces subcategorization, semantic constraints
Associated w/at least one elementary grammar struct
Finite set of operations combines
Framework
FB-LTAG:
Feature-based, lexicalized TAG
Elementary trees:
Lexical item (anchor) on frontier
Provides complex description of anchor
Specifies domain of locality for syn/sem
constraints
Initial (non-recursive), auxiliary (recursive)
Derived by substitution and adjunction
Supertags
Inspired by POS taggers
Locally resolve much POS ambiguity before
parsing (96-97% accurate)
Limited context, e.g. tri-gram
Elementary trees localize dependencies
All and only dependent elements are in tree
Supertag=elementary structure
Highly ambiguous
One per lexical item per distinct use
Word with one POS has (probably) many supertags
E.g. always a verb, many subcat frames
Extended Domain of Locality
Each supertag must contain all and only
the arguments of the anchor in the
struct
For each lexical item, grammar must
contain supertag for each syntactic
environment in which the item can
appear
Factoring Recursion
Recursive constructs represented as auxiliary
trees / supertags
Initial supertags define domains of locality for
agreement, subcategorization
Auxiliary trees can capture long distance
dependencies by adjunction
Supertags: Ambiguity and
Parsing
One supertag per word in complete parse
Must select to parse
Problem: Massive ambiguity
Solution: Manage with local disambiguation
Supertags localize dependencies
Apply local n-gram constraints b/f parsing
POS disambiguation makes parsing easier
Supertagging disambiguation makes it trivial
Just structure combination
Example
Example
Example
Structural Filtering
Simple local tests for supertag use
Span of supertag:
Left/Right span constraint:
Left or right of anchor can’t be longer than input
Lexical items:
Minimum number of lexical items covered
Can’t be larger than input string
Can’t use if terminals don’t appear in input
Features, etc
Reduces ambiguity by 50% before parsing
Verbs, esp. light verbs worst, reduce 50%
Still > 250 per POS
N-gram Supertagging
Initially, (POS,supertag) pairs
Tri-gram model, trained on 5K WSJ
sentences
68% accuracy – small corpus, little smoothing
Dependency parsing
Avoid fixed context length
Dependency if substitutes or adjoins to
tree
Limited by size of LTAG parsed corpus
Too much like regular parsing
Smoothed N-gram Models:
Data
Training data:
Two sources:
XTAG parses of WSJ, IBM, ATIS
Small corpus but clean TAG derivations
Converted Penn Treebank WSJ sentences
Associates each lexical item with supertag using
parse
Requires heuristics using local tree contexts
Labels of dominating nodes, siblings, parent’s
sibs
Approximate
Smoothed N-gram Models: Unigram
Disambiguation redux:
Assume structural filters applied
Unigram approach:
Select supertag for word by its preference
Pr(t|w) = freq(t,w)/freq(w)
Most frequent supertag for word in training
Missing word:
Backoff to most frequent supertag of POS of word
Results: 73-77% top 1 vs 91% POS
Errors: Verb subcat, PP attachment, NP head/mod
Smoothed N-gram Models:
Trigram
Enhances previous models
Ideally
Trigram: lexicalized, (word,supertag)
Unigram: Adds context
T=argmaxTP(T1,T2,..Tn)*P(W1,..Wn|T1,..Tn)
Really
N
P (T1 , T2 ,..., Tn ) P (Ti | Ti 1,Ti 2 )
i 1
N
P (W1 , W2 ,.., Wn | T1 , T2 ,..., Tn ) P (Wi | Ti )
i 1
Good-Turing smoothing, Katz backoff
Results: up to 92%, 1M words training
Supertagging & Parsing
Front-end to parser
Analogous to POS tagging
Key: disambiguate supertags
A) Pick most probable by trigram method
B) Pick n-best supertags
4 vs 120 seconds per sentence to parse
If tag is wrong, parse is wrong
Recovers many more parses, slower
Still fails if ill-formed
Conclusion
Integration of light-weight n-gram approach
N-gram supertagging produces almost parse
With rich, lexicalized representation
Good tag, parsing effectiveness
> 90%
Faster than XTAG by factor of 30
Issues: conversion, ill-formed input, etc
Applications
Lightweight dependency analysis
Enhanced information retrieval
Exploit syntactic structure of query
Can enhance precision 33-> 79%
Supertagging in other formalisms
CCG
Lightweight Dependency
Analysis
Heuristic, linear-time, deterministic
Pass 1: modifier supertags
Pass 2: non-modifier supertags
Compute dependencies for s (w/anchor w)
For each frontier node d in s
Connect word to left or right of w by position
Label arc to d with internal node
P=82.3, R=93.8
Robust to fragments