Supertagging

Transcript Supertagging

Supertagging
CMSC 35100
Natural Language Processing
January 31, 2006
Roadmap

Motivation


Supertags


Definition & Examples
Parsing as disambiguation



Tagging, Parsing & Lexicalization
Structural filters
Unigram & N-gram models
Results & Discussion
Motivation: Good, Robust
Parsing

Main approaches:

Finite-state parsers:


Shallow parsing, longest match resolves ambiguity
Hand-crafted


Partition domain-independent, domain-dependent
Statistical parsers:



Assign some structure to any string w/probability
Automatically extracted from manual annotations
Not linguistically transparent, hard to modify
Lexicalization

Lexicalization:



Finite set of elementary structures



Trees, strings,etc
Anchored on lexical item
Lexical items


Provides integration of syntax, semantics of lexical
Enforces subcategorization, semantic constraints
Associated w/at least one elementary grammar struct
Finite set of operations combines
Framework

FB-LTAG:


Feature-based, lexicalized TAG
Elementary trees:





Lexical item (anchor) on frontier
Provides complex description of anchor
Specifies domain of locality for syn/sem
constraints
Initial (non-recursive), auxiliary (recursive)
Derived by substitution and adjunction
Supertags

Inspired by POS taggers

Locally resolve much POS ambiguity before
parsing (96-97% accurate)


Limited context, e.g. tri-gram
Elementary trees localize dependencies


All and only dependent elements are in tree
Supertag=elementary structure

Highly ambiguous


One per lexical item per distinct use
Word with one POS has (probably) many supertags
 E.g. always a verb, many subcat frames
Extended Domain of Locality


Each supertag must contain all and only
the arguments of the anchor in the
struct
For each lexical item, grammar must
contain supertag for each syntactic
environment in which the item can
appear
Factoring Recursion



Recursive constructs represented as auxiliary
trees / supertags
Initial supertags define domains of locality for
agreement, subcategorization
Auxiliary trees can capture long distance
dependencies by adjunction
Supertags: Ambiguity and
Parsing

One supertag per word in complete parse



Must select to parse
Problem: Massive ambiguity
Solution: Manage with local disambiguation



Supertags localize dependencies
Apply local n-gram constraints b/f parsing
POS disambiguation makes parsing easier


Supertagging disambiguation makes it trivial
Just structure combination
Example
Example
Example
Structural Filtering

Simple local tests for supertag use

Span of supertag:



Left/Right span constraint:



Left or right of anchor can’t be longer than input
Lexical items:


Minimum number of lexical items covered
Can’t be larger than input string
Can’t use if terminals don’t appear in input
Features, etc
Reduces ambiguity by 50% before parsing

Verbs, esp. light verbs worst, reduce 50%

Still > 250 per POS
N-gram Supertagging

Initially, (POS,supertag) pairs

Tri-gram model, trained on 5K WSJ
sentences


68% accuracy – small corpus, little smoothing
Dependency parsing



Avoid fixed context length
Dependency if substitutes or adjoins to
tree
Limited by size of LTAG parsed corpus

Too much like regular parsing
Smoothed N-gram Models:
Data

Training data:

Two sources:

XTAG parses of WSJ, IBM, ATIS


Small corpus but clean TAG derivations
Converted Penn Treebank WSJ sentences



Associates each lexical item with supertag using
parse
Requires heuristics using local tree contexts
 Labels of dominating nodes, siblings, parent’s
sibs
Approximate
Smoothed N-gram Models: Unigram

Disambiguation redux:


Assume structural filters applied
Unigram approach:



Select supertag for word by its preference
Pr(t|w) = freq(t,w)/freq(w)
Most frequent supertag for word in training

Missing word:


Backoff to most frequent supertag of POS of word
Results: 73-77% top 1 vs 91% POS

Errors: Verb subcat, PP attachment, NP head/mod
Smoothed N-gram Models:
Trigram

Enhances previous models



Ideally


Trigram: lexicalized, (word,supertag)
Unigram: Adds context
T=argmaxTP(T1,T2,..Tn)*P(W1,..Wn|T1,..Tn)
Really
N
P (T1 , T2 ,..., Tn )   P (Ti | Ti 1,Ti  2 )
i 1
N
P (W1 , W2 ,.., Wn | T1 , T2 ,..., Tn )   P (Wi | Ti )
i 1


Good-Turing smoothing, Katz backoff
Results: up to 92%, 1M words training
Supertagging & Parsing

Front-end to parser


Analogous to POS tagging
Key: disambiguate supertags

A) Pick most probable by trigram method



B) Pick n-best supertags


4 vs 120 seconds per sentence to parse
If tag is wrong, parse is wrong
Recovers many more parses, slower
Still fails if ill-formed
Conclusion

Integration of light-weight n-gram approach


N-gram supertagging produces almost parse




With rich, lexicalized representation
Good tag, parsing effectiveness
> 90%
Faster than XTAG by factor of 30
Issues: conversion, ill-formed input, etc
Applications

Lightweight dependency analysis

Enhanced information retrieval



Exploit syntactic structure of query
Can enhance precision 33-> 79%
Supertagging in other formalisms

CCG
Lightweight Dependency
Analysis

Heuristic, linear-time, deterministic



Pass 1: modifier supertags
Pass 2: non-modifier supertags
Compute dependencies for s (w/anchor w)

For each frontier node d in s



Connect word to left or right of w by position
Label arc to d with internal node
P=82.3, R=93.8

Robust to fragments

Supertagging

Transcript Supertagging

Directory