ppt - University of California, Berkeley
Download
Report
Transcript ppt - University of California, Berkeley
Using the Web as an Implicit Training Set:
Application to Structural Ambiguity Resolution
Preslav Nakov and Marti Hearst
Computer Science Division and SIMS
University of California, Berkeley
Supported by NSF DBI-0317510 and a gift from Genentech
Motivation
Huge datasets trump sophisticated algorithms.
“Scaling to Very Very Large Corpora for Natural
Language Disambiguation”, ACL 2001 (Banko &
Brill, 2001)
Task: spelling correction
Raw text as “training data”
Log-linear improvement even to billion words
Getting more data is better than fine-tuning
algorithms.
How to generalize to other problems?
Web as a Baseline
“Web as a baseline” (Lapata & Keller 04;05):
applied simple n-gram models to:
machine translation candidate selection
Significantly better than the
article generation
best supervised algorithm.
noun compound interpretation
noun compound bracketing
Not significantly different from
adjective ordering
the best supervised algorithm.
spelling correction
countability detection
prepositional phrase attachment
All unsupervised
Findings:
Sometimes rival best supervised approaches.
=> Web n-grams should be used as a baseline.
Our Contribution
Potential of these ideas is not yet fully realized
We introduce new features
paraphrases
surface features
Applied to structural ambiguity problems
Data sparseness: need statistics for every possible
word and for word combinations
state-of-the-art results
(Nakov&Hearst, 2005)
Problems (unsupervised):
Noun compound bracketing
PP attachment
this work
NP coordination
Task 1:
Prepositional Phrase
Attachment
PP attachment
PP combines with the NP
to form another NP
(a) Peter spent millions of dollars.
(b) Peter spent time with his family.
(noun)
(verb)
PP is an indirect
object of the verb
Human performance:
quadruple: 88%
whole sentence: 93%
quadruple: (v, n1, p, n2)
(a) (spent, millions, of, dollars)
(b) (spent, time, with, family)
Related Work
Supervised
Unsupervised
(Brill & Resnik, 94):
(Hindle & Rooth, 93): partially
parsed corpus, lexical
associations over subsets of
(v,n1,p), P=80%,R=80%
Ratnaparkhi dataset
transformation-based
learning, WordNet classes,
P=82%
(Ratnaparkhi & al., 94):
ME, word classes (MI),
P=81.6%
(Collins & Brooks, 95):
back-off, P=84.5%
(Stetina & Makoto, 97):
decision trees, WordNet,
P=88.1%
(Toutanova & al., 04):
morphology, syntax, WordNet,
P=87.5%
(Olteanu & Moldovan, 05):
in context, parser, FrameNet,
Web, SVM, P=92.85%
(Ratnaparkhi, 98): POS
tagged corpus, unambiguous
cases for (v,n1,p), (n1,p,n2),
classifier: P=81.9%
(Pantel & Lin,00): collocation
database, dependency parser,
large corpus (125M words),
P=84.3%
Unsup. state-of-the-art
Related Work: Web Unsup.
(Volk, 00): Altavista, NEAR operator, German, compare
Pr(p|n1) vs. Pr(p|v), P=75%, R=58%
(Volk, 01): Altavista, NEAR operator, German, inflected
queries, Pr(p,n2|n1) vs. Pr(p,n2|v), P=75%, R=85%
(Calvo & Gelbukh, 03): exact phrases, Spanish,
P=91.97%, R=89.5%
(Lapata & Keller,05): Web n-grams, English, Ratnaparkhi
dataset, P in low 70’s
(Olteanu & Moldovan, 05): supervised, English, in context,
parser, FrameNet, Web counts, SVM, P=92.85%
PP-attachment:
Our Approach
Unsupervised
(v,n1,p,n2) quadruples, Ratnaparkhi test set
Google and MSN Search
Exact phrase queries
Inflections: WordNet 2.0
Adding determiners where appropriate
Models:
n-gram association models
Web-derived surface features
paraphrases
Probabilities: Estimation
Using page hits as a proxy for n-gram counts
Pr(w1|w2) = #(w1,w2) / #(w2)
#(w2)
#(w1,w2)
word frequency; query for “w2”
bigram frequency; query for “w1 w2”
Pr(w1,w2|w3) = #(w1,w2,w3) / #(w3)
N-gram models
(i) Pr(p|n1) vs. Pr(p|v)
(ii) Pr(p,n2|n1) vs. Pr(p,n2|v)
I eat/v spaghetti/n1 with/p a fork/n2.
I eat/v spaghetti/n1 with/p sauce/n2.
Pr or # (frequency)
smoothing as in (Hindle & Rooth, 93)
back-off from (ii) to (i)
N-grams unreliable, if n1 or n2 is a pronoun.
MSN Search: no rounding of n-gram estimates
sum
Web-derived Surface Features
Example features
P
R
open the door / with a key verb (100.00%, 0.13%)
open the door (with a key) verb (73.58%, 2.44%)
open the door – with a key verb (68.18%, 2.03%)
open the door , with a key verb (58.44%, 7.09%)
compare
sum
eat Spaghetti with sauce noun (100.00%, 0.14%)
eat ? spaghetti with sauce noun (83.33%, 0.55%)
eat , spaghetti with sauce noun (65.77%, 5.11%)
eat : spaghetti with sauce noun (64.71%, 1.57%)
Summing achieves high precision, low recall.
Paraphrases
v n1 p n2
v n2 n1
v p n2 n1
p n2 * v n1
n1 p n2 v
v PRONOUN p n2
BE n1 p n2
(noun)
(verb)
(verb)
(noun)
(verb)
(noun)
Paraphrases: pattern (1)
(1) v n1 p n2 v n2 n1
Can we turn “n1 p n2” into a noun compound “n2 n1”?
meet/v demands/n1 from/p customers/n2
meet/v the customer/n2 demands/n1
Problem: ditransitive verbs like give
(noun)
gave/v an apple/n1 to/p him/n2
gave/v him/n2 an apple/n1
Solution:
no determiner before n1
determiner before n2 is required
the preposition cannot be to
Paraphrases: pattern (2)
(2) v n1 p n2 v p n2 n1
(verb)
If “p n2” is an indirect object of v, then it
could be switched with the direct object n1.
had/v a program/n1 in/p place/n2
had/v in/p place/n2 a program/n1
Determiner before n1 is required to prevent
“n2 n1” from forming a noun compound.
Paraphrases: pattern (3)
(3) v n1 p n2 p n2 * v n1
(verb)
“*” indicates a wildcard position (up to three
intervening words are allowed)
Looks for appositions, where the PP has
moved in front of the verb, e.g.
I gave/v an apple/n1 to/p him/n2
to/p him/n2 I gave/v an apple/n1
Paraphrases: pattern (4)
(4) v n1 p n2 n1 p n2 v
(noun)
Looks for appositions, where “n1 p n2” has
moved in front of v
shaken/v confidence/n1 in/p markets/n2
confidence/n1 in/p markets/n2 shaken/v
Paraphrases: pattern (5)
(5) v n1 p n2 v PRONOUN p n2 (verb)
pronoun
n1 is a pronoun verb (Hindle&Rooth, 93)
Pattern (5) substitutes n1 with a dative
pronoun (him or her), e.g.
put/v a client/n1 at/p odds/n2
put/v him at/p odds/n2
Paraphrases: pattern (6)
(6) v n1 p n2 BE n1 p n2
(noun)
to be
BE is typically used with a noun attachment
Pattern (6) substitutes v with a form of to be
(is or are), e.g.
eat/v spaghetti/n1 with/p sauce/n2
is spaghetti/n1 with/p sauce/n2
Evaluation
Ratnaparkhi dataset
3097 test examples, e.g.
prepare dinner for family V
shipped crabs from province V
n1 or n2 is a bare determiner: 149 examples
problem for unsupervised methods
left chairmanship of the N
is the of kind N
acquire securities for an N
special symbols: %, /, & etc.: 230 examples
problem for Web queries
buy % for 10 V
beat S&P-down from % V
is 43%-owned by firm N
Results
Smoothing is not
needed on the Web
For prepositions
other then OF.
(of noun
attachment)
Models in bold
are combined in
a majority vote.
Checking directly for...
Simpler but not
significantly
different from 84.3%
(Pantel&Lin,00).
Task 2:
Coordination
Coordination & Problems
(Modified) real sentence:
The Department of Chronic Diseases and Health
Promotion leads and strengthens global efforts to
prevent and control chronic diseases or disabilities
and to promote health and quality of life.
Problems:
boundaries: words, constituents, clauses etc.
interactions with PPs: [health and [quality of life]]
vs. [[health and quality] of life]
or meaning and: chronic diseases or disabilities
ellipsis
NC coordination: ellipsis
Ellipsis
car and truck production
means car production and truck production
No ellipsis
president and chief executive
All-way coordination
Securities and Exchange Commission
NC Coordination: ellipsis
Quadruple (n1,c,n2,h)
Penn Treebank annotations
ellipsis:
(NP car/NN and/CC truck/NN production/NN).
no ellipsis:
(NP (NP president/NN) and/CC (NP chief/NN
executive/NN))
all-way: can be annotated either way
This is a problem a parser must deal with.
Collins’ parser always predicts ellipsis,
but other parsers (e.g. Charniak’s) try to solve it.
Related Work
(Resnik, 99): similarity of form and meaning,
conceptual association, decision tree, P=80%,
R=100%
(Rus & al., 02): deterministic, rule-based bracketing
in context, P=87.42%, R=71.05%
(Chantree & al., 05): distributional similarities from
BNC, Sketch Engine (freqs., object/modifier etc.),
P=80.3%, R=53.8%
(Goldberg, 99): different problem (n1,p,n2,c,n3),
adapts Ratnaparkhi (99) algorithm, P=72%, R=100%
N-gram models
(n1,c,n2,h)
(i) #(n1,h) vs. #(n2,h)
(ii) #(n1,h) vs. #(n1,c,n2)
sum
Surface Features
sum
compare
Paraphrases
n1 c n2 h
n2 c n1 h
n2 h c n1
n1 h c n2 h
n2 h c n1 h
(ellipsis)
(NO ellipsis)
(ellipsis)
(ellipsis)
Paraphrases: Pattern (1)
(1) n1 c n2 h n2 c n1 h
(ellipsis)
Switch places of n1 and n2
bar/n1 and/c pie/n2 graph/h
pie/n2 and/c bar/n1 graph/h
Paraphrases: Pattern (2)
(2) n1 c n2 h n2 h c n1
(NO ellipsis)
Switch places of n1 and n2 h
president/n1 and/c chief/n2 executive/h
chief/n2 executive/h and/c president/n1
Paraphrases: Pattern (3)
h
(3) n1 c n2 h n1 h c n2 h
(ellipsis)
Insert the elided head h
bar/n1 and/c pie/n2 graph/h
bar/n1 graph/h and/c pie/n2 graph/h
Paraphrases: Pattern (4)
h
(4) n1 c n2 h n2 h c n1 h
(ellipsis)
Insert the elided head h, but also switch
n1 and n2
bar/n1 and/c pie/n2 graph/h
pie/n2 graph/h and/c bar/n1 graph/h
(Rus & al.,02) Heuristics
Heuristic 1: no ellipsis
n1=n2
milk/n1 and/c milk/n2 products/h
Heuristic 4: no ellipsis
n1 and n2 are modified by an adjective
Heuristic 5: ellipsis
only n1 is modified by an adjective
We use a
Heuristic 6: no ellipsis
determiner.
only n2 is modified by an adjective
Number Agreement
Introduced by Resnik (93)
(a) n1&n2 agree, but n1&h do not ellipsis;
(b) n1&n2 don’t agree, but n1&h do no ellipsis;
(c) otherwise leave undecided.
Results
428 examples from Penn TB
Bad, compares
bigram to trigram.
Models in bold
are combined in
a majority vote.
Comparable to
other researchers
(but no standard
dataset).
Conclusions & Future Work
Tapping the potential of very large corpora for
unsupervised algorithms
Go beyond n-grams
Surface features
Paraphrases
Results competitive with best unsupervised
Results can rival supervised algorithms’
Future Work
There should be
even more exciting
features on the Web!
other NLP tasks
better evidence combination
The End
Thank you!