Bootstrapping

Download Report

Transcript Bootstrapping

Bootstrapping
April 3 2007
William Cohen
Prehistory
Karl Friedrich Hieronymus, Freiherr von
Münchhausen (11 May 1720 – 22 February
1797) was a German baron who in his youth
was sent to serve as page to Anthony Ulrich II,
Duke of Brunswick-Lüneburg and later joined
the Russian military. He served until 1750, in
particular taking part in two campaigns against
the Turks. Returning home, Münchhausen
supposedly told a number of outrageous tall
tales about his adventures. The Baron was
born in Bodenwerder and died there as well.
According to the stories, as retold by others, the
Baron's astounding feats included riding
cannonballs, travelling to the Moon, and
escaping from a swamp by pulling himself up
by his own hair. … In later versions he was
using his own boot straps to pull himself
out of the sea. [Wikipedia]
Prehistory
“Bob Wilson is desperately trying to finish his
doctoral thesis and has locked himself in his room
in a marathon attempt to do so. His typewriter jams,
and as he unjams it he hears someone say "Don't
bother, it's hogwash anyway." The thesis, in fact,
deals with time travel. The interloper is a man who
seems strangely familiar, and might be recognizable
without the two-day growth of beard and the black
eye. …”
“In computing, bootstrapping refers to a process
where a simple system activates another more
complicated system that serves the same purpose.
It is a solution to the Chicken-and-egg problem of
starting a certain system without the system already
functioning. The term is most often applied to the
process of starting up a computer, in which a
mechanism is needed to execute the software
program that is responsible for executing software
programs …” [Wikipedia]
Some more recent history - 1
Idea: write some specific patterns that
indicate A is a kind of B:
1. … such NP as NP (“at such schools
as CMU, students rarely need
extensions”)
[Coling 1992]
Results: 8.6M words of Grolier’s
encyclopedia  7067 pattern
instances  152 relations
Many were not in WordNet.
2. NP, NP, or other NP (“William,
Carlos or other machine learning
professors”)
3. NP including NP (“struggling teams
including the Pirates”)
4. NP, especially NP (prestigious
conferences, especially NIPS)
Some history – 2a
Idea: exploit “pattern/relation duality”:
1. Start with some seed instances of
(author,title) pairs (“Isaac Asimov”,
“The Robots of Dawn”)
2. Look for occurrences of these pairs
on the web.
[some workshop, 1998]
Unlike Hearst, Brin learned the patterns;
and learned very high-precision, easyto-match patterns.
Result: 24M web pages + 5 books 
199 occurrences  3 patterns  4047
occurrences + 5M pages  3947
occurrences  105 patterns  …
15,257 books *with some manual tweaks
3. Generate patterns that match the
seeds.
- URLprefix, prefix, middle, suffix
4. Extract new (author, title) pairs that
match the patterns.
5. Go to 2.
Some history – 2b
Instances
Occurrences
Patterns
Idea: exploit “pattern/relation duality”:
1. Start with some seed instances of
(author,title) pairs (“Isaac Asimov”,
“The Robots of Dawn”)
2. Look for occurrences of these pairs
on the web.
3. Generate patterns that match the
seeds.
- URLprefix, prefix, middle, suffix
Result: 24M web pages + 5 books 
199 occurrences  3 patterns  4047
occurrences + 5M pages  3947
occurrences  105 patterns  …
15,257 books *with some manual tweaks
4. Extract new (author, title) pairs that
match the patterns.
5. Go to 2.
Some history – 3
[COLT 98]
Some history – 3b
Instances
Occurrences
Patterns
How to filter out “bad” instances,
occurrences, patterns?
Instances/Occurrences
Patterns
Bootstrapping
Hearst ‘92
Deeper linguistic features, free text…
BM’98
Brin’98
Learning, semi-supervised learning, dual feature spaces…
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst ‘92
Deeper linguistic features, free text…
Collins & Singer ‘99
BM’98
Brin’98
Boosting-based co-train method using content &
context features; context based on Collins’ parser;
learn to classify three types of NE
Learning, semi-supervised learning, dual feature spaces…
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst ‘92
Deeper linguistic features, free text…
Riloff & Jones ‘99
Collins & Singer ‘99
BM’98
Brin’98
Hearst-like patterns, Brin-like
bootstrapping (+ “meta-level”
bootstrapping) on MUC data
Learning, semi-supervised learning, dual feature spaces…
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst ‘92
Deeper linguistic features, free text…
Riloff & Jones ‘99
Collins & Singer ‘99
BM’98
Learning, semi-supervised learning, dual feature spaces…
Cucerzan & Yarowsky ‘99
Brin’98
EM like co-train method with
context & content both defined
by character-level tries
Scalability, surface patterns, use of web crawlers…
Bootstrapping
Hearst ‘92
Deeper linguistic features, free text…
Riloff & Jones ‘99
Collins & Singer ‘99
…
Stevenson &
Greenwood
2005
Rosenfeld
and Feldman
BM’98
2006
Learning, semi-supervised learning, dual feature spaces…
Etzioni et
al 2005
…
Cucerzan & Yarowsky ‘99
Brin’98
Scalability, surface patterns, use of web crawlers…
De-emphasize duality, focus on
distance between patterns.
Stevenson & Greenwood
Instances/Occurrences
Patterns
Patterns
Patternpatternfrom is
semantic
similarity
(Wordnet)
Flow from patternpattern depends
on empirical
similarity (i.e.
overlapping
occurrences in
corpus)
Bootstrapping
Hearst ‘92
Deeper linguistic features, free text…
Riloff & Jones ‘99
Collins & Singer ‘99
…
Stevenson &
Greenwood
2005
Rosenfeld
and Feldman
BM’98
2006
Learning, semi-supervised learning, dual feature spaces…
Etzioni et
al 2005
…
Cucerzan & Yarowsky ‘99
Brin’98
Scalability, surface patterns, use of web crawlers…
Clever idea for learning
relation patterns & strong
experimental results
Rosenfeld & Feldman
• Instances  Occurrences as
before.
• Vary “positive” occurrences to
get near-miss “negative”
occurrences, using asymmetry,
disjointness, etc.
• Learn patterns in a
(moderately) expressive but
easy-to-match language (NPs
from OpenNLP).
Know It All
Architecture
Set of predicates to consider + two names for each
~= [H92]
Architecture
Bootstrapping - 1
1. Submit the queries & apply the rules  initial seeds.
2. Evaluate each seed with each discriminator U: e.g., compute
PMI stats like: |hits(“city Boston”)| / |hits(“Boston”)|
3. Take the top seeds from each class and call them POSITIVE
and use disjointness, etc to find NEGATIVE seeds.
4. Train a NaiveBayes classifier using thresholded U’s as features.
Bootstrapping - 2