Filtered Ranking for Bootstrapping in Event Extraction

Download Report

Transcript Filtered Ranking for Bootstrapping in Event Extraction

FILTERED RANKING
FOR BOOTSTRAPPING
IN EVENT EXTRACTION
Shasha Liao
Ralph Grishman
@New York University
CONTENT
Introduction
 Related work
 Ranking methods in bootstrapping
 System description
 Experiment
 Conclusion

INTRODUCTION

The goal of event extraction is to identify instances of a
class of events, including its occurrence and arguments.

In this paper, we focus on identify the occurrence of an
event

Annotating large corpora to train supervised event
extractors is expensive

Semi-supervised methods are trained from a small
seed set and an unannotated corpus

Semi-supervised methods can greatly reduce human
labor.
INTRODUCTION
 Most
semi-supervised event extractors seek to
learn sets of patterns
 Patterns
typically consist of a predicate and some
lexical or semantic constraints on its arguments.
 Such

patterns indicate that there is an event
For example: “ORG appointed PER as the vice
president…”
 An
effective semi-supervised extractor should have
good performance over a range of extraction tasks
and corpora.
FLOW CHART

A typical bootstrapping approach
Untagged
Corpus
Exit
Yes
New Patterns
Seeds
Stop?
No
Pattern
Ranking Function
RELATED WORK

Document-centric method





Riloff (1996)
Yangarber et al. (2000)
Surdeanu et al. (2006)
Patwardhan and Riloff (2007)
Similarity-centric method
Stevenson and Greenwood (2005) (S&G)
 Greenwood and Stevenson (2006)

RANKING METHODS IN
BOOTSTRAPPING

Document-centric method

Find patterns with high frequency in relevant documents and low frequency
in irrelevant documents.

Good for extracting patterns for a scenario, which involve related events
(hiring and firing, attacks and injuries).


Corpus selection is quite important.
Similarity-centric method

Find patterns with high lexical similarities.

Good for extracting patterns of the same event type

No extra corpus is needed, although you can use one

Problem of polysemy in computing lexical similarities
RANKING METHODS IN
BOOTSTRAPPING

Our assumption is more restrictive:

patterns that appear in relevant documents and are lexically
similar are most likely to be relevant.

We limit the effect of ambiguous patterns by narrowing the
search to relevant documents

We limit irrelevant patterns in relevant documents by word
similarity restriction.

Many combinations can be possible, and we propose one using
the word similarity as a filter.
DocumentRanking( p) SimilarityRanking( p)  t
Filter( p)  
0
otherwise

SYSTEM DESCRIPTION

Pre-processing:


Tokenization, stemming, name tagging, semantic labeling
GLARF – logical grammatical and predicate-argument
representation

SURFACE


LOGIC1


grammatical logical role, regularize phenomena like passive, relative clauses, etc.
LOGIC2



from parse tree
predicate-argument role, corresponding to Propbank & Nombank
Generally “arg0” for SBJ (agent), and “arg1” for OBJ (patient)
John is hit by Tom’s brother.
<Arg1 hit John>
 <Arg0 hit brother>
 <T-pos brother Tom>

 <Arg1 hit PER>
 <T-pos brother PER>
SYSTEM DESCRIPTION
 Document-based
ranking
 Patterns in seed set have precision scores of 1,
other patterns have precision scores of 0.
Rel(d)  1  p K(d)(1  Prec(p))
Rel (d)
i
Prec i1(p) 

Sup(p) 
dH(p)
| H(p) |
 Rel(d)
d H(p)



Sup(p)
RankFun Yangarber(p) 
* logSup(p)
H
H(p) isthe set of documents which contain pattern p.
K(d) is the set of accepted patterns document d.

SYSTEM DESCRIPTION

Pattern similarity

For two words, we use the Information Content (IC) from
WordNet (same as S&G 2005)

S&G only focus on patterns headed by verbs, we include verbs,
nouns and adjectives

They only record the subject and object to a verb, we record all
argument relations between verbs, nouns, and adjectives

We only use predicate and one constraint (we do not do multiconstraint patterns currently)
Sim( p1, p2 )
  * Sim( f1, f 2 )   * Sim(r1,r2 ) * Sim(a1,a2 )
SYSTEM FLOW CHART

Our process follows Yangarber, while incorporating
word similarity into the pattern ranker.
Untagged
Corpus
Seeds
New Patterns
Word
Similarity
Pre-Processor
Document
Ranking Function
Exit
Yes
Stop?
No
Pattern
Ranking
Function
EXPERIMENTS
-DATA

MUC-6 Evaluation

Task: hiring and firing of executives

Bootstrapping data: Reuters corpus (Rose et al. 2002)



Evaluation data: MUC-6


Preselected, 6000 documents
half relevant and half irrelevant
200 documents
ACE Evaluation

Task: multiple elementary event types, like attack, die, hire

Bootstrapping data : Agence France Press (AFP) from Gigaword
corpus


Non-preselected, 14,171 documents
Evaluation data: ACE 2005

589 documents
EXPERIMENTS
-MUC EVALUATION

Filtered ranking is better in
performance.



metric: F-measure of finding
relevant sentences
Our conclusion is different from
S&G’s experiment, why?
Does corpus matter?
 Reuters (6,000)
 WSJ (18,734)
 Gigaword (14,171)
 Is
this conclusion general?
EXPERIMENTS
-ACE 2005 EVALUATION

Three event types to be tested




Die
Attack
Start-Position
Two kinds of evaluations

Sentence level


Word level


If there is a pattern matching in sentence s, tag s as relevant; otherwise,
irrelevant.
If the pattern matches a trigger word, it is correct; otherwise, incorrect.
Comparison to a simple supervised method

For training, for every pattern, we count how many times it contains
an event trigger and how many times it does not. If more than 50% of
the time it contains an event trigger, we treat it as a positive pattern.

We did a 5-fold cross-validation on the ACE 2005 data, report the
average results.
EXPERIMENTS
-ACE 2005 EVALUATION
 Sentence
 Word
level evaluation
level evaluation
CONCLUSIONS

We propose a new ranking method in bootstrapping for
event extraction

This new method can block some irrelevant patterns coming
from relevant documents

This new method, by preferring patterns from relevant
documents, can eliminate some lexical ambiguity.

Experiments show that this new ranking method performs
better than previous ranking methods and is more stable
across different corpora.