Discussion slides
Download
Report
Transcript Discussion slides
Unsupervised Learning of
Narrative Event Chains
Original paper by: Nate Chambers and Dan
Jurafsky
in ACL 2008
This presentation for discussion created by:
Peter Clark (Jan 2009)
Disclaimer: these slides are by Peter Clark, not the original
authors, and thus represent a (possibly flawed) interpretation
of the original work!
Why Scripts?
Essential for making sense of text
We typically match a narrative with expected “scripts”
to make sense of what’s happening
to fill in the gaps, fill in goals and purpose
On November 26, the Japanese attack fleet of 33
warships and auxiliary craft, including 6 aircraft
carriers, sailed from northern Japan for the Hawaiian
Islands. It followed a route that took it far to the north of
the normal shipping lanes. By early morning,
December 7, 1941, the ships had reached their launch
position, 230 miles north of Oahu.
depart → travel → arrive
Scripts
Important/essential for NLP
But: expensive to build
Can we learn them from text?
“John entered the restaurant. He sat down,
and ordered a meal. He ate…”
enter
sit
order
eat
?
Our own (brief) attempt
Look at next events in 1GB corpus:
"shoot“ is followed by:
("say" 121)
("be" 110)
("shoot" 103)
("wound" 58)
("kill" 30)
("die" 27)
("have" 23)
("tell" 23)
("fire" 15)
("refuse" 15)
("go" 13)
("think" 13)
("carry" 12)
("take" 12)
("come" 11)
("help" 10)
("run" 10)
("be arrested" 9)
("find" 9)
"drive" is followed by:
("drive" 364)
("be" 354)
("say" 343)
("have" 71)
("continue" 47)
("see" 40)
("take" 32)
("make" 29)
("expect" 27)
("go" 24)
("show" 22)
("try" 19)
("tell" 18)
("think" 18)
("allow" 16)
("want" 15)
("come" 13)
("look" 13)
("close" 12)
Some glimmers of hope, but not great…
"fly“ is followed by:
("fly" 362)
("say" 223)
("be" 179)
("have" 60)
("expect" 48)
("allow" 40)
("tell" 33)
("see" 30)
("go" 27)
("take" 27)
("make" 26)
("plan" 24)
("derive" 21)
("want" 19)
("schedule" 17)
("report" 16)
("declare" 15)
("give" 15)
("leave on" 15)
Andrew Gordon (2007)
From [email protected] Thu Sep 27 09:33:04 2007
…Recently I tried to apply language modeling
techniques over event sequences in a billion words of narrative
text extracted from Internet weblogs, and barely exceeded chance
performance on some event-ordering evaluations….
Chambers and Jurafsky
Main insight:
Don’t look at all verbs, just look at those mentioning
the “key player” – the protagonist – in the sequence
Capture some role relationships also:
Not just “push” → “fall”, but “push X” → “X fall”
“An automatically
learned Prosecution
Chain. Arrows indicate
the before relation.”
Approach
Stage 1:
find likelihood that one event+protagonist goes with
another (or more) event+protagonist
NOTE: no ordering info
e.g., given:
“X pleaded”, what other event+protagonist occur with
unusually high frequencies?
→ “sentenced X”, “fined X”, “fired X”
Stage 2:
order the set of event+protagonist
The Training Data
Articles in the GigaWord corpus
For each article:
find all pairs of events (verbs) which have a shared
argument
shared argument found by OpenNLP coreference
includes transitivity (X = Y, Y = Z, → X = Z)
add each pair to the database
“John entered the restaurant. The waiter came over.
John sat down, and the waiter greeted him….
events about John: {X enter, X sat, greet X}
events about the waiter: {X come, X greet}
database of
pairs cooccurring
in the article
X enter, X sat
X enter, greet X
X sat, greet X
X come, X greet
Stage 1
Given two events with a shared protagonist, do
they occur “unusually often” in a corpus?
“push X” & “X fall”
probability of seeing “push” and “fall”
with particular coreferring arguments
=
number of times “push” and “fall” have
been seen with these corefererring arguments
number of times any pair of verbs have
been seen with any coreferring arguments
more generally:…
Number(“X event1” AND “X event2”)
Prob(“X event1” AND “X event2”) =
Sumij Number(“X eventi” AND “X eventj”)
PMI (“surprisingness”):…
PMI(“X event1”, “X event2”) = log
Prob(“X event1” AND “X event2”)
Prob (“X event1”) Prob(“X event2”)
= the “surprisingness” that the arg of event1
and event2 are coreferential
Can generalize:
PMI: given an event (+ arg), how “unusual” is it to see
another event (+ same arg)?
Generalization: given N events (+ arg), how “unusual” to
see another event (+ same arg)?
Thus:
set
Evaluation: Cloze test
Fill in the blank…
McCann threw two interceptions early. Toledo pulled McCann
aside and told him he’d start. McCann quickly completed his
first two passes.
X throw
pull X
tell X
X start
X complete
(note: a set, not list)
?
pull X
tell X
X start
X complete
Cloze task: predict “?”
Results:
69 articles, with >=5 protagonist+event in them
System produces ~9000 guesses at each “?”
Learning temporal ordering
Stage 1: add labels to corpus
Given: verb features (neighboring POS tags, neighboring
axuiliaries and modals, WordNet synsets, etc.)
Assign: tense, grammatical aspect, aspectual class
[Aside: couldn’t a parser assign this directly?]
Using: SVM, trained on labeled data (TimeBank corpus)
Stage 2: learn before() classifier
Given: 2 events in a document sharing an argument
Assign: before() relation
Using: SVM, trained on labeled data (TimeBank
expanded with transitivity rule
“X before Y and Y before Z → X before Z”)
A variety of features used, including whether e1 grammatically
occurs before e2 in the text
Learning temporal ordering (cont)
Stage 3:
For all event pairs with shared arg in the main corpus
e.g., “push X”, “X fall”
count the number of before(e1,e2) vs. before(e2,e1)
classifications, to get an overall ordering confidence
Evaluation
Test set: use same 69 documents
minus 6 which had no ordered events
Task: for each document
a. manually label the before() relations
b. generate a random ordering
Can system distinguish real from random order?
“Coherence” ≈ sum of confidences of before() labels on all
event pairs in document
Confidence(e1→e2) = log(#before(e1,e2) - #before(e2,e1)
# event+shared
arg in doc:
Not that impressive (?)
Agglomeration and scripts
How do we get scripts?
Could take a verb+arg, e.g., “arrest X”
Then look for the most likely 2nd verb+arg, eg “charge X”
Then the next most likely verb+arg, given these 2, eg
“indict X”
etc.
{arrest X}
↓
{arrest X, charge X}
↓
{arrest X, charge X, indict X}
↓
…
Then: use ordering algorithm to produce ordering
“Good” examples…
“Prosecution”
(This was the initial
seed.
Agglomeration was
stopped arbitrarily
after 10 events, or
when a cutoff for
node inclusion was
reached (whichever
was first)).
Good examples…
“Employment”
(dotted lines are
incorrect “before”
relations)
Nate Chambers’ suggested mode of use:
Given a set of events in a news article
Predict/fill in the missing events
→ Do we really need scripts?
Many ways of referring to the same entity…
Less common style:
John went to a restaurant. John sat down. John ate. He paid…
More common style:
Nagumo's fleet assembled in the remote anchorage
of Tankan Bay in the Kurile Islands and departed in
strictest secrecy for Hawaii on 26 November 1941.
The ships' route crossed the North Pacific and
avoided normal shipping lanes. At dawn 7
December 1941, the Japanese task force had
approached undetected to a point slightly more than
200 miles north of Oahu.
Generally, there
are a lot of
entities doing a lot
of things!
From [email protected] Tue Dec 16 12:48:58 2008
…Even with the protagonist idea, it is still difficult to name the protagonist
himself as many different terms are used. Naming the other non-protagonist
roles is even more sparse. I'm experiencing the same difficulties. My personal
thought is that we should not aim to fill the role with one term, but a set of
weighted terms. This may be a set of related nouns, or even a set of unrelated
nouns with their own preference weights.
Also: many ways of describing the same event!
Different levels of detail, different viewpoints:
The planes destroyed the ships
The planes dropped bombs, which destroyed the ships
The bombs exploded, destroying the ships
The Japanese destroyed the ships
Different granularities:
Planes attacked
Two waves of planes attacked
353 dive-bombers and torpedo planes attacked
Summary
Exciting work!
simple but brilliant insight of “protagonist”
But
is really only a first step towards scripts
mainly learns verb+arg co-associations in a text
temporal ordering and agglomeration is a post-processing step
quality of learned results still questionable
Cloze: needs >1000 guesses before hitting a mentioned, coassociated verb+arg
nice “Prosecution” script: a special case as most verbs in script are
necessarily specific to Prosecution?
fluidity of language use (multiple ways of viewing same scene,
multiple ways of referring to same entity) still a challenge
maybe don’t need to reify scripts (?)
fill in missing (implied) events on the fly in context-sensitive way