Discussion slides

download report

Transcript Discussion slides

Unsupervised Learning of
Narrative Event Chains
Original paper by: Nate Chambers and Dan
in ACL 2008
This presentation for discussion created by:
Peter Clark (Jan 2009)
Disclaimer: these slides are by Peter Clark, not the original
authors, and thus represent a (possibly flawed) interpretation
of the original work!
Why Scripts?
 Essential for making sense of text
 We typically match a narrative with expected “scripts”
 to make sense of what’s happening
 to fill in the gaps, fill in goals and purpose
On November 26, the Japanese attack fleet of 33
warships and auxiliary craft, including 6 aircraft
carriers, sailed from northern Japan for the Hawaiian
Islands. It followed a route that took it far to the north of
the normal shipping lanes. By early morning,
December 7, 1941, the ships had reached their launch
position, 230 miles north of Oahu.
depart → travel → arrive
 Scripts
 Important/essential for NLP
 But: expensive to build
 Can we learn them from text?
“John entered the restaurant. He sat down,
and ordered a meal. He ate…”
Our own (brief) attempt
 Look at next events in 1GB corpus:
"shoot“ is followed by:
("say" 121)
("be" 110)
("shoot" 103)
("wound" 58)
("kill" 30)
("die" 27)
("have" 23)
("tell" 23)
("fire" 15)
("refuse" 15)
("go" 13)
("think" 13)
("carry" 12)
("take" 12)
("come" 11)
("help" 10)
("run" 10)
("be arrested" 9)
("find" 9)
"drive" is followed by:
("drive" 364)
("be" 354)
("say" 343)
("have" 71)
("continue" 47)
("see" 40)
("take" 32)
("make" 29)
("expect" 27)
("go" 24)
("show" 22)
("try" 19)
("tell" 18)
("think" 18)
("allow" 16)
("want" 15)
("come" 13)
("look" 13)
("close" 12)
Some glimmers of hope, but not great…
"fly“ is followed by:
("fly" 362)
("say" 223)
("be" 179)
("have" 60)
("expect" 48)
("allow" 40)
("tell" 33)
("see" 30)
("go" 27)
("take" 27)
("make" 26)
("plan" 24)
("derive" 21)
("want" 19)
("schedule" 17)
("report" 16)
("declare" 15)
("give" 15)
("leave on" 15)
Andrew Gordon (2007)
From [email protected] Thu Sep 27 09:33:04 2007
…Recently I tried to apply language modeling
techniques over event sequences in a billion words of narrative
text extracted from Internet weblogs, and barely exceeded chance
performance on some event-ordering evaluations….
Chambers and Jurafsky
 Main insight:
 Don’t look at all verbs, just look at those mentioning
the “key player” – the protagonist – in the sequence
 Capture some role relationships also:
 Not just “push” → “fall”, but “push X” → “X fall”
“An automatically
learned Prosecution
Chain. Arrows indicate
the before relation.”
 Stage 1:
 find likelihood that one event+protagonist goes with
another (or more) event+protagonist
 NOTE: no ordering info
 e.g., given:
 “X pleaded”, what other event+protagonist occur with
unusually high frequencies?
 → “sentenced X”, “fined X”, “fired X”
 Stage 2:
 order the set of event+protagonist
The Training Data
 Articles in the GigaWord corpus
 For each article:
 find all pairs of events (verbs) which have a shared
 shared argument found by OpenNLP coreference
 includes transitivity (X = Y, Y = Z, → X = Z)
 add each pair to the database
“John entered the restaurant. The waiter came over.
John sat down, and the waiter greeted him….
 events about John: {X enter, X sat, greet X}
 events about the waiter: {X come, X greet}
database of
pairs cooccurring
in the article
X enter, X sat
X enter, greet X
X sat, greet X
X come, X greet
Stage 1
 Given two events with a shared protagonist, do
they occur “unusually often” in a corpus?
“push X” & “X fall”
probability of seeing “push” and “fall”
with particular coreferring arguments
number of times “push” and “fall” have
been seen with these corefererring arguments
number of times any pair of verbs have
been seen with any coreferring arguments
more generally:…
Number(“X event1” AND “X event2”)
Prob(“X event1” AND “X event2”) =
Sumij Number(“X eventi” AND “X eventj”)
PMI (“surprisingness”):…
PMI(“X event1”, “X event2”) = log
Prob(“X event1” AND “X event2”)
Prob (“X event1”) Prob(“X event2”)
= the “surprisingness” that the arg of event1
and event2 are coreferential
 Can generalize:
 PMI: given an event (+ arg), how “unusual” is it to see
another event (+ same arg)?
 Generalization: given N events (+ arg), how “unusual” to
see another event (+ same arg)?
 Thus:
Evaluation: Cloze test
 Fill in the blank…
McCann threw two interceptions early. Toledo pulled McCann
aside and told him he’d start. McCann quickly completed his
first two passes.
X throw
pull X
tell X
X start
X complete
(note: a set, not list)
pull X
tell X
X start
X complete
Cloze task: predict “?”
 69 articles, with >=5 protagonist+event in them
 System produces ~9000 guesses at each “?”
Learning temporal ordering
 Stage 1: add labels to corpus
 Given: verb features (neighboring POS tags, neighboring
axuiliaries and modals, WordNet synsets, etc.)
 Assign: tense, grammatical aspect, aspectual class
 [Aside: couldn’t a parser assign this directly?]
 Using: SVM, trained on labeled data (TimeBank corpus)
 Stage 2: learn before() classifier
 Given: 2 events in a document sharing an argument
 Assign: before() relation
 Using: SVM, trained on labeled data (TimeBank
expanded with transitivity rule
 “X before Y and Y before Z → X before Z”)
 A variety of features used, including whether e1 grammatically
occurs before e2 in the text
Learning temporal ordering (cont)
 Stage 3:
 For all event pairs with shared arg in the main corpus
 e.g., “push X”, “X fall”
 count the number of before(e1,e2) vs. before(e2,e1)
classifications, to get an overall ordering confidence
 Test set: use same 69 documents
 minus 6 which had no ordered events
 Task: for each document
a. manually label the before() relations
b. generate a random ordering
 Can system distinguish real from random order?
 “Coherence” ≈ sum of confidences of before() labels on all
event pairs in document
 Confidence(e1→e2) = log(#before(e1,e2) - #before(e2,e1)
# event+shared
arg in doc:
Not that impressive (?)
Agglomeration and scripts
 How do we get scripts?
 Could take a verb+arg, e.g., “arrest X”
 Then look for the most likely 2nd verb+arg, eg “charge X”
 Then the next most likely verb+arg, given these 2, eg
“indict X”
 etc.
{arrest X}
{arrest X, charge X}
{arrest X, charge X, indict X}
 Then: use ordering algorithm to produce ordering
“Good” examples…
 “Prosecution”
(This was the initial
Agglomeration was
stopped arbitrarily
after 10 events, or
when a cutoff for
node inclusion was
reached (whichever
was first)).
Good examples…
 “Employment”
(dotted lines are
incorrect “before”
 Nate Chambers’ suggested mode of use:
 Given a set of events in a news article
 Predict/fill in the missing events
 → Do we really need scripts?
Many ways of referring to the same entity…
 Less common style:
John went to a restaurant. John sat down. John ate. He paid…
 More common style:
Nagumo's fleet assembled in the remote anchorage
of Tankan Bay in the Kurile Islands and departed in
strictest secrecy for Hawaii on 26 November 1941.
The ships' route crossed the North Pacific and
avoided normal shipping lanes. At dawn 7
December 1941, the Japanese task force had
approached undetected to a point slightly more than
200 miles north of Oahu.
Generally, there
are a lot of
entities doing a lot
of things!
From [email protected] Tue Dec 16 12:48:58 2008
…Even with the protagonist idea, it is still difficult to name the protagonist
himself as many different terms are used. Naming the other non-protagonist
roles is even more sparse. I'm experiencing the same difficulties. My personal
thought is that we should not aim to fill the role with one term, but a set of
weighted terms. This may be a set of related nouns, or even a set of unrelated
nouns with their own preference weights.
Also: many ways of describing the same event!
 Different levels of detail, different viewpoints:
The planes destroyed the ships
The planes dropped bombs, which destroyed the ships
The bombs exploded, destroying the ships
The Japanese destroyed the ships
 Different granularities:
 Planes attacked
 Two waves of planes attacked
 353 dive-bombers and torpedo planes attacked
 Exciting work!
 simple but brilliant insight of “protagonist”
 But
 is really only a first step towards scripts
 mainly learns verb+arg co-associations in a text
 temporal ordering and agglomeration is a post-processing step
 quality of learned results still questionable
 Cloze: needs >1000 guesses before hitting a mentioned, coassociated verb+arg
 nice “Prosecution” script: a special case as most verbs in script are
necessarily specific to Prosecution?
 fluidity of language use (multiple ways of viewing same scene,
multiple ways of referring to same entity) still a challenge
 maybe don’t need to reify scripts (?)
 fill in missing (implied) events on the fly in context-sensitive way