Transcript document

Proposition Bank:
a resource of
predicate-argument relations
Martha Palmer, Dan Gildea, Paul Kingsbury
University of Pennsylvania
February 26, 2002
ACE PI Meeting, Fairfield Inn, MD
10/9/01
PropBank
1
Outline





Overview
Status Report
Outstanding Issues
Automatic Tagging – Dan Gildea
Details – Paul Kingsbury
• Frames files
• Annotator issues
• Demo
10/9/01
PropBank
2
Proposition Bank:
Generalizing from Sentences to Propositions
Powell met Zhu Rongji
battle
wrestle
join
debate
Powell and Zhu Rongji met
Powell met with Zhu Rongji
consult
Proposition: meet(Powell, Zhu Rongji)
Powell and Zhu Rongji had
a meeting
meet(Somebody1, Somebody2)
...
When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu)
10/9/01
discuss([Powell, Zhu], return(X, plane))
PropBank
3
Penn English Treebank






1.3 million words
Wall Street Journal and other sources
Tagged with Part-of-Speech
Syntactically Parsed
Widely used in NLP community
Available from Linguistic Data Consortium
10/9/01
PropBank
4
A TreeBanked Sentence
(S (NP-SBJ Analysts)
(VP have
(VP been
VP
(VP expecting
(NP (NP a GM-Jaguar pact)
have VP
(SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1)
NP-SBJ
been VP
(VP would
Analysts
(VP give
expecting NP
(NP the U.S. car maker)
SBAR
(NP (NP an eventual (ADJP 30 %) stake)
NP
S
(PP-LOC in (NP the British
a GM-Jaguar WHNP-1
company))))))))))))
VP
pact
that NP-SBJ
VP
*T*-1 would
NP
give
PP-LOC
NP
Analysts have been expecting a GM-Jaguar
NP
the US car
pact that would give the U.S. car maker an
NP
an eventual
maker
eventual 30% stake in the British company.
in
the British
30% stake
company
S
10/9/01
PropBank
5
The same sentence, PropBanked
(S Arg0 (NP-SBJ Analysts)
(VP have
(VP been
(VP expecting
Arg1
Arg1 (NP (NP a GM-Jaguar pact)
(SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1)
(VP would
a GM-Jaguar
(VP give
pact
Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %)
stake)
(PP-LOC in (NP the British
Arg0
company))))))))))))
that would give
have been expecting
Arg0
Analysts
Arg1
*T*-1
Arg2
the US car
maker
10/9/01
an eventual 30% stake in the
British company
expect(Analysts, GM-J pact)
give(GM-J pact, US car maker, 30% stake)
PropBank
6
English PropBank
 1M words of Treebank over 2 years, May’01-03
 New semantic augmentations
• Predicate-argument relations for verbs
• label arguments: Arg0, Arg1, Arg2, …
• First subtask, 300K word financial subcorpus
(12K sentences, 29K+ predicates)
 Spin-off: Guidelines (necessary for annotators)
• English lexical resource – FRAMES FILES
• 3500+ verbs with labeled examples, rich semantics
 http://www.cis.upenn.edu/~ace/
10/9/01
PropBank
7
English PropBank – Current Status

Frames files
• 742 verb lemmas (includes phrasal variants - 932)
• 363/899 VerbNet semi-automatic expansions (subtask/PB)

First subtask: 300K financial subcorpus
—22,595K unique predicates annotated out of 29K, (80%)
– 6K+ remaining (7 weeks, 1000@week, first pass)
—1005 verb lemmas out of 1700+ (59%)
– 700 remaining (3.5 months, 200@month)

PropBank, (including some of Brown?)
• 34,437 predicates annotated out of 118K, (29%)
• 1904 (1005 + 899) verb lemmas out of 3500, (54%)
10/9/01
PropBank
8
Projected delivery dates
 Financial subcorpus
—
alpha release – December, 2001
—
beta release – June, 2002
—
adjudicated release – Dec, 2002
 Propbank
—
alpha release – December, 2002
—
beta release – Spring, 2003
10/9/01
PropBank
9
English PropBank - Status
 Sense tagging
• 200+ verbs with multiple rolesets
—sense tag this summer with undergrads using NSF funds
 Still need to address
• 3 usages of "have”: imperative, possessive, auxiliary
• be, become: predicate adjectives, predicate nominals
10/9/01
PropBank
10
Automatic Labeling of Semantic
Relations
Features:
 Predicate
 Phrase Type
 Parse Tree Path
 Position (Before/after predicate)
 Voice (active/passive)
 Head Word
10/9/01
PropBank
11
Example with Features
10/9/01
PropBank
12
Labelling Accuracy-Known Boundaries
Parses
Framenet PropBank
Gold Standard
Automatic
82.0
77.0
PropBank >
10 instances
83.1
73.6
79.6
Accuracy of semantic role prediction for known boundaries--the
system is given the constituents to classify.
Framenet examples (training/test) are handpicked to be unambiguous.
10/9/01
PropBank
13
Labelling Accuracy – Unknown
Boundaries
Parses
Framenet
Precision
PropBank
Recall Precision
Gold Standard
Automatic
64.6
61.2
Recall
71.1
64.4
57.7
50.0
Accuracy of semantic role prediction for unknown boundaries--the
system must identify the constituents as arguments and give them the
correct roles.
10/9/01
PropBank
14
Complete Sentence
Analysts have been expecting a GM-Jaguar pact tha
*T*-1 would give the U.S. car maker an eventual 30%
stake in the British company and create joint venture
that *T*-2 would produce an executive-model range
of cars.
expect(analysts, pact)
give(pact, car_maker,stake)
create(pact,joint_ventures)
produce(joint_ventures,range_of_cars)
10/9/01
PropBank
15
Guidelines: Frames Files
 Created manually - Paul Kingsbury
—new framer: Olga Babko-Malaya, (Ph.D.,Rugters,
Linguistics)
 Refer to VerbNet, WordNet and Framenet
 Currently in place for 787/986 verbs
 Use "semantic role glosses" unique to each verb
(map to Arg0, Arg1 labels appropriate to class)
10/9/01
PropBank
16
Frames Example: expect
Roles:
Arg0: expecter
Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines
in interest rates.
Arg0:
REL:
Arg1:
10/9/01
Portfolio managers
expect
further declines in interest rates
PropBank
17
Frames example: give
Roles:
Arg0: giver
Arg1: thing given
Arg2: entity given to
Example:
double object
The executives gave the chefs a standing ovation.
Arg0:
The executives
REL:
gave
Arg2:
the chefs
Arg1:
a standing ovation
10/9/01
PropBank
18
How are arguments numbered?
 Examination of example sentences
 Determination of required / highly preferred
elements
 Sequential numbering, Arg0 is typical first
argument, except
O
O
10/9/01
ergative/unaccusative verbs (shake example)
Arguments mapped for "synonymous" verbs
PropBank
19
Additional tags
(arguments or adjuncts?)
 Variety of ArgM’s (Arg#>4):
10/9/01
•
TMP - when?
•
LOC - where at?
•
DIR - where to?
•
MNR - how?
•
PRP -why?
•
REC - himself, themselves, each other
•
PRD -this argument refers to or modifies another
•
ADV -others
PropBank
20
Ergative/Unaccusative Verbs: rise
Roles
Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16 billion.
*Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could have
used ArgM-Source, ArgM-Goal. Arbitrary distinction.
10/9/01
PropBank
21
Synonymous Verbs: add in sense rise
Roles:
Arg1 = Logical subject, patient, thing
rising/gaining/being added to
Arg2 = EXT, amount risen
Arg4 = end point
The Nasdaq composite index added 1.01 to 456.6 on
paltry volume.
10/9/01
PropBank
22
Phrasal Verbs
 Put together
 Put in
 Put off
 Put on
 Put out
 Put up
 ...
Accounts for additional 200 "verbs"
10/9/01
PropBank
23
Frames: Multiple Rolesets

Rolesets are not necessarily consistent between different
senses of the same verb
O Verb with multiple senses can have multiple frames, but not
necessarily

Roles and mappings onto argument labels are consistent
between different verbs that share similar argument
structures, Similar to Framenet
O Levin / VerbNet classes
O http://www.cis.upenn.edu/~dgildea/Verbs/

Out of the 787 most frequent verbs:
O 1 Roleset - 521
O 2 rolesets - 169
O 3+ rolesets - 97 (includes light verbs)
10/9/01
PropBank
24
Semi-automatic expansion of Frames
 Experimenting with semi-automatic expansion
 Find unframed members of Levin class in
VerbNet--inherit” frames from other member
 787 verbs manually framed
• Can expand to 1200+ using VerbNet
• Will need hand correction
 First experiment, automatic expansion provided
90% coverage of data
10/9/01
PropBank
25
More on Automatic Expansion
Destroy:
Arg0: destroyer
Arg1: thing destroyed
Arg2: instrument of destruction
Verbnet class Destroy-44:
annihilate, blitz, decimate, demolish, destroy,
devastate, exterminate, extirpate, obliterate,
ravage, raze, ruin, waste, wreck
10/9/01
PropBank
26
What a Waste
Waste:
Arg0: destroyer
Arg1: thing destroyed
Arg2: instrument of destruction
He didn’t waste any time distancing himself from
his former boss
Arg0: He
Arg1: any time
Arg2 =? distancing himself...
•
10/9/01
PropBank
27
Trends in Argument Numbering
 Arg0 = agent
 Arg1 = direct object / theme / patient
 Arg2 = indirect object / benefactive / instrument /
attribute / end state
 Arg3 = start point / benefactive / instrument /
attribute
 Arg4 = end point
10/9/01
PropBank
28
Morphology
 Verbs also marked for tense/aspect/voice
O
O
O
O
O
Passive/Active
Perfect/Progressive
Third singular (is has does was)
Present/Past/Future
Infinitives/Participles/Gerunds/Finites
 Modals and negation marked as ArgMs
10/9/01
PropBank
29
Annotation procedure
 Extraction of all sentences with given verb
 First pass: Automatic tagging (Joseph Rosenzweig)
•
http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon
 Second pass: Double blind hand correction
• Variety of backgrounds
• Less syntactic training than for treebanking
 Tagging tool highlights discrepancies
 Third pass: Solomonization (adjudication)
10/9/01
PropBank
30
Inter-Annotator Agreement
100
90
Percentage Agreement
80
70
60
50
40
30
20
10
0
10/9/01
PropBank
31
Annotator vs. Gold Standard
quote
bid
cost
return
100
Percentage Agreement
95
hit
result
boost
keep know
appeal
announce decline
accept
affect
earn
announce
accept
resign
decline
90
85
80
advertise
75
70
Kate
65
advertise
Darren
Brian
60
# of Tokens
10/9/01
PropBank
32
Financial Subcorpus Status
 1005 verbs framed (700+ to go)
O
(742 + 363 VerbNet siblings)
 535 verbs first-passed
O 22,595 unique tokens
O Does not include ~3000 tokens tagged for
Senseval
 89 verbs second-passed
O
7600+ tokens
 42 verbs solomonized
O 2890 tokens
10/9/01
PropBank
33
Throughput
 Framing: approximately 25 verbs/week
• Olga will also start framing; joint up to 50 verbs/wk
 Annotation: approximately 50 predicates/hour
• 20 hours of annotation a week, 1000 predicates/wk
 Solomonization: approximately 1 hour per verb,
but will speed up with lower frequency verbs.
10/9/01
PropBank
34
Summary




Predicate-argument structure labels are arbitrary to a
certain degree, but still consistent, and generic enough
to be mappable to particular theoretical frameworks
Automatic tagging as a first pass makes the task feasible
Agreement and accuracy figures are reassuring
Financial subcorpus is 80% complete, beta-release June
10/9/01
PropBank
35
Solomonization
Source tree: Intel told analysts that the company will resume
shipments of the chips within two to three weeks .
*** Kate said:
arg0 : Intel
arg1 : the company will resume shipments of the chips within
two to three weeks
arg2 : analysts
*** Erwin said:
arg0 : Intel
arg1 : that the company will resume shipments of the chips
within two to three weeks
arg2 : analysts
10/9/01
PropBank
36
Solomonization
Such loans to Argentina also remain classified as non-accruing,
*TRACE*-1 costing the bank $ 10 million *TRACE*-*U* of
interest income in the third period.
*** Kate said:
arg1 : *TRACE*-1
arg2 : $ 10 million *TRACE*-*U* of interest income
arg3 : the bank
argM-TMP : in the third period
*** Erwin said:
arg1 : *TRACE*-1 -> Such loans to Argentina
arg2 : $ 10 million *TRACE*-*U* of interest income
arg3 : the bank
argM-TMP : in the third period
10/9/01
PropBank
37
Solomonization
Also , substantially lower Dutch corporate tax rates helped the
company keep its tax outlay flat relative to earnings growth.
*** Kate said:
arg0 : the company
arg1 : its tax outlay
arg3-PRD : flat
argM-MNR : relative to earnings growth
*** Katherine said:
arg0 : the company
arg1 : its tax outlay
arg3-PRD : flat
argM-ADV : relative to earnings growth
10/9/01
PropBank
38