OntoNotes - Verbs Index

Download Report

Transcript OntoNotes - Verbs Index

NSF-ULA
Sense tagging and Eventive
Nouns
Martha Palmer, Miriam Eckert, Jena D. Hwang,
Susan Windisch Brown, Dmitriy Dligach, Jinho
Choi, Nianwen Xue
University of Colorado
Departments of Linguistics and Computer
Science
Institute for Cognitive Science
1
OntoNotes
http://www.bbn.com/NLP/OntoNotes
Participating Sites:
BBN
University of Colorado
University of Pennsylvania
USC/ISI
2
OntoNotes goals
 Develop a skeletal representation of the literal meaning of
sentences
– Add to a frame-based (PropBank) representation of predicates and
their arguments:
• Referring expressions and the textual phrases they refer to
• Terms disambiguated by coarse-grained word sense in an ontology
– Encodes the core, skeletal meaning
– Moves away from strings to terms that a reasoning system can use
 Find a “sweet spot” in space of
– Inter-tagger agreement
– Productivity
– Depth of representation
3
Text
Treebank
Word Sense
wrt Ontology
PropBank
OntoNotes
Annotated Text
Co-reference
Creating a Sense Inventory that
Supports High Quality Annotation
 A large scale annotation effort as part of the OntoNotes
project
 Two Steps
– Grouping subtle, fine-grained WordNet senses into coherent
semantic sense groups based on syntactic and semantic
criteria
For Example:
WordNet Sense1: I called my son David
WordNet Sense 12: You can call me Sir
are grouped together
– Annotation
4
Example Grouping: Order
Group
Characteristics
Examples
1 “give a command”
NP1[+human] ORDER
NP2[+animate] to V
The victim says that
the owner ordered the
dogs to attack.
where NP1 has some
authority over NP2
2 “request something
to be made, supplied,
or delivered"
3 “organize”
NP1[+human] ORDER
NP2
I just ordered pizza
from panhandle pizza.
I ordered the papers
before the meeting,
–5
5
Annotation Process
6

Verb sense groups are created based on WordNet senses and online resources (VerbNet, PropBank, FrameNet, dictionaries)

Newly created verb sense groups are subject to sampleannotation

Verbs higher than 90% ITA (or 85% after regrouping) go to actual
annotation

Verbs with less that 90% ITA are regrouped and sent back into
sample-annotation tasks.

Regroupings and resample annotations are not done by the
original grouper and taggers.

Verbs that complete actual annotation are adjudicated
The Grouping and Annotation Process
7
WSD with English OntoNotes Verbs
 Picked 217 sense group annotated verbs with 50+
instances each (out of 1300+ verbs)
– 35K instances total (almost half the data)
– WN polysemy range: 59 to 2; Coarse polysemy range: 16
to 2
– Test: 5-fold cross-validation
– Automatic performance approaches human
performance!
8
WN Avg.
Polysemy
Onto Avg. Baseline
Polysemy
10.4
5.1
0.68
ITA
MaxEnt
0.825 0.827
SVM
0.822
ITA > System Performance
1
0.9
0.8
0.7
0.6
ITA
0.5
Baseline
0.4
Performance
0.3
0.2
0.1
0
form
9
count
deal
order
ITA < System Performance
1
0.9
0.8
0.7
0.6
ITA
0.5
Baseline
0.4
Performance
0.3
0.2
0.1
0
decide
10
keep
throw
mean
Discussion
 Coarse-grained sense distinctions improve both
ITA and system performance
Data set
Baseline Acc.
System Acc.
ITA
SENSEVAL-2 verbs
0.407
0.646
0.713
OntoNotes verbs
0.680
0.827
0.825
 Linguistically motivated features contributed to
high system accuracy
11
ALL
w/o SEM.
w/o SEM+SYN
0.827
0.816
0.789
Eventive nouns
 ISI sense tags nouns
 Some nouns have eventive senses
– party
– development
 Given a list of nouns and tagged instances, we
nombank just those. A few thousand at most.
 Last meeting we reported very poor ITA with Adam’s
nombank annotation
12
Comparison of NomBank and PropBank Frames

107 Frames examined: the ISI eventive nouns that have
frame files in NomBank

47 of those showed differences between the NomBank
and the PropBank frames.
Types of differences:
13
1.
No PropBank equivalent
2.
No PropBank equivalent for some NomBank senses
3.
No NomBank equivalent for some PropBank senses
4.
NomBank equivalent has extra Args
No PropBank equivalent
 15 cases
breakdown; downturn; illness; oath;
outcome; pain; repercussion; stress;
transition; turmoil; unrest
14
No related PropBank equivalent for some NomBank
senses

7 cases

No PB equivalent
start
start.02 “attribute/housing-starts”

PB equivalent of unrelated name
appointment
appointment.02 “have a date”
Equivalent: meet.03, no equivalent sense of “to appoint”
•
PB equivalent of related name has different sense numbering
plea
plea.02 “beg”
Equivalent: plead.01, source is listed as appeal.02
15
solution
solution.02 “mix, combine”; source mix.01
Arg0: agent, mixer
Arg1: ingredient one
Arg2: ingredient two
Related PB equivalent: dissolve.01 “cause to come apart”??
Arg0: causer, agent
Arg1: thing dissolving
Arg2: medium
“salt water solution”
“rubber solution”
“chemical solution”
16
No related NomBank equivalent for some
PropBank rolesets

10 cases
harassment
harassment.01 “bother”
Source: harass.01 “bother”
“Police and soldiers continue to harass Americans.”
“the harassment of diplomats and their families”
harass.02 “cause an action”
“John harassed Mary into giving him some ice cream.”
No NB equivalent.
17
NomBank equivalent has extra Args

6 cases
allegation
PB: allege.01
Arg0: speaker, alleger
Arg1: utterance, allegation
Arg2: hearer
NB: allegation.01
Arg3: person against whom something is alleged
“Fraud allegation against Wei-Chyung Wang”
“Abuse alleged against accused murderer.
Conclusion: Consider adding Arg3 to VN frame.
18
answer
PB: answer.01
Arg0: replier
Arg1: in response to
Arg2: answer
NB: answer.01
Arg3: asker/recipient of answer
“Wang’s marketing department provided the sales forceArg3
answers to [the] questions”
In PropBank, this role is often fulfilled by the Arg1.
“’I’ve read Balzac’, he answers criticsArg1”
19
attachment
attachment.01
Arg0: agent
Arg1: theme
Arg2: theme2
Arg3: instrument
attach.01
Arg0: agent
Arg1: thing being tied
Arg2: instrument
Attach.01 allows two Arg1’s:
John attached the apology noteArg1 to his dissertationArg1.
20
Other issues
 Roles described differently (different label or true difference)
utterance.01: Arg0 “agent”
utter.01: Arg0 “speaker”
score.01 (VN): Arg2 “opponent”
score.01 (NB): Arg2 “test/game”
“scored against the B teamArg2”; “high testArg2 scores”
 Different frame numbers
“attach, as with glue”
bond.02 (NB)
bond.01 (VN)
21
Conclusion
 We can fix these!
22