Palmer-Columbia.Jan04
Download
Report
Transcript Palmer-Columbia.Jan04
Penn
Putting Meaning Into Your Trees
Martha Palmer
University of Pennsylvania
Columbia University
New York City
January 29, 2004
Columbia, 1/29/04
1
Outline
Penn
Introduction
Background: WordNet, Levin classes, VerbNet
Proposition Bank – capturing shallow
semantics
Mapping PropBank to VerbNet
Mapping PropBank to WordNet
Columbia, 1/29/04
2
Ask Jeeves – A Q/A, IR ex.
Penn
What do you call a successful movie? Blockbuster
Tips on Being a Successful Movie Vampire ... I shall call
the police.
Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
3
Ask Jeeves – filtering w/ POS tag
Penn
What do you call a successful movie?
Tips on Being a Successful Movie Vampire ... I shall call
the police.
Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
4
Filtering out “call the police”
Penn
Syntax
call(you,movie,what)
≠
call(you,police)
Columbia, 1/29/04
5
English lexical resource is required
Penn
That provides sets of possible syntactic
frames for verbs.
And provides clear, replicable sense
distinctions.
AskJeeves: Who do you call for a good
electronic lexical database for English?
Columbia, 1/29/04
6
WordNet – call, 28 senses
Penn
1. name, call -- (assign a specified, proper name to;
"They named their son David"; …)
-> LABEL
2. call, telephone, call up, phone, ring -- (get or try to get into
communication (with someone) by telephone;
"I tried to call you all night"; …)
->TELECOMMUNICATE
3. call -- (ascribe a quality to or give a name of a common
noun that reflects a quality;
"He called me a bastard"; …)
-> LABEL
4. call, send for -- (order, request, or command to come;
"She was called into the director's office"; "Call the police!")
-> ORDER
Columbia, 1/29/04
7
WordNet – Princeton (Miller 1985, Fellbaum 1998)
Penn
On-line lexical reference (dictionary)
Nouns, verbs, adjectives, and adverbs grouped into
synonym sets
Other relations include hypernyms (ISA), antonyms,
meronyms
Limitations as a computational lexicon
Contains little syntactic information
No explicit predicate argument structures
No systematic extension of basic senses
Sense distinctions are very fine-grained, ITA 73%
No hierarchical entries
Columbia, 1/29/04
8
Levin classes
(Levin, 1993)
Penn
3100 verbs, 47 top level classes, 193 second and third level
Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
John cut the bread. / *The bread cut. / Bread cuts easily.
John hit the wall. / *The wall hit. / *Walls hit easily.
Columbia, 1/29/04
9
Levin classes
(Levin, 1993)
Penn
Verb class hierarchy: 3100 verbs, 47 top level classes, 193
Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
change-of-state
John cut the bread. / *The bread cut. / Bread cuts easily.
change-of-state, recognizable action,
sharp instrument
John hit the wall. / *The wall hit. / *Walls hit easily.
contact, exertion of force
Columbia, 1/29/04
10
Penn
Columbia, 1/29/04
11
Confusions in Levin classes?
Penn
Not semantically homogenous
{braid, clip, file, powder, pluck, etc...}
Multiple class listings
homonymy or polysemy?
Conflicting alternations?
Carry verbs disallow the Conative,
(*she carried at the ball), but include
{push,pull,shove,kick,draw,yank,tug}
also in Push/pull class, does take the Conative
(she kicked at the ball)
Columbia, 1/29/04
12
Intersective Levin Classes
Penn
“apart” CH-STATE
“across the room”
“at” ¬CH-LOC
CH-LOC
Columbia, 1/29/04
Dang, Kipper & Palmer, ACL98
13
Intersective Levin Classes
Penn
More syntactically and semantically coherent
sets of syntactic patterns
explicit semantic components
relations between senses
VERBNET
www.cis.upenn.edu/verbnet
Dang, Kipper & Palmer, IJCAI00, Coling00
Columbia, 1/29/04
14
VerbNet – Karin Kipper
Penn
Class entries:
Capture generalizations about verb behavior
Organized hierarchically
Members have common semantic elements,
semantic roles and syntactic frames
Verb entries:
Refer to a set of classes (different senses)
each class member linked to WN synset(s)
(not all WN senses are covered)
Columbia, 1/29/04
15
Semantic role labels:
Penn
Julia broke the LCD projector.
break (agent(Julia), patient(LCD-projector))
cause(agent(Julia),
broken(LCD-projector))
agent(A) -> intentional(A), sentient(A),
causer(A), affector(A)
patient(P) -> affected(P), change(P),…
Columbia, 1/29/04
16
Hand built resources vs. Real data
Penn
VerbNet is based on linguistic theory –
how useful is it?
How well does it correspond to syntactic
variations found in naturally occurring text?
PropBank
Columbia, 1/29/04
17
Proposition Bank:
Penn
From Sentences to Propositions
Powell met Zhu Rongji
battle
wrestle
join
debate
Powell and Zhu Rongji met
Powell met with Zhu Rongji
Powell and Zhu Rongji had
a meeting
consult
Proposition: meet(Powell, Zhu Rongji)
meet(Somebody1, Somebody2)
...
When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu)
Columbia, 1/29/04
discuss([Powell, Zhu], return(X, plane))
18
Capturing semantic roles*
Penn
SUBJ
Owen broke [ ARG1 the laser pointer.]
SUBJ
[ARG1 The windows] were broken by the
hurricane.
SUBJ
[ARG1 The vase] broke into pieces when it
toppled over.
*See also Framenet, http://www.icsi.berkeley.edu/~framenet/
Columbia, 1/29/04
19
English lexical resource is required
Penn
That provides sets of possible syntactic
frames for verbs with semantic role labels.
And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
20
A TreeBanked Sentence
Penn
(S (NP-SBJ Analysts)
(VP have
(VP been
VP
(VP expecting
(NP (NP a GM-Jaguar pact)
have VP
(SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1)
NPbeen VP
(VP would
Analyst
SBJ
expectingNP
(VP give
s
(NP the U.S. car maker)
SBAR
NP
S (NP (NP an eventual (ADJP 30 %) stake)
a GM-Jaguar WHNP-1
(PP-LOC
in (NP the British company))))))))))))
VP
pact
that NP-SBJ
VP
*T*-1 would
NP
give
PPNP
Analysts have been expecting a GM-Jaguar
NP
LOC
pact that would give the U.S. car maker an the US car
NP
an eventual
maker
eventual 30% stake in the British company.
the British
30% stake in
company
S
Columbia, 1/29/04
21
The same sentence, PropBanked
Penn
(S Arg0 (NP-SBJ Analysts)
(VP have
(VP been
Arg1
(VP expecting
Arg1 (NP (NP a GM-Jaguar pact)
(SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1)
a GM-Jaguar
(VP would
pact
(VP give
Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %)
stake)
Arg0
(PP-LOC in (NP the British
that would give
Arg1
company))))))))))))
have been expecting
Arg0
Analyst
s
*T*-1
Arg2
the US car
maker
Columbia, 1/29/04
an eventual 30% stake in the
British company
expect(Analysts, GM-J pact)
give(GM-J pact, US car maker, 30% stake)
22
Frames File Example: expect
Penn
Roles:
Arg0: expecter
Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines in
interest rates.
Arg0:
REL:
Arg1:
Columbia, 1/29/04
Portfolio managers
expect
further declines in interest rates
23
Frames File example: give
Penn
Roles:
Arg0: giver
Arg1: thing given
Arg2: entity given to
Example:
double object
The executives gave the chefs a standing ovation.
Arg0:
The executives
REL:
gave
Arg2:
the chefs
Arg1:
a standing ovation
Columbia, 1/29/04
24
Word Senses in PropBank
Penn
Orders to ignore word sense not feasible for 700+
verbs
Mary left the room
Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
25
Annotation procedure
Penn
PTB II - Extraction of all sentences with given verb
Create Frame File for that verb Paul Kingsbury
(3100+ lemmas, 4400 framesets,118K predicates)
Over 300 created automatically via VerbNet
First pass: Automatic tagging (Joseph Rosenzweig)
http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon
Second pass: Double blind hand correction
Paul Kingsbury
Tagging tool highlights discrepancies Scott Cotton
Third pass: Solomonization (adjudication)
Betsy Klipple, Olga Babko-Malaya
Columbia, 1/29/04
26
Trends in Argument Numbering
Penn
Arg0 = agent
Arg1 = direct object / theme / patient
Arg2 = indirect object / benefactive /
instrument / attribute / end state
Arg3 = start point / benefactive / instrument /
attribute
Arg4 = end point
Per word vs frame level – more general?
Columbia, 1/29/04
27
Additional tags
(arguments or adjuncts?)
Penn
Variety of ArgM’s (Arg#>4):
TMP - when?
LOC - where at?
DIR - where to?
MNR - how?
PRP -why?
REC - himself, themselves, each other
PRD -this argument refers to or modifies
another
ADV –others
Columbia, 1/29/04
28
Inflection
Penn
Verbs also marked for tense/aspect
Passive/Active
Perfect/Progressive
Third singular (is has does was)
Present/Past/Future
Infinitives/Participles/Gerunds/Finites
Modals and negations marked as ArgMs
Columbia, 1/29/04
29
Frames: Multiple Framesets
Penn
Out of the 787 most frequent verbs:
1 Frameset – 521
2 Frameset – 169
3+ Frameset - 97 (includes light verbs)
94% ITA
Framesets are not necessarily consistent between
different senses of the same verb
Framesets are consistent between different verbs
that share similar argument structures,
(like FrameNet)
Columbia, 1/29/04
30
Ergative/Unaccusative Verbs
Penn
Roles (no ARG0 for unaccusative verbs)
Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16
billion.
The Nasdaq composite index added 1.01
to 456.6 on paltry volume.
Columbia, 1/29/04
31
Actual data for leave
Penn
http://www.cs.rochester.edu/~gildea/PropBank/Sort/
Leave .01 “move away from” Arg0 rel Arg1 Arg3
Leave .02 “give” Arg0 rel Arg1 Arg2
sub-ARG0 obj-ARG1 44
sub-ARG0 20
sub-ARG0 NP-ARG1-with obj-ARG2 17
sub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10
sub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6
sub-ARG0 sub-ARG1 VP-ARG3-PRD 5
NP-ARG1-with obj-ARG2 4
obj-ARG1 3
sub-ARG0 sub-ARG2 VP-ARG3-PRD 3
Columbia, 1/29/04
32
Penn
PropBank/FrameNet
Buy
Sell
Arg0: buyer
Arg0: seller
Arg1: goods
Arg1: goods
Arg2: seller
Arg2: buyer
Arg3: rate
Arg3: rate
Arg4: payment
Arg4: payment
Broader, more neutral, more syntactic –
maps readily to VN,TR.FN
Rambow, et al, PMLB03
33
Columbia, 1/29/04
Annotator accuracy – ITA 84%
Penn
Annotator Accuracy-primary labels only
0.96
hertlerb
0.95
forbesk
0.94
solaman2
istreit
accuracy
0.93
0.92
wiarmstr
0.91
0.9
0.89
kingsbur
ksledge
nryant
jaywang
malayao
0.88
0.87
0.86
1000
ptepper
cotter
10000
delilkan
100000
1000000
# of annotations (log scale)
Columbia, 1/29/04
34
English lexical resource is required
Penn
That provides sets of possible syntactic
frames for verbs with semantic role labels?
And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
35
English lexical resource is required
Penn
That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically
assigned accurately to new text?
And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
36
Automatic Labelling of Semantic
Relations
Penn
• Stochastic Model
• Features:
Predicate
Phrase Type
Parse Tree Path
Position (Before/after predicate)
Voice (active/passive)
Head Word
Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02
Columbia, 1/29/04
37
Semantic Role Labelling AccuracyKnown Boundaries
Gold St. parses
Framenet PropBank
≥ 10 inst
77.0
Automatic parses 82.0
73.6
Penn
PropBank
≥ 10 instances
83.1
79.6
•Accuracy of semantic role prediction for known boundaries--the
system is given the constituents to classify.
•FrameNet examples (training/test) are handpicked to be unambiguous.
• Lower performance with unknown boundaries.
• Higher performance with traces.
• Almost evens out.
Columbia, 1/29/04
38
Additional Automatic Role Labelers
Penn
Performance improved from 77% to 88% Colorado
(Gold Standard parses, < 10 instances)
Same features plus
Named Entity tags
Head word POS
For unseen verbs – backoff to automatic verb clusters
SVM’s
Role or not role
For each likely role, for each Arg#, Arg# or not
No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
39
Additional Automatic Role Labelers
Penn
Performance improved from 77% to 88% Colorado
New results, original features, labels, 88%, 93% Penn
(Gold Standard parses, < 10 instances)
Same features plus
Named Entity tags
Head word POS
For unseen verbs – backoff to automatic verb clusters
SVM’s
Role or not role
For each likely role, for each Arg#, Arg# or not
No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
40
Word Senses in PropBank
Penn
Orders to ignore word sense not feasible for 700+
verbs
Mary left the room
Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
41
Mapping from PropBank to VerbNet
Frameset id =
leave.02
Sense =
give
VerbNet class =
future-having 13.3
Arg0
Giver
Agent
Arg1
Thing given
Theme
Arg2
Benefactive
Recipient
Columbia, 1/29/04
Penn
42
Mapping from PB to VerbNet
Penn
Columbia, 1/29/04
43
Mapping from PropBank to VerbNet
Penn
Overlap with PropBank framesets
50,000 PropBank instances
< 50% VN entries, > 85% VN classes
Results
MATCH - 78.63%. (80.90% relaxed)
(VerbNet isn’t just linguistic theory!)
Benefits
Thematic role labels and semantic predicates
Can extend PropBank coverage with VerbNet classes
WordNet sense tags
Kingsbury & Kipper, NAACL03, Text Meaning Workshop
http://www.cs.rochester.edu/~gildea/VerbNet/
Columbia, 1/29/04
44
WordNet as a WSD sense inventory
Penn
Senses unnecessarily fine-grained?
Word Sense Disambiguation bakeoffs
Senseval1 – Hector, ITA = 95.5%
Senseval2 – WordNet 1.7, ITA verbs = 71%
Groupings of Senseval2 verbs, ITA =82%
Used syntactic and semantic criteria
Columbia, 1/29/04
45
Groupings Methodology
(w/ Dang and Fellbaum)
Penn
Double blind groupings, adjudication
Syntactic Criteria (VerbNet was useful)
Distinct subcategorization frames
call him a bastard
call him a taxi
Recognizable alternations – regular sense
extensions:
play an instrument
play a song
play a melody on an instrument
SIGLEX01, SIGLEX02, JNLE04
Columbia, 1/29/04
46
Groupings Methodology (cont.)
Penn
Semantic Criteria
Differences in semantic classes of arguments
Abstract/concrete, human/animal,
animate/inanimate, different instrument types,…
Differences in entailments
Change of prior entity or creation of a new entity?
Differences in types of events
Abstract/concrete/mental/emotional/….
Specialized subject domains
Columbia, 1/29/04
47
Results – averaged over 28 verbs
Dang and Palmer, Siglex02,Dang et al,Coling02
Penn
Total
WN polysemy
16.28
Group polysemy
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
MX-group
69%
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
48
+2.5%,
+1.5 to +5%,
+6%
Columbia, 1/29/04
Grouping improved ITA
and Maxent WSD
Penn
Call: 31% of errors due to confusion between senses
within same group 1:
name, call -- (assign a specified, proper name to; They named
their son David)
call -- (ascribe a quality to or give a name of a common noun
that reflects a quality; He called me a bastard)
call -- (consider or regard as being;I would not call her beautiful)
75% with training and testing on grouped senses vs.
43% with training and testing on fine-grained senses
Columbia, 1/29/04
49
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
50
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
51
Overlap between Groups and
Framesets – 95%
Penn
Frameset2
Frameset1
WN1 WN2
WN3 WN4
WN6 WN7 WN8
WN5 WN 9 WN10
WN11 WN12 WN13
WN19
WN 14
WN20
develop
Palmer, Dang & Fellbaum, NLE 2004
Columbia, 1/29/04
52
Sense Hierarchy
Penn
PropBank Framesets –
coarse grained distinctions
20 Senseval 2 verbs w/ > 1 Frameset
Maxent WSD system, 73.5% baseline, 90% accuracy
Sense Groups (Senseval-2) intermediate level
(includes Levin classes) – 95% overlap, 69%
WordNet – fine grained distinctions, 60.2%
Columbia, 1/29/04
53
English lexical resource is available
Penn
That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically assigned
accurately to new text.
And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
54
A Chinese Treebank Sentence
Penn
国会/Congress 最近/recently 通过/pass 了/ASP 银行法
/banking law
“The Congress passed the banking law recently.”
(IP (NP-SBJ (NN 国会/Congress))
(VP (ADVP (ADV 最近/recently))
(VP (VV 通过/pass)
(AS 了/ASP)
(NP-OBJ (NN 银行法/banking law)))))
Columbia, 1/29/04
55
The Same Sentence, PropBanked
Penn
(IP (NP-SBJ arg0 (NN 国会))
(VP argM (ADVP (ADV 最近))
(VP f2 (VV 通过)
(AS 了)
arg1 (NP-OBJ (NN 银行法)))))
通过(f2) (pass)
arg0
国会
argM
最近
arg1
银行法 (law)
(congress)
Columbia, 1/29/04
56
Chinese PropBank Status (w/ Bert Xue and Scott Cotton)
Penn
Create Frame File for that verb Similar alternations – causative/inchoative,
unexpressed object
5000 lemmas, 3000 DONE, (hired Jiang)
First pass: Automatic tagging 2500 DONE
Subcat frame matcher
(Xue & Kulick, MT03)
Second pass: Double blind hand correction
In progress (includes frameset tagging), 1000 DONE
Ported RATS to CATS, in use since May
Third pass: Solomonization (adjudication)
Columbia, 1/29/04
57
A Korean Treebank Sentence
Penn
그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다.
He added that Renault has a deadline until the end of March for a merger
proposal.
(S (NP-SBJ 그/NPN+은/PAU)
(VP (S-COMP (NP-SBJ 르노/NPR+이/PCA)
(VP (VP (NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP (NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
Columbia, 1/29/04
58
The same sentence, PropBanked
덧붙이었다
Arg0
그는
Arg2
갖고 있다
Arg0
르노가
Arg1
ArgM
3 월말까지
Penn
(S Arg0 (NP-SBJ 그/NPN+은/PAU)
(VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노/NPR+이/PCA)
(VP (VP ( ArgM NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP ( Arg1 NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
인수제의 시한을
덧붙이다 (그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다)
(add)
(he) (Renaut has a deadline until the end of March for a merger proposal)
갖다 (르노가,
(has)
Columbia, 1/29/04
3 월말까지,
인수제의 시한을)
(Renaut) (until the end of March) (a deadline for a merger proposal)
59
PropBank II
Penn
Nominalizations NYU
Lexical Frames DONE
Event Variables, (including temporals and
locatives)
More fine-grained sense tagging
Tagging nominalizations w/ WordNet sense
Selected verbs and nouns
Nominal Coreference
not names
Clausal Discourse connectives – selected subset
Columbia, 1/29/04
60
PropBank II
Event variables;
Penn
sense tags;
nominal reference;
discourse connectives
{Also}, [Arg0substantially lower Dutch corporate tax rates]
helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3PRD flat] [ArgM-ADV relative to earnings growth]].
ID#
REL
h23
help
tax rates
help2,5 tax rate1
the company
keep its tax
outlay flat
k16
keep the
keep1 company1
company
its tax outlay
Columbia, 1/29/04
Arg0
Arg1
Arg3PRD
ArgM-ADV
flat
relative to
earnings…
61
Summary
Penn
Shallow semantic annotation that captures critical
dependencies and semantic role labels
Supports training of supervised automatic
taggers
Methodology ports readily to other languages
English PropBank release – spring 2004
Chinese PropBank release – fall 2004
Korean PropBank release – summer 2005
Columbia, 1/29/04
62
Word sense in Machine Translation
Penn
Different syntactic frames
John left the room
Juan saiu do quarto. (Portuguese)
John left the book on the table.
Juan deizou o livro na mesa.
Same syntactic frame?
John left a fortune.
Juan deixou uma fortuna.
Columbia, 1/29/04
63
Summary of
Multilingual TreeBanks, PropBanks
Parallel
Corpora
Text
Treebank
PropBank I
Penn
Prop II
Chinese Chinese 500K Chinese 500K Chinese 500K Ch 100K
Treebank English 400K English 100K English 350K En 100K
Arabic 500K
Arabic 500K
Arabic
Treebank English 500K English ?
?
?
Korean 180K
Korean
Treebank English 50K
Korean 180K
English 50K
Columbia, 1/29/04
Korean 180K
English 50K
64
Levin class: escape-51.1-1
Penn
WordNet Senses: WN 1, 5, 8
Thematic Roles: Location[+concrete]
Theme[+concrete]
Frames with Semantics
Basic Intransitive
"The convict escaped"
motion(during(E),Theme) direction(during(E),Prep,Theme, ~Location)
Intransitive (+ path PP)
"The convict escaped from the prison"
Locative Preposition Drop
"The convict escaped the prison"
Columbia, 1/29/04
65
Levin class: future_having-13.3
Penn
WordNet Senses: WN 2,10,13
Thematic Roles: Agent[+animate OR +organization]
Recipient[+animate OR +organization]
Theme[]
Frames with Semantics
Dative
"I promised somebody my time"
Agent V Recipient Theme
has_possession(start(E),Agent,Theme)
future_possession(end(E),Recipient,Theme) cause(Agent,E)
Transitive (+ Recipient PP)
"We offered our paycheck to her"
Agent V Theme Prep(to) Recipient )
Transitive (Theme Object)
"I promised my house (to somebody)"
Agent V Theme
Columbia, 1/29/04
66
Automatic classification
Penn
Merlo & Stevenson automatically classified 59
verbs with 69.8% accuracy
1. Unergative, 2. unaccusative, 3. object-drop
100M words automatically parsed
C5.0. Using features: transitivity, causativity,
animacy, voice, POS
EM clustering – 61%, 2669 instances, 1M words
Using Gold Standard semantic role labels
1. float hop/hope jump march leap
2. change clear collapse cool crack open flood
3. borrow clean inherit reap organize study
Columbia, 1/29/04
67
SENSEVAL – Word Sense
Disambiguation Evaluation
Penn
DARPA style bakeoff: training data, testing data, scoring algorithm.
SENSEVAL1
1998
Languages
3
Systems
24
Eng. Lexical Sample Yes
Verbs/Poly/Instances 13/12/215
Sense Inventory
Hector,
95.5%
Columbia, 1/29/04
SENSEVAL2
2001
12
90
Yes
29/16/110
WordNet,
73+%
NLE99, CHUM01, NLE02, NLE03
68
Maximum Entropy WSD
Hoa Dang, best performer on Verbs
Penn
Maximum entropy framework, p(sense|context)
Contextual Linguistic Features
Topical feature for W:
keywords (determined automatically)
Local syntactic features for W:
presence of subject, complements, passive?
words in subject, complement positions, particles,
preps, etc.
Local semantic features for W:
Semantic class info from WordNet (synsets, etc.)
Named Entity tag (PERSON, LOCATION,..) for
proper Ns
words within +/- 2 word window
Columbia, 1/29/04
69
Best Verb Performance - Maxent-WSD
Hoa Dang
28 verbs - average
Total
WN polysemy
16.28
ITA
71%
MX-WSD
60.2%
Penn
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
+2.5%,
+1.5 to +5%,
+6%
Dang and Palmer, Siglex02,Dang et al,Coling02
Columbia, 1/29/04
70
Role Labels & Framesets
as features for WSD
Penn
Preliminary results
Jinying Chen
Gold Standard PropBank annotation
Decision Tree C5.0,
Groups
5 verbs,
Features: Frameset tags, Arg labels
Comparable results to Maxent with
PropBank features
Syntactic frames and sense distinctions are inseparable
Columbia, 1/29/04
71
Lexical resources provide concrete
criteria for sense distinctions
Penn
PropBank – coarse grained sense
distinctions determined by different
subcategorization frames (Framesets)
Intersective Levin classes – regular sense
extensions through differing syntactic
constructions
VerbNet – distinct semantic predicates for
each sense (verb class)
Are these the right distinctions?
Columbia, 1/29/04
72
Results – averaged over 28 verbs
Penn
Total
WN
16.28
Grp
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
JHU - MLultra
56.6%,58.7%
MX-group
69%
Columbia, 1/29/04
73