Class-based_nominal_semantic_role_labeling

Download Report

Transcript Class-based_nominal_semantic_role_labeling

Class-based nominal semantic role
labeling: a preliminary investigation
Matt Gerber
Michigan State University, Department of Computer
Science
Introduction: semantic role labeling

The semantic role

Relation between a constituent and a predication
“John presented his findings to the committee.”
Agent

Theme
Experiencer
The task


Automatically identify semantic roles occurring in
natural language
Problematic: which roles are the “right” ones?
Introduction: PropBank (Kingsbury
and Palmer 2003)

Annotated corpus of semantic roles
“John presented his findings to the committee.”
Arg0


Arg2
Base corpus: TreeBank 2 (Marcus et al., 1993)
Evaluation


Arg1
CoNLL Shared Task (Carreras and Marquez, 2005)
Implications



QA: Kaisser and Webber (2007), Shen and Lapata (2007)
Coreference: Ponzetto and Strube (2006)
Information extraction: Surdeanu et al. (2003)
Introduction: NomBank (Meyers, 2007)

Verbs are not the only lexical category with
shallow semantic structure
Verbal

Nominal


[Arg0 Judge Curry] [Predicate ordered] [Arg1 Edison]
[Arg2 to make average refunds of about $45].
Judge Curry ordered [Arg0 Edison] to make average
[Predicate refunds] [Arg1 of about $45].
A more complete semantic interpretation of
natural language
Introduction: NomBank (Meyers, 2007)

Corpus information




Base corpus: TreeBank 2
Distinct nominalizations: 4704
Total attestations: ~115K
NomLex (Macleod et al., 1998)

Nominalization classes (22)
Nom (deverbals)
Example: Sales departments then urged [Predicate abandonment]
[Arg1 of the Pico Project].
Partitive (part-whole)
Example: Hallwood owns about 11 [Predicate %] [Arg1 of Integra].
Research objectives


Investigate the role of NomLex classes in
automated NomBank SRL
Hypotheses


(1) Classes may exhibit consistent realizations of
their arguments
(2) Modeling each class separately may result in
more homogeneous training data and better SRL
performance
Outline





Nominalization interpretation: related work
NomBank SRL
Class-based NomBank SRL
Preliminary results and analysis
Conclusions and future work
Nominalization interpretation: early work

Rule-based methods

Associate syntactic configurations with
grammatical functions and semantic properties




Dahl et al. (1987)
Hull and Gomez (1996)
Meyers et al. (1998)
Statistical models: Lapata (2000)

Identify underlying subject/object


[subject satellite] observation
[object satellite] observation
Nominalization interpretation: recent work

SemEval (Girju, 2007)

Semantic relations between nominals







Cause-Effect: laugh wrinkles
Instrument-Agency: laser printer
Product-Producer: honey bee
Origin-Entity: messageentity from outer-spaceorigin
Theme-Tool: news conference
Part-Whole: the door of the car
Content-Container: the grocery bag
Nominalization interpretation: recent work

NomBank SRL: Jiang and Ng (2006), Liu
and Ng (2007)

Direct application of verbal SRL methods




Standard feature set
Maximum entropy modeling
Best overall f-measure score: 0.7283
NomBank-specific features had little impact
Overview of NomBank SRL

Full syntactic analysis
S
VP
S
VP
VP
NP
NP
NP
JJ
PP
NNS
Judge Curry ordered Edison to make average [Predicate refunds] of about $45.
Overview of NomBank SRL

Argument identification

S
Binary classification problem


VP
Argument
Non-argument
S
VP
VP
NP
NP
NP
JJ
PP
NNS
Judge Curry ordered [Edison] to make average [Predicate refunds] [of about $45].
Overview of NomBank SRL

Argument classification

S
22-class problem


VP
Arg0-Arg9
Temporal, location, etc.
S
VP
VP
NP
NP
NP
JJ
PP
NNS
Judge Curry ordered [Arg0 Edison] to make average [Predicate refunds] [Arg1 of about $45].
NomBank SRL features
Class-based NomBank SRL

Simple method


Cluster nominalizations according to NomLex class
membership
Train a logistic regression model for each class




Single-stage, 23-class strategy
Baseline feature set
Heuristic post-processing
Backoff

Trained over all classes
Class-based NomBank SRL

Model application
Hallwood owns about 11 [Predicate %] of Integra.
NomLex
abandonment: …
abatement: …
abduction: …
aberration: …
ability: …
abolition: …
abomination: …
Nom
Partitive
Attribute
Hallwood owns about 11 [Predicate %] [Arg1 of Integra].
Relational
Backoff
Preliminary results and analysis

Evaluation configuration




Training instances: WSJ 2-21
Testing instances: WSJ 23
Automatically generated parse trees for training
and testing
Key observations



Overall performance
Per-class performance
Class-based gains over baseline
Overall evaluation results
Per-class evaluation results
Per-class evaluation results

General observations



Negligible overall gains compared to Liu and Ng (2007),
who reported overall f-measure of 0.7283
Some NomLex classes perform very well
Classes introduce gains as well as losses
Analysis: intra-class regularity


Hypothesis 1: classes may exhibit consistent
realizations of their arguments
Relational class (F1=90.94)




Regularity: argument incorporation
[Arg2 Mr. Hunt’s] [Arg0/Predicate attorney] said his
client welcomed the gamble.
100% of Relational nominalizations have an
incorporated Arg0
Constitutes 38% of test arguments for the class
Analysis: intra-class regularity


Hypothesis 1: classes may exhibit consistent
realizations of their arguments
Partitive class (F1=79.85)



Regularity: presence of Arg0
86% of Partitive instances take a single Arg0
Compare: 15% of Nom instances take a single
Arg1
Analysis: class-based gains


Hypothesis 2: modeling each class separately
may result in more homogeneous training
data and better SRL performance
Improvements
Class
Test instances
Improvement
Nom-like
798
2.06
Environment
108
3.97
Group
40
5.87
Job
30
6.29
Analysis: class-based gains


Hypothesis 2: modeling each class separately
may result in more homogeneous training
data and better SRL performance
Losses
Class
Test instances
Loss
Class ambiguity
Training instances
Share
42
20.83
98.53
66 of 5211 total
Nom-adj-like
28
5.93
90.56
400 of 5086 total
Conclusions and future work




NomBank SRL based on classes derived
from NomLex
Demonstrates negligible gains over Liu
and Ng (2007)
Intra-class regularity leads to modest
gains in some classes
NomLex ambiguity causes losses in others
Conclusions and future work

In-depth class modeling



Identification of class-specific regularities not
captured by the current feature set
Further partitioning of the Nom class?
NomLex class disambiguation
Thanks!
Any questions?
References








Carreras, X. & Màrquez, L. (2005), 'Introduction to the CoNLL-2005 Shared Task: Semantic
Role Labeling'.
Dahl, D. A.; Palmer, M. S. & Passonneau, R. J. (1987), Nominalizations in PUNDIT, in
'Proceedings of the 25th annual meeting on Association for Computational Linguistics',
Association for Computational Linguistics, Morristown, NJ, USA, pp. 131--139.
Girju, R.; Nakov, P.; Nastase, V.; Szpakowicz, S.; Turney, P. & Yuret, D. (2007), SemEval2007 Task 04: Classification of Semantic Relations between Nominals, in 'Proceedings of the
4th International Workshop on Semantic Evaluations'.
Hull, R. & Gomez, F. (1996), Semantic Interpretation of Nominalizations, in 'Proceedings of
AAAI'.
Jiang, Z. & Ng, H. (2006), Semantic Role Labeling of NomBank: A Maximum Entropy
Approach, in 'Proceedings of the 2006 Conference on Empirical Methods in Natural
Language Processing'.
Kaisser, M. & Webber, B. (2007), Question Answering based on Semantic Roles, in 'ACL
2007 Workshop on Deep Linguistic Processing', Association for Computational Linguistics,
Prague, Czech Republic, pp. 41--48.
Kingsbury, P. & Palmer, M. (2003), Propbank: the next level of treebank, in 'Proceedings of
Treebanks and Lexical Theories'.
Lapata, M. (2000), The Automatic Interpretation of Nominalizations, in 'Proceedings of the
Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on
Innovative Applications of Artificial Intelligence', AAAI Press / The MIT Press, , pp. 716-721.
References (cont’d)








Liu, C. & Ng, H. (2007), Learning Predictive Structures for Semantic Role Labeling of
NomBank, in 'Proceedings of the 45th Annual Meeting of the Association of Computational
Linguistics', Association for Computational Linguistics, Prague, Czech Republic, pp. 208-215.
Macleod, C.; Grishman, R.; Meyers, A.; Barrett, L. & Reeves, R. (1998), Nomlex: A lexicon
of nominalizations, in 'Proceedings of the Eighth International Congress of the European
Association for Lexicography'.
Marcus, M.; Santorini, B. & Marcinkiewicz, M. A. (1993), 'Building a large annotated corpus
of English: the Penn TreeBank', Computational Linguistics 19, 313-330.
Meyers, A. (2007), 'Annotation Guidelines for NomBank - Noun Argument Structure for
PropBank', Technical report, New York University.
Meyers, A.; Macleod, C.; Yangarber, R.; Grishman, R.; Barrett, L. & Reeves, R. (1998),
Using NOMLEX to produce nominalization patterns for information extraction, in
'Proceedings of the COLING-ACL Workshop on the Computational Treatment of Nominals'.
Ponzetto, S. P. & Strube, M. (2006), Exploiting semantic role labeling, WordNet and
Wikipedia for coreference resolution, in 'Proceedings of the main conference on Human
Language Technology Conference of the North American Chapter of the Association of
Computational Linguistics', Association for Computational Linguistics, Morristown, NJ, USA,
pp. 192--199.
Shen, D. & Lapata, M. (2007), Using Semantic Roles to Improve Question Answering, in
'Proceedings of the Conference on Empirical Methods in Natural Language Processing and
on Computational Natural Language Learning', pp. 12-21.
Surdeanu, M.; Harabagiu, S.; Williams, J. & Aarseth, P. (2003), Using predicate-argument
structures for information extraction, in 'Proceedings of the 41st Annual Meeting on
Association for Computational Linguistics'.