Automatic Labeling of Semantic Roles

Download Report

Transcript Automatic Labeling of Semantic Roles

Automatic Labeling of
Semantic Roles
By Daniel Gildea and Daniel Jurafsky
Presented By Kino Coursey
Outline






Their Goals
Semantic Roles
Related Work
Methodology
Results
Their Conclusions
Their Goals


To create a system that can identify
the semantic relationships or
semantic roles, filled by the syntatic
constituents of a sentence and place
them into a semantic frame.
Lexical and syntactic features are
derived from parse trees and are
used to make statistical classifiers
from hand –annotated training data
Potential users





Shallow semantic analysis would be useful
in a number of NLP tasks
Domain independent starting point for
information extraction
Word sense Disambiguation based on
current semantic role
Intermediate representation for translation
and summarization
Adding semantic roles could improve
parser and speech recognition accuracy
Their Approach




Treat the role assignment problem as
being like other tagging problems
Use recent successful methods in
probabilistic parsing and statistical
classification
Use the hand-labeled FrameNet
database to provide training info
over 50,000 sentences from the BNC
FrameNet roles defines the tag set
Semantic Roles

Historically two types of roles
• Very abstract like AGENT & PATIENT
• Verb specific like EATER and EATEN for “eat”


FrameNet defines and intermediate,
schematic representation of situations,
with participants, props and conceptual
roles.
A frame being a situation description can
be activated by multiple verbs or other
constituents
Frame Advantages



Avoids difficulty with trying to find a small
set of universal, abstract or thematic roles
Has as many roles as necessary to
describe the situation with minimal
information loss and discrimination
Abstract roles can be defined as high level
roles of abstract frames such as “action”
or “motion” at the top of the hiearchy
Example Domains and Frames
Examples of Semantic Roles
Example FrameNet Markup
<CORPUS CORPNAME="bnc" DOMAIN="motion" FRAME="removing"
LEMMA="take.v">
<S TPOS="80499932">
<T TYPE="sense1"></T>
<C FE="Agt" PT="NP" GF="Ext">Pioneer/VVB European/AJ0</C>
settlers/NN2
used/VVD-VVN
several/DT0
methods/NN2
to/TO0
<C TARGET="y"> take/VVI</C>
<C FE="Thm" PT="NP" GF="Obj">land/NN1</C>
<C FE="Src" PT="PP" GF="Comp">from/PRP indigenous/AJ0
people/NN0</C>
./PUN
</S>
Related Work

Traditional parsing and
understanding systems rely on handdeveloped grammars
• Must anticipate the way semantic roles
are realized through syntax
• Time consuming to develop
• Limited coverage (human proscriptive
recall problem)
Related Work



Others have used data-driven approaches
for template-based semantic analysis in
“shallow” systems
Miller(1996) Air Travler Information
System, probability of a constituent filling
slots in frames. Each node could have
both semantic and syntactic elements
Data-driven information extraction by
Riloff. Automatically derived case frames
for words in domain
Related Work

Blaheta and Charniak used a statistical
algorithm for assigning Penn Tree bank
functional words with F-measure of 87%
with 99% when ‘no tag’ is valid choice
Methodology

Two part strategy
• Identify the boundaries of the frame
elements in the sentence
• Given the boundaries label each with
the correct role

Statistics based: train a classifier on
labeled training set then test on
unlabeled test set
Methodology

Training
• Trained using Collins parser on 37000
sentences
• Match annotated frame elements to parse
constituents
• Extract various features from string of words
and parse tree

Testing
• Run parser on test sentences and extract same
features
• Probability for each semantic role r is
computed from features
Features used


Phrase Type: Standard syntactic type (NP,VP,S)
Grammatical Function
• Relation to rest of sentence (subject of verb, object of
verb…)
• Limited to NP’s

Position
• Before or after predicate defining the frame
• Correlated to Grammatical functions
• Redundant backup information


Voice: Used 10 passive-identifying patterns for
active/passive classification
Head Word: head words of each constituent
Parsed Sentence with FrameNet role
assignments
Testing

FrameNet corpus test set
• 10% of each target word -> test set
• 10% of each target word -> tuning set
• Words with fewer than 10 ignored
• Average number of sentences per target
word = 34 [Too SPARSE !!!]
• Average number of sentences per frame
= 732
Sparseness Problem




Problem: Data is too sparse to directly
calculate probabilities on the full set of
features
Approach: Build classifiers by combining
probabilities from distributions conditioned
on combinations of features
Additional problem: FrameNet data was
selected to show prototypical examples of
semantic frames, not as a random sample
for each frame
Approach : Collect more data in the future
Results: Probability Distributions



Coverage= % of test data seen in training
Accuracy = % of test data correctly predicted (similar to
precision)
Performance = overall % of test data for which correct role
is predicted (similar to recall)
Results: Simple Probabilities
Used simple empirical distributions
Results: Linear Interpolation
Results: Geometric mean in the log
domain
Results: Combining data


Schemes of giving
more weight to
distributions with
more data did not
have a significant
effect
Role assignments
only depended on
relative ranking so
fine tuning makes
little difference
Backoff combination: use
less specific data only if
more specific is missing
Results: Linear Backoff was the
best



Final system performance 80.4% up from
the 40.9% baseline
Linear Backoff performed 80.4% on development
set and 76.9% on Test set
Baseline performed 40.9% on development set
and 40.6% on Test set
Results: Their Discussions



Constituent position relative to target word +
active/passive info (78.8%) performed as well as
reading grammatical functions off the parse tree
(79.2%)
Using active/passive info can improve performance
from 78.8% to 80.5%. 5% of examples were
passives
Lexicalization via head words when available is
good
• P(role|head,target) is available for only 56.0% of data
• P(role|head,target) is 86.7% correct without using any
syntactic features.
Results: Lexical Clustering


Since head words performed so well but are so
sparse, try to use clustering to improve coverage
Compute soft clusters for nouns using only frame
elements with noun head words from the BNC
P(r|h,nt,t)=SumOf( P(r|c,nt,t)*P(c|h), over C clusters h
belongs to)




Unclustered data is 87.6% correct but only
covers 43.7%
Clustered head words 79.9% for the 97.9% of
nominal head words in vocabulary.
Adding clustering of NP constituents improved
performance from 80.4% to 81.2%
(Question: Would other lexical semantic
resources help?)
Automatic Identification of Frame
Element Boundaries





Original experiments used hand annotated
frame element boundaries
Used features in a sentence parse tree
likely to be a frame element
System given human annotated target
word and frame
Main feature used: path from target word
through parse tree to constituent, using
upward and downward links
Used P(fe|path), P(fe|path,target) and
P(fe|head,target)
Automatic Identification of Frame
Element Boundaries






P(fe|path,target) peforms relatively poorly since
only about 30 sentences for each target word
P(fe|head,target) alone not a useful classifier, but
helps with linear interpolation
Can only ID frame elements that have a
constituent in the parse tree, but can be helped
with partial matching
With relaxed matching, 86% agreement with
hand annotations
When correctly ID’ed FE’s are fed into the
previous role labeler, 79.6% are correct, in the
same range as with human data
(Question: If it is correctly ID’ed, shouldn’t this
be the case?)
Their Conclusions





Their system can label roles with some
accuracy
Lexical statistics on constituents head
words were most important feature used
Problem is while very accurate they are
very sparse
Key to high overall performance was
combining features
Combined system was more accurate than
any feature alone, the specific method
was less important