Identifying Expressions of Opinion in Context

Download Report

Transcript Identifying Expressions of Opinion in Context

Identifying Expressions of
Opinion in Context
Eric Breck and Yejin Choi and Claire Cardie
IJCAI 2007
Introduction
• Traditional information extraction: answer
questions about facts
• Extract answers to subjective questions:
how does X feel about Y?
• Subjective information extraction and
question answering will require techniques
to analyze text below the sentence level
Introduction:
System Requirement
• Is its polarity positive, negative, or neutral?
• With what strength or intensity is the
opinion expressed: mild, medium, strong or
extreme?
• Who or what is the source, or holder, of the
opinion?
• What is its target, i.e. what is the opinion
about?
Introduction: Examples
• Minister Vedrine criticized the White House
reaction.
– the agent role = “Minister Vedrine”
– the object/theme role = “White House reaction”
• 17 persons were killed by sharpshooters faithful
to the president.
• Tsvangirai said the election result was
“illegitimate” and a clear case of “highway
robbery”.
• Criminals have been preying on Korean travelers
in China.
Introduction
• Direct subjective expressions (DSEs)
– criticized, faithful to
– Said (speech event, if subjective)
• Expressive subjective elements (ESEs)
– illegitimate, highway robbery
– preying on (instead of mugging)
• None has directly tackled the problem of
opinion expression identification.
Subjective Expressions
• The expressions can vary in length from one word
to over twenty words.
• They may be verb phrases, noun phrases, or
strings of words that do not correspond to any
linguistic constituent.
• Subjectivity is a realm of expression where writers
get quite creative, so no short fixed list can capture
all expressions of interest.
• Also, an expression which is subjective in one
context is not always subjective in another context.
Approach
• This task is treated as a tagging problem.
• Conditional random field
• Class variable
– IOB vs IO
• Features
• A linear-chain conditional random field is
chosen, using MALLET toolkit.
Features (1)
• Lexical features
– The word at position i relative to the current token.
– Lex-4 ~ Lex4, , 18,000 binary features per position
(vocabulary size)
• Syntactic features
– POS (45 binary features)
– prev, cur, next (CASS partial parser, constituent type),
100 binary features each.
• Dictionary-based features
Features (2)
• Dictionary-based features: 4 sources
– WordNet: WordNet hypernyms (29,989 binary
features)
– Levin: Levin’s categorization of English words
– Framenet: word in the categorization of nouns
and verbs in Framenet
– Wilson clues (subjective): strong or weak (two
binary features)
Statistics of Data
MPQA corpus, 535 documents.
135 for training, 400 for testing.
10-fold cross validation
Evaluation
• Metric: Precision/Recall/F-measure
– Exact
– Overlap
• Baselines: dictionary-based
– two dictionaries of subjectivity clues: Wiebe vs.
Wilson
– Wilson is incorporated in this experiment
Results (DSE/ESE)
Results (DSE and ESE)
Results (Dictionary-based)
• WordNet is the most useful
• The other dictionaries only help a little
Discussion
• Rules of boundary agreement is not defined
for the annotations: order 1 outperform order
0
• DSEs includes speech events like “said” or
“a statement”, which may be objective.
• Expressions of subjectivity tend to cluster,
therefore density-based features might help.
• Inter-annotator agreement of DSE: 0.75;
ESE:0.72