Determining the Sentiment of Opinions

Download Report

Transcript Determining the Sentiment of Opinions

DETERMINING THE SENTIMENT
OF OPINIONS
1
Presentation by
Md Mustafizur Rahman (mr4xb)
OUTLINES

What is an Opinion?

Problem definition

Word Sentiment Classifier

Sentence Sentiment Classifier

Experimental Analysis

Shortcomings

Future works
2
WHAT IS AN OPINION?

An opinion is a quadruple
[Topic, Holder, Claim, Sentiment]
 The Holder believes a Claim about the Topic and in
many cases associates a Sentiment.


Opinion may contain sentiment or not
 e.g.

I believe the world is flat. (absent)
Sentiment can be implicit or explicit
 e.g.
I like apple. (explicit)
 e.g.
We should decrease our dependence on oil (implicit)
3
PROBLEM DEFINITION

Opinion = [Topic, Holder, Claim, Sentiment]

Given
a Topic
 a set of texts about the topic


Find
The sentiments (only positive or negative) about the
topic in each sentence
 Identify the people who hold that sentiment.

4
AUTHORS APPROACH

4 Basic stages

Calculation of the polarity of sentiment bearing
words (Word Sentiment Classifier)

Selection of sentence containing both topic and
holder

Holder based region identification

Combine these polarity to provide the sentence
sentiment (Sentence Sentiment Classifier)
5
WORD SENTIMENT CLASSIFIER


To build a classifier we need a training data
How to generate training data for word
sentiment classifier?

Assemble a small amount of seed words by hand

Seed word list only contains positive and negative
polarity words

Then grow this list by adding synonyms and
antonyms from WordNet [1]
6
WORD SENTIMENT CLASSIFIER
WORDNET
7
WORD SENTIMENT CLASSIFIER
WORDNET (CONTD.)
Figure: An example of the relationship
between Hyponyms and Hypernym [source:
wikipedia]
8
WORD SENTIMENT CLASSIFIER (CONTD.)


Initial Seed word list

Adjectives (15 positive and 19 negative)

Verbs (23 positive and 21 negative)
Final Seed word list

Adjectives (5880 positive and 6233 negative)

Verb (2840 positive and 3239 negative)

Some words e.g. “great”, “strong” appears in both
positive and negative categories.
9
WORD SENTIMENT CLASSIFIER (CONTD.)

Now we have
A set of words
 Each word has a class label (or polarity) of either
positive or negative


How to calculate the strength of the sentiment
polarity?
For a new word w we compute first the synonym set
(syn1, syn2, …, synn) from WordNet .
 Then we compute arg max P(c|w) which is
equivalent to arg max P(c| syn1, syn2, …, synn)
 Here c is sentiment category (positive or negative)

10
WORD SENTIMENT CLASSIFIER (CONTD.)

There are two possible ways to calculate


arg max P(c|w)
Approach 1
arg maxp(c | w)
 arg maxP(c)P(w | c)
 arg maxP(c)P(s yn_1, syn_2,..., syn_n | c)
m
 arg maxP(c)  p(f_k | c)^ count(f_k, synset(w))
k 1
Where f_k is the kth feature of category c.
 And count(f_k,synset(w)) is the total number of
occurrence of f_k in the synonym set of w.

11
WORD SENTIMENT CLASSIFIER (CONTD.)

There are two possible ways to calculate


arg max P(c|w)
Approach 2
arg max p(c | w)
 arg max p(c) p( w | c)
n
 arg max p(c)

 count (syn _ i, c)
i 1
count (c)
Where count(syn_i,c) is the count of occurrence of
w’s synonyms in the list of c.
12
WORD SENTIMENT CLASSIFIER (CONTD.)


word “amusing”, for
example, is classified as
carrying primarily
positive sentiment, and
“blame” as primarily
negative
“afraid” with strength 0.99 represents strong
negativity while “abysmal”
with strength -0.61
represents weaker
negativity.
13
SENTENCE SENTIMENT CLASSIFIER

Consists of 4 parts:

Identification of Topic in the sentence (i.e. direct
matching)

Identification of opinion holder

Identification of region

Development of model to combine sentiments
14
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
HOLDER IDENTIFICATION

Assumption
Person and organization are the only opinion holder
 For sentence with more than holder just pick the
closest one to Topic.


Method

BBN named entity tagger identifier [2]

A software tool
[http://www.bbn.com/technology/speech/identifinder]
15
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
SENTIMENT REGION IDENTIFICATION
Where to look for the sentiment?
 Proposed different sentiment region

Window 1
Full sentence
Window 2
Words between holder and
Topic
Window 3
Window2 ± 2
Window 4
Window 2 to the end of the
sentence
16
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
CLASSIFICATION MODEL

3 different models

Model 0:



(signs in region)
Signs can be positive or negative
Model 1:

Harmonic mean of the sentiment in the region
1 n
p (c | s ) 
p (c | w _ i )

n(c) i 1
if argmax p(c_j | w_i)  c
17
SENTENCE SENTIMENT CLASSIFIER (CONTD.)
CLASSIFICATION MODEL

Model 1 (Contd.)
n( c) is the number of words in the region whose
sentiment category is c.
 s is the sentiment strength


Model 2

Geometric mean of the sentiment in the region
n
p (c | s )  10^ (n(c)  1) x  p(c | w_i)
i 1
if arg max p (c _ j | w _ i )  c
18
SYSTEM ARCHITECTURE
19
EXPERIMENTAL ANALYSIS

Two set of experiments for

Word Sentiment Classifier

Sentence Sentiment Classifier
20
EXPERIMENTAL ANALYSIS (CONTD.)
WORD SENTIMENT CLASSIFIER

Dataset
Word List from TOEFL exam
 A predefined list

Containing 19748 English Adjectives
 And 8011 English Verbs




Take an intersection of above two lists.
Finally take randomly 462 adjectives and 502 verbs.
Classification of dataset


Human 1 and Human 2: label adjectives
Human 2 and Human 3 : label verbs
21
EXPERIMENTAL ANALYSIS (CONTD.)
WORD SENTIMENT CLASSIFIER
Class Label
Positive, Negative and Neutral
 Measurement Type

Strict – Consider all class label
 Lenient – Two Class Label Negative and Positive
merged with neutral

22
Table: Inter Human Agreement
EXPERIMENTAL ANALYSIS (CONTD.)
WORD SENTIMENT CLASSIFIER
Table: Human-Machine Agreement (Small Seed Set)
23
Table: Human-Machine Agreement (Larger Seed Set)
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER

Dataset
100 sentences from the DUC 2001 Corpus
 Topics covered: “illegal alien”, “term limit”, “gun
control” and “NAFTA”


Classification of Sentence
100 sentences from the DUC 2001 Corpus [3]
 Two human classify the sentence into three class
label : positive, negative and N/A.

24
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER

Experiment Variants
Three different models
 Four different windows
 Two different word classifier models
 Manual annotated holder vs. automatic holder


So in total 16 different variants for each model 1
and model 2 and 8 different variants for model 0.
25
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER
26
Table: Results with manually
annotated Holder
Table: Results with automatic
Holder
EXPERIMENTAL ANALYSIS (CONTD.)
SENTENCE SENTIMENT CLASSIFIER

Performance Matrix

Correctness

Correct identification of both holder and sentiment
Best Model : Model 0
 Best Window : window 4


Accuracy
81% accuracy obtained on manually annotated
holder
 67% accuracy obtained on automatic holder

27
SHORTCOMINGS

Consider only unigram model.
As a result, for some words having both positive and
negative sentiment this model will fail.
 E.g.: Term limit really hit at democracy.


Model cannot infer sentiment from fact
Absence of adjective, verb and noun sentiment word
prevents classification.
 E.g.: She thinks term limit will give women more
opportunities in politics.

28
FUTURE WORK

One of assumption of this work is that the topic is
given.
Can we extract topic automatically?
 E.g: Twitter HashTag ??


Not only positive or negative sentiment

Context dependent sentiment (Bi-gram or ti-gram
analysis)
29
REFERENCES



[1] Miller, G.A., R. Beckwith, C. Fellbaum, D.
Gross, and K. Miller. 1993. Introduction to
WordNet: An On-Line Lexical Database.
http://www.cosgi.princeton.edu/~wn.
[2] BBN named entity tagger identifierhttp://www.bbn.com/technology/speech/identifind
er
[3] DUC 2001 Corpus. http://wwwnlpir.nist.gov/projects/duc/data.html
30