Construction of a sentimental word dictionary

Download Report

Transcript Construction of a sentimental word dictionary

Construction of a Sentimental Word Dictionary
Eduard C. Dragut
Clement Yu
Prasad Sistla
Weiyi Meng
Cyber Center
Purdue University
Computer Science Department
University of Illinois at Chicago
Computer Science Department
University of Illinois at Chicago
Computer Science Department
Binghamton University
Motivation
Sentimental Word Dictionary
Why?
 It facilitates opinion mining and opinion retrieval
What?
 Reviews, comments about products, services,
government policies reviewing.
Where?
 The Web has plenty of reviews, comments and
reports about products, services, etc.
Bottom line:
 Many approaches rely on lexicons of polar words.
Building the Sentimental Dictionary
Construct the sentimental dictionary on
top of the electronic dictionary WordNet
The idea:
1. Start from a small number of words whose
polarities are known.
2. Propagate the polarities to the synsets.

 each word with a given part of speech has polarity p
(positive or negative) if the majority sense of the
word with that part of speech has polarity p
 E.g., the word “bland” has 3 senses, out of which 2
have negative polarity and 1 a positive polarity.
Why such a property is important?
 Deduction of other sentimental words with the same
property.
 Detection of inconsistencies in input dictionaries.
A deduction approach
 A set of about 20 inference rules.
The resulting sentimental word dictionary
contains approximately 50% more words than
the seed dictionary.
The accuracy of the deduced polarities is
comparable to that of human judgment.
 3 students were asked to evaluate 100 randomly
chosen words.
Synset Polarity Inference
Synset Polarity Inference
Example of an inference rule with one word
Hypothesis: w a word with polarity p
(positive or negative) and two synsets.
0.5
s1
Hypothesis: w a word with polarity p
(positive or negative).
→
0.5
s2 p
→
s1
s2
4. Go to 2, until no more polarities are inferred.
Experiments: Automatic Discovery
Data sets
 WordNet [Fellbaum98] and
 3 sentimental dictionaries General Inquirer [Stone96],
Appraisal Lexicon [Taboada04] and Opinion Finder
[Wilson05]
Take the union of the 3 sentimental dictionaries
POS
Input Words
Inferred Words
Inferred Synsets
Noun
2,315
1,460
1,683
Verb
1,617
844
1,079
Adjective
2,937
1,407
1,907
Adverb
925
364
430
Total
7,794
4,075
5,099
– “advance” has positive polarity in General Inquirer;
– It has two senses with identical relative frequencies in WordNet;
– Hence, we deduce that both its synsets have positive polarities.
Experiments: Accuracy
100 words were randomly chosen from the 4075
words according to their distributions
 Nouns (22.2%), adjectives (37.5%), adverbs (11.5%) and
verbs (28.8%)
3 humans judge their deduced polarities.
 The agreement between humans is 62%.
 The agreement between humans and automatic deduction is
63.3%.
Reason for low agreement between humans:
 preconceived notions of the polarities of words/phrases
 E.g., “eat at”.
References
[Fellbaum98] C. Fellbaum. Wordnet: An on-line lexical database and some of its applications. 1998.
[Stone96] P. Stone, D. Dunphy, M. Smith, and J. Ogilvie. The general inquirer: A computer approach to content
analysis. In MIT Press, 1996
[Taboada 04] M. Taboada and J. Grieve. Analyzing appraisal automatically. In AAAI Spring Symposium on
Exploring Attitude and Affect in Text, 2004
sn
…
with known
polarities ≠ p
• Example:
Use the majority sentiment definition.
p w
0.5
p s1
s2
Conclusion:
p w
p w
p w
0.5
Conclusion:
Example of an inference rule
Use inference rules
3. Determine the polarities of new words

The proposed dictionary has the majority
sentiment property
Contribution
•unknown polarities
•sum of relative
frequencies > 0.5
s1
s2
with known
polarities ≠ p
…
sn
all have polarity p
• Example:
– “consumate” has positive polarity in General Inquirer.
Related Work
WordNet-based
 SentiWordNet[Esuli06] assigns degrees of polarities
 Q-WordNet [Agerri10] starts from 6 synsets with “known”
polarities:
• “positive”, “negative”, “good”, “bad”, “inferior” and “superior”.
• Propagates the polarities using the semantic relationships
• E.g., antonym, hypernym, etc.
 Measuring relative distance of a term from exemplars
[Kamps04]
 E.g., “good” and “bad”
Corpora-based
References
[Wilson05] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In
HLT/EMNLP, 2005.
[Esuli06] A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In LREC, 2006.
[Agerri10] R. Agerri and A. Garc´ıa-Serrano, Q-wordnet: Extracting polarity from wordnet senses, in LREC, 2010.
[Kamps04] J. Kamps, M. Marx, R. Mokken, and M. de Rijke, Using wordnet to measure semantic orientation of adjectives, in
LREC, 2004.