Sentiment Analysis
Download
Report
Transcript Sentiment Analysis
Sentiment Analysis
An Overview of Concepts and
Selected Techniques
Terms
Sentiment
A thought, view, or attitude, especially one
based mainly on emotion instead of reason
Sentiment Analysis
aka opinion mining
use of natural language processing (NLP) and
computational techniques to automate the
extraction or classification of sentiment from
typically unstructured text
Motivation
Consumer information
Marketing
Consumer attitudes
Trends
Politics
Product reviews
Politicians want to know voters’ views
Voters want to know policitians’ stances and who else
supports them
Social
Find like-minded individuals or communities
Problem
Which features to use?
Words (unigrams)
Phrases/n-grams
Sentences
How to interpret features for sentiment
detection?
Bag of words (IR)
Annotated lexicons (WordNet, SentiWordNet)
Syntactic patterns
Paragraph structure
Challenges
Harder than topical classification, with
which bag of words features perform well
Must consider other features due to…
Subtlety of sentiment expression
Domain/context dependence
irony
expression of sentiment using neutral words
words/phrases can mean different things in different
contexts and domains
Effect of syntax on semantics
Approaches
Machine learning
Naïve Bayes
Maximum Entropy Classifier
SVM
Markov Blanket Classifier
Accounts for conditional feature dependencies
Allowed reduction of discriminating features from
thousands of words to about 20 (movie review
domain)
Unsupervised methods
Assume pairwise
independent features
Use lexicons
LingPipe Polarity Classifier
First eliminate objective sentences, then
use remaining sentences to classify
document polarity (reduce noise)
LingPipe Polarity Classifier
Uses unigram features extracted from
movie review data
Assumes that adjacent sentences are
likely to have similar subjective-objective
(SO) polarity
Uses a min-cut algorithm to efficiently
extract subjective sentences
LingPipe Polarity Classifier
Graph for classifying three items.
LingPipe Polarity Classifier
Accurate as baseline but uses only 22% of
content in test data (average)
Metrics suggests properties of movie
review structure
SentiWordNet
Based on WordNet “synsets”
Ternary classifier
http://wordnet.princeton.edu/
Positive, negative, and neutral scores for each
synset
Provides means of gauging sentiment for
a text
SentiWordNet: Construction
Created training sets of synsets, Lp and Ln
Start with small number of synsets with fundamentally
positive or negative semantics, e.g., “nice” and “nasty”
Use WordNet relations, e.g., direct antonymy, similarity,
derived-from, to expand Lp and Ln over K iterations
Lo (objective) is set of synsets not in Lp or Ln
Trained classifiers on training set
Rocchio and SVM
Use four values of K to create eight classifiers with
different precision/recall characteristics
As K increases, P decreases and R increases
SentiWordNet: Results
24.6% synsets with Objective<1.0
Many terms are classified with some degree of
subjectivity
10.45% with Objective<=0.5
0.56% with Objective<=0.125
Only a few terms are classified as definitively
subjective
Difficult (if not impossible) to accurately
assess performance
SentiWordNet: How to use it
Use score to select features (+/-)
e.g. Zhang and Zhang (2006) used words in
corpus with subjectivity score of 0.5 or greater
Combine pos/neg/objective scores to
calculate document-level score
e.g. Devitt and Ahmad (2007) conflated
polarity scores with a Wordnet-based graph
representation of documents to create
predictive metrics
References
1.
http://www.answers.com/sentiment, 9/22/08
B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment
classification using machine learning techniques,” in Proc Conf
on Empirical Methods in Natural Language Processing (EMNLP),
pp. 79–86, 2002.
Esuli A, Sebastiani F. SentiWordNet: A Publicly Available Lexical
Resource for Opinion Mining. In: Proc of LREC 2006 - 5th Conf
on Language Resources and Evaluation, 2006.
Zhang E, Zhang Y. UCSC on TREC 2006 Blog Opinion Mining.
TREC 2006 Blog Track, Opinion Retrieval Task.
Devitt A, Ahmad K. Sentiment Polarity Identification in Financial
News: A Cohesion-based Approach. ACL 2007.
Bo Pang , Lillian Lee, A sentimental education: sentiment
analysis using subjectivity summarization based on minimum
cuts, Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics, p.271-es, July 21-26, 2004.