Seminar Slides

Download Report

Transcript Seminar Slides

Sentiment Detection
Rik Sarkar (03305048)
Kedar Godbole (03305805)
Outline
 Sentiment detection: the problem statement
 Difficulties in sentiment detection
 Approaches to sentiment detection
 Conclusion
 Project proposal
Problem Statement
 Detect the polarity about a particular topic in
a document
Polarity:
- Positive
- Negative
- Mixed
- Neutral
Motivation
Reviews on the Web
 Opinions about a product
 Opinions about the individual aspects of a
product
 Movie/book reviews
 Feedback/evaluation forms
Issues
 Reference to multiple objects in the same
document
- The NR70 is trendy. T-Series is fast becoming
obsolete.
 Dependence on the context of the document
- “Unpredictable” plot ; “Unpredictable” performance
 Negations have to be captured
- Monochrome display is not what the user wants
Issues (contd.)
 Metaphors/Similes
- The metallic body is solid as a rock
 Part-of and Attribute-of relationships
- The small keypad is inconvenient
 Absence of a polar word
- How can someone sit through this seminar?
Approaches to Sentiment Detection
 Based on pre-selected sets of words
 Naive Bayes
 Support Vector Machines
 Unsupervised learning
 Enhancement by NLP
An Unsupervised Learning Technique
Extract phrases from the review based on patterns of
POS tags
 JJ – Adjective
First word
Second word
 RB – Adverb
JJ
NN
 NN – Noun
RB
JJ
JJ
JJ
NN
JJ
Unsupervised Learning
PointWise Mutual Information (PMI)
and Semantic Orientation (SO)
PMI(word1, word2) =
 p( word1 & word 2) 
log 

 p( word1) p( word 2) 
SO (phrase) = PMI (phrase, ”excellent”)
– PMI (phrase, “poor”)
Unsupervised Learning
 Determine the Semantic Orientation (SO) of the
phrases
 Search on AltaVista
 SO (phrase) =
 hits ( phraseNEAR" excellent" )hits (" poor" ) 
log 

 hits ( phraseNEAR" poor" )hits(" excellent" ) 
Unsupervised Learning
Calculate average semantic orientation of document:
Extracted
phrase
POS tags
Semantic
Orientation
Low fees
JJ NN
0.333
Online service
JJ NN
2.780
Inconveniently
located
RB VB
-1.541
Average Semantic Orientation = 0.524
Need for NLP
 Identifying phrases is not enough – need to
identify subject/object
- The NR70 is trendy. T-Series is fast becoming
obsolete.
 Need to identify part-of and attribute-of
relationship
- The battery is long-lasting
Focus of the sentiment
Feature/attribute terms:
 BNP - Base Noun Phrases
- battery, display, keypad
 dBNP - Definite Base Noun Phrases
- “the display”
 bBNP - Beginning Definite Base Noun
Phrases
- “The battery is long-lasting”
Sentiment Analyzer
 Sentiment lexicon database
- <lexical_entry> <POS> <sent_category>
- “excellent” JJ +
 Sentiment pattern database
- <predicate> <sent_category> <target>
- “I am impressed with the flash capabilities”
- impress + PP(by;with) target
SA (contd.)
 Identify sentences containing feature terms
 Ternary expressions (T-expressions)
- +ve/-ve sentiment verbs
<target, verb, “”>
- trans verbs
<target, verb, source>
 Binary expressions (B-expressions)
- <adjective, target>
SA (contd.)
 Identify sentiment phrases within subject,
object phrases
 Associating sentiment with the target
- Based on sentiment patterns
“I was impressed by the flash capabilities”
“This camera takes excellent pictures”
- Based on B-expressions
“Poor performance in a dark room”
Other issues
 Position of the sentiment words
- Words at the beginning and end of a review
 Sentiment about the characters in the movie
versus Sentiment about the actors in the
movie – abstraction.
“He played the role of a very corrupt politician”
“He played the role brilliantly”
Conclusion
 Sentiment detection can be used in areas
ranging from marketing research to movie
reviews.
 Sentiment Detection is a “hard” problem due
to context-sensitivity, complex sentences, etc.
 Statistical methods should be augmented
with NLP techniques.
References
 Yi, Nasukawa, et al. Sentiment Analyzer: Extracting
Sentiments about a Given Topic using NLP
techniques. Proceedings of the Third IEEE
International Conference on Data Mining, p. 427, Nov
19-22, 2003
 Peter D. Turney. Thumbs Up or Thumbs Down?
Semantic Orientation Applied to Unsupervised
Classification of Reviews. Proceedings of the 40th
Annual Meeting of ACL, p. 417-424, 2002
 Matthew Hurst and Kamal Nigam. Retrieving Topical
Sentiments from Online Document Collections.
Document Recognition and Retrieval XI, p. 27-34,
2004
References (contd.)
 B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up?
Sentiment classification using Machine Learning
techniques. Proceedings of the 2002 ACL EMNLP
Conference, p. 79-86, 2002
Project
 Sentiment analyzer for a specific domain
 Given set of features, initial list of polar words
 Learns new polar words from documents
analyzed