Opinion Analysis 1
Download
Report
Transcript Opinion Analysis 1
Opinion Analysis
Sudeshna Sarkar
IIT Kharagpur
Introduction – facts and opinions
Two main types of information on the Web.
•
Current search engines search for facts (assume
they are true)
•
Facts and Opinions
Facts can be expressed with topic keywords.
Search engines do not search for opinions
•
•
Opinions are hard to express with a few keywords
•
How do people think of Motorola Cell phones?
Current search ranking strategy is not appropriate for
opinion retrieval/search.
Overview
Motivation
Definitions
Coarse grained vs Fine grained opinion analysis
Opinion Lexicons
Approaches to document level opinion analysis
•
•
•
Lexicon based
Supervised learning approaches
Mixed approaches
Approaches to fine-grained opinion analysis
•
•
Rule based
Learning
Opinion mining work at IIT Kharagpur
Opinion Mining
Search for and aggregate opinions from online sources
Many reviews have both positive and negative
sentences
Many products are liked by some and disliked by
others – there must be different reasons
Identify different features/ aspects of the target and the
opinion on these separately
Why do opinion analysis?
Opinion search
•
Opinion question answering
•
•
to extract examples of particular types of positive or
negative statements on some topic.
What is the reaction to the Left Front’s stand on the nuclear deal?
Is support diminishing for the UPA government?
Product review mining
•
What features of “Mr Coffee programmable coffee maker” do users
like and what they dislike (Microsoft Live)
Review classification
Tracking sentiment toward topics over time
•
to track the ups and downs of aggregate attitudes to a brand
or product
Introduction – Applications
Businesses and organizations: product and service benchmarking.
Market intelligence.
•
Business spends a huge amount of money to find consumer sentiments
and opinions.
• Consultants, surveys and focused groups, etc
Individuals: interested in other’s opinions when
•
•
•
Purchasing a product or using a service,
Finding opinions on political topics,
Many other decision making tasks.
Ads placements: Placing ads in user-generated content
•
•
Place an ad when one praises an product.
Place an ad from a competitor if one criticizes an product.
Opinion retrieval/search: providing general search for opinions.
Question Answering
Opinion question answering:
Q: What is the international reaction
to the reelection of Robert Mugabe
as President of Zimbabwe?
A: African observers generally
approved of his victory while
Western Governments denounced
it.
Opinion search
(Liu, Web Data Mining book, 2007)
Can you search for opinions as conveniently
as general Web search?
Whenever you need to make a decision, you
may want some opinions from others,
•
Wouldn’t it be nice? you can find them on a search
system instantly, by issuing queries such as
•
•
Opinions: “Motorola cell phones”
Comparisons: “Motorola vs. Nokia”
Cannot be done yet!
Typical opinion search queries
Find the opinion of a person or organization (opinion holder)
on a particular object or a feature of an object.
•
Find positive and/or negative opinions on a particular object
(or some features of the object), e.g.,
•
•
E.g., what is Bill Clinton’s opinion on abortion?
customer opinions on a digital camera,
public opinions on a political topic.
Find how opinions on an object change with time.
How object A compares with Object B?
•
Gmail vs. Yahoo mail
Find the opinion of a person on X
In some cases, the general search engine can
handle it, i.e., using suitable keywords.
•
Bill Clinton’s opinion on abortion
Reason:
•
•
•
One person or organization usually has only one
opinion on a particular topic.
The opinion is likely contained in a single document.
Thus, a good keyword query may be sufficient.
Find opinions on an object X
We use product reviews as an example:
Searching for opinions in product reviews is different from
general Web search.
• E.g., search for opinions on “Motorola RAZR V3”
General Web search for a fact: rank pages according to
some authority and relevance scores.
•
•
The user views the first page (if the search is perfect).
One fact = Multiple facts
Opinion search: rank is desirable, however
•
•
reading only the review ranked at the top is dangerous because it
is only the opinion of one person.
One opinion Multiple opinions
Search opinions (contd)
Ranking:
•
•
produce two rankings
•
•
Positive opinions and negative opinions
Some kind of summary of both, e.g., # of each
Or, one ranking but
•
The top (say 30) reviews should reflect the natural distribution
of all reviews (assume that there is no spam), i.e., with the
right balance of positive and negative reviews.
Questions:
•
•
Should the user reads all the top reviews? OR
Should the system prepare a summary of the reviews?
User generated content
Word of mouth on the web.
•
•
•
•
•
Review sites
Blogs
Online forums
Shopping comparison sites
User reviews
Mine opinions expressed in the usergenerated content
•
•
Challenging task
Useful to individual consumers and companies.
Motivation for Consumer
I want to buy a camera.
Which model should I pick?
•
•
CEA-CNET Study: Tech-Savvy Consumers Use Internet to
Research Products Before Buying Them
•
Ask my friends
Use the internet
Wireless News, November, 2007
Seventy Percent of Consumers Use Internet to Research
Consumer Packaged Goods, According to Prospectiv Survey
•
Market Wire, January, 2008
Businesses
Identify opinions about products – help to
position/ adapt products
Much of product feedback is web-based
•
provided by customers/critiques online through
websites, discussion boards, mailing lists, and
blogs, CRM Portals.
Market research is becoming unwieldy
•
Sources are heterogeneous and multilingual in
nature
Facts vs Opinions
An opinion is a person's ideas and thoughts towards
something. It is an assessment, judgment or evaluation of
something. An opinion is not a fact, because opinions are either
not falsifiable, or the opinion has not been proven or verified. ...
en.wikipedia.org/wiki/Opinion
Subjectivity: The linguistic expression of somebody’s emotions,
sentiments, evaluations, opinions, beliefs, speculations, etc.
Polarity: positive and negative
•
•
This camera is awesome.
The movie is too long and boring.
Strength of opinion
Levels of opinion analysis
Coarse to fine grained opinion analysis
Document level: At the document (or review) level
•
•
Subjective vs Objective
Sentiment classification: positive, negative or neutral
Sentence level, Expression level
Task 1: identifying subjective/opinionated sentences (or clauses/
phrases)
•
Classes: objective and subjective (opinionated)
Task 2: sentiment classification of sentences
•
Classes: positive, negative and neutral.
But a document/ sentence may contain multiple opinions on
more than one topic from one or more opinion holder
Lexicon Development
Manual
Semi-automatic
Fully automatic
Find relevant words, phrases, patterns that can
be used to express subjectivity
Determine the polarity of subjective
expressions
Opinion Words
An opinion lexicon containing lists of positive and negative
phrases is very useful for the opinion mining task at different
levels
Positive: beautiful, wonderful, good, amazing,
Negative: bad, poor, terrible, cost someone an arm and a leg
How to compile such a list?
•
•
Dictionary-based approaches
Corpus-based approaches
•
•
Supervised
Semi-supervised
BUT
•
•
Some opinion words are context independent (e.g., good).
Some are context dependent (e.g., long).
Hand created lists
Create lists of opinion words appropriate
for the domain manually
• Sentiment term
• Polarity
• Strength
These approaches, while being interesting, are
labor intensive and can be vulnerable to error
and high maintenance costs
Dictionary-based approaches
Start from a set of seed opinion words
Use WordNet’s synsets and hierarchies to acquire opinion
words
•
Use the seeds to search for synonyms and antonyms in WordNet (eg, Hu
and Liu, 2004).
21
Dictionary-based approaches
•
Use additional information (e.g., glosses) and learning from
WordNet (Andreevskaia and Bergler, 2006) (Esuti and
Sebastiani, 2005).
22
Dictionary-based approaches
Advantage: Good to find a lot of such words
Weakness: Do not find context dependent opinion
words, e.g., small, long, fast.
23
Corpus-based approaches
Rely on syntactic rules and co-occurrence patterns
to extract from large corpora
•
•
•
Use a list of seed words
A large domain corpus
Machine learning
Advantages: This approach can find domain
(corpus) dependent opinions.
24
How to identify subjective terms?
Assume that contexts are coherent
Statistical Association: If words of the same orientation
like to co-occur together, then the presence of one
makes the other more probable
Use statistical measures of association to capture this
interdependence
Assume that contexts are coherent
Assume that alternatives are similarly subjective
Corpus-based approaches
(contd)
Conjunctions: Conjoined adjectives usually have the same
orientation (Hazivassiloglou and McKeown 1997).
•
1.
2.
3.
4.
E.g., “This car is beautiful and spacious.”(conjunction)
Start with seed words
Use conjunctions to find adjectives with similar orientations
Use log-linear regression to aggregate information from various
conjunctions
Use hierarchical clustering on a graph
representation of adjective similarities to find two groups of same
orientation
26
scenic
nice
painful
handsome
terrible
fun
expensive
comfortable
slow
scenic
nice
handsome
terrible
painful
fun
expensive
comfortable
Growing contextual opinion
words
[Ding, Liu, Wu]
Intra-sentence conjunction rule Opinion on both sides of “and” / two
consecutive sentences tend to be the same
• E.g., “This camera takes great pictures and has a long battery life”.
But with a “but”-like clause, the opinions tend to be of opposite polarity.
Context is important
• Long battery life vs Long time to focus
Growing
• by applying various conjunctive rules
Verifying the results as the system sees more reviews by those
conjunctive rules
Only keep those opinions which the system is confident about,
controlled by a confidence limit.
28
Semantic Orientation by Association
Labeled semantic orientation of words
Pwords = {good, nice, excellent, positive, fortunate, correct,
superior}
Nwords = {bad, nasty, poor, negative, unfortunate, wrong,
inferior}.
SO( word )
A(word , pword )
pwordPwords
A(word , nword )
nwordNwords
Various approach to calculate the semantic association of two
words
•
•
•
Pointwise Mutual Information (PMI) [Church and Hanks 1989]
Latent Semantic Indexing (LSI) Dumais et al. 1990]
Likelihood Ratios [Dunning 1993]
Turney 2002; Turney & Littman 2003
Determine the semantic orientation of each extracted
phrase based on their association with seven positive
and seven negative seed words
p( word1 & word 2 )
PMI ( word1 , word 2 ) log 2
p( word1 ) p( word 2 )
hits( word NEAR p _ query )hits(n _ query )
SO PMI IR( word ) log 2
hits
(
word
NEAR
n
_
query
)
hits
(
p
_
query
)
Weakly spervised learning
Gammon Aue 2005
Given a list of seed words (seed words 1)
Get more seed words (seed words 2)– words with
low PMI at sentence level
Get semantic orientation of (seed words 2) by PMI
at document level
Get Semantic orientation of all words by PMI with
all seed words
Document level opinion analysis
Polarity classification: Classify documents
(e.g., reviews) based on the overall sentiments
expressed by authors,
Approaches
•
•
•
•
•
Use opinion lexicon
Knowledge Engineering
Supervised learning techniques
Classifying using the Web as a corpus
Semi-supervised
Knowledge Engineering
Make use of lists of sentiment terms
Manually create analysis components based
on cognitive linguistic theory: parser, feature
structure representation, etc
Supervised polarity classifier
Requirements: A labeled database of
opinion
• Download ratings from Amazon.com,
epinions.com etc.
Build a binary opinion classifier
• From positive and negative ratings
• Merge 1 and 2 stars to negative and 3, 4 and 5 to
positive
• Use thresholded SVM, maximum entropy,
naïve Bayes, etc.
Supervised Training
1.
Obtain Labeled Sentences: positive, neutral, negative
2.
Extract features: words, n-grams, multi word
expressions, feature generalization [Kim & Hovy 2007]
3.
Feature values: binary/ frequency
4.
Run Training algorithm on the features to give a
classifier
5.
[Optional] Do feature selection (use log-likelihood ratio)
Semi-supervised approaches
Fully supervised techniques require
•
large amount of labeled data for the given domain
Semi-supervised systems
•
Use small amount of domain knowledge
1. From a small set of seed words use domain
corpus to get domain relevant opinion words as
discussed earlier
Semi-supervised approach
Gamon & Aue 2005
1.
Obtain opinion words by semi-supervised
approach
Given a domain corpus, label data using
average semantic orientation
Train classifier on labeled data
2.
3.