Opinion Mining Using Econometrics: A Case Study on

Download Report

Transcript Opinion Mining Using Econometrics: A Case Study on

Opinion Mining Using
Econometrics: A Case Study on
Reputation Systems
Anindya Ghose, Panagiotis G. Ipeirotis, and Arun
Sundararajan
Leonard N. Stern School of Business, New York
University
ACL 2007
Questions/Challenges
• What makes an opinion positive or
negative? Is there an objective measure
for this task?
• How can we rank opinions according to
their strength? Can we define an
objective measure for ranking opinions?
• How does the context change the
polarity and strength of an opinion and
how can we take the context into
consideration?
Introduction
•
•
•
•
•
Reputation systems in electronic markets
Pricing power of merchants in Amazon.com
Using 9,500 transactions over 180 days
Textual feedback and star rank
Discover polarity and strength without the
need for human annotations or linguistic
resources.
Arguments
• Textual feedback affects the power of
merchants to charge higher prices than the
competition for the same product and still
make a sale.
Reputation Systems
• A reputation profile
– Past transactions for the merchant.
– Numerical ratings from buyers who have
completed transactions.
– Chronological list of textual feedback provided
by these buyers.
Price Premiums
• Price premium/ relative price premium/
relative average price premium
Data
• Transaction Data:
– 1,078 merchants
– 9,487 unique transactions
– 107,922 price premiums
• Reputation Data:
– 4,932 postings per merchant
– Numerical ratings: one to five stars
– Reconstruct each seller’s exact feedback profile
at the time of each transaction
Econometrics-based
Opinion Mining
• Retrieving the dimensions of reputation
– Features expressed by noun, noun phrase, verb,
verb phrase.
– For example,
– X1 might be shipping, X2 might be packaging.
Reputation dimension example
• X=(delivery, packaging, service)
• I was impressed by the speedy delivery!
Great service! (post 1)
• The item arrived in awful packaging and the
delivery was slow. (post 2)
Scoring the dimension of
reputation
• Construct an n x p matrix M(si)
• A total of 151 unique dimensions, and a
total of 142 modifiers.
• c is the prob of clicking on the “Next” link.
• K is the number of postings that appear on
each page.
• Posting–specific
weight
Posting – specific
weight example
• Weight is dropped exponentially if the page
number is increased.
• If K = 25, total post = 51,
weight of post 1 (page 3)= 1/(25+25c+c2)
weight of post 26 (page 2) = c/(25+25c+c2)
weight of post 51 (page 1)= c2/(25+25c+c2)
• Score of modifier-dimention (feature) pair:
Reputation Score
• Modifier-dimention pair score = strength
• Feature weight = polarity
• A correlation between the appearance of modifierdimension opinion phrase (
) of the merchants
and the price premiums observed for each
transaction.
Scoring by regression
•
•
•
•
•
Regressor coefficients
Control variables
Fixed effects
The error term
Score: counting appearances and weighting each
appearance using the definition of ri
• Variations:
Regression Settings
• Predicting
• Control variables:
–
–
–
–
The product’s price on Amazon
The average star rating of the merchant
The number of merchant’s past transactions
The number of sellers for the product
Experimental Results
• Human Recall by two annotators, a random
sample of 1,000 posts:
• Computer Recall: average over two annotators
• Precision is not an issue, noise will be ruled out
by regression
Estimating polarity
and strength
• Good packaging =
-$0.58
Price Premiums vs. Ratings
• Many researches assume that text feedback
will not influence the buyers. They used
only rating stars as a summary of opinions.
• Examine R2 fit of the regression, with and
without the use of the text variables.
Without: R2 = 0.35; With: R2 = 0.63
• Text contains significantly more
information than the ratings.
Prediction
• Predict which merchant will make a sale.
• C4.5 classifier, 4 months data as training
and 2 months data as testing.
Possible application
• Exam the effect of product reviews on product
sales and detect the weight that customers put on
different product features.
• The analysis of the effect of news stories on stock
prices; how opinion holders and their wordings
can cause the market to move up or down.
• Extract the pragmatic effect of news and blogs on
elections or other political events.