ON THE PREDICTABILITY OF the u.s. ELECTIONS THROUGH

Download Report

Transcript ON THE PREDICTABILITY OF the u.s. ELECTIONS THROUGH

CS 315 – Web Search and Data Mining
Overview
 The power of crowdsourcing
 Predicting flu outbreaks
 Predicting “the present” through Google Insights!
 Predicting movie success!
 Predicting elections!
 Predicting elections?
 What can (and cannot) be predicted
 How (not to) predict
Tracking Seasonal Flu through the CDC.gov
 Map taken on
April 18 ->
 Based on
reports from
Hospitals
 Takes a couple
of weeks to
record
google.org/fl
utrends/us
 Map taken on
April 18 ->
 Based on
keywords
being searched
 It is updated
immediately
 Data can be
downloaded,
studied
Why does it work so well?
 “close relationship between
how many people search for flu-related topics and
how many people actually have flu symptoms”
Google Trends predicts flu outbreak!
Observing the crowd
 It makes sense:
People search about things they want to be informed
about, including flu symptoms
 Another example:
Which day of the week there are the most queries with
the term “hangover” in?
Observing the crowd
 It makes sense:
People search about things they want to be informed
about, including flu symptoms
 Another example:
Which day of the week there are the most queries with
the term “hangover” in?
 “Civil war”
what do you
expect to see?
Predicting “the future"
 Sample data
 Not identical when
repeated
 Preserve privacy
 Normalized data
 Peak at 100%
 You can disambiguate
 Apple in computer &
electronics
 Apple in food & drink
 Downloadable
 Must be logged in
Geography
Time
window
Category
Basic Econometrics Forecasting Models
Real Estate Agencies Query Index
•
•
Autoregressive: value at time t depends on
Value at time t-1
20
10
Seasonal adjustment: value at time t depends
on Value at time t-12
0
10
•
•
Transfer function: value at time t depends on
other contemporaneous or lagging variables
Seasonal autoregressive transfer model:
Value at time t depends on
20
30
2006
2007
Real Estate Agencies YOY Growth Index
• Value at time t-12 (seasonality)
• Value at time t-1 (recent behavior)
• Other lagging or contemporaneous
variables (such as Google Trends data)
2008
5
0
5
•
Typical question of interest
• How much more accurate forecasts can you
get from additional variables over and
above the accuracy you get with the history
of the time series itself?
10
15
20
Oct
Jan
Apr
Jul
10
Analysis and Forecasting
Method: Fit other data as best you can, then add Trends data, improve prediction
Model:
Yt = 446.1 + 0.864 * Yt - 1 – 4.340 * us378.1 + 4.198 * us96.2 – 0.001 * AvgPt – 1
Yt : New house sold at t-th month
AvgPt – 1: Average Sales Price of New One-Family Houses Sold at (t-1)-th month
us378.1 : Google Trend of vertical id = 378 (Rental Listings & Referrals ) at t-th month 1st week
us96.2 : Google Trend of vertical id = 96 (Real Estate Agent) at t-th month 2nd week
July 2008
Actual = 515K
Predicted = 442.98K
Z-score = 2.53
August 2008 Prediction = 417.52K
Google Trends “can predict the present”
Predicted with Google Trends
 Home sales
 Movie box-office success
 Product sales (e.g., video games)
 Travel to Hong Kong
 Unemployment rates
 …Consumer behavior, in general? (Goel paper)
 Is there anything that could NOT be predicted with
Google Trends?
 Is Twitter chat volume as good?
Twitter Predicts Movie Box-Office Sales!
Movie buzz creates tweets…
 The rate at which movie tweets are generated
can be used to build a powerful model
for predicting movie box-office revenue,
(better than “gold-standard” Hollywood Stock Exch.)
 Tweet-rate(movie) = tweets(movie)/hour
 Predictions (linear regression):
7-days before release data
thent: #theaters playing
HSX index
Twitter monitors Poll Sentiment (!)
For more information, see “oconnor – tweets to polls AAPOR panel.ppt”
Smoothed (15 days) comparisons
SentimentRatio(”jobs”)
US Presidential elections not predicted
 2008 elections
 SR(“obama”) and
SR(“mccain”) sentiment do
not correlate
 But, “obama” and “mccain”
volume: r = .79, .74 (!)
 Simple indicator of election
news?
 2009 job approval
 SR(“obama”): r = .72
 Looks easier: simple decline
In the meantime, in Germany…
Twitter can Predict Elections (?!)
For more info, see “icwsm2010_Tumasjan-Predicting elections with Twitter.pdf”
Not so fast, speedy…
 It seems that they
forgot the party
with the biggest
tweet share…
Maybe Google Trends can predict US Elections…
Can Google Trends predict elections?
 2008 US Congressional Elections Data Collection
 2010 US Congressional Elections Data Collection
 The Competitors for Prediction:
US congressional elections 2008 & 2010
2008
2010
Total Races
413
441
House Races
381
408
Senate Races
32
33
Highly contested
61
125
Democrats
237
200
Republicans
177
241
“landslide win”
Democrats
Republicans
Prediction of All races
(unfair to Google-trends)
Prediction of races where
one candidate had no G-trends visibility
Prediction of races where
both candidates had G-trends visibility
What about the one success case?
Conclusions
 Google Trends: bad predictor of election results
 Google Trends: Good Predictor of election defeat!
 But what about other Social Media?
 What do YOU think?
High G-trends may be bad news!
Liberal activists try again
unsuccessfully in 2010
Liberal activists openly collaborate
Conservative
activists launch a
to Google-bomb search results
of
political opponents in 2006Tweeter-bomb in Jan. 2010