G18 - Spatial Database Group

Download Report

Transcript G18 - Spatial Database Group

DNC-Big Data and Data
Mining in 2012 US Election
Azamat Kamzin
Mandar Bhide
Overview
•
•
•
•
•
•
Highlights of Narwhal
System Organization
Classification
Associative patterns
Predictive models
References
Highlights
•
•
•
•
•
•
•
Codename: Narwhal
Budget:$100 million
Lead Developer: Scott VanDenPlas
Chief Analytics: Dan Wagner
Team: Approx. 200 members
General Objective:
o Bring together information on voters, supporters, donorsat one place(
unlike in 2008 where information was split 6 different servers/vendors)
It was top 20 largest consumer/customer databases ever made
o Size: As per VanDenPlas tweet
 “4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters,
180TB and 8.5 billion requests...”
 (Service Provider:Amazon Cloud)
System Organization
2008 Voter
databases
DreamCatcher
Call/Email to
motivate the voter
Best Channel and
timeslot
to advertise
Narwhal
•
Private/
Public
Databases
•
Automated 1.2
million call
survey per day
•
Tracking visitors
behavior online
using cookies
•
•
Data
Collection
/Enrichment
Level of support for
Obama
Likelihood to vote
Estimate donation
Amount
Directing
volunteers to right
door
Right email Ad to
right person
Dreamcatcher -Voter Classification
•
Classification was done in 4 categories
Dreamcatcher:Association Pattern
•
•
Output: Detailed profile of voters
Inputs are attributes of each
individual stored in Narwhal
o Voting history
o Social media Likes, comments
o Volunteering
o Magazine subscriptions
o Registered car
o Insurance data
o Individual Private Information
from firms like Aristotle
Predictive Models
•
A/B Testing:
 To understand which image or text user response will be
higher
 Ex. “Learn More” garnered 18.6 percent more signups per
visitor than the default of “Sign Up.”
•
Time Series Analysis:
 To understand Approval and disapproval trend
Predictive Models
•
•
Regression
o Used to calculate Electoral votes(dependent variable) based on top
issues such as economy, healthcare etc.
o Packages used were SAS, R and MATLAB
Decision Trees
o We don’t believe they used decision trees due to large number of
attributes which differ with each individual
Reference
•Author: Michael Scherer
( November8, 2012). “How Obama's data crunchers
helped him win” . Retrieved from http://www.cnn.com/2012/11/07/tech/web/obamacampaign-tech-team
•Author: Sasha Issenberg (December 19, 2012). “How President Obama’s campaign
used big data to rally individual voters” . Retrieved from
http://www.technologyreview.com/featuredstory/509026/how-obamas-team-used-bigdata-to-rally-voters/