Transcript Document

A GACP and GTMCP company
How to perform predictive analysis on your
web analytics tool data
January 23rd, 2014
7/18/2015
#tatvicwebinar
3 Type of Analytics…
A GACP and GTMCP company
Descriptive:
What has
happened?
Analytics
Predictive:
Predicts the
outcome or
future
7/18/2015
Prescriptive:
What should
happen?
#tatvicwebinar
Scope for Today
A GACP and GTMCP company
Descriptive:
What has
happened?
Analytics
Predictive:
Predicts the
outcome or
future
7/18/2015
Prescriptive:
What should
happen?
#tatvicwebinar
In other words…
A GACP and GTMCP company
Predictive Analytics
“Technology that learns from experience (data) to
predict the future behavior of individuals in order
to drive better decisions.”
Source: Siegel, E. (2013) “Predictive Analytics. The power to predict who will click, buy, lie or die.”
7/18/2015
#tatvicwebinar
Why it matters
7/18/2015
A GACP and GTMCP company
#tatvicwebinar
Applications
7/18/2015
A GACP and GTMCP company
#tatvicwebinar
Generic mental model
A GACP and GTMCP company
• Individual suggestions, preferences ,
clustering, & reaction of cohort to stimuli
• Predict Failures, errors that costs big
• Economic Value of Prediction
7/18/2015
#tatvicwebinar
Myth Busting
A GACP and GTMCP company
• You don’t need to be a PhD statistician to build
predictive models
• A predictive model shouldn’t be a black box
• Even if you know your data, modeling can help
• Predictive models can be implemented quickly
• Predictive models enhance human judgment,
not replace it
7/18/2015
#tatvicwebinar
Outline
A GACP and GTMCP company
Predictive Analytics
7/18/2015
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
#tatvicwebinar
Outline
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
7/18/2015
#tatvicwebinar
Introduction to R
What
A GACP and GTMCP company
• Open source statistical computing language, widely used by
organizations to solve business problems.
• Data Analysis
• Statistical Tests
• Data Visualization
• Predictive Model
• Easy to integrate
• Data frame
•
• Choose and download
a user-friendly GUI
• Forecasting
Applications
Why
How to get
started
7/18/2015
Download
and install
• Pre developed
packages
RStudio
#tatvicwebinar
R Packages
Categories of Packages
Data Extraction
A GACP and GTMCP company
For this webinar
• RGoogleAnalytics
Usage: To extract Google Analytics data into R
Contibutors: Michael Pearmain, Nick Mihailovski,
Amar Gondaliya and Vignesh Prajapati
Data Visualization
• ggplot2
Usage: Build plots and charts
Contibutor: Hadley Wickham
Time Series
Machine Learning
7/18/2015
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
7/18/2015
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
7/18/2015
#tatvicwebinar
Google Analytics data
A GACP and GTMCP company
Extracting your GA data into R
User performing
data extraction
Google OAuth2
Authorization
Server
Google Analytics
API
Access Token Request
Access Token Response
Call API for list
of profiles
Call API for
query
7/18/2015
#tatvicwebinar
Business Problem
A GACP and GTMCP company
Product return
“Returns are on the rise-up 19% from 2007. For every US$1 spent on merchandize, 9¢ are returned.”
“Average return rate for ecommerce retailers varies from 3-12%.”
Source: Time Magazine, Sept. 04th, 2012
Product Return Impact (per day)
Average Return Rate
9%
7%
Average Order Value
$100
$100
Orders Per Day
500
500
Total Income
$50,000
$50,000
Loss due to returns
$4,500
$3,500
Revenue post loss
$45,500
$46,500
-----
$1000
Increase in Revenue/day
7/18/2015
Increase in Revenue with
recovered returns in long run
Month
x30
$30,000
Year
x365
$365,000
#tatvicwebinar
Outline of this webinar
A GACP and GTMCP company
Predictive Analytics
Tool
Data
Model
R
Google
Analytics
Logistic
Regression
Visualization
7/18/2015
#tatvicwebinar
Data Introduction
A GACP and GTMCP company
Transactional Data
7/18/2015
Pre Purchase
Data
Browsing Behavior up to
shopping cart
In Purchase
Data
Purchase Behavior from shopping
cart to thank you page
Post Purchase
Data
Delivery Period, Location,
amount of time to deliver,
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
7/18/2015
#tatvicwebinar
Machine Learning Tech.
A GACP and GTMCP company
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g. Spam Detection)
Variables
Supervised Learning Model
Labels are right answers
from historical data
Training
Data
Machine
Learning
Algorithm
Labels
eg. Spam Detector
Input Data: Contains
emails marked Spam/No
Spam
Variables
Test Data
7/18/2015
Predictive
Model
Predicted
Outcome
labels
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
7/18/2015
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
7/18/2015
#tatvicwebinar
Feature engineering
A GACP and GTMCP company
Going beyond algorithms and using domain knowledge to augment new
variables to model
•
•
•
•
E.g.: Products purchased as gifts are less likely to be returned
Create a New Variable with binary values: 1 – Product purchased as gift, 0 –
otherwise
Products purchased in holiday season are more likely to be returned
Based on Purchase date, create new variable with binary values: 1 – Product
purchased in the month Nov-Dec, 0 - otherwise
7/18/2015
#tatvicwebinar
Predictor/Response Variables
A GACP and GTMCP company
700,000.00
Price of House ($)
Response Variable
800,000.00
600,000.00
500,000.00
400,000.00
300,000.00
200,000.00
100,000.00
0.00
0
500
1,000
1,500
2,000 2,500 3,000
Size of House (sq ft)
3,500
4,000
4,500
5,000
Predictor Variable
7/18/2015
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
7/18/2015
#tatvicwebinar
Generalized Linear Models
A GACP and GTMCP company
glm (formula, family, data)
Formula
Response ~ Predictor (This argument shows which all variables are
independent (predictor) variables and which variable is/are
dependent(response) variable/s
Family
Binomial (Since the output variable (which is product return is
defined as binary value 0 or 1, we are using binomial family)
Data
Train data set – This data set consists values of all 18 variables (i.e.
values of dependent variables and independent variables are
given). This dataset is also called labeled data.
7/18/2015
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
7/18/2015
#tatvicwebinar
Modeling
A GACP and GTMCP company
Loading Input Data
Introducing Model Variables
Model Creation
Model Performance
Applying Model to Test Data
7/18/2015
#tatvicwebinar
Machine Learning Tech.
A GACP and GTMCP company
Supervised Learning
Generates a function that maps inputs (labeled data) to desired outputs (e.g. Spam Detection)
Variables
Supervised Learning Model
Labels are right answers
from historical data
Training
Data
Machine
Learning
Algorithm
Labels
e.g.: Spam Detector
Input Data: Contains
emails marked Spam/No
Spam
Variables
Test Data
7/18/2015
Predictive
Model
Predicted
Outcome
labels
#tatvicwebinar
Summary
A GACP and GTMCP company
Probability of product return > 60%
Number of Transactions
Probability of product return ≤ 60%
> 60 %
≤ 60 %
> 60 %
< 60 %
Probability of Product Returns
 Call customer before shipping
 Send discount coupon to initiate customer for future purchase
7/18/2015
#tatvicwebinar
Myth Busting
A GACP and GTMCP company
• You don’t need to be a PhD statistician to build
predictive models
• A predictive model shouldn’t be a black box
• Even if you know your data, modeling can help
• Predictive models can be implemented quickly
• Predictive models enhance human judgment,
not replace it
7/18/2015
#tatvicwebinar
A GACP and GTMCP company
Thank you!
7/18/2015
#tatvicwebinar