Consumer Behavior Prediction using Parametric and Nonparametric

Download Report

Transcript Consumer Behavior Prediction using Parametric and Nonparametric

Consumer Behavior Prediction
using Parametric and
Nonparametric Methods
Elena Eneva
Carnegie Mellon University
25 November 2002
[email protected]
Recent Research Projects

Dimensionality Reduction Methods and Fractal Dimension
(with Christos Faloutsos)

Learning to Change Taxonomies
(with Valery Petrushin, Accenture Technology Labs)

Text Re-Classification Using Existing Schemas
(with Yiming Yang)

Learning Within-Sentence Semantic Coherence
(with Roni Rosenfeld)

Automatic Document Summarization
(with John Lafferty)

Consumer Behavior Prediction
(with Alan Montgomery [Business school] and Rich Caruana [SCS])
Outline
Introduction & Motivation
 Dataset
 Baseline Models
 New Hybrid Models
 Results
 Summary & Work in Progress

How to increase profits?

Without raising the overall price level?

Without more advertising?

Without attracting new customers?
A: Better Pricing Strategies
Encourage the demand for products
which are most profitable for the store
Recent trend to consolidate
independent stores into chains
 Pricing doesn’t take into account the
variability of demand due to
neighborhood differences.

A: Micro-Marketing

Pricing strategies should adapt to the
neighborhood demand
 The basis: the difference in interbrand
competition in different stores

Stores can increase operating profit margins
by 33% to 83% [Montgomery 1997]
Understanding Demand

Need to understand the relationship
between the prices of products in a
category and the demand for these
products

Price Elasticity of Demand
Price Elasticity
consumer’s response to price change
percent Q
E
percent P
inelastic
Q is quantity purchased
P is price of product
elastic
Prices and Quantities

Q demanded of a specific product is a
function of the prices of all the products
in that category

This function is different for every store,
for every category
The Function

 
q ~ f ( p)  
 ~ N (0,  )
Category
2
Price of Product 1
Quantity bought of Product 1
Price of Product 2
Quantity bought of Product 2
Price of Product 3
...
Predictor
“I know
your
Price of Product N
customers”
Quantity bought of Product 3
...
Quantity bought of Product N
Need to multiply this across many stores, many categories.
How to find this function?

Traditionally – using parametric models
(linear regression)
Data Example
100000
quantity
80000
60000
40000
20000
0
0.02
0.03
0.04
price
0.05
0.06
Data Example – Log Space
5.25
ln(quant)
4.75
4.25
3.75
3.25
2.75
-1.58
-1.53
-1.48
-1.43
-1.38
ln(price)
-1.33
-1.28
The Function



ln( q ) ~ f (ln( p))  
convert to original space
Price of Product 1
Quantity bought of Product 1
Price of Product 2
Quantity bought of Product 2
convert to ln space
Category
 ~ N (0,  2 )
Price of Product 3
...
Predictor
“I know
your
Price of Product N
customers”
Quantity bought of Product 3
...
Quantity bought of Product N
Need to multiply this across many stores, many categories.
How to find this function?

Traditionally – using parametric models
(linear regression)

Recently – using non-parametric
models (neural networks)
Our Goal
Advantage of LR: known functional form
(linear in log space), extrapolation ability

Advantage of NN: flexibility, accuracy
accuracy

NN
new
LR
robustness

Take Advantage: use the
known functional form to
bias the NN
 Build hybrid models from
the baseline models
Evaluation Measure
RMSerror 
1 N
qi  qˆi

N i 1
2
Root Mean Squared Error (RMS)
 the average deviation between the true
quantity and the predicted quantity

Error Measure – Unbiased Model
ln( q) ~ f (ln( p))  
 ~ N (0,  )
2
RMSerror 

ln( q) | f (ln( p)) ~ N (  ,  2 )
E[e
ln(q )
]e
1
2
  2
1 N
qi  qˆi

N i 1
 E[e

ln(q )
^
but
] e
qˆ  eln(q )  e 
1 2

2
by computing the
integral over the
distribution
^
e
ln(q )
is a biased estimator for q, and we correct the bias by using
^
qˆ  e
1
ln(q )   2
2
2
which is an unbiased estimator for q.
Dataset
Store-level cash register data at the
product level for 100 stores
 Store prices updated every week
 Two Years of transactions
 Chilled Orange Juice category (12
Products)

Models
Baselines
–Linear Regression
–Neural Networks

Hybrids
–
–
–
–
Smart Prior
MultiTask Learning
Jumping Connections
Frozen Jumping
Connections
Baselines

Linear Regression

Neural Networks
Linear Regression
K
ln( q)  a   bi ln( pi )  
i 1
 ~ N (0,  2 )

q is the quantity demanded
 pi is the price for the ith product
 K products overall
 The coefficients a and bi are determined by
the condition that the sum of the square
residuals is as small as possible.
Linear Regression
Results - RMS Error
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Neural Networks
Generic nonlinear function
approximators
 Collection of basic units (neurons),
computing a (non)linear function of their
input
 Random initialization
 Backpropagation
 Early stopping to prevent overfitting

Neural Networks
1 hidden layer, 100 units, sigmoid activation function
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Hybrid Models
Smart Prior
 MultiTask Learning
 Jumping Connections
 Frozen Jumping Connections

Smart Prior
Idea: Initialize the NN with a “good” set of
weights; help it start from a “smart” prior.

Start the search in a state which already gives
a linear approximation
 NN training in 2 stages
– First, on synthetic data (generated by the LR model)
– Second, on the real data
Smart Prior
LR
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Multitask Learning
[Caruana 1997]
Idea: learning an additional related task in
parallel, using a shared representation
Adding the output of the LR model (built
over the same inputs) as an extra
output to the NN
 Make the NN share its hidden nodes
between both tasks

MultiTask Learning
•Custom halting function
•Custom RMS function
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Jumping Connections
Idea: fusing LR and NN
Modify architecture of the NN
 Add connections which “jump” over the
hidden layer
 Gives the effect of simulating a LR and
NN together

Jumping Connections
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Frozen Jumping Connections
Idea: show the model what the “jump” is for
Same architecture as Jumping
Connections, but two training stages
 Freeze the weights of the jumping layer,
so the network can’t “forget” about the
linearity

Frozen Jumping Connections
Frozen Jumping Connections
Frozen Jumping Connections
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Models
Baselines:

–Linear Regression
–Neural Networks
Hybrids
–
–
–
–
Smart Prior
MultiTask Learning
Jumping Connections
Frozen Jumping
Connections
Combinations
–Voting
–Weighted Average
Combining Models
Idea: Ensemble Learning
Use all models and then combine their
predictions

Committee Voting
 Weighted Average
2 baseline and 3 hybrid models
(Smart Prior, MultiTask Learning, Frozen
Jumping Conections)
Committee Voting

Average the predictions of the models
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Weighted Average – Model Regression

Optimal weights determined by a linear
regression model over the predictions
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Normalized RMS Error

Compare model performance across stores
with different:
– Sizes
– Ages
– Locations


Need to normalize
Compare to baselines

Take the error of the LR benchmark as unit
error
Normalized RMS Error
1.10
1.05
1.00
0.95
0.90
0.85
0.80
0.75
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV


Category
Summary
P of Prod1
Predictor Q bought of Prod1
Q bought of Prod2
P of Prod2
P of Prod3 “I know
Q bought of Prod3
...
your
P of ProdN customers”
...
Q bought of ProdN
Built new models for better pricing strategies for
individual stores, categories
Hybrid models clearly superior to baselines for
customer choice prediction

Incorporated domain knowledge (linearity) in Neural
Networks

New models allow stores to
– price the products more strategically and optimize profits
– maintain better inventories
– understand product interaction
www.cs.cmu.edu/~eneva
References




Montgomery, A. (1997). Creating MicroMarketing Pricing Strategies Using
Supermarket Scanner Data
West, P., Brockett, P. and Golden, L (1997) A
Comparative Analysis of Neural Networks
and Statistical Methods for Predicting
Consumer Choice
Guadagni, P. and Little, J. (1983) A Logit
Model of Brand Choice Calibrated on
Scanner data
Rossi, P. and Allenby, G. (1993) A Bayesian
Approach to Estimating Household
Parameters
Work In Progress
analyze Weighted Average model
 compare extrapolation ability of new
models
 Other MTL tasks:

– shrinkage model – a “super” store model
with data pooled across all stores
– store zones