Consumer Behavior Prediction using Parametric and Nonparametric
Download
Report
Transcript Consumer Behavior Prediction using Parametric and Nonparametric
Consumer Behavior Prediction
using Parametric and
Nonparametric Methods
Elena Eneva
Carnegie Mellon University
25 November 2002
[email protected]
Recent Research Projects
Dimensionality Reduction Methods and Fractal Dimension
(with Christos Faloutsos)
Learning to Change Taxonomies
(with Valery Petrushin, Accenture Technology Labs)
Text Re-Classification Using Existing Schemas
(with Yiming Yang)
Learning Within-Sentence Semantic Coherence
(with Roni Rosenfeld)
Automatic Document Summarization
(with John Lafferty)
Consumer Behavior Prediction
(with Alan Montgomery [Business school] and Rich Caruana [SCS])
Outline
Introduction & Motivation
Dataset
Baseline Models
New Hybrid Models
Results
Summary & Work in Progress
How to increase profits?
Without raising the overall price level?
Without more advertising?
Without attracting new customers?
A: Better Pricing Strategies
Encourage the demand for products
which are most profitable for the store
Recent trend to consolidate
independent stores into chains
Pricing doesn’t take into account the
variability of demand due to
neighborhood differences.
A: Micro-Marketing
Pricing strategies should adapt to the
neighborhood demand
The basis: the difference in interbrand
competition in different stores
Stores can increase operating profit margins
by 33% to 83% [Montgomery 1997]
Understanding Demand
Need to understand the relationship
between the prices of products in a
category and the demand for these
products
Price Elasticity of Demand
Price Elasticity
consumer’s response to price change
percent Q
E
percent P
inelastic
Q is quantity purchased
P is price of product
elastic
Prices and Quantities
Q demanded of a specific product is a
function of the prices of all the products
in that category
This function is different for every store,
for every category
The Function
q ~ f ( p)
~ N (0, )
Category
2
Price of Product 1
Quantity bought of Product 1
Price of Product 2
Quantity bought of Product 2
Price of Product 3
...
Predictor
“I know
your
Price of Product N
customers”
Quantity bought of Product 3
...
Quantity bought of Product N
Need to multiply this across many stores, many categories.
How to find this function?
Traditionally – using parametric models
(linear regression)
Data Example
100000
quantity
80000
60000
40000
20000
0
0.02
0.03
0.04
price
0.05
0.06
Data Example – Log Space
5.25
ln(quant)
4.75
4.25
3.75
3.25
2.75
-1.58
-1.53
-1.48
-1.43
-1.38
ln(price)
-1.33
-1.28
The Function
ln( q ) ~ f (ln( p))
convert to original space
Price of Product 1
Quantity bought of Product 1
Price of Product 2
Quantity bought of Product 2
convert to ln space
Category
~ N (0, 2 )
Price of Product 3
...
Predictor
“I know
your
Price of Product N
customers”
Quantity bought of Product 3
...
Quantity bought of Product N
Need to multiply this across many stores, many categories.
How to find this function?
Traditionally – using parametric models
(linear regression)
Recently – using non-parametric
models (neural networks)
Our Goal
Advantage of LR: known functional form
(linear in log space), extrapolation ability
Advantage of NN: flexibility, accuracy
accuracy
NN
new
LR
robustness
Take Advantage: use the
known functional form to
bias the NN
Build hybrid models from
the baseline models
Evaluation Measure
RMSerror
1 N
qi qˆi
N i 1
2
Root Mean Squared Error (RMS)
the average deviation between the true
quantity and the predicted quantity
Error Measure – Unbiased Model
ln( q) ~ f (ln( p))
~ N (0, )
2
RMSerror
ln( q) | f (ln( p)) ~ N ( , 2 )
E[e
ln(q )
]e
1
2
2
1 N
qi qˆi
N i 1
E[e
ln(q )
^
but
] e
qˆ eln(q ) e
1 2
2
by computing the
integral over the
distribution
^
e
ln(q )
is a biased estimator for q, and we correct the bias by using
^
qˆ e
1
ln(q ) 2
2
2
which is an unbiased estimator for q.
Dataset
Store-level cash register data at the
product level for 100 stores
Store prices updated every week
Two Years of transactions
Chilled Orange Juice category (12
Products)
Models
Baselines
–Linear Regression
–Neural Networks
Hybrids
–
–
–
–
Smart Prior
MultiTask Learning
Jumping Connections
Frozen Jumping
Connections
Baselines
Linear Regression
Neural Networks
Linear Regression
K
ln( q) a bi ln( pi )
i 1
~ N (0, 2 )
q is the quantity demanded
pi is the price for the ith product
K products overall
The coefficients a and bi are determined by
the condition that the sum of the square
residuals is as small as possible.
Linear Regression
Results - RMS Error
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Neural Networks
Generic nonlinear function
approximators
Collection of basic units (neurons),
computing a (non)linear function of their
input
Random initialization
Backpropagation
Early stopping to prevent overfitting
Neural Networks
1 hidden layer, 100 units, sigmoid activation function
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Hybrid Models
Smart Prior
MultiTask Learning
Jumping Connections
Frozen Jumping Connections
Smart Prior
Idea: Initialize the NN with a “good” set of
weights; help it start from a “smart” prior.
Start the search in a state which already gives
a linear approximation
NN training in 2 stages
– First, on synthetic data (generated by the LR model)
– Second, on the real data
Smart Prior
LR
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Multitask Learning
[Caruana 1997]
Idea: learning an additional related task in
parallel, using a shared representation
Adding the output of the LR model (built
over the same inputs) as an extra
output to the NN
Make the NN share its hidden nodes
between both tasks
MultiTask Learning
•Custom halting function
•Custom RMS function
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Jumping Connections
Idea: fusing LR and NN
Modify architecture of the NN
Add connections which “jump” over the
hidden layer
Gives the effect of simulating a LR and
NN together
Jumping Connections
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Frozen Jumping Connections
Idea: show the model what the “jump” is for
Same architecture as Jumping
Connections, but two training stages
Freeze the weights of the jumping layer,
so the network can’t “forget” about the
linearity
Frozen Jumping Connections
Frozen Jumping Connections
Frozen Jumping Connections
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Models
Baselines:
–Linear Regression
–Neural Networks
Hybrids
–
–
–
–
Smart Prior
MultiTask Learning
Jumping Connections
Frozen Jumping
Connections
Combinations
–Voting
–Weighted Average
Combining Models
Idea: Ensemble Learning
Use all models and then combine their
predictions
Committee Voting
Weighted Average
2 baseline and 3 hybrid models
(Smart Prior, MultiTask Learning, Frozen
Jumping Conections)
Committee Voting
Average the predictions of the models
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Weighted Average – Model Regression
Optimal weights determined by a linear
regression model over the predictions
Results RMS
12000
10000
8000
6000
4000
RMS
2000
0
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Normalized RMS Error
Compare model performance across stores
with different:
– Sizes
– Ages
– Locations
Need to normalize
Compare to baselines
Take the error of the LR benchmark as unit
error
Normalized RMS Error
1.10
1.05
1.00
0.95
0.90
0.85
0.80
0.75
LR
NN
SmPr
MTL
JC
FJC
Vote
WAV
Category
Summary
P of Prod1
Predictor Q bought of Prod1
Q bought of Prod2
P of Prod2
P of Prod3 “I know
Q bought of Prod3
...
your
P of ProdN customers”
...
Q bought of ProdN
Built new models for better pricing strategies for
individual stores, categories
Hybrid models clearly superior to baselines for
customer choice prediction
Incorporated domain knowledge (linearity) in Neural
Networks
New models allow stores to
– price the products more strategically and optimize profits
– maintain better inventories
– understand product interaction
www.cs.cmu.edu/~eneva
References
Montgomery, A. (1997). Creating MicroMarketing Pricing Strategies Using
Supermarket Scanner Data
West, P., Brockett, P. and Golden, L (1997) A
Comparative Analysis of Neural Networks
and Statistical Methods for Predicting
Consumer Choice
Guadagni, P. and Little, J. (1983) A Logit
Model of Brand Choice Calibrated on
Scanner data
Rossi, P. and Allenby, G. (1993) A Bayesian
Approach to Estimating Household
Parameters
Work In Progress
analyze Weighted Average model
compare extrapolation ability of new
models
Other MTL tasks:
– shrinkage model – a “super” store model
with data pooled across all stores
– store zones