What`s in your wallet?
Download
Report
Transcript What`s in your wallet?
IBM Research
What’s in your wallet?
Opportunity modeling approaches and applications
Claudia Perlich
Chief Scientist
Formerly: IBM Research
Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al.
© 2006 IBM Corporation
Predictive Modeling Group – Mathematical Sciences – IBM Research
Publications & Recognition
2009 Finalist in the INFORMS Edelman competition
2007 Data Mining Practice Prize at KDD 2007, “Predictive modeling for marketing”, Runner Up
2007 IBM Outstanding Technical Award, “Opportunity models and validation for the Market Alignment Program (MAP)”
2005 IBM Research Award for contributions to Market Alignment Program (MAP)
“Operations Research Improves Sales Force Productivity at IBM” R. Lawrence, C.Perlich, S.Rosset, et al.
Forthcoming INFORMS Journal on Computing
“Analytics-driven solutions for customer targeting and sales force allocation”, J. Arroyo, M. Callahan, M. Collins, A.
Ershov, I. Khabibrakhmanov, R. Lawrence, S.Mahatma, M. Niemaszyk, C. Perlich, S. Rosset, S. Weiss. IBM
Systems Journal 46 (4) (2007)
“A Data Mining Case Study: Analytics-driven solutions for customer targeting and sales force allocation” R.
Lawrence, C. Perlich, S. Rosset, I. Khabibrakhmanov, S. Mahatma, S. Weiss. Second Workshop on Data Mining
Case Studies and Practice Prize at SIGKDD 2007
“High Quantile Modeling for Customer Wallet Estimation with Other Applications” Perlich, C., S. Rosset, R.
Lawrence, and B. Zadrozny, 13th SIGKDD International Conference on Knowledge Discovery and Data Mining 2007
“Quantile Modeling for Marketing”, Perlich, C., S. Rosset and B. Zadrozny. Workshop on Data Mining for Business
Applications at 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006
“A New Multi-View Regression Approach with an Application to Customer Wallet Estimation” Merugu, S. S.Rosset
and C. Perlich. 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006
“Wallet Estimation Models” Rosset, S., C. Perlich, B. Zadrozny, S. Merugu, S. Weiss and R. Lawrence. International
Workshop on Customer Relationship Management: Data Mining Meets Marketing, NYU 2005
“Modeling Quantiles” Perlich, C., S. Rosset and B.Zadrozny. In Encyclopedia of Data Warehousing and Mining,
Second Edition
© Copyright IBM Corporation 2010
Presentation Outline
Wallet Definitions and Business Considerations
Modeling Approaches
Evaluation of Wallet Models
Business Impact – Market Alignment Project (MAP)
© 2011 IBM Corporation
What is Wallet/Opportunity?
Total amount of money that the customer (company)
can spend in a certain product category in a given
period
Company Revenue
Company Revenue
IT Wallet
IBM Sales
IBM sales IT wallet Company revenue
© 2011 IBM Corporation
Why Are We Interested in Wallet?
Customer targeting
– Focus on acquiring customers with high wallet
– Evaluate customers’ growth potential by combining
wallet estimates and sales history
– For existing customers, focus on high wallet, low shareof-wallet customers
Sales force management
– Make resource assignment decisions
• Concentrate resources on untapped
– Evaluate success of sales personnel and sales channel
by share-of-wallet they attain
© 2011 IBM Corporation
Wallet Modeling Challenge
The customer wallet is never observed
– Nothing to “fit a model”
– Even if you have a model, how do you evaluate it?
Need a predictive approach from available data
– Firmographics (Sales, Industry, Employees)
– IBM Sales and transaction history
© 2011 IBM Corporation
Existing Approaches to Wallet Modeling
Bottom up: learn a model for individual companies
– Get “true” wallet values through surveys
– Very expensive
– Small, typically not representative sample
– Unreliable because ill defined
– Coarse level of IT categories
Top down: this approach was used by IBM Market
Intelligence in North America (called ITEM)
– Use econometric models to assign total “opportunity” to
segment (e.g., industry geography)
– Assign to companies in segment proportional to their size
– Completely Ad hoc without any validation
© 2011 IBM Corporation
Multiple Wallet Definitions
TOTAL: Total customer available budget in the
relevant area (e.g., total IT)
– Can we really hope to attain all of it?
SERVED: Total customer spending on IT products
covered by IBM
– Better definition for our marketing purposes
REALISTIC: IBM spending of the “best similar
customers”
Company Revenue
TOTAL
REALISTIC SERVED TOTAL
SERVED
REALISTIC
IBM Sales
© 2011 IBM Corporation
We formulate the problem as Quantile Estimation
Imagine 1,000 customers with identical customer features
Consider the distribution of the IBM Sales to these
customers:
Best
Customers
IBM Sales
Opportunity is
High Quantile
© 2011 IBM Corporation
Slide 9
Formally: Percentile of Conditional
Distribution of IBM sales s to the customer given customer
attributes x: s|x ~ f,x
E(s|x)
REALISTIC
Two obvious ways to get at the pth percentile:
– Estimate the conditional by integrating over a neighborhood of
similar customers
Take pth percentile of spending in neighborhood
– Create a global model for pth percentile
Build global regression models, e.g.,
© 2011 IBM Corporation
s | x ~ N ( x, 2 )
Overview of analytical approaches
‘Ad HOC’
kNN
Optimization
Quantile Regression
Decomposition
-Industry
- Size
General kNN
-K
- Distance
- Features
Model Form
- Linear
- Decision Tree
- Quanting
Evaluation and Validation
- Quantile Loss
- MAP Feedback
© 2011 IBM Corporation
- Linear Model
- Adjustment
Universe of IBM customers
with D&B information
K-Nearest Neighbor
Distance metric:
– Industry match
Industry
– Euclidean distance on firmographics
and past IBM sales
Target company i
– Scaling issung
Neighborhood sizes (k):
Employees
Neighborhood of target company
Prediction:
– Quantile of firms in the neighborhood
Frequency
– Neighborhood size has significant
effect on prediction quality
Wallet Estimate
IBM Sales
© 2011 IBM Corporation
Global Estimation: the Quantile Loss Function
The mean minimizes a sum of squared residuals:
n
min
2
(
y
)
i
i 1
The median minimizes a sum of absolute residuals.
n
min m | y i m |
i 1
4
The p-th quantile minimizes an asymmetrically weighted
sum of absolute residuals:
p=0.8
n
3
min yˆi Lp ( yi , yˆ i )
p=0.5 (absolute loss)
1
0
if y yˆ
p ( y yˆ )
L p ( y, yˆ )
(1 p) ( yˆ y ) if yˆ y
2
i 1
-3
© 2011 IBM Corporation
-2
-1
0
1
2
3
Quantile Regression
Traditional Regression:
– Estimation of conditional expected value by minimizing sum of
n
squares:
2
min
Quantile Regression:
– Minimize Quantile loss:
( y
i 1
i
f ( xi , ))
n
min
L ( y , f ( x , ))
p
i
i
i 1
if y yˆ
p ( y yˆ )
L p ( y, yˆ )
(1 p) ( yˆ y ) if yˆ y
quantile
regression
loss
function
Implementation:
– assume linear function y
© 2011 IBM Corporation
x , solution using linear programming
Linear Quantile Regression (Koenker)
9
8
Opportunity for C 2
IBMRevenue
Revenue
IBM
7
6
Opportunity for C
1
Opportunity for C 1
5
4
C2
3
C
C1
2
1
10
20
30
40
50
60
Company
Firm Sales Sales
70
80
if y yˆ
p ( y yˆ )
ˆ
L p ( y, y )
(1 p) ( yˆ y ) if yˆ y
© 2011 IBM Corporation
Slide 15
Quantile Regression Tree
Motivation:
– Identify a locally optimal definition of neighborhood
– Inherently nonlinear
Adjustments of M5/CART for Quantile prediction:
– Predict the percentile rather than the mean of the leaf
– Splitting/pruning criteria: Quantile or squared error loss?
Industry = ‘Banking’
no
yes
Sales<100K
Frequency
Frequency
yes
no
IBM Rev 2003>10K
Wallet Estimate
Wallet Estimate
yes
no
IBM Sales
© 2011 IBM Corporation
Wallet Estimate
IBM Sales
Frequency
Frequency
IBM Sales
Wallet Estimate
IBM Sales
Quanting
Transform the quantile regression into a series of classification
– non-linearity, if non-linear classifiers are used
– theoretical guarantee: if the classifiers minimize the expected
classification error, the quanting algorithm minimizes the quantile loss
Training
– Each classifier is trained to decide whether or not the conditional quantile
is above a threshold T
– Original observations are re-labeled and re-weighted to train each
classifier appropriately similar to the quantile loss
Prediction
– Find the threshold where the classifier predictions switch from one to zero
C100
0
1
1
© 2011 IBM Corporation
C200
0
1
1
C300
0
0
1
C400
0
0
0
C500
0
0
0
C600
0
0
0
Prediction
250
350
(Graphical model approach to SERVED Wallets)
Company
firmographics
SERVED
Wallet
IT spend
with IBM
Historical
relationship
with IBM
Wallet is unobserved, all other variables are
Two families of variables --- firmographics and IBM
relationship are conditionally independent given wallet
We develop inference procedures and demonstrate them
Theoretically attractive, practically questionable
© 2011 IBM Corporation
Empirical Evaluation of Quantile Estimation
Setup
– Four domains with relevant quantile modeling problems
– Performance on test set in terms of 0.9 quantile loss
– Approaches: Linear quantile regression, Q-kNN, Quantile trees,
Bagged quantile trees, Quanting
Baselines
– Best constant
– Traditional regression models for expected values, adjusted under
Gaussian assumption (+1.28)
© 2011 IBM Corporation
19
Performance on Quantile Loss
Best result in BOLD, variance in parenthesis
Observations
– Regression + 1.28 is not competitive (because the residuals are not normal)
– Splitting criterion is irrelevant
– Q-kNN is not competitive
– Quanting (using decision trees) and bagged quantile tree perform comparably
© 2011 IBM Corporation
20
Additional Insights
Irrelevance of splitting criterion
– Good news! Because squared error is much more efficient
– Reason:
• SSE measures the decrease of the conditional variance
• SSE measures the ‘goodness’ of the local neighborhood
• Good estimate of the conditional distribution -> good quantile
Linear model does well on IBM and KDD-CUP98
– Match of model bias
• Both domains have strong autocorrelation
• Last years donation/revenue is a great predictor of this years
• Hard for tree-based models to express linear relationships
© 2011 IBM Corporation
21
Evaluating REALISTIC Wallet
We still don’t know the truth
Quantile loss only evaluates the ability to predict
quantile – but is a quantile a good wallet?
– Which quantile 80%, 90%, 99%?
Distribution is highly skewed
– Most error measure are very sensitive to outliers
– What is the right scale ? Log?
Even good survey data is not the truth
– Not available on a IBM product level
– Probably irrelevant for the REALISTIC wallet
© 2011 IBM Corporation
MAP: Market Alignment Program
Re-deploying IBM sales resources
Old Sales Process
New Sales Process Using MAP ...
Use prior-year revenue as
proxy for future revenue
generation
Use OR models to develop
forward-looking view of
opportunity by client
Assign quota based largely on
recent revenue history
Assign quota based on future
opportunity and productivity
Focused on Existing Relationship
Slide 23
Focused on Future Opportunity
Predictive Modeling Group – Mathematical Sciences – IBM Research
The MAP process and components
MAP Workshops
IBM Sales Team Interviews
MAP Web Interface
Model
Estimates
Expert
Feedback
Modeled
Opportunity
MAP Models
Integrated
Data
Data Model
Validated
Opportunity
Realign Sales Resources
Slide 24
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Explanatory features are extracted from multiple sources
Dun & Bradstreet
(D&B) Data
IBM Client
Transactions
Entity Matching
Feature Extraction
D&B Features
Industry
Revenue (Rank)
Employees
State
D&B Structure Code
…
IBM Transactional Features
Prior-year revenue in
other product brands
Long-term revenue in
other product brands
…
Train model against current year revenue based on previous year
Apply model by rolling forward to current year and predicting
future opportunity
Slide 25
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
MAP Validation and Expert Feedback
Expert-validated
Opportunity (log)
Validates Opportunity
Expert
20
Experts accept
opportunity (45%)
18
16
Increase (17%)
14
12
Experts change
opportunity (40%)
10
Decrease (23%)
8
6
4
2
0
0
2
4
6
8
10
12
14
16
18
20
Experts reduced
opportunity to 0
(15%)
MODEL_OPPTY
Opportunity
ModelkNN
Opportunity
(log)
Slide 26
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Observations
Many accounts are set for external reasons to zero
Exclude from evaluation since no model can predict the competitive
environment
Exponential distribution of opportunities
Evaluation on the original (non-log) scale suffers from huge outliers
Experts seem to make percentage adjustments
Consider log scale evaluation in addition to original scale and root
as intermediate
Suspect strong “anchoring” bias, 45% of opportunities were not
touched
© Copyright IBM Corporation 2010
Evaluation Measures
Different scales to avoid outlier artifacts
– Original: e = model - expert
– Root:
e = root(model) - root(expert)
– Log:
e = log(model) - log(expert)
Statistics on the distribution of the errors
– Mean of e2
– Mean of |e|
Total of 6 criteria
© 2011 IBM Corporation
Model Comparison Results
We count how often a model scores within the top 10
and 20 for each of the 6 measures:
Model
Rational
DB2
Tivoli
Displayed Model (kNN) 6
6
4
5
6
6
Max 03-05 Revenue
1
1
0
3
1
4
Linear Quantile 0.8
5
6
2
4
3
5
Regression Tree
1
3
2
4
1
2
Q-kNN 50 + flooring
2
3
6
6
4
6
Decomposition Center
0
0
3
5
0
4
Quantile Tree 0.8
0
1
2
4
1
4
© 2011 IBM Corporation
(Anchoring)
(Best)
MAP Experiments Conclusions
Q-kNN performs very well after flooring but is typically
inferior prior to flooring
80th percentile Linear quantile regression performs
consistently well (flooring has a minor effect)
Experts are strongly influenced by displayed opportunity
(and displayed revenue of previous years)
Models without last year’s revenue don’t perform well
Use Linear Quantile Regression with q=0.8 in MAP 06
© 2011 IBM Corporation
Predictive Modeling Group – Mathematical Sciences – IBM Research
Scope and some of the tedious details
• 3 Million customers
• 20 Brands (Product categories)
• 4 Markets
• Annual model refresh
• The Quantile is chosen for each brand and market separately
based on market insights on IBM market share
• Whitespace model for customers with no prior IBM revenue are
build using the same methodology but only D&B features
• Entity matching between IBM customer records and D&B
hierarchy is HARD
• Evaluation remains somewhat subjective and we collect feedback
Slide 31
© Copyright IBM Corporation 2010
Predictive
Modeling
Group
– Mathematicaland
Sciences
– IBM Research
In 2008 MAP
covered
50+
countries
~100%
of IBM revenue and
opportunity
g
2005
2006
2007
,
2008
Resources shifted to high growth Markets and Accounts
Shifted resources performed >10 pts better
Slide 32
© Copyright IBM Corporation 2010
Validated Revenue Opportunity
Modeling
Group –segmentation
Mathematical Sciencesand
– IBM Research
MAP outputPredictive
drives
account
resource allocation
decisions
Invest
High
growth
potential
Opportunistic
Small
Accounts
Core Growth
Modest growth
potential
Sellers shifted
Resource implications
Shift resources to Core
Growth and Invest Accounts
Reduce resource overlap
8,000 sellers shifted
(2006 – 2009 )
Core Optimize
Flat or declining
Prior Year Actual Revenue
Slide 33
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Validated Revenue Opportunity
MAP drove significant revenue impact in 2008
Invest
Core Growth
$53B of Revenue
3,000 sellers shifted (2008)
30,000 sellers
Opportunistic
Core Optimize
$9B of Revenue
Prior Year Actual Revenue
[3,000 Sellers] x [$2M Revenue / Seller] x [10% Performance
Improvement]
= $600M (2008 Revenue Impact)
Slide 34
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
MAP Take away
Interesting predictive modeling task that calls for an
unorthodox loss function
Combination of data mining AND expert feedback
Integration into the annual sales management cycle
Significant effort on data collection and preparation
Many additional analytical tools were build on top of
MAP
Territory definition and assignment
Quota assignment
Substantial impact on the bottom line
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Questions?
© Copyright IBM Corporation 2010