What`s in your wallet?

Download Report

Transcript What`s in your wallet?

IBM Research
What’s in your wallet?
Opportunity modeling approaches and applications
Claudia Perlich
Chief Scientist
Formerly: IBM Research
Collaborators: Saharon Rosset, Rick Lawrence, Srujana Merugu, et al.
© 2006 IBM Corporation
Predictive Modeling Group – Mathematical Sciences – IBM Research
Publications & Recognition
2009 Finalist in the INFORMS Edelman competition
2007 Data Mining Practice Prize at KDD 2007, “Predictive modeling for marketing”, Runner Up
2007 IBM Outstanding Technical Award, “Opportunity models and validation for the Market Alignment Program (MAP)”
2005 IBM Research Award for contributions to Market Alignment Program (MAP)
 “Operations Research Improves Sales Force Productivity at IBM” R. Lawrence, C.Perlich, S.Rosset, et al.
Forthcoming INFORMS Journal on Computing
 “Analytics-driven solutions for customer targeting and sales force allocation”, J. Arroyo, M. Callahan, M. Collins, A.
Ershov, I. Khabibrakhmanov, R. Lawrence, S.Mahatma, M. Niemaszyk, C. Perlich, S. Rosset, S. Weiss. IBM
Systems Journal 46 (4) (2007)
 “A Data Mining Case Study: Analytics-driven solutions for customer targeting and sales force allocation” R.
Lawrence, C. Perlich, S. Rosset, I. Khabibrakhmanov, S. Mahatma, S. Weiss. Second Workshop on Data Mining
Case Studies and Practice Prize at SIGKDD 2007
 “High Quantile Modeling for Customer Wallet Estimation with Other Applications” Perlich, C., S. Rosset, R.
Lawrence, and B. Zadrozny, 13th SIGKDD International Conference on Knowledge Discovery and Data Mining 2007
 “Quantile Modeling for Marketing”, Perlich, C., S. Rosset and B. Zadrozny. Workshop on Data Mining for Business
Applications at 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006
 “A New Multi-View Regression Approach with an Application to Customer Wallet Estimation” Merugu, S. S.Rosset
and C. Perlich. 12th SIGKDD International Conference on Knowledge Discovery and Data Mining 2006
 “Wallet Estimation Models” Rosset, S., C. Perlich, B. Zadrozny, S. Merugu, S. Weiss and R. Lawrence. International
Workshop on Customer Relationship Management: Data Mining Meets Marketing, NYU 2005
 “Modeling Quantiles” Perlich, C., S. Rosset and B.Zadrozny. In Encyclopedia of Data Warehousing and Mining,
Second Edition
© Copyright IBM Corporation 2010
Presentation Outline
 Wallet Definitions and Business Considerations
 Modeling Approaches
 Evaluation of Wallet Models
 Business Impact – Market Alignment Project (MAP)
© 2011 IBM Corporation
What is Wallet/Opportunity?
 Total amount of money that the customer (company)
can spend in a certain product category in a given
period
Company Revenue
Company Revenue
IT Wallet
IBM Sales
IBM sales  IT wallet  Company revenue
© 2011 IBM Corporation
Why Are We Interested in Wallet?
 Customer targeting
– Focus on acquiring customers with high wallet
– Evaluate customers’ growth potential by combining
wallet estimates and sales history
– For existing customers, focus on high wallet, low shareof-wallet customers
 Sales force management
– Make resource assignment decisions
• Concentrate resources on untapped
– Evaluate success of sales personnel and sales channel
by share-of-wallet they attain
© 2011 IBM Corporation
Wallet Modeling Challenge
 The customer wallet is never observed
– Nothing to “fit a model”
– Even if you have a model, how do you evaluate it?
 Need a predictive approach from available data
– Firmographics (Sales, Industry, Employees)
– IBM Sales and transaction history
© 2011 IBM Corporation
Existing Approaches to Wallet Modeling
 Bottom up: learn a model for individual companies
– Get “true” wallet values through surveys
– Very expensive
– Small, typically not representative sample
– Unreliable because ill defined
– Coarse level of IT categories
 Top down: this approach was used by IBM Market
Intelligence in North America (called ITEM)
– Use econometric models to assign total “opportunity” to
segment (e.g., industry  geography)
– Assign to companies in segment proportional to their size
– Completely Ad hoc without any validation
© 2011 IBM Corporation
Multiple Wallet Definitions
 TOTAL: Total customer available budget in the
relevant area (e.g., total IT)
– Can we really hope to attain all of it?
 SERVED: Total customer spending on IT products
covered by IBM
– Better definition for our marketing purposes
 REALISTIC: IBM spending of the “best similar
customers”
Company Revenue
TOTAL
REALISTIC  SERVED  TOTAL
SERVED
REALISTIC
IBM Sales
© 2011 IBM Corporation
We formulate the problem as Quantile Estimation
 Imagine 1,000 customers with identical customer features
 Consider the distribution of the IBM Sales to these
customers:
Best
Customers
IBM Sales
Opportunity is
High Quantile
© 2011 IBM Corporation
Slide 9
Formally: Percentile of Conditional
 Distribution of IBM sales s to the customer given customer
attributes x: s|x ~ f,x
E(s|x)
REALISTIC
 Two obvious ways to get at the pth percentile:
– Estimate the conditional by integrating over a neighborhood of
similar customers
 Take pth percentile of spending in neighborhood
– Create a global model for pth percentile
 Build global regression models, e.g.,
© 2011 IBM Corporation
s | x ~ N (  x,  2 )
Overview of analytical approaches
‘Ad HOC’
kNN
Optimization
Quantile Regression
Decomposition
-Industry
- Size
General kNN
-K
- Distance
- Features
Model Form
- Linear
- Decision Tree
- Quanting
Evaluation and Validation
- Quantile Loss
- MAP Feedback
© 2011 IBM Corporation
- Linear Model
- Adjustment
Universe of IBM customers
with D&B information
K-Nearest Neighbor
 Distance metric:
– Industry match
Industry
– Euclidean distance on firmographics
and past IBM sales
Target company i
– Scaling issung
 Neighborhood sizes (k):
Employees
Neighborhood of target company
 Prediction:
– Quantile of firms in the neighborhood
Frequency
– Neighborhood size has significant
effect on prediction quality
Wallet Estimate
IBM Sales
© 2011 IBM Corporation
Global Estimation: the Quantile Loss Function
 The mean minimizes a sum of squared residuals:
n
min
2
(
y


)
 
i
i 1
 The median minimizes a sum of absolute residuals.
n
min m  | y i  m |
i 1
4
 The p-th quantile minimizes an asymmetrically weighted
sum of absolute residuals:
p=0.8
n
3
min yˆi  Lp ( yi , yˆ i )
p=0.5 (absolute loss)
1
0
if y  yˆ
 p  ( y  yˆ )
L p ( y, yˆ )  
 (1  p)  ( yˆ  y ) if yˆ  y
2
i 1
-3
© 2011 IBM Corporation
-2
-1
0
1
2
3
Quantile Regression
 Traditional Regression:
– Estimation of conditional expected value by minimizing sum of
n
squares:
2
min 
 Quantile Regression:
– Minimize Quantile loss:
( y
i 1
i
 f ( xi ,  ))
n
min 
 L ( y , f ( x ,  ))
p
i
i
i 1
if y  yˆ
 p  ( y  yˆ )
L p ( y, yˆ )  
 (1  p)  ( yˆ  y ) if yˆ  y
quantile
regression
loss
function
 Implementation:
– assume linear function y 
© 2011 IBM Corporation
x   , solution using linear programming
Linear Quantile Regression (Koenker)
9
8
Opportunity for C 2
IBMRevenue
Revenue
IBM
7
6
Opportunity for C
1
Opportunity for C 1
5
4
C2
3
C
C1
2
1
10
20
30
40
50
60
Company
Firm Sales Sales
70
80
if y  yˆ
 p  ( y  yˆ )
ˆ
L p ( y, y )  
 (1  p)  ( yˆ  y ) if yˆ  y
© 2011 IBM Corporation
Slide 15
Quantile Regression Tree
 Motivation:
– Identify a locally optimal definition of neighborhood
– Inherently nonlinear
 Adjustments of M5/CART for Quantile prediction:
– Predict the percentile rather than the mean of the leaf
– Splitting/pruning criteria: Quantile or squared error loss?
Industry = ‘Banking’
no
yes
Sales<100K
Frequency
Frequency
yes
no
IBM Rev 2003>10K
Wallet Estimate
Wallet Estimate
yes
no
IBM Sales
© 2011 IBM Corporation
Wallet Estimate
IBM Sales
Frequency
Frequency
IBM Sales
Wallet Estimate
IBM Sales
Quanting
 Transform the quantile regression into a series of classification
– non-linearity, if non-linear classifiers are used
– theoretical guarantee: if the classifiers minimize the expected
classification error, the quanting algorithm minimizes the quantile loss
 Training
– Each classifier is trained to decide whether or not the conditional quantile
is above a threshold T
– Original observations are re-labeled and re-weighted to train each
classifier appropriately similar to the quantile loss
 Prediction
– Find the threshold where the classifier predictions switch from one to zero
C100
0
1
1
© 2011 IBM Corporation
C200
0
1
1
C300
0
0
1
C400
0
0
0
C500
0
0
0
C600
0
0
0
Prediction
250
350
(Graphical model approach to SERVED Wallets)
Company
firmographics
SERVED
Wallet
IT spend
with IBM
Historical
relationship
with IBM
 Wallet is unobserved, all other variables are
 Two families of variables --- firmographics and IBM
relationship are conditionally independent given wallet
 We develop inference procedures and demonstrate them
 Theoretically attractive, practically questionable
© 2011 IBM Corporation
Empirical Evaluation of Quantile Estimation
 Setup
– Four domains with relevant quantile modeling problems
– Performance on test set in terms of 0.9 quantile loss
– Approaches: Linear quantile regression, Q-kNN, Quantile trees,
Bagged quantile trees, Quanting
 Baselines
– Best constant
– Traditional regression models for expected values, adjusted under
Gaussian assumption (+1.28)
© 2011 IBM Corporation
19
Performance on Quantile Loss
Best result in BOLD, variance in parenthesis
 Observations
– Regression + 1.28 is not competitive (because the residuals are not normal)
– Splitting criterion is irrelevant
– Q-kNN is not competitive
– Quanting (using decision trees) and bagged quantile tree perform comparably
© 2011 IBM Corporation
20
Additional Insights
 Irrelevance of splitting criterion
– Good news! Because squared error is much more efficient
– Reason:
• SSE measures the decrease of the conditional variance
• SSE measures the ‘goodness’ of the local neighborhood
• Good estimate of the conditional distribution -> good quantile
 Linear model does well on IBM and KDD-CUP98
– Match of model bias
• Both domains have strong autocorrelation
• Last years donation/revenue is a great predictor of this years
• Hard for tree-based models to express linear relationships
© 2011 IBM Corporation
21
Evaluating REALISTIC Wallet
 We still don’t know the truth
 Quantile loss only evaluates the ability to predict
quantile – but is a quantile a good wallet?
– Which quantile 80%, 90%, 99%?
 Distribution is highly skewed
– Most error measure are very sensitive to outliers
– What is the right scale ? Log?
 Even good survey data is not the truth
– Not available on a IBM product level
– Probably irrelevant for the REALISTIC wallet
© 2011 IBM Corporation
MAP: Market Alignment Program
Re-deploying IBM sales resources
Old Sales Process
New Sales Process Using MAP ...
 Use prior-year revenue as
proxy for future revenue
generation
 Use OR models to develop
forward-looking view of
opportunity by client
 Assign quota based largely on
recent revenue history
 Assign quota based on future
opportunity and productivity
Focused on Existing Relationship
Slide 23
Focused on Future Opportunity
Predictive Modeling Group – Mathematical Sciences – IBM Research
The MAP process and components
MAP Workshops
IBM Sales Team Interviews
MAP Web Interface
Model
Estimates
Expert
Feedback
Modeled
Opportunity
MAP Models
Integrated
Data
Data Model
Validated
Opportunity
Realign Sales Resources
Slide 24
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Explanatory features are extracted from multiple sources
Dun & Bradstreet
(D&B) Data
IBM Client
Transactions
Entity Matching
Feature Extraction
D&B Features






Industry
Revenue (Rank)
Employees
State
D&B Structure Code
…
IBM Transactional Features
 Prior-year revenue in
other product brands
 Long-term revenue in
other product brands
…
 Train model against current year revenue based on previous year
 Apply model by rolling forward to current year and predicting
future opportunity
Slide 25
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
MAP Validation and Expert Feedback
Expert-validated
Opportunity (log)
Validates Opportunity
Expert
20
Experts accept
opportunity (45%)
18
16
Increase (17%)
14
12
Experts change
opportunity (40%)
10
Decrease (23%)
8
6
4
2
0
0
2
4
6
8
10
12
14
16
18
20
Experts reduced
opportunity to 0
(15%)
MODEL_OPPTY
Opportunity
ModelkNN
Opportunity
(log)
Slide 26
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Observations
 Many accounts are set for external reasons to zero

Exclude from evaluation since no model can predict the competitive
environment
 Exponential distribution of opportunities

Evaluation on the original (non-log) scale suffers from huge outliers
 Experts seem to make percentage adjustments


Consider log scale evaluation in addition to original scale and root
as intermediate
Suspect strong “anchoring” bias, 45% of opportunities were not
touched
© Copyright IBM Corporation 2010
Evaluation Measures
 Different scales to avoid outlier artifacts
– Original: e = model - expert
– Root:
e = root(model) - root(expert)
– Log:
e = log(model) - log(expert)
 Statistics on the distribution of the errors
– Mean of e2
– Mean of |e|
 Total of 6 criteria
© 2011 IBM Corporation
Model Comparison Results
We count how often a model scores within the top 10
and 20 for each of the 6 measures:
Model
Rational
DB2
Tivoli
Displayed Model (kNN) 6
6
4
5
6
6
Max 03-05 Revenue
1
1
0
3
1
4
Linear Quantile 0.8
5
6
2
4
3
5
Regression Tree
1
3
2
4
1
2
Q-kNN 50 + flooring
2
3
6
6
4
6
Decomposition Center
0
0
3
5
0
4
Quantile Tree 0.8
0
1
2
4
1
4
© 2011 IBM Corporation
(Anchoring)
(Best)
MAP Experiments Conclusions
 Q-kNN performs very well after flooring but is typically
inferior prior to flooring
 80th percentile Linear quantile regression performs
consistently well (flooring has a minor effect)
 Experts are strongly influenced by displayed opportunity
(and displayed revenue of previous years)
 Models without last year’s revenue don’t perform well
Use Linear Quantile Regression with q=0.8 in MAP 06
© 2011 IBM Corporation
Predictive Modeling Group – Mathematical Sciences – IBM Research
Scope and some of the tedious details
• 3 Million customers
• 20 Brands (Product categories)
• 4 Markets
• Annual model refresh
• The Quantile is chosen for each brand and market separately
based on market insights on IBM market share
• Whitespace model for customers with no prior IBM revenue are
build using the same methodology but only D&B features
• Entity matching between IBM customer records and D&B
hierarchy is HARD
• Evaluation remains somewhat subjective and we collect feedback
Slide 31
© Copyright IBM Corporation 2010
Predictive
Modeling
Group
– Mathematicaland
Sciences
– IBM Research
In 2008 MAP
covered
50+
countries
~100%
of IBM revenue and
opportunity
g
2005
2006
2007
,
2008
 Resources shifted to high growth Markets and Accounts
 Shifted resources performed >10 pts better
Slide 32
© Copyright IBM Corporation 2010
Validated Revenue Opportunity
Modeling
Group –segmentation
Mathematical Sciencesand
– IBM Research
MAP outputPredictive
drives
account
resource allocation
decisions
Invest
High
growth
potential
Opportunistic
Small
Accounts
Core Growth
Modest growth
potential
Sellers shifted
Resource implications
 Shift resources to Core
Growth and Invest Accounts
 Reduce resource overlap
 8,000 sellers shifted
(2006 – 2009 )
Core Optimize
Flat or declining
Prior Year Actual Revenue
Slide 33
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Validated Revenue Opportunity
MAP drove significant revenue impact in 2008
Invest
Core Growth
$53B of Revenue
3,000 sellers shifted (2008)
30,000 sellers
Opportunistic
Core Optimize
$9B of Revenue
Prior Year Actual Revenue
[3,000 Sellers] x [$2M Revenue / Seller] x [10% Performance
Improvement]
= $600M (2008 Revenue Impact)
Slide 34
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
MAP Take away
 Interesting predictive modeling task that calls for an
unorthodox loss function
 Combination of data mining AND expert feedback
 Integration into the annual sales management cycle
 Significant effort on data collection and preparation
 Many additional analytical tools were build on top of
MAP
 Territory definition and assignment
 Quota assignment
 Substantial impact on the bottom line
© Copyright IBM Corporation 2010
Predictive Modeling Group – Mathematical Sciences – IBM Research
Questions?
© Copyright IBM Corporation 2010