Jennifer Lewis Priestley

Download Report

Transcript Jennifer Lewis Priestley

Jennifer Lewis Priestley
Presentation of
“Assessment of Evaluation Methods for Prediction and Classification
of Consumer Risk in the Credit Industry”
co-authored with S. Nargundkar.
(Accepted for publication as a chapter in "Neural Networks for
Business Forecasting" by Peter Zhang, PhD (Ed))
Objectives
This paper addresses the answers to two important
questions:
1.Does model development technique improve
classification accuracy?
2.How will model selection vary based upon the
evaluation method used?
Objectives
• Discussion of Modeling Techniques
• Discussion of Model Evaluation Methods
• Empirical Example
Model Development Techniques
Modeling plays an increasingly important role in CRM
strategies:
Other Models
Segmentation Models
Bankruptcy Models
Fraud Models
Collection
s/Recover
y
Collections
Recovery Models
Product
Planning
Creating
Value
Customer
Management
Target Marketing
Response Models
Risk Models
Customer
Acquisitio
n
Customer Behavioral
Models
Usage Models
Attrition Models
Activation Models
Model Development Techniques
Given that even minimal improvements in model
classification accuracy can translate into significant
savings or incremental revenue, many different
modeling techniques are used in practice:
Statistical Techniques
 Linear Discriminant Analysis
 Logistic Analysis
 Multiple Regression Analysis
Non-Statistical Techniques
 Neural Networks
 Cluster Analysis
 Decision Trees
Model Evaluation Methods
But, developing the model is really only half the
problem. How do you then determine which model
is best?
Model Evaluation Methods
In the context of binary classification (one of the most
common objectives in CRM modeling), one of four
outcomes is possible:
1. True positive (a “good” credit risk is identified as “good”)
2. False positive (a “bad” credit risk is identified as “good”)
3. True negative (a “bad” credit risk is identified as “bad”)
4. False negative (a “good” credit risk is identified as “bad”)
Model Evaluation Methods
If all of these outcomes, specifically the errors, have
the same associated costs, then a simple global
classification rate is a highly appropriate evaluation
method:
Predicted Good
Predicted Bad
Total
True Good
650
True Bad
50
Total
200
100
300
850
150
1000
700
Classification Rate = 75% ((100+650)/1000)
Model Evaluation Methods
The global classification method is the most commonly
used, but fails when the costs of the misclassification
errors are different (Type 1 vs Type 2 errors) Model 1 results:
Model 2 results:
Global Classification Rate = 75%
False Positive Rate = 5%
False Negative Rate = 20%
Global Classification Rate = 80%
False Positive Rate = 15%
False Negative Rate = 5%
What if the cost of a false positive was great, and the
cost of a false negative was negligible? What if it was
the other way around?
Model Evaluation Methods
If the misclassification error costs are understood with
some certainty, a cost function could be used to
evaluate the best model:
Loss=π0f0c0+π1f1c1
Where, πi is the probability that an element comes
from class i, (prior probability), fi is the probability that
an element will be misclassified in i class, and ci is the
cost associated with that misclassification error.
Model Evaluation Methods
An evaluation model that uses the same conceptual
foundation as the global classification rate is the
Kolmorgorov-Smirnov Test:
Cumulativ e Percentage
of Observ ations
100%
80%
60%
40%
20%
0%
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Score Cut Off
Model Evaluation Methods
What if you don’t have ANY information regarding
misclassification error costs…or…the costs are in the
eye of the beholder?
Model Evaluation Methods
Sensitivity (True Positives)
The area under the ROC (Receiver Operating
Characteristics) Curve is an option:
1
θ=1
.5<θ<1
θ=.5
0
1-Specificity (False Positives)
1
Empirical Example
So, given this background, the guiding questions of our
research were –
1. Does model development technique impact
prediction accuracy?
2. How will model selection vary with the evaluation
method used?
Empirical Example
We elected to evaluate these questions using a large
data set from a pool of car loan applicants. The data
set included:
• 14,042 US applicants for car loans between June 1, 1998 and
June 30, 1999.
• Of these applicants, 9442 were considered to have been
“good” and 4600 were considered to be “bad” as of December
31, 1999.
• 65 variables, split into two groups –
• Transaction variables (miles on the vehicle, selling price, age
of vehicle, etc.)
• Applicant variables (bankruptcies, balances on other loans,
number of revolving trades, etc.)
Empirical Example
The LDA and Logistic models were developed using
SAS 8.2, while the Neural Network models were
developed using Backpack® 4.0.
Because there is no accepted guidelines for the
number of hidden nodes in Neural Network
development, we tested a range of hidden layers from
5 to 50.
Empirical Example
Quick Review on Linear Discriminant Analysis:
General Form: Y=X1 + X2 + X3…+Xn
 The dependent variable (Y) is categorical (can be 2 or more
categories)…the independent variables (X) are metric;
 The linear variate maximizes the discrimination between two
pre-defined groups;
 The primary assumptions include:
• Normality
• Linearity
• Non-multicollinearity among the independent variables
 The discriminant weights indicate the contribution of each
variable;
 Traditionally a “hit” matrix is the output.
Empirical Example
Quick Review on Logistic Analysis:
General Form: Probevent/Probnon-event = e B0+B1X1+B2X2…+BnXn
The technique requires a binary dependent variable;
Is less sensitive to assumptions of normality;
Function is S-shaped and is bounded between 1 and 0;
Where LDA and Regression use the least squares method of
estimation, Logistic Analysis uses a maximum likelihood
estimation algorithm;
 The weights are measures of changes in the ratio of the
probabilities or odds ratios;
 Proc Logistic in SAS produces a “classification” matrix that
provides sensitivity and specificity information to support the
development of an ROC curve.




Empirical Example
Quick Review on Neural Networks:
input
Input Layer
input
Hidden Layer
Output Layer
output
input
input
Combination
Function
combines all
inputs into a
single value,
usually as a
weighted
summation
Σ S
Transfer
Function
Calculates the
output value from
the combination
function
Empirical Example - Results
Technique
Class Rate Class Rate Class Rate Theta
“Goods”
“Bads”
“Global”
K-S Test
LDA
73.91%
43.40%
59.74%
68.98%
19%
Logistic
70.54%
59.64%
69.45%
68.00%
24%
NN-5 Hidden Layers
63.50%
56.50%
58.88%
63.59%
38%
NN-10 Hidden Layers
75.40%
44.50%
55.07%
64.46%
11%
NN-15 Hidden Layers
60.10%
62.10%
61.40%
65.89%
24%
NN-20 Hidden Layers
62.70%
59.00%
60.29%
65.27%
24%
NN-25 Hidden Layers
76.60%
41.90%
53.78%
63.55%
16%
NN-30 Hidden Layers
52.70%
68.50%
63.13%
65.74%
22%
NN-35 Hidden Layers
60.30%
59.00%
59.46%
63.30%
22%
NN-40 Hidden Layers
62.40%
58.30%
59.71%
64.47%
17%
NN-45 Hidden Layers
54.10%
65.20%
61.40%
64.50%
31%
NN-50 Hidden Layers
53.20%
68.50%
63.27%
65.15%
37%
Empirical Example - Conclusions
What were we able to demonstrate?
1. The “best” model depends upon the evaluation
method selected;
2. The appropriate evaluation method depends upon
situational and data context;
3. No multivariate technique is “best” under all
circumstances.