Transcript Document
Universidad Complutense de Madrid
Máster en Ingeniería Matemática
Curso 2007-2008
Modelling Week
Second Edition
June 16 – June 24, 2008
Credit Scoring Modelling for
Retail Banking Sector
Problem raised by Accenture.
Coordinators:
Ignacio Villanueva (UCM).
Estela Luna (Accenture).
Credit Scoring Modelling for
Retail Banking Sector
Team members:
Elena Bartolozzi (Universitá di Firenze)
Matthew Cornford (University of Oxford)
Leticia García-Ergüín (UCM)
Cristina Pascual Deocón (UCM)
Oscar Iván Pascual (UCM)
Francisco Javier Plaza (UCM)
Credit Scoring Modelling for
Retail Banking Sector
Index
Introduction
Methodology and Data
Univariate and Multivariate Analysis
Model Creation
Validation
Calibration
Credit Scoring Modelling for
Retail Banking Sector
Index
Introduction
Methodology and Data
Univariate and Multivariate Analysis
Model Creation
Validation
Calibration
Credit Scoring Modelling for
Retail Banking Sector
Our problem is concerned with who a bank should loan
its money to.
When a client applies for a loan, the bank would like to
be sure that the client will pay back the full amount of
the loan.
We need effective models that allow us to predict if a
client will pay back the loan.
What we have is historical data for several variables.
We are trying to fit a model to this historical data so we
can estimate a probability of default.
Credit Scoring Modelling for
Retail Banking Sector
Index
Introduction
Methodology and Data
Univariate and Multivariate Analysis
Model Creation
Validation
Calibration
Credit Scoring Modelling for
Retail Banking Sector
Our data is provided by Accenture and include details of
completed loan agreements
The variables included are:
Age
Income
Wealth
Marital Status
Length as a Client
Amount of Loan
Maturity
Default
Credit Scoring Modelling for
Retail Banking Sector
Sample Selection
We split the sample into two parts
The modelling sample
The validation sample
Credit Scoring Modelling for
Retail Banking Sector
Modelling Sample
A random sample from the data is selected.
The size of the modelling sample is about 2/3 of the
original data
This new sample is used to create the model.
Validation Sample
The remaining data is used to validate the model
We test how many defaults the model predicted and
which of them really did default.
Credit Scoring Modelling for
Retail Banking Sector
Index
Introduction
Methodology and Data
Univariate and Multivariate Analysis
Model Creation
Validation
Calibration
Credit Scoring Modelling for
Retail Banking Sector
We have a dependent variable, which is default, and
some independent variables (age, income,…)
First of all, we do univariate analysis.
For each variable, we calculate some statistics like mean,
standard deviation, skewness…
We plot some histograms…
This information can be use as a first check before
applying the model.
It would be better if the data were homogeneous.
Credit Scoring Modelling for
Retail Banking Sector
Univariate Analysis
We’ve used SAS software to generate these statistics:
output.htm
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
This kind of analysis is very useful to detect outliers or
transcription mistakes.
Credit Scoring Modelling for
Retail Banking Sector
Multivariate Analysis
Correlations
Credit Scoring Modelling for
Retail Banking Sector
Chi-squared test
We try to calculate which of the variables are explanatory
variables, i.e. which variables does default depend on.
We use the chi-squared test for that:
2
(
O
E
)
i
2 i
Ei
i 1
n
To begin with, we must discretize the continuous variables using
percentiles.
After doing Chi-squared test, we look at the p-value.
If p-value<0.05, we reject independency
If p-value>0.05 we do not reject independency.
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
Index
Introduction
Methodology and Data
Univariate and Multivariate Analysis
Model Creation
Validation
Calibration
Credit Scoring Modelling for
Retail Banking Sector
According to the results of the Univariate and
Multivariate Analysis, the variables we include in our
model are:
Age
Income
Wealth
Marital Status
Maturity
Credit Scoring Modelling for
Retail Banking Sector
We apply a logit model using proc logistic in SAS and
glmfit in MATLAB as well, obtaining the same results.
exp( j 0
k
( x0 ,..., x k ) ( x )
1 exp( j 0
k
( x)
k
exp( j 0
1 ( x)
Logit( ( x)) log
x)
j
j
x)
j
j
x)
j
j
( x)
0 x1 1 ... x k k
1 ( x)
Credit Scoring Modelling for
Retail Banking Sector
Credit Scoring Modelling for
Retail Banking Sector
Intercept
Age
Income
Wealth
Marital Status
Maturity
-1.85136
-0.02678
0.10025
-0.01761
0.79651
0.00892
There must be some diferences because we randomize
the sample.
Credit Scoring Modelling for
Retail Banking Sector
So, our model is as follows:
1
P(Default/x)=
1+e-
Where -1.85 -0.026*Age+0.1*Inc-0.017*Wlth+0.79*Marit+0.0089*Matur
Credit Scoring Modelling for
Retail Banking Sector
Index
Introduction
Methodology and Data
Univariate and Multivariate Analysis
Model Creation
Validation
Calibration
Credit Scoring Modelling for
Retail Banking Sector
Model statistics:
Credit Scoring Modelling for
Retail Banking Sector
Powerstat is a method to measure the likelihood of the
model
The data is sorted from worse to better according to the
probability of default calculated with our model.
The perfect model will have the total amount of defaults at
the beginning.
We plot accumulated defaults against accumulated
observations.
Powerstat compares the area between the perfect model,
our model and a random model.
Credit Scoring Modelling for
Retail Banking Sector
Powerstat (Gini Index):
Credit Scoring Modelling for
Retail Banking Sector
Validation
Once the probability of default for each client is found, the
question is how to choose the level that classifies if a client
will default or not.
We use the validation data to predict with our model how
many observations will default and compare with which of
them are really did default.
Repeating the process with several random samples, the
probability has very low deviation and rounds 0.77.
Credit Scoring Modelling for
Retail Banking Sector
Index
Introduction
Methodology and Data
Univariate and Multivariate Analysis
Model Creation
Validation
Calibration
Credit Scoring Modelling for
Retail Banking Sector
The expected Loss is defined as:
EL = PD * EAD * LGD
PD is the percentage of default.
Is defined as default probability calibrated for a year.
EAD is the exposition to default.
LGD are losses on the exhibition.
Credit Scoring Modelling for
Retail Banking Sector
Scoring allows us to sort people against default.
However, these probabilities do not take into account when the
default happens.
This is the reason for calibration.
We want to obtain the yearly average probability of default
We need a sample of people observed in periods of years.
The model is applied and the sample is sorted by score.
We obtain a default observed rate:
Rate A (C score) B
Minimizing the Least Squares Error with the MATLAB function
fminsearch, we obtain the values:
A=0.0004
B=3.7410
C=2.7870
Credit Scoring Modelling for
Retail Banking Sector
The Credit Scoring Model was solved quickly
and didn’t cause too much difficult.
We asked Accenture to bring another, related
problem.
We now introduce the Problem of Capital
Allocation.
The Problem of Capital Allocation
Index
The Problem of Capital Allocation
Implementation
Conclusions
The Problem of Capital Allocation
Index
The Problem of Capital Allocation
Implementation
Conclusions
The Problem of Capital Allocation
In this problem a lender has a fixed amount of money to lend,
EAD, between n blocks of similiar customers
¿How to distribute the money between the blocks to maximize
the profit?
Each block has associated with it an interest rate ρi, an a priori
probabilty of default PDi, the loss given default LGDi and the
number of customers Ni.
If each customer in each block is independent of the rest then
we can easily compute the probability of k defaults.
The Problem of Capital Allocation
But the customers are correlated via the economy. We can use
Gaussian Copula to introduce a default random variable for each
customer:
m
Zi aijYj ri wi
j 1
(Zi ) PDi Default
Then for a particular state of the economy we have that the
independent probability of default for each customer is:
m
1
i
( PDi ) a jY j
j 1
pi
ri
The Problem of Capital Allocation
We use the binomial distribution:
N
P(k defaults) i pik (1 pi ) Ni k
k
When N is big enough (in the order of 10^3) we can aproximate this
binomial with normal random variable Di:
Di
N ( Ni pi , Ni pi (1 pi ))
The Problem of Capital Allocation
We define the loss distribution as:
n
i EAD
i 1
Ni
L
( LGDi Di ( Ni Di ) i )
As L is a sum of independent normal distributions,
L
N (L , L2 )
The Problem of Capital Allocation
To measure risk we use Value at Risk (VaR) with a 99%
confidence level. So the problem becomes:
Minimise
f ( ) L
n
Subject to:
i 1
i
1
-2.3262 L + L = VaR99
Where VaR99 is the fixed level of risk the lender is willing to take.
The Problem of Capital Allocation
Index
The Problem of Capital Allocation
Implementation
Conclusions
The Problem of Capital Allocation
We start with 3 blocks to make the problem easier.
We have to find the α’s that minimise the expected loss.
We have two approaches to solve this problem.
The Problem of Capital Allocation
First we fix α’s and find the VaR99 and Expected Loss for
each set of α’s (Black dots).
The Problem of Capital Allocation
Then we find the α’s that minimise the Expected Loss for any fixed
VaR99 (Red Dots) using the MATLAB function fmincon.
As we can see we got very good agreement between the two
approaches, on the order of 10^(−4).
The Problem of Capital Allocation
Here we have the results for 5 blocks, which took considerably
longer than with 3 blocks.
The Problem of Capital Allocation
Conclusions
Analytical method outperformed the simulation of wi as
expected.
Optimise for more than 3 blocks the choice of optimiser
needs to be investigated furhter.
Another interesting question is to look at the relationship
between the efficient border and the interest rates charged
for each block.
The Problem of Capital Allocation
¿Questions?