Transcript Document

© Deloitte Consulting, 2004
Predictive Modeling for
Property-Casualty
Insurance
James Guszcza, FCAS, MAAA
Peter Wu, FCAS, MAAA
SoCal Actuarial Club
LAX
September 22, 2004
© Deloitte Consulting, 2004
Predictive Modeling:
3 Levels of Discussion

Strategy
 Profitable
growth
 Retain most profitable policyholders

Methodology
 Model
design (actuarial)
 Modeling process

Technique
 GLM
vs. decision trees vs. neural nets…
2
© Deloitte Consulting, 2004
Methodology vs Technique

How does data mining need actuarial
science?
Variable creation
 Model design
 Model evaluation


How does actuarial science need data
mining?
Advances in computing, modeling techniques
 Ideas from other fields can be applied to insurance
problems

3
© Deloitte Consulting, 2004
Semantics: DM vs PM

One connotation: Data Mining (DM) is about
knowledge discovery in large industrial databases
 Data
exploration techniques (some brute
force)
 e.g. discover strength of credit variables

Predictive Modeling (PM) applies statistical
techniques (like regression) after knowledge
discovery phase is completed.
 Quantify
& synthesize relationships found
during knowledge discovery
 e.g. build a credit model
4
© Deloitte Consulting, 2004
Strategy:
Why do Data Mining?
Think Baseball!
© Deloitte Consulting, 2004
Bay Area Baseball


In 1999 Billy Beane (manager for the Oakland
Athletics) found a novel use of data mining.
Not a wealthy team
Ranked 12th (out of 14) in payroll
 How to compete with rich teams?




Beane hired a statistics whiz to analyze statistics
advocated by baseball guru Bill James
Beane was able to hire excellent players
undervalued by the market.
A year after Beane took over, the A’s ranked 2nd!
6
© Deloitte Consulting, 2004
Implication

Beane quantified how well a player would do.


Not perfectly, just better than his peers
Implication:



Be on the lookout for fields where an expert is
required to reach a decision based on
judgmentally synthesizing quantifiable information
across many dimensions.
(sound like insurance underwriting?)
Maybe a predictive model can beat the pro.
7
© Deloitte Consulting, 2004
Example

Who is worse?... And by how much?
20 y.o. driver with 1 minor violation who pays his bills
on time and was written by your best agent
 Mature driver with a recent accident and has paid his
bills late a few times



Unlike the human, the algorithm knows how
much weight to give each dimension…
Classic PM strategy: build underwriting
models to achieve profitable growth.
8
© Deloitte Consulting, 2004
Keeping Score
Billy Beane
CEO who wants to run the
next Progressive
Beane’s Scouts
Underwriter
Potential Team Member
Potential Insured
Bill James’ stats
Billy Bean’s number
cruncher
Predictive variables – old
or new (e.g. credit)
You! (or people on your
team)
9
© Deloitte Consulting, 2004
What is Predictive
Modeling?
© Deloitte Consulting, 2004
Three Concepts

Scoring engines


Lift curves


A “predictive model” by any other name…
How much worse than average are the policies with
the worst scores?
Out-of-sample tests
How well will the model work in the real world?
 Unbiased estimate of predictive power

11
© Deloitte Consulting, 2004
Classic Application:
Scoring Engines

Scoring engine: formula that classifies or
separates policies (or risks, accounts,
agents…) into
 profitable
vs. unprofitable
 Retaining vs. non-retaining…


(Non-)Linear equation f( ) of several
predictive variables
Produces continuous range of scores
score = f(X1, X2, …, XN)
12
© Deloitte Consulting, 2004
What “Powers” a Scoring
Engine?
Scoring Engine:
score = f(X1, X2, …, XN)
 The X1, X2,…, XN are as important as the f( )!



Why actuarial expertise is necessary
A large part of the modeling process consists
of variable creation and selection
Usually possible to generate 100’s of variables
 Steepest part of the learning curve

13
© Deloitte Consulting, 2004
Model Evaluation: Lift Curves


Sort data by score
Break the dataset into
10 equal pieces





Best “decile”: lowest
score  lowest LR
Worst “decile”: highest
score  highest LR
Difference: “Lift”
Lift = segmentation
power
Lift  ROI of the
modeling project
14
© Deloitte Consulting, 2004
Out-of-Sample Testing

Randomly divide data into 3 pieces



Use Training data to fit models
Score the Test data to create a lift curve



Training data, Test data, Validation data
Perform the train/test steps iteratively until you have a
model you’re happy with
During this iterative phase, validation data is set aside in a
“lock box”
Once model has been finalized, score the
Validation data and produce a lift curve

Unbiased estimate of future performance
15
© Deloitte Consulting, 2004
Comparison of Techniques


All techniques work ok!

Good variable creation
at least as important as
modeling technique.
1.0
0.8
0.6
0.4
Analogous to lift curves
Good for binary target
0.2

perfect model
mars
neural net
decision tree
glm
regression
0.0

Models built to detect
whether an email
message is really
spam.
“Gains charts” from
several models

Perc.Fraud

Spam Email Detection - Gains Charts
0.0
0.2
0.4
0.6
0.8
1.0
Perc.Total
16
© Deloitte Consulting, 2004
Credit Scoring is an Example

All of these concepts apply to Credit
Scoring
Knowledge discovery in databases (KDD)
 Scoring engine
 Lift Curve evaluation  translates to LR
improvement  ROI
 Blind-test validation


Credit scoring has been the insurance
industry’s segue into data mining
17
© Deloitte Consulting, 2004
Applications Beyond Credit









The classic: Profitability Scoring Model
 Underwriting/Pricing applications
Retention models
Elasticity models
Cross-sell models
Lifetime Value models
Agent/agency monitoring
Target marketing
Fraud detection
Customer segmentation
 no target variable (“unsupervised learning”)
18
© Deloitte Consulting, 2004
Data Sources

Company’s internal data






Policy-level records
Loss & premium transactions
Agent database
Billing
VIN……..
Externally purchased data





Credit
CLUE
MVR
Census
….
19
© Deloitte Consulting, 2004
The Predictive Modeling
Process
Early: Variable Creation
Middle: Data Exploration & Modeling
Late: Analysis & Implementation
© Deloitte Consulting, 2004
Variable Creation



Research possible data sources
Extract/purchase data
Check data for quality (QA)


Messy! (still deep in the mines)
Create Predictive and Target Variables
Opportunity to quantify tribal wisdom
 …and come up with new ideas
 Can be a very big task!


Steepest part of the learning curve
21
© Deloitte Consulting, 2004
Types of Predictive Variables

Behavioral


Policyholder


Age/Gender, # employees …
Policy specifics


Historical Claim, billing, credit …
Vehicle age, Construction Type …
Territorial

Census, Weather …
22
© Deloitte Consulting, 2004
Data Exploration &
Variable Transformation




1-way analyses of predictive variables
Exploratory Data Analysis (EDA)
Data Visualization
Use EDA to cap / transform predictive
variables
 Extreme
values
 Missing values
 …etc
23
© Deloitte Consulting, 2004
Multivariate Modeling




Examine correlations among the variables
Weed out redundant, weak, poorly distributed
variables
Model design
Build candidate models
Regression/GLM
 Decision Trees/MARS
 Neural Networks


Select final model
24
© Deloitte Consulting, 2004
Building the Model
1.
2.
Pair down collection of predictive variables
to a manageable set
Iterative process



Build candidate models on “training data”
Evaluate on “test data”
Many things to tweak




Different target variables
Different predictive variables
Different modeling techniques
# NN nodes, hidden layers; tree splitting rules…
25
© Deloitte Consulting, 2004
Considerations




Do signs/magnitudes of parameters make
sense? Statistically significant?
Is the model biased for/against certain types
of policies? States? Policy sizes? ...
Predictive power holds up for large policies?
Continuity
Are there small changes in input values that might
produce large swings in scores
 Make sure that an agent can’t game the system

26
© Deloitte Consulting, 2004
Model Analysis & Implementation

Perform model analytics


Calibrate Models


Create user-friendly “scale” – client dictates
Implement models


Necessary for client to gain comfort with the model
Programming skills are critical here
Monitor performance
Distribution of scores over time, predictiveness,
usage of model...
 Plan model maintenance

27
© Deloitte Consulting, 2004
Modeling Techniques
Where Actuarial Science Needs
Data Mining
© Deloitte Consulting, 2004
The Greatest Hits

Unsupervised:
no target variable
Clustering
 Principal Components (dimension reduction)


Supervised:
predict a target variable
Regression  GLM
 Neural Networks
 MARS: Multivariate Adaptive Regression Splines
 CART: Classification And Regression Trees

29
© Deloitte Consulting, 2004
Regression and its Relations

GLM: relax regression’s distributional
assumptions
Logistic regression (binary target)
 Poisson regression (count target)


MARS & NN
Clever ways of automatically transforming and
interacting input variables
 Why: sometimes “true” relationships aren’t linear
 Universal approximators: model any functional form


CART is simplified MARS
30
© Deloitte Consulting, 2004
Neural Net Motivation

Let X1, X2, X3 be three predictive variables


Let Y be the target variable


policy age, historical LR, driver age
Loss ratio
A NNET model is a complicated, non-linear,
function φ such that:
φ(X1, X2, X3) ≈ Y
31
© Deloitte Consulting, 2004
In visual terms…
1
X1
a11
a12
X2
1
b0
Z1
b1
a21
Y
a22
Z2
a31
X3
a01
a32
b2
a02
1
32
© Deloitte Consulting, 2004
NNET lingo





Green: “input layer”
Red: “hidden layer”
X1
Yellow: “output layer”
The {a, b} numbers are
“weights” to be
X2
estimated.
The network
architecture and the
X3
weights constitute the
model.
1
a11
a12
a01
1
b0
Z1
b1
a21
Y
a22
Z2
a31
a32
b2
a02
1
33
© Deloitte Consulting, 2004
In more detail…
Z1 
Z2 
1
1 e
a01  b11 x1  b21 x2  b31 x3
1
X1
a11
a12
1
1  e a02 b12 x1 b22 x2 b32 x3
X2
Y
1
1 e
1
b0
Z1
b1
a21
Y
a22
Z2
a31
X3
a01
a32
b2
a02
1
b0  b1 z1  b2 z 2
34
© Deloitte Consulting, 2004
In more detail…

The NNET model
results from substituting
the expressions for Z1 X1
and Z2 in the
expression for Y.
X2
Z1 
Z2 
Y
1
1  e a01 b11 x1 b21 x2 b31 x3
1
1
a11
a12
1
b0
Z1
b1
a21
Y
a22
Z2
a31
X3
a01
a32
b2
a02
1
1  e a02 b12 x1 b22 x2 b32 x3
1
1  e b0 b1 z1 b2 z2
35
© Deloitte Consulting, 2004
In more detail…


Notice that the
expression for Y has
the form of a logistic
regression.
Similarly with Z1, Z2.
1
X1
a12
X2
Z1 
Z2 
Y
1
1  e a01 b11 x1 b21 x2 b31 x3
1
a11
1
b0
Z1
b1
a21
Y
a22
Z2
a31
X3
a01
a32
b2
a02
1
1  e a02 b12 x1 b22 x2 b32 x3
1
1  e b0 b1 z1 b2 z2
36
© Deloitte Consulting, 2004
In more detail…

You can therefore think
of a NNET as a set of
logistic regressions
X1
embedded in another
logistic regression.
X2
Z1 
Z2 
Y
1
1  e a01 b11 x1 b21 x2 b31 x3
1
1
a11
a12
1
b0
Z1
b1
a21
Y
a22
Z2
a31
X3
a01
a32
b2
a02
1
1  e a02 b12 x1 b22 x2 b32 x3
1
1  e b0 b1 z1 b2 z2
37
© Deloitte Consulting, 2004
Universal Approximators


The essential idea: by layering several
logistic regressions in this way…
…we can model any functional form



no matter how many non-linearities or
interactions between variables X1, X2,…
by varying # of nodes and training cycles only
NNETs are sometimes called “universal
function approximators”.
38
© Deloitte Consulting, 2004
MARS / CART Motivation


NNETs use the logistic function to combine variables
and automatically model any functional form
MARS uses an analogous clever idea to do the
same work


MARS “basis functions”
CART can be viewed as simplified MARS

Basis functions are horizontal step functions
 NNETS, MARS, and CART are all cousins of
classic regression analysis
39
© Deloitte Consulting, 2004
Reference
For Beginners:
Data Mining Techniques
--Michael Berry & Gordon Linhoff
For Mavens:
The Elements of Statistical Learning
--Jerome Friedman, Trevor Hastie, Robert Tibshirani
40