DFA presentation

Download Report

Transcript DFA presentation

Predictive Modeling
Spring 2005 CAMAR meeting
Louise Francis, FCAS, MAAA
Francis Analytics and Actuarial Data Mining, Inc
www.data-mines.com
Objectives
 Introduce Predictive modeling
 Why use it?
 Describe some methods in depth
 Trees
 Neural
networks
 Clustering
 Apply to fraud data
2
Predictive Modeling Family
Predictive Modeling
Classical Linear Models
GLMs
Data Mining
3
Why Predictive Modeling?
 Better use of insurance
data
 Advanced methods for
dealing with messy data
now available
4
Major Kinds of Modeling
 Supervised learning


Most common situation
A dependent variable




 Unsupervised learning


Frequency
Loss ratio
Fraud/no fraud
No dependent variable
Group like records
together

Some methods



Regression
CART
Some neural networks

A group of claims with
similar characteristics
might be more likely to
be fraudulent
Some methods



Association rules
K-means clustering
Kohonen neural
networks
5
Two Big Specialties in Predicative
Modeling
Data Mining
GLMS
Regression
Logistic Regressions
Poisson Regression
Trees
Neural Networks
Clustering
6
Modeling Process
Internal
Data
Data
Cleaning
External
Data
Deploy
Model
Other
Preprocessing
Build Model
Validate Model
Test Model
7
Data Complexities Affecting
Insurance Data
 Nonlinear functions
 Interactions
 Missing Data
 Correlations
 Non normal data
8
Kinds of Applications
 Classification
 Prediction
9
The Fraud Study Data
•
•
1993 Automobile Insurers Bureau closed Personal
Injury Protection claims
Dependent Variables
• Suspicion Score
•
•
Expert assessment of liklihood of fraud or abuse
•
•
•
Number from 0 to 10
5 categories
Used to create a binary indicator
Predictor Variables
• Red flag indicators
• Claim file variables
10
Introduction of Two Methods
 Trees
 Sometimes known as CART (Classification and
Regression Trees)
 Neural Networks
 Will introduce backpropagation neural network
11
Decision Trees
 Recursively partitions the data
 Often sequentially bifurcates the data – but can
split into more groups
 Applies goodness of fit to select best partition at
each step
 Selects the partition which results in largest
improvement to goodness of fit statistic
12
Goodness of Fit Statistics
 Chi Square  CHAID
(Fish, Gallagher, Monroe- Discussion
Paper Program, 1990)
2  
i, k
 Observed-Expected 
Expected
2
 Deviance  CART
Di  2
 nik log( pik ) (categorical)
k
D=

(y j   j ) 2 (or RSS for continuous variables)
cases j
13
Goodness of Fit Statistics
 Gini Measure  CART
 i is impurity measure
i  1
 pk
2
k
(t , s)  i(t )  pL i(t L )  pR i(t R )
14
Goodness of Fit Statistics
 Entropy  C4.5
E
I ( E )   log 2 ( )   log 2 ( pE )
N
H 
 pk log2 ( pk )
k
15
An Illustration from Fraud data:
GINI Measure
Fraud/No Fraud
Legal Representation No
Yes
Total
No
626
80
706
Yes
269
425
694
Total
895
505
1400
Percent
64%
36%
16
First Split
All Claims
p(fraud) = 0.36
Legal Rep = Yes
P(fraud) = 0 .612
Legal Rep = No
P(fraud) = 0.113
17
Example cont:
Root Node:
Legal
No
Yes
0.461199
Fraud/No Fraud
No
Yes
1-p(i)^2
Row %
0.887
0.113
0.201
50.4%
0.388
0.612
0.475
49.6%
33.7%
0.337 , 201*.504  .475 *.496
improvement  .461  .337  0.124
18
Example of Nonlinear Function
Suspicion Score vs. 1st Provider Bill
Neural Network Fit of SUSPICION vs Provider Bill
4.00
netfraud1
3.00
2.00
1.00
0.00
1000
3000
5000
7000
Provider Bill
19
An Approach to Nonlinear Functions:
mp1.bill<1279.5
|
Fit A Tree
mp1.bill<153
mp1.bill<2389
mp1.bill<842.5
0.3387
3.6430
1.2850
4.4270
2.2550
20
3
2
1
Fraud Score Prediction
4
Fitted Curve From Tree
0
5000
10000
Provider Bill
15000
21
Neural Networks
 Developed by artificial
intelligence experts –
but now used by
statisticians also
 Based on how neurons
function in brain
22
Neural Networks
• Fit by minimizing squared deviation between
fitted and actual values
• Can be viewed as a non-parametric, nonlinear regression
• Often thought of as a “black box”
•
Due to complexity of fitted model it is difficult
to understand relationship between dependent
and predictor variables
23
The Backpropagation Neural
Network
Three Layer Neural Network
Input Layer
(Input Data)
Hidden Layer
(Process Data)
Output Layer
(Predicted Value)
24
Neural Network
 Fits a nonlinear function at
each node of each layer
h  f ( X ; w0, w1...wn )  f ( w0  w1 x1  ...wn xn ) 
1
1  e ( w0  w1x1...wn xn )
25
The Logistic Function
Logistic Function for Various Values of w1
1.0
0.8
w1=-10
w1=-5
w1=-1
w1=1
w1=5
w1=10
0.6
0.4
0.2
0.0
X
-1.2
-0.7
-0.2
0.3
0.8
26
Universal Function
Approximator
• The backpropagation neural network with one
hidden layer is a universal function
approximator
• Theoretically, with a sufficient number of
nodes in the hidden layer, any continuous
nonlinear function can be approximated
27
Nonlinear Function Fit by Neural
Network
Neural Network Fit of SUSPICION vs Provider Bill
4.00
netfraud1
3.00
2.00
1.00
0.00
1000
3000
5000
7000
Provider Bill
28
Interactions
 Functional relationship between a predictor variable and a
dependent variable depends on the value of another variable(s)
Neural Network Predicted for Provider Bill and Injury Type
inj.type: 05
6.00
4.00
Neural Net Predicted
2.00
0.00
inj.type: 04
inj.type: 03
6.00
4.00
2.00
inj.type: 02
inj.type: 01
0.00
6.00
4.00
2.00
0.00
3000
8000
13000
18000
Provider Bill
29
Interactions
 Neural Networks
The hidden nodes pay a key role in
modeling the interactions
 CART partitions the data
 Partitions capture the interactions

30
mp1.bill<1279.5
|
Simple Tree of Injury
and Provider Bill
mp1.bill<153
injtype:abcefghi
injtype:abcefgi
injtype:abcefgh
mp1.bill<2675.5
3.20
injtype:abfgi
mp1.bill<2017.5
0.68
0.14
0.30
1.00
2.10
4.80
3.70
4.20
31
4000 10000 16000
injtype: 8
injtype: 10
injtype: 99
injtype: 5
injtype: 6
injtype: 7
5
2
response
5
2
injtype: 1
injtype: 2
injtype: 4
5
2
4000 10000 16000
4000 10000 16000
mp1.bill
32
Missing Data
 Occurs frequently in insurance data
 There are some sophisticated methods for
addressing this (i.e., EM algorithm)
 CART finds surrogates for variables with missing
values
 Neural Networks have no explicit procedure for
missing values
33
More Complex Example
 Dependent variable: Expert’s assessment of liklihood claim
is legitimate
 A classification application
 Predictor variables: Combination of
 claim file variables (age of claimant, legal representation)
 red flag variables (injury is strain/sprain only, claimant
has history of previous claim)
 Used an enhancement on CART known as boosting
34
Red Flag Predictor Variables
Subject
Accident
Indicator Variable
ACC01
ACC04
ACC09
ACC10
ACC11
ACC14
ACC15
ACC16
ACC19
Claimant
CLT02
CLT04
CLT07
Injury
INJ01
INJ02
INJ03
INJ05
INJ06
INJ11
Insured
INS01
INS03
INS06
INS07
Lost Wages LW01
LW03
Red Flag Variables
Description
No report by police officer at scene
Single vehicle accident
No plausible explanation for accident
Claimant in old, low valued vehicle
Rental vehicle involved in accident
Property Damage was inconsistent with accident
Very minor impact collision
Claimant vehicle stopped short
Insured felt set up, denied fault
Had a history of previous claims
Was an out of state accident
Was one of three or more claimants in vehicle
Injury consisted of strain or sprain only
No objective evidence of injury
Police report showed no injury or pain
No emergency treatment was given
Non-emergency treatment was delayed
Unusual injury for auto accident
Had history of previous claims
Readily accepted fault for accident
Was difficult to contact/uncooperative
Accident occurred soon after effective date
Claimant worked for self or a family member
Claimant recently started employment
35
Claim File Variables
Claim Variables Available Early in Life of Claim
Variable
Description
AGE
Age of claimant
RPTLAG
TREATLAG
Lag from date of accident to date reported
Lag from date of accident to earliest treatment by
service provider
AMBUL
Ambulance charges
PARTDIS
The claimant partially disabled
TOTDIS
The claimant totally disabled
LEGALREP
The claimant represented by an attorney
36
Neural Network Measure of
Variable Importance
• Look at weights to hidden layer
• Compute sensitivities:
•
a measure of how much the predicted value’s
error increases when the variables are
excluded from the model one at a time
37
Variable Importance
Rank
1
2
3
4
5
6
7
8
9
10
Rank
LEGALREP
TRTLAG
AGE
ACC04
INJ01
INJ02
ACC14
RPTLAG
AMBUL
CLT02
Variable
100.0
69.7
54.5
44.4
42.1
39.4
35.8
32.4
29.3
23.9
Importance
||||||||||||||||||||||||||||||||||||||||||
|||||||||||||||||||||||||||||
||||||||||||||||||||||
||||||||||||||||||
|||||||||||||||||
||||||||||||||||
||||||||||||||
|||||||||||||
||||||||||||
|||||||||
38
Testing: Hold Out Part of
Sample
• Fit model on 1/2 to 2/3 of data
• Test fit of model on remaining data
• Need a large sample
39
Testing: Cross-Validation
• Hold out 1/n (say 1/10) of data
• Fit model to remaining data
• Test on portion of sample held out
• Do this n (say 10) times and average the
results
• Used for moderate sample sizes
• Jacknifing similar to cross-validation
40
Results of Classification on Test
Data
Actual
Fitted Neural Network
0
1
0
81.5%
18.5%
1
26.7%
73.3%
Fitted Tree
Actual
0
0
77.3%
1
14.3%
1
22.7%
85.7%
41
Unsupervised Learning
 Common Method: Clustering
 No dependent variable – records are grouped
into classes with similar values on the
variable
 Start with a measure of similarity or
dissimilarity
 Maximize dissimilarity between members of
different clusters
42
Dissimilarity (Distance)
Measure
 Euclidian Distance
dij 


1/ 2
m
2
( xik  x jk )
i, j = records k=variable
k 1
 Manhattan Distance
dij 

m
xik  x jk
k 1

43
Column
Variable
Binary Variables
Row Variable
1
0
1a
b
a+b
0c
d
c+d
a+c b+d
44
Binary Variables
 Sample Matching
bc
d
abcd
 Rogers and Tanimoto
2(b  c)
d
(a  d )  2(b  c)
45
Results for 2 Clusters
Cluster Lawyer Back Claim Or Sprain Chiro or PT Prior Claim
1
77%
73%
56%
26%
2
3%
29%
14%
1%
Suspicious
Cluster
Claim
1
56%
2
3%
Average
Suspicion
Score
2.99
0.21
46
Beginners Library
 Berry, Michael J. A., and Linoff, Gordon, Data Mining
Techniques, John Wiley and Sons, 1997
 Kaufman, Leonard and Rousseeuw, Peter, Finding
Groups in Data, John Wiley and Sons, 1990
 Smith, Murry, Neural Networks for Statistical
Modeling, International Thompson Computer Press,
1996
47
Data Mining
CAMAR Spring Meeting
Louise Francis, FCAS, MAAA
[email protected]
www.data-mines.com