Neural Networks Demystified by Louise Francis

Download Report

Transcript Neural Networks Demystified by Louise Francis

Data Mining:
Neural Network
Applications
by Louise Francis
CAS Convention, Nov 13, 2001
Francis Analytics and Actuarial Data Mining, Inc.
[email protected]
www.francisanalytics.com
Objectives of Presentation
•
•
•
•
•
Introduce insurance professionals to neural
networks
Show that neural networks are a lot like some
conventional statistics
Indicate where use of neural networks might
be helpful
Show practical examples of using neural
networks
Show how to interpret neural network
models
Conventional Statistics:
Regression
•
One of the most common methods is
linear regression
•
Models a relationship between two
variables by fitting a straight line
through points.
A Common Actuarial
Example:
Trend Estimation
Autom obile BI Severity
$21,000.0
Severity
$16,000.0
$11,000.0
$6,000.0
$1,000.0
1980
1985
1990
Year
1995
2000
A Common Actuarial
Example:
Trend Estimation
A u t o m o b ile B I S e v e r it y W it h E x p o n e n t ia l C u r v e
$ 2 1 ,0 0 0 .0
S e v e r it y
$ 1 6 ,0 0 0 .0
$ 1 1 ,0 0 0 .0
$ 6 ,0 0 0 .0
$ 1 ,0 0 0 .0
1980
1985
1990
Year
1995
2000
The Severity Trend Model
•
Severityt = Constant(1+trend)t-t0
•
Log(Severity)=
Constant+(t-t0)*log(1+trend)+error
Solution to Regression
Problem
Min ((Y  Yˆ ) 2 )
N
 ( X i  X )(Yi  Y )
B i N
2
(
X

X
)
 i
i 1
a Y  B* X
Neural Networks
•
Also minimizes squared deviation
between fitted and actual values
•
Can be viewed as a non-parametric,
non-linear regression
The MLP Neural Network
Three Layer Neural Network
Input Layer
(Input Data)
Hidden Layer
(Process Data)
Output Layer
(Predicted Value)
The Activation Function
•
The sigmoid logistic function
f (Y ) 
1
1  e Y
Y  w0  w1 * X 1  w2 X 2 ...  wn X n
The Logistic Function
Logistic Function for Various Values of w1
1.0
0.8
w1=-10
w1=-5
w1=-1
w1=1
w1=5
w1=10
0.6
0.4
0.2
0.0
X
-1.2
-0.7
-0.2
0.3
0.8
Simple Trend Example:
One Hidden Node
Simple Neural Network
One Hidden Node
Input Layer
(Input Data)
Hidden Layer
(Process Data)
Output Layer
(Predicted Value)
Logistic Function of
Simple Trend Example
h  f ( X ; w0, w1 )  f ( w0  w1 X ) 
1
1  e ( w0  w1 X )
1
f ( f ( X ; w0 , w1 ); w2 , w3 ) 
1 e
 ( w2  w3
1
1 e
 w0  w1 X
)
Fitting the Curve
•
Typically use a procedure which
minimizes the squared error – like
regression does
Min ((Y  Yˆ ) 2 )
Trend Example: 1 Hidden Node
Auto BI Severity: Actual and Fitted Values
1 Hidden Node
$21,000.0
Actual Severity
Exponential
Neural Network
$16,000.0
$11,000.0
$6,000.0
$1,000.0
0.5
3.0
5.5
8.0
time
10.5
13.0
15.5
Trend Example: 2 Hidden Nodes
Autom obile B I Severity: Fitted vs Actual Values
2 N ode N etwork
$21,000.0
Actual Severity
Exponential
N eural N et
$16,000.0
$11,000.0
$6,000.0
$1,000.0
0.5
3.0
5.5
8.0
tim e
10.5
13.0
15.5
Trend Example: 3 Hidden Nodes
Auto B I Severity: Actual vs Fitted Values
3 N ode N etwork
$21,000.0
Actual Severity
Exponential
N eural N etwork
$16,000.0
$11,000.0
$6,000.0
$1,000.0
0.5
3.0
5.5
8.0
tim e
10.5
13.0
15.5
Universal Function
Approximator
 The
backpropigation neural network
with one hidden layer is a universal
function approximator
 Theoretically,
with a sufficient number
of nodes in the hidden layer, any
nonlinear function can be approximated
How Many Hidden Nodes?
 Too
well
few nodes: Don’t fit the curve very
 Too
many nodes: Over
parameterization
 May
fit noise as well as pattern
How Do We Determine the
Number of Hidden Nodes?
 Hold
out part of the sample
 Cross-Validation
 Resampling
 Bootstrapping
 Jacknifing
 Algebraic
formula
Hold Out Part of Sample
 Fit
model on 1/2 to 2/3 of data
 Test
fit of model on remaining data
 Need
a large sample
Cross-Validation
 Hold
out 1/n (say 1/10) of data
 Fit model to remaining data
 Test on portion of sample held out
 Do this n (say 10) times and average the
results
 Used for moderate sample sizes
 Jacknifing similar to cross-validation
Bootstrapping
 Create
many samples by drawing
samples, with replacement, from the
original data
 Fit the model to each of the samples
 Measure overall goodness of fit and
create distribution of results
 Uses for small and moderate sample
sizes
Jacknife Result
95th Percentile of Jacknife Estim ates
1
2
3
4
17000
Node
Nodes
Nodes
Nodes
12000
7000
2000
0
3
5
8
t
10
13
15
Result for Sample
Hold Out
One Step Ahead Results
Number of Nodes
R2
1
0.53
2
0.73
3
0.74
4
0.72
Interpreting Complex
Multi-Variable Model
 How
many hidden nodes?
 Which
keep?
variables should the analyst
Measuring Variable
Importance
 Look
at weights to hidden layer
 Compute

sensitivities:
a measure of how much the predicted
value’s error increases when the variables
are excluded from the model one at a time
Technical Predictors
of
Stock Price
A Complex Multivariate
Example
Stock Prediction: Which
Indicator is Best?
 Moving
Averages
 Measures of Volatility
 Seasonal Indicators
 The
January effect
 Oscillators
The Data
 S&P
Index since 1930
 Close
 S&P
Only
500 since 1962
 Open
 High
 Low
 Close
Moving Averages

A very commonly used technical indicator
1
week MA of returns
 2 week MA of returns
 1 month MA of returns
These are trend following indicators
 A more complicated time series smoother
based on running medians called T4253H

Volatility Measures
 Finance
literature suggests volatility of
market changes over time
 More turbulent market -> higher
volatility
 Measures
 Standard
deviation of returns
 Range of returns
 Moving averages of above
Seasonal Effects
Month Effect On Stock Returns
.02
.01
0.00
-.01
-.02
N=
1602
1429
1.00
2.00
MONTH
1646
1565
3.00
1598
1588
5.00
4.00
1560
1626
7.00
6.00
1514
1637
9.00
8.00
1478
1580
11.00
10.00
12.00
Oscillators
 May
indicate that market is overbought
or oversold
 May indicate that a trend is nearing
completion
 Some oscillators
 Moving
average differences
 Stochastic
Stochastic
Based on observation that as prices increase
closing prices tend to be closer to upper end
of range
 In downtrends, closing prices are near lower
end of range

 %K

= (C – L5)/(H5 – L5)
C is closing prince, L5 is 5 day low, H5 is 5 day high
 %D
= 3 day moving average of %K
Neural Network Result

Variable Importance
1.
2.
3.
4.
5.
6.
7.

Month
%K (from stochastic)
Smoothed standard deviation
Smoothed return
2 Week %D (from stochastic)
1 week range of returns
Smoothed %K
R2 was .15 or 15% of variance explained
What are the Relationships
between the Variables?
N e u ra l N e t P re d ic te d
S c a tte rp lo t o f % K a n d P re d ic te d V a lu e
0 .1 0
0 .0 5
0 .0 0
-0 .0 5
-0 .1 0
0 .1
0 .3
0 .5
%K
0 .7
0 .9
1 .1
Neural Network
Result for Seasonality
0.010
0.005
0.0
P re d ic t e d
0.015
Plot of Neural Network Predicted by Month
2
4
6
8
Month
10
12
Neural Network
Result for Oscillator
N e u ra l N e tw o rk R e s u lt fo r % K
P re d ic te d
0 .0 2
0 .0 1
0 .0 0
-0 .0 1
-0 .0 2
0 .1
0 .3
0 .5
%K
0 .7
0 .9
1 .1
Neural Network
Result for Seasonality
and Oscillator
Neural Network
Result
for0.9Month and %K
-0.1
0.4
month: 10
month: 11
month: 12
0.03
-0.01
month: 7
month: 8
month: 9
month: 4
month: 5
month: 6
Predicted
0.03
-0.01
0.03
-0.01
month: 1
month: 2
month: 3
0.03
-0.01
-0.1
0.4
0.9
-0.1
%K
0.4
0.9
Neural Network Result for
Seasonality and
Standard Deviation
Neural Netw ork Result for Standard Deviation
0.00
Predicted
-0.02
-0.04
-0.06
-0.08
-0.10
0.01
0.03
0.05
0.07
Sm oothed Standard Deviation
0.09
Neural Network Result for
Seasonality and
Standard Deviation
Neural Result for Month
and0.10
Standard Deviation
0.04
month: 10
month: 11
month: 12
0.05
0.00
month: 7
month: 8
month: 9
month: 4
month: 5
month: 6
Predicted
0.05
0.00
0.05
0.00
month: 1
month: 2
month: 3
0.05
0.00
0.04
0.10
0.04
Smoothed Standard Deviation
0.10
How Many Nodes?
Results of Holding Out 1/3 of Sample
Number of Nodes
2
3
4
5
6
R2
0.088
0.096
0.114
0.11
0.113
Conclusions





Neural Networks are a lot like conventional statistics
They address some problems of conventional
statistics: nonlinear relationships, correlated variables
and interactions
Despite black block aspect, we now can interpret
them
Can find further information, including paper, at
www.casact.org/aboutcas/mdiprize.htm
Paper and presentation can be found at
www.francisanalytics.com