Neural Networks Demystified by Louise Francis
Download
Report
Transcript Neural Networks Demystified by Louise Francis
Data Mining:
Neural Network
Applications
by Louise Francis
CAS Convention, Nov 13, 2001
Francis Analytics and Actuarial Data Mining, Inc.
[email protected]
www.francisanalytics.com
Objectives of Presentation
•
•
•
•
•
Introduce insurance professionals to neural
networks
Show that neural networks are a lot like some
conventional statistics
Indicate where use of neural networks might
be helpful
Show practical examples of using neural
networks
Show how to interpret neural network
models
Conventional Statistics:
Regression
•
One of the most common methods is
linear regression
•
Models a relationship between two
variables by fitting a straight line
through points.
A Common Actuarial
Example:
Trend Estimation
Autom obile BI Severity
$21,000.0
Severity
$16,000.0
$11,000.0
$6,000.0
$1,000.0
1980
1985
1990
Year
1995
2000
A Common Actuarial
Example:
Trend Estimation
A u t o m o b ile B I S e v e r it y W it h E x p o n e n t ia l C u r v e
$ 2 1 ,0 0 0 .0
S e v e r it y
$ 1 6 ,0 0 0 .0
$ 1 1 ,0 0 0 .0
$ 6 ,0 0 0 .0
$ 1 ,0 0 0 .0
1980
1985
1990
Year
1995
2000
The Severity Trend Model
•
Severityt = Constant(1+trend)t-t0
•
Log(Severity)=
Constant+(t-t0)*log(1+trend)+error
Solution to Regression
Problem
Min ((Y Yˆ ) 2 )
N
( X i X )(Yi Y )
B i N
2
(
X
X
)
i
i 1
a Y B* X
Neural Networks
•
Also minimizes squared deviation
between fitted and actual values
•
Can be viewed as a non-parametric,
non-linear regression
The MLP Neural Network
Three Layer Neural Network
Input Layer
(Input Data)
Hidden Layer
(Process Data)
Output Layer
(Predicted Value)
The Activation Function
•
The sigmoid logistic function
f (Y )
1
1 e Y
Y w0 w1 * X 1 w2 X 2 ... wn X n
The Logistic Function
Logistic Function for Various Values of w1
1.0
0.8
w1=-10
w1=-5
w1=-1
w1=1
w1=5
w1=10
0.6
0.4
0.2
0.0
X
-1.2
-0.7
-0.2
0.3
0.8
Simple Trend Example:
One Hidden Node
Simple Neural Network
One Hidden Node
Input Layer
(Input Data)
Hidden Layer
(Process Data)
Output Layer
(Predicted Value)
Logistic Function of
Simple Trend Example
h f ( X ; w0, w1 ) f ( w0 w1 X )
1
1 e ( w0 w1 X )
1
f ( f ( X ; w0 , w1 ); w2 , w3 )
1 e
( w2 w3
1
1 e
w0 w1 X
)
Fitting the Curve
•
Typically use a procedure which
minimizes the squared error – like
regression does
Min ((Y Yˆ ) 2 )
Trend Example: 1 Hidden Node
Auto BI Severity: Actual and Fitted Values
1 Hidden Node
$21,000.0
Actual Severity
Exponential
Neural Network
$16,000.0
$11,000.0
$6,000.0
$1,000.0
0.5
3.0
5.5
8.0
time
10.5
13.0
15.5
Trend Example: 2 Hidden Nodes
Autom obile B I Severity: Fitted vs Actual Values
2 N ode N etwork
$21,000.0
Actual Severity
Exponential
N eural N et
$16,000.0
$11,000.0
$6,000.0
$1,000.0
0.5
3.0
5.5
8.0
tim e
10.5
13.0
15.5
Trend Example: 3 Hidden Nodes
Auto B I Severity: Actual vs Fitted Values
3 N ode N etwork
$21,000.0
Actual Severity
Exponential
N eural N etwork
$16,000.0
$11,000.0
$6,000.0
$1,000.0
0.5
3.0
5.5
8.0
tim e
10.5
13.0
15.5
Universal Function
Approximator
The
backpropigation neural network
with one hidden layer is a universal
function approximator
Theoretically,
with a sufficient number
of nodes in the hidden layer, any
nonlinear function can be approximated
How Many Hidden Nodes?
Too
well
few nodes: Don’t fit the curve very
Too
many nodes: Over
parameterization
May
fit noise as well as pattern
How Do We Determine the
Number of Hidden Nodes?
Hold
out part of the sample
Cross-Validation
Resampling
Bootstrapping
Jacknifing
Algebraic
formula
Hold Out Part of Sample
Fit
model on 1/2 to 2/3 of data
Test
fit of model on remaining data
Need
a large sample
Cross-Validation
Hold
out 1/n (say 1/10) of data
Fit model to remaining data
Test on portion of sample held out
Do this n (say 10) times and average the
results
Used for moderate sample sizes
Jacknifing similar to cross-validation
Bootstrapping
Create
many samples by drawing
samples, with replacement, from the
original data
Fit the model to each of the samples
Measure overall goodness of fit and
create distribution of results
Uses for small and moderate sample
sizes
Jacknife Result
95th Percentile of Jacknife Estim ates
1
2
3
4
17000
Node
Nodes
Nodes
Nodes
12000
7000
2000
0
3
5
8
t
10
13
15
Result for Sample
Hold Out
One Step Ahead Results
Number of Nodes
R2
1
0.53
2
0.73
3
0.74
4
0.72
Interpreting Complex
Multi-Variable Model
How
many hidden nodes?
Which
keep?
variables should the analyst
Measuring Variable
Importance
Look
at weights to hidden layer
Compute
sensitivities:
a measure of how much the predicted
value’s error increases when the variables
are excluded from the model one at a time
Technical Predictors
of
Stock Price
A Complex Multivariate
Example
Stock Prediction: Which
Indicator is Best?
Moving
Averages
Measures of Volatility
Seasonal Indicators
The
January effect
Oscillators
The Data
S&P
Index since 1930
Close
S&P
Only
500 since 1962
Open
High
Low
Close
Moving Averages
A very commonly used technical indicator
1
week MA of returns
2 week MA of returns
1 month MA of returns
These are trend following indicators
A more complicated time series smoother
based on running medians called T4253H
Volatility Measures
Finance
literature suggests volatility of
market changes over time
More turbulent market -> higher
volatility
Measures
Standard
deviation of returns
Range of returns
Moving averages of above
Seasonal Effects
Month Effect On Stock Returns
.02
.01
0.00
-.01
-.02
N=
1602
1429
1.00
2.00
MONTH
1646
1565
3.00
1598
1588
5.00
4.00
1560
1626
7.00
6.00
1514
1637
9.00
8.00
1478
1580
11.00
10.00
12.00
Oscillators
May
indicate that market is overbought
or oversold
May indicate that a trend is nearing
completion
Some oscillators
Moving
average differences
Stochastic
Stochastic
Based on observation that as prices increase
closing prices tend to be closer to upper end
of range
In downtrends, closing prices are near lower
end of range
%K
= (C – L5)/(H5 – L5)
C is closing prince, L5 is 5 day low, H5 is 5 day high
%D
= 3 day moving average of %K
Neural Network Result
Variable Importance
1.
2.
3.
4.
5.
6.
7.
Month
%K (from stochastic)
Smoothed standard deviation
Smoothed return
2 Week %D (from stochastic)
1 week range of returns
Smoothed %K
R2 was .15 or 15% of variance explained
What are the Relationships
between the Variables?
N e u ra l N e t P re d ic te d
S c a tte rp lo t o f % K a n d P re d ic te d V a lu e
0 .1 0
0 .0 5
0 .0 0
-0 .0 5
-0 .1 0
0 .1
0 .3
0 .5
%K
0 .7
0 .9
1 .1
Neural Network
Result for Seasonality
0.010
0.005
0.0
P re d ic t e d
0.015
Plot of Neural Network Predicted by Month
2
4
6
8
Month
10
12
Neural Network
Result for Oscillator
N e u ra l N e tw o rk R e s u lt fo r % K
P re d ic te d
0 .0 2
0 .0 1
0 .0 0
-0 .0 1
-0 .0 2
0 .1
0 .3
0 .5
%K
0 .7
0 .9
1 .1
Neural Network
Result for Seasonality
and Oscillator
Neural Network
Result
for0.9Month and %K
-0.1
0.4
month: 10
month: 11
month: 12
0.03
-0.01
month: 7
month: 8
month: 9
month: 4
month: 5
month: 6
Predicted
0.03
-0.01
0.03
-0.01
month: 1
month: 2
month: 3
0.03
-0.01
-0.1
0.4
0.9
-0.1
%K
0.4
0.9
Neural Network Result for
Seasonality and
Standard Deviation
Neural Netw ork Result for Standard Deviation
0.00
Predicted
-0.02
-0.04
-0.06
-0.08
-0.10
0.01
0.03
0.05
0.07
Sm oothed Standard Deviation
0.09
Neural Network Result for
Seasonality and
Standard Deviation
Neural Result for Month
and0.10
Standard Deviation
0.04
month: 10
month: 11
month: 12
0.05
0.00
month: 7
month: 8
month: 9
month: 4
month: 5
month: 6
Predicted
0.05
0.00
0.05
0.00
month: 1
month: 2
month: 3
0.05
0.00
0.04
0.10
0.04
Smoothed Standard Deviation
0.10
How Many Nodes?
Results of Holding Out 1/3 of Sample
Number of Nodes
2
3
4
5
6
R2
0.088
0.096
0.114
0.11
0.113
Conclusions
Neural Networks are a lot like conventional statistics
They address some problems of conventional
statistics: nonlinear relationships, correlated variables
and interactions
Despite black block aspect, we now can interpret
them
Can find further information, including paper, at
www.casact.org/aboutcas/mdiprize.htm
Paper and presentation can be found at
www.francisanalytics.com