The Artificial Neural Network

Download Report

Transcript The Artificial Neural Network

DOCTORAL SCHOOL OF FINANCE AND BANKING DOFIN
ACADEMY OF ECONOMIC STUDIES
Forecasting the BET-C Stock Index with
Artificial Neural Networks
MSc Student: Stoica Ioan-Andrei
Supervisor: Professor Moisa Altar
July 2006
Stock Markets and Prediction


Predicting stock prices - goal of every investor trying to achieve profit on the
stock market
predictability of the market - issue that has been discussed by a lot of
researchers and academics


Efficient Market Hypothesis - Eugene Fama
three forms:
 Weak: future stock prices can’t be predicted using past stock prices
 Semi-strong: even published information can’t be used to predict future
prices

Strong: market can’t be predicted no matter what information is available
Stock Markets and Prediction



Technical Analysis
 ‘castles-in-the air’
 investors behavior and reactions according to these anticipations
Fundamental Analysis
 ‘firm foundations’
 stocks have an intrinsic value determined by present conditions and
future prospects of the company
Traditional Time Series Analysis
 uses historic data attempting to approximate future values of a time series
as a linear combination

Machine Learning - Artificial Neural Networks
The Artificial Neural Network




computational technique that benefits
from techniques similar to those
employed in the human brain
1943 - W.S. McCulloch and W. Pitts
attempted to mimic the ability of the
human brain to process data and
information and comprehend patterns
and dependencies
The human brain - a complex,
nonlinear and parallel computer
The neurons:
 elementary information
processing units
 building blocks of a
neural network
The Artificial Neural Network



semi-parametric approximation method
Advantages:
 ability to detect nonlinear dependencies
 parsimonious compared to polynomial expansions
 generalization ability and robustness
 no assumptions of the model have to be made
 flexibility
Disadvantages:
 has the ‘black box’ property
 training requires an experienced user



training takes a lot of time, fast computer needed
overtraining  overfitting
undertraining  underfitting
The Artificial Neural Network
y  f ( x)  sin x  ln x
The Artificial Neural Network
y  sin x  
The Artificial Neural Network
Overtraining/Overfitting
The Artificial Neural Network
Undertraining/Underfitting
Architecture of the Neural Network

Types of layers:
 input layer: number of neurons = number of inputs
 output layer: number of neurons = number of outputs
 hidden layer(s): number of neurons = trial and error

Connections between neurons:
 fully connected
 partially connected
The activation function:
 threshold function
 piecewise linear function
 sigmoid functions

The feed forward network
n
nk ,t   k ,0    k ,i * xi ,t
i 1
N k ,t  L(nk ,t ) 
^
1
n
1  e k ,t
m
y t   0    k * N k ,t
k 1
m = number of hidden layer neurons
n = number of inputs
The Feed forward Network with Jump Connections
n
nk ,t   k , 0    k ,i * xi ,t
i 1
N k ,t  L(nk ,t ) 
^
1
n
1  e k ,t
m
n
k 1
i 1
y t   0    k * N k ,t    i xi ,t
The Recurrent Neural Network - Elman
n
m
i 1
k 1
nk ,t  k ,0   k ,i * xi ,t   k * nk ,t 1
N k ,t  L(nk ,t ) 
^
1
n
1  e k ,t
m
y t   0    k * N k ,t
k 1
allows the neurons to depend on their own lagged values  building ‘memory’ in their evolution
Training the Neural Network
Objective: minimizing the discrepancy between real data and the output of the network
T
^
min  ()   ( yt  y t ) 2

t 1
^
y t  f ( x t ; )
Ω - the set of parameters
Ψ – loss function
Ψ nonlinear  nonlinear optimization problem
- backpropagation
- genetic algorithm
The Backpropagation Algorithm


alternative to quasi-Newton gradient descent
Ω0 – randomly generated
(1   0 )    0


ρ – learning parameter, in [.05,.5]
after n iterations: μ=0.9, momentum parameter
 n   n1   n1   ( n1   n2 )

problem: local minimum points
The Genetic Algorithm







based on Darwinian laws
Population Creation: N random vectors of weights
Selection  (Ωi Ωj) parent vectors
Crossover & Mutation  C1,C2 children vectors
Election Tournament: the fittest 2 vectors passed to the next
generation
Convergence: G* generations
G* - large enough so there are no significant changes in the
fitness of the best individual for several generations
Experiments and Results
Data


BET-C stock index – daily closing prices, 16 April 1998 until 18 May 2006
daily returns: Rt  ln Pt  ln Pt 1  ln Pt
Pt 1
20

conditional volatility - rolling 20-day standard deviation:

BDS-Test for nonlinear dependencies:
 H0: i.i.d. data
 BDSm,ε~N(0,1)
Series
m=2
Vt 
_
 ( Rt i  R t ) 2
i 1
m=3
19
m=4
ε=1
ε=1.5
ε=1
ε=1.5
ε=1
ε=1.5
OD
16.6526
17.6970
18.5436
18.7202
19.7849
19.0588
ARF
16.2626
17.2148
18.3803
18.4839
19.7618
18.9595
Experiments and Results





3 types of Ann's:
 feed-forward network
 feed-forward network with jump connections
 recurrent network
Input: [Rt-1 Rt-2 Rt-3 Rt-4 Rt-5] & Vt
Output: next-day-return Rt
Training: genetic algorithm & backpropagation
Data divided in:




training set – 90%
test set – 10%
one-day-ahead forecasts - static forecasting
Network:



trained 100 times
best 10 – SSE
best 1 - RMSE
Experiments and Results
Evaluation Criteria

In-sample Criteria
T
R 
2

^
 ( yt  yt )2
t 1
T

 ( yt  y t ) 2
T
 1
t 1

^
 ( yt  y t ) 2
t 1
T

 ( yt  y t ) 2
T
^
T
SSE   ( y t  y t )    t2
2
t 1
t 1
t 1
Out-of-sample Criteria
T
HR 
T
I
t 1
T
RMSE 
(y
t 1
^
t
 yt )
2
T
1 T
MAE   yˆ t  yt
T t 1
t
^

1, if yt y t  0
It  

0

^
ROI   y t sign ( y t )
t 1
T
RP 
^
 yt sign ( y t )
t 1
T
| y
t 1

t
|
Pesaran-Timmerman Test for Directional Accuracy:
 H0 : signs of the forecast and those of the real data are independent

DA~N(0,1)
Experiments and Results
ROI - trading strategy based on the sign forecasts:


+ buy sign
- sell sign
Finite differences:


y f ( x1 ,..., xi  hi ,..., xn )  f ( x1 ,..., xi ,...xn )

xi
hi
Benchmarks



Naïve model: Rt+1=Rt
buy-and-hold strategy
AR(1) model – LS – overfitting:


RMSE
MAE
hi  10 6
Experiments and Results
Naïve
AR(1)
FFN – no vol
FFN
FFN-jump
RN
R2
-
0.079257
0.083252
0.083755
0.084827
0.091762
SSE
-
0.332702
0.331258
0.331077
0.330689
0.328183
RMSE
0.015100
0.011344
0.011325
0.011304
0.011332
0.011319
MAE
0.011948
0.008932
0.008929
0.008873
0.008867
0.008892
HR
55.77% (111)
56.78% (113)
57.79% (115)
59.79% (119)
59.79% (119)
59.79% (119)
ROI
0.265271
0.255605
0.318374
0.351890
0.331464
0.412183
RP
15.02%
14.47%
18.02%
19.92%
18.77%
23.34%
PT-Test
-
-
14.79
15.01
15.01
14.49
B&H
0.2753
0.2753
0.2753
0.2753
0.2753
0.2753
FFN
Volatility
-0.1123
FFN-jump
-0.1358
RN
-0.1841
Experiments and Results
Actual, fitted ( training sample)
Experiments and Results
Actual, fitted ( test sample)
Conclusions






RMSE and MAE < AR(1)  no signs of overfitting
R2 < 0.1  forecasting magnitude is a failure
sign forecasting ~60%  success
Volatility:
 improves sign forecast
 finite differences  negative correlation
 perceived as measure of risk
trading strategy: outperforms naïve model and buy-and-hold
quality of the sign forecast – confirmed by Pesaran-Timmerman
test
Further development



Volatility: other estimates
neural classificator: specialized in sign forecasting
using data outside the Bucharest Stock Exchange:
 T-Bond yields
 exchange rates
 indexes from foreign capital markets