gd-presentation

Download Report

Transcript gd-presentation

Neural Computation
and Applications in
Time Series and Signal Processing
Georg Dorffner
Dept. of Medical Cybernetics and
Artificial Intelligence, University of Vienna
And
Austrian Research Institute for Artificial Intelligence
June 2003
Neural Computation for Time Series
1
Neural Computation
• Originally biologically motivated
(information processing in the brain)
• Simple mathematical model of the neuron
 neural network
• Large number of simple „units“
• Massively parallel (in theory)
• Complexity through the interplay of many simple
elements
• Strong relationship to methods from statistics
• Suitable for pattern recognition
June 2003
Neural Computation for Time Series
2
A Unit
Activation, Output
Weight
w1
w2
• Propagation rule:
Unit (Neuron)

yj f
xj
• Transfer function f:
…
wi
June 2003
– Weighted sum
– Euclidian distance
(Net-) Input
– Threshold fct.
(McCulloch & Pitts)
– Linear fct.
– Sigmoid fct.
– Gaussian fct.
Neural Computation for Time Series
3
Multilayer Perceptron (MLP),
Radial Basis Function Network (RBFN)
• 2 (or more) layers (= connections)
Output Units
(typically linear)
Hidden Units
(typically nonlinear)
Input Units
k

 n
in 
out
MLP: x   vlj f   wil xi  RBFN: x j   vlj f 

l 1
 i 1

l 1

1
 x2
f x  
f x   e
1  e x
k
out
j
June 2003
Neural Computation for Time Series
 w
n
i 1
il

in 2
i
x
4




MLP as Universal Function Approximator
• E.g,: 1 Input, 1
Output, 5 Hidden
• MLP can
approximate arbitrary
functions (Hornik et
al. 1990)
• trough superposition
of weighted sigmoids
• Similar is true for
RBFN
x
out
k
  w
 gk x
n
in
j 1
out
jk
Stretch, mirror
June 2003
Neural Computation for Time Series
 m hid in

out
f   wij xi  wihid
0   wj0
 i 1

move
(bias)
5
Training (Model Estimation)
• Iterative optimisation based on gradient
(gradient descent, conjugent gradient, quasi-Newton):
E
E xout

wi xout wi
contribution of network
contribution of error function
• Typical error function:
E   xkout,( i )  t k( i )  (summed squared error)
n
m
2
i 1 k 1
target
all patterns
all outputs
• „Backpropagation“ (application of chain rule):
out
 out
 f ' y out
j
j t j  x j 
June 2003

hid
j
 f y
'
 w
n
hid
j
Neural Computation for Time Series
k 1

out out
jk k
6
Recurrent Perceptrons
• Recurrent
connection =
feedback loop
• From hidden
layer („Elman“)
or output layer
(„Jordan“)
June 2003
copy
Input
Zustands- bzw.
Kontextlayer
Learning:
„backpropagation through time“
Neural Computation for Time Series
7
Time series processing
• Given: time-dependent observables

xt , t  0,1,
• Scalar: univariate; vector: multivariate
• Typical tasks:
- Forecasting
- Noise modeling
Time series
(minutes to days)
June 2003
- Pattern
recognition
- Modeling
- Filtering
- Source separation
Signals
(milliseconds to seconds)
Neural Computation for Time Series
8
Examples
Standard & Poor‘s
Sunspots
Preprocessed: rt  xt  xt 1 (returns)
Preprocessed: st  xt  xt 11 (de-seasoned)
June 2003
Neural Computation for Time Series
9
Autoregressive models
• Forecasting: making use of past information to predict
(estimate) the future
• AR: Past information = past observations
xt  F xt 1 , xt  2 ,, xt  p    t
past observations
Expected value
X t, p
Noise,
„random shock“
x̂t
• Best forecast: expected value
June 2003
Neural Computation for Time Series
10
Linear AR models
• Most common case:
p
xt   ai xt 1   t
i 1
• Simplest form: random
walk
xt  xt 1   t ;  t ~ N 0,1
• Nontrivial forecast
impossible
June 2003
Neural Computation for Time Series
11
MLP as NAR
• Neural network can approximate nonlinear AR model
• „time window“ or „time delay“
June 2003
Neural Computation for Time Series
12
Noise modeling
• Regression is density
estimation of:
(Bishop 1995)
px, t   pt | x px
Target = future
past
• Likelihood:
L   p t i | x i p x i 
n
i 1
Distribution with expected value F(xi)
June 2003
Neural Computation for Time Series
13
Gaussian noise
• Likelihood:
 F X t , p ; W   xt 2 
1

exp  
2


2
2 


LW    pxt | X t , p   
n
n
t 1
t 1
• Maximization = minimization of -logL
(constant terms can be deleted, incl. p(x))
E   F X t , p ; W   xt 
n
2
i 1
• Corresponds to summed squared error
(typical backpropagation)
June 2003
Neural Computation for Time Series
14
Complex noise models
• Assumption: arbitrary distribution


 ~ D
• Parameters are time dependent (dependent on past):

  g X t , p 
• Likelihood:



L   d   g X t(,ip) 
N
i 1
Probability density function for D
June 2003
Neural Computation for Time Series
15
Heteroskedastic time series
• Assumption: Noise is Gaussian with time-dependent
variance
x   X 

N
1
2  X 
L
e
i 1
2t2 X t(,ip) 
• ARCH model
(i )
t
2
t
(i )
t,p
(i )
t,p
2
p
 t2   ai rt 2i
i 1
• MLP is nonlinear ARCH (when applied to returns/residuals)
 t2  F rt 1 , rt 2 ,, rt  p   F ' rt21 , rt22 ,, rt2 p 
June 2003
Neural Computation for Time Series
16
Non-Gaussian noise
• Other parametric pdfs (e.g. t-distribution)
• Mixture of Gaussians (Mixture density network, Bishop
1994)


k
 2 
d  , ,   
i 1
 i X t , p 
2 X t , p 
2
i
e

x  i  X t , p 2

2 i2  X t , p 
• Network with 3k outputs (or separate networks)
June 2003
Neural Computation for Time Series
17
Identifiability problem
• Mixture models (like neural networks) are not identifiable
(parameters cannot be interpreted)
• No distinction between model and noise
e.g. sunspot data:
•  Models have to be treated with care
June 2003
Neural Computation for Time Series
18
Recurrent networks: Moving Average
• Second model class: Moving Average models
• Past information: random shocks
q
xt   bi t i
i 0
• Recurrent (Jordan)
network: Nonlinear MA
 t  xˆt  xt
• However, convergence not
guaranteed
June 2003
Neural Computation for Time Series
19
GARCH
• Extension of ARCH:
p
p
i 1
i 1
 t2   ai rt 2i   bi t2i
• Explains „volatility clustering“
• Neural network can again be a nonlinear version
• Using past estimates: recurrent network
June 2003
Neural Computation for Time Series
20
State space models
• Observables depend on (hidden) time-variant state

xt  Cst   t



st  Ast 1  Bt
• Strong relationship to recurrent (Elman) networks
• Nonlinear version only with additional hidden layers
June 2003
Neural Computation for Time Series
21
Symbolic time series
xt  si 
• Examples:
alphabet
– DNA
– Text
– Quantised time series (e.g. „up“ and „down“)
• Past information: past p symbols  probability
distribution
p xt | xt 1 , xt  2 , , xt  p
• Markov chains
• Problem: long substrings are rare

June 2003
Neural Computation for Time Series

22
Fractal prediction machines
• Similar
subsequences are
mapped to points
close in space
• Clustering =
extraction of
stochastic
automaton
June 2003
Neural Computation for Time Series
23
Relationship to recurrent network
• Network of 2nd order
June 2003
Neural Computation for Time Series
24
Other topics
• Filtering:
corresponds to ARMA models
NN as nonlinear filters
• Source separation
independent component analysis
• Relationship to stochastic automata
June 2003
Neural Computation for Time Series
25
Practical considerations
• Stationarity is an
important issue
• Preprocessing (trends,
seasonalities)
• N-fold cross-validation
time-wise
(validation set must be
after training set
• Mean and standard
deviation  model
selection
June 2003
test
train
validation
Neural Computation for Time Series
26
Summary
• Neural networks are powerful semi-parametric
models for nonlinear dependencies
• Can be considered as nonlinear extensions of
classical time series and signal processing
techniques
• Applying semi-parametric models to noise
modeling adds another interesting facet
• Models must be treated with care, much data
necessary
June 2003
Neural Computation for Time Series
27