artificial neural network biologic neural network

Download Report

Transcript artificial neural network biologic neural network

Document Analysis:
Artificial Neural Networks
Prof. Rolf Ingold, University of Fribourg
Master course, spring semester 2008
Prénom Nom
Outline









Biological vs. artificial neural networks
Artificial neuron model
Artificial neural networks
Multi-layer perceptron
Feed-forward activation
Learning approach
Back-propagation method
Optimal learning
Illustration of JavaNNS
2
© Prof. Rolf Ingold
Biological neurons

Artificial neural networks are
inspired by biological neurons of
the central nervous system
 each neuron is connected
to many other neurons
 information is transmitted
via synapses (electrochemical process)
 a neuron receives input on
the from its dendrites, and
transmit output via the axon
to synapses
3
© Prof. Rolf Ingold
Biological vs artificial networks
biologic neural
network
artificial neural
network
processing
chemical
mathematical
function
transmission time
relatively slow
very fast
number of neurons
approx. 1010
max. 104 à 106
number de synapses
approx. 1013
up to 108
4
© Prof. Rolf Ingold
Artificial neuron model

A neuron receives input signals x1, ..., xn

These signals are multiplied by synaptic weights w1, ..., wn, which
can be positive or negative
The activation of the neuron

a

i wi xi
is transmitted to a non linear
function f with threshold w0
The output signal
y = f (a-w0)
is then propagated to other
neurons
5
© Prof. Rolf Ingold
Characteristics of artificial neural networks

Artificial neural networks may vary in different aspects
 the topology of the network, i.e.
 the number of neurons, possibly organized in layers or
classes
 how each neuron (of a given layer/class) is connected to its
neighbors
 the transfer function used in each neuron

The use and the learning strategy has to be adapted
6
© Prof. Rolf Ingold
Topology of the neural network

The synaptic connections have a major influence on the behavior of
the neural network

Two main categories can be considered
 feed-forward networks where each neuron is propagating its
output signal to neurons that have not yet been used
 as special case the multi-layer perceptron has a
sequence of layers such than a neurons from one layer is
connected only to neurons of the next layer
 dynamic networks where neurons are connected without
restrictions, in a cyclic way
7
© Prof. Rolf Ingold
Multi-layer perceptron



The multi-layer perceptron (MLP) has 3 (or more) layers
 an input layer with one input neuron per feature
 one or several hidden layers having each an arbitrary number
of neurons, connected to the previous layer
 an output layer with one neuron per class each neuron being
connected to the previous layer
Hidden and output layers can
be completely or only partly
connected
The decision is in favor of the
class corresponding to the
highest output activation
8
© Prof. Rolf Ingold
Impact of the hidden layer(s)

Networks with hidden layers generate arbitrary decision boundaries
 however the number of hidden layers has no impact !
9
© Prof. Rolf Ingold
Feed-forward activation

As for the single perceptron, the feature space is augmented with a
feature x0=1 to take into account the bias w0 .

Each neuron j of a hidden layer computes an activation
y j  f (net j )
d
with
net j   xi w ji  w tj x
i 0

Each neuron k of a output layer computes an activation
zk  f (netk )
d
with
net k   y j wkj  w tk y
i 0
10
© Prof. Rolf Ingold
Transfer function


The transfer function f is supposed to be
 monotonic increasing, within the range [-1,+1]
 antisymmetric, i.e. f (-net) = - f (net)
 continuous and derivable (for back-propagation)
Typical functions are
 simple threshold
1 si a  w0  0
f (a  w0 )  
  1 sinon
 xxx
1 si a  w0  T


f (a  w0 )   x / T si  T  a  w0  T
  1 si a  w  T

0
 sigmoide
e( a  w0 ) / T  1
f (a  w0 )  ( a  w ) / T
 tanh(a  w0 ) / 2T 
e 0 1
11
© Prof. Rolf Ingold
Learning in a multi-layer perceptron



Learning consists of setting the weights w, based on training
samples
The method is called back-propagation, because the training error
is propagated recursively from the output layer back to the hidden
and input layers
The training error on a given pattern is defined as the squared
difference between the desired output and the observed output, i.e.
1
1 C
2
2
J (w )  t  z   t k  z k 
2
2 k 1

In practice, the desired output is +1 for the correct class and 1
(or sometimes 0) for all other classes
12
© Prof. Rolf Ingold
Back-propagation of errors

The weight vectors are changed in the direction of their gradient
w  h
J
w
where h is the learning rate
13
© Prof. Rolf Ingold
Error correction on the output layer

Since the error does not directly depend upon wji we apply the
differential chain rule
netk
J
J netk

  k
wkj netk wkj
wkj

J
J zk

 (t k  zk ) f ' (net k )
net k
zk net k
with
k  
and
netk
 yj
wkj
Thus the update rule becomes
wkj  h (t k  zk ) f ' (net k ) y j  h k y j
14
© Prof. Rolf Ingold
Error correction on the hidden layer(s)

Applying the following chain rule
J
J y j

w ji y j w ji
y j
y j net j

 f ' (net j ) xi
with
w ji net j w ji
c
z
1 c
2
  t k  z k     t k  z k  k
y j
k 1
 2 k 1

zk
zk netk

 f ' (netk )wkj
y j netk y j
J


y j y j
c
c
J
  t k  zk  f ' (net k ) wkj   wkj k
y j
k 1
k 1

Finally the update rule becomes
c
w ji  h  wkj k f ' (net k ) xi  h j xi
k 1
15
© Prof. Rolf Ingold
Learning algorithm



The learning process starts with randomly initialized weights
The weights are adjusted iteratively by patterns from the training set
 the pattern is presented to the network and the feed-forward
activation is computed
 the output error is computed
 the error is used to
update the weights
w[1]
reversely, from the
[q+1][1]
•
f'(z)
w[2]
output layer to the
[q+1][2]
•
hidden layers
w[3]
[q]
[q+1][3]
•
•
•
The process is repeated
for until a quality criteria
w[n]
is reached
[q+1][n]
•
16
© Prof. Rolf Ingold
Risk of overfitting


By minimizing the global error over all training sample tends to
produce overfitting
To avoid overfitting, the best strategy is to minimize the global error
on a validation set which is independent of training set
17
© Prof. Rolf Ingold
JavaNNS

JavaNNS is an interactive software framework for experimenting
artificial neural networks,
 it has been developed at University of Tübingen
 it is

It supporting the following features
 multiple topologies (MLP, dynamic networks, ...)
 different transfer functions
 different learning strategies
 network pruning
 ...
18
© Prof. Rolf Ingold
Font recognition with JavaNNS

Original neural
network with 9
hidden units
19
© Prof. Rolf Ingold
Pruned neural network for font recognition

Neural network
obtained after
pruning
20
© Prof. Rolf Ingold