artificial neural network biologic neural network
Download
Report
Transcript artificial neural network biologic neural network
Document Analysis:
Artificial Neural Networks
Prof. Rolf Ingold, University of Fribourg
Master course, spring semester 2008
Prénom Nom
Outline
Biological vs. artificial neural networks
Artificial neuron model
Artificial neural networks
Multi-layer perceptron
Feed-forward activation
Learning approach
Back-propagation method
Optimal learning
Illustration of JavaNNS
2
© Prof. Rolf Ingold
Biological neurons
Artificial neural networks are
inspired by biological neurons of
the central nervous system
each neuron is connected
to many other neurons
information is transmitted
via synapses (electrochemical process)
a neuron receives input on
the from its dendrites, and
transmit output via the axon
to synapses
3
© Prof. Rolf Ingold
Biological vs artificial networks
biologic neural
network
artificial neural
network
processing
chemical
mathematical
function
transmission time
relatively slow
very fast
number of neurons
approx. 1010
max. 104 à 106
number de synapses
approx. 1013
up to 108
4
© Prof. Rolf Ingold
Artificial neuron model
A neuron receives input signals x1, ..., xn
These signals are multiplied by synaptic weights w1, ..., wn, which
can be positive or negative
The activation of the neuron
a
i wi xi
is transmitted to a non linear
function f with threshold w0
The output signal
y = f (a-w0)
is then propagated to other
neurons
5
© Prof. Rolf Ingold
Characteristics of artificial neural networks
Artificial neural networks may vary in different aspects
the topology of the network, i.e.
the number of neurons, possibly organized in layers or
classes
how each neuron (of a given layer/class) is connected to its
neighbors
the transfer function used in each neuron
The use and the learning strategy has to be adapted
6
© Prof. Rolf Ingold
Topology of the neural network
The synaptic connections have a major influence on the behavior of
the neural network
Two main categories can be considered
feed-forward networks where each neuron is propagating its
output signal to neurons that have not yet been used
as special case the multi-layer perceptron has a
sequence of layers such than a neurons from one layer is
connected only to neurons of the next layer
dynamic networks where neurons are connected without
restrictions, in a cyclic way
7
© Prof. Rolf Ingold
Multi-layer perceptron
The multi-layer perceptron (MLP) has 3 (or more) layers
an input layer with one input neuron per feature
one or several hidden layers having each an arbitrary number
of neurons, connected to the previous layer
an output layer with one neuron per class each neuron being
connected to the previous layer
Hidden and output layers can
be completely or only partly
connected
The decision is in favor of the
class corresponding to the
highest output activation
8
© Prof. Rolf Ingold
Impact of the hidden layer(s)
Networks with hidden layers generate arbitrary decision boundaries
however the number of hidden layers has no impact !
9
© Prof. Rolf Ingold
Feed-forward activation
As for the single perceptron, the feature space is augmented with a
feature x0=1 to take into account the bias w0 .
Each neuron j of a hidden layer computes an activation
y j f (net j )
d
with
net j xi w ji w tj x
i 0
Each neuron k of a output layer computes an activation
zk f (netk )
d
with
net k y j wkj w tk y
i 0
10
© Prof. Rolf Ingold
Transfer function
The transfer function f is supposed to be
monotonic increasing, within the range [-1,+1]
antisymmetric, i.e. f (-net) = - f (net)
continuous and derivable (for back-propagation)
Typical functions are
simple threshold
1 si a w0 0
f (a w0 )
1 sinon
xxx
1 si a w0 T
f (a w0 ) x / T si T a w0 T
1 si a w T
0
sigmoide
e( a w0 ) / T 1
f (a w0 ) ( a w ) / T
tanh(a w0 ) / 2T
e 0 1
11
© Prof. Rolf Ingold
Learning in a multi-layer perceptron
Learning consists of setting the weights w, based on training
samples
The method is called back-propagation, because the training error
is propagated recursively from the output layer back to the hidden
and input layers
The training error on a given pattern is defined as the squared
difference between the desired output and the observed output, i.e.
1
1 C
2
2
J (w ) t z t k z k
2
2 k 1
In practice, the desired output is +1 for the correct class and 1
(or sometimes 0) for all other classes
12
© Prof. Rolf Ingold
Back-propagation of errors
The weight vectors are changed in the direction of their gradient
w h
J
w
where h is the learning rate
13
© Prof. Rolf Ingold
Error correction on the output layer
Since the error does not directly depend upon wji we apply the
differential chain rule
netk
J
J netk
k
wkj netk wkj
wkj
J
J zk
(t k zk ) f ' (net k )
net k
zk net k
with
k
and
netk
yj
wkj
Thus the update rule becomes
wkj h (t k zk ) f ' (net k ) y j h k y j
14
© Prof. Rolf Ingold
Error correction on the hidden layer(s)
Applying the following chain rule
J
J y j
w ji y j w ji
y j
y j net j
f ' (net j ) xi
with
w ji net j w ji
c
z
1 c
2
t k z k t k z k k
y j
k 1
2 k 1
zk
zk netk
f ' (netk )wkj
y j netk y j
J
y j y j
c
c
J
t k zk f ' (net k ) wkj wkj k
y j
k 1
k 1
Finally the update rule becomes
c
w ji h wkj k f ' (net k ) xi h j xi
k 1
15
© Prof. Rolf Ingold
Learning algorithm
The learning process starts with randomly initialized weights
The weights are adjusted iteratively by patterns from the training set
the pattern is presented to the network and the feed-forward
activation is computed
the output error is computed
the error is used to
update the weights
w[1]
reversely, from the
[q+1][1]
•
f'(z)
w[2]
output layer to the
[q+1][2]
•
hidden layers
w[3]
[q]
[q+1][3]
•
•
•
The process is repeated
for until a quality criteria
w[n]
is reached
[q+1][n]
•
16
© Prof. Rolf Ingold
Risk of overfitting
By minimizing the global error over all training sample tends to
produce overfitting
To avoid overfitting, the best strategy is to minimize the global error
on a validation set which is independent of training set
17
© Prof. Rolf Ingold
JavaNNS
JavaNNS is an interactive software framework for experimenting
artificial neural networks,
it has been developed at University of Tübingen
it is
It supporting the following features
multiple topologies (MLP, dynamic networks, ...)
different transfer functions
different learning strategies
network pruning
...
18
© Prof. Rolf Ingold
Font recognition with JavaNNS
Original neural
network with 9
hidden units
19
© Prof. Rolf Ingold
Pruned neural network for font recognition
Neural network
obtained after
pruning
20
© Prof. Rolf Ingold