Transcript Talk 8

Artificial Neural Network Paradigms
Marc Pomplun
Department of Computer Science
University of Massachusetts at Boston
E-mail: [email protected]
Homepage: http://www.cs.umb.edu/~marc/
Artificial Neural Network Paradigms
Overview:
The Backpropagation Network (BPN)
Supervised Learning in the BPN
The Self-Organizing Map (SOM)
Unsupervised Learning in the SOM
Instantaneous Learning: The Hopfield
Network
The Backpropagation Network
The backpropagation network (BPN) is the most
popular type of ANN for applications such as
classification or function approximation.
Like other networks using supervised learning, the
BPN is not biologically plausible.
The structure of the network is identical to the one we
discussed before:
• Three (sometimes more) layers of neurons,
• Only feedforward processing:
input layer  hidden layer  output layer,
• Sigmoid activation functions
The Backpropagation Network
BPN units and activation functions:
output vector y
O1
H1
H2
I1
f(neto)
… OK
H3
… II
I2
input vector x
… HJ
f(neth)
Supervised Learning in the BPN
Before the learning process starts, all weights
(synapses) in the network are initialized with
pseudorandom numbers.
We also have to provide a set of training patterns
(exemplars). They can be described as a set of
ordered vector pairs {(x1, y1), (x2, y2), …, (xP, yP)}.
Then we can start the backpropagation learning
algorithm.
This algorithm iteratively minimizes the network’s
error by finding the gradient of the error surface in
weight-space and adjusting the weights in the
opposite direction (gradient-descent technique).
Supervised Learning in the BPN
Gradient-descent example: Finding the absolute
minimum of a one-dimensional error function f(x):
f(x)
slope: f’(x0)
x0
x1 = x0 - f’(x0)
x
Repeat this iteratively until for some xi, f’(xi) is
sufficiently close to 0.
Supervised Learning in the BPN
Gradients of two-dimensional functions:
The two-dimensional function in the left diagram is represented by contour
lines in the right diagram, where arrows indicate the gradient of the function
at different locations. Obviously, the gradient is always pointing in the
direction of the steepest increase of the function. In order to find the
function’s minimum, we should always move against the gradient.
Supervised Learning in the BPN
In the BPN, learning is performed as follows:
1. Randomly select a vector pair (xp, yp) from the
training set and call it (x, y).
2. Use x as input to the BPN and successively
compute the outputs of all neurons in the network
(bottom-up) until you get the network output o.
3. Compute the error opk, for the pattern p across all
K output layer units by using the formula:
o
 pk
 ( yk  ok ) f ' (netko )
Supervised Learning in the BPN
4. Compute the error hpj, for all J hidden layer units
by using the formula:
K
o
 pjh  f ' (net kh )  pk
wkj
k 1
5. Update the connection-weight values to the hidden
layer by using the following equation:
w ji (t  1)  w ji (t )   pjh xi
Supervised Learning in the BPN
6. Update the connection-weight values to the output
layer by using the following equation:
wkj (t  1)  wkj (t )   f (net )
o
pk
h
j
Repeat steps 1 to 6 for all vector pairs in the training
set; this is called a training epoch.
Run as many epochs as required to reduce the
network error E to fall below a threshold :
P
K
E   ( )
p 1 k 1
o 2
pk
Supervised Learning in the BPN
The only thing that we need to know before we can
start our network is the derivative of our sigmoid
function, for example, f’(netk) for the output neurons:
1
f (net k ) 
1  e  net k
f (net k )
f ' (net k ) 
 ok (1  ok )
net k
Supervised Learning in the BPN
Now our BPN is ready to go!
If we choose the type and number of neurons in our
network appropriately, after training the network
should show the following behavior:
• If we input any of the training vectors, the network
should yield the expected output vector (with some
margin of error).
• If we input a vector that the network has never
“seen” before, it should be able to generalize and
yield a plausible output vector based on its
knowledge about similar input vectors.
Self-Organizing Maps (Kohonen Maps)
In the BPN, we used supervised learning.
This is not biologically plausible: In a biological
system, there is no external “teacher” who
manipulates the network’s weights from outside the
network.
Biologically more adequate: unsupervised learning.
We will study Self-Organizing Maps (SOMs) as
examples for unsupervised learning (Kohonen, 1980).
Self-Organizing Maps (Kohonen Maps)
In the human cortex, multi-dimensional sensory input
spaces (e.g., visual input, tactile input) are
represented by two-dimensional maps.
The projection from sensory inputs onto such maps is
topology conserving.
This means that neighboring areas in these maps
represent neighboring areas in the sensory input
space.
For example, neighboring areas in the sensory cortex
are responsible for the arm and hand regions.
Self-Organizing Maps (Kohonen Maps)
Such topology-conserving mapping can be achieved
by SOMs:
• Two layers: input layer and output (map) layer
• Input and output layers are completely connected.
• Output neurons are interconnected within a defined
neighborhood.
• A topology (neighborhood relation) is defined on
the output layer.
Self-Organizing Maps (Kohonen Maps)
Common output-layer structures:
One-dimensional
(completely interconnected)
i
i
Two-dimensional
(connections omitted,
only neighborhood
relations shown [green])
Neighborhood of neuron i
Self-Organizing Maps (Kohonen Maps)
A neighborhood function (i, k) indicates how closely
neurons i and k in the output layer are connected to
each other.
Usually, a Gaussian function on the distance between
the two neurons in the layer is used:

position of i
position of k
Unsupervised Learning in SOMs
For n-dimensional input space and m output neurons:
(1) Choose random weight vector wi for neuron i, i = 1, ..., m
(2) Choose random input x
(3) Determine winner neuron k:
||wk – x|| = mini ||wi – x|| (Euclidean distance)
(4) Update all weight vectors of all neurons i in the
neighborhood of neuron k: wi := wi + ·(i, k)·(x – wi)
(wi is shifted towards x)
(5) If convergence criterion met, STOP.
Otherwise, narrow neighborhood function  and learning
parameter  and go to (2).
Unsupervised Learning in SOMs
Example I: Learning a one-dimensional representation
of a two-dimensional (triangular) input space:
0
20
1000
10000
100
25000
Unsupervised Learning in SOMs
Example II: Learning a two-dimensional representation
of a two-dimensional (square) input space:
Unsupervised Learning in SOMs
Example III:
Learning a twodimensional
mapping of texture
images
The Hopfield Network
The Hopfield model is a single-layered recurrent
network.
It is usually initialized with appropriate weights
instead of being trained.
The network structure looks as follows:
X1
X2
…
XN
The Hopfield Network
We will focus on the discrete Hopfield model,
because its mathematical description is more
straightforward.
In the discrete model, the output of each neuron is
either 1 or –1.
In its simplest form, the output function is the sign
function, which yields 1 for arguments  0 and –1
otherwise.
The Hopfield Network
For input-output pairs (x1, y1), (x2, y2), …, (xP, yP), we
can initialize the weights in the following way (like
associative memory):
1 P
W   y p x tp
N p 1
This is identical to the following formula:
1 P (i ) ( j )
wij   y p x p
N p 1
where xp(j) is the j-th component of vector xp, and
yp(i) is the i-th component of vector yp.
The Hopfield Network
In the discrete version of the model, each component
of an input or output vector can only assume the
values 1 or –1.
The output of a neuron i at time t is then computed
according to the following formula:
 N

oi (t )  sgn   wij o j (t  1) 
 j 1

This recursion can be performed over and over again.
In some network variants, external input is added to
the internal, recurrent one.
The Hopfield Network
Usually, the vectors xp are not orthonormal, so it is
not guaranteed that whenever we input some pattern
xp, the output will be yp, but it will be a pattern similar
to yp.
Since the Hopfield network is recurrent, its behavior
depends on its previous state and in the general
case is difficult to predict.
However, what happens if we initialize the weights
with a set of patterns so that each pattern is being
associated with itself, (x1, x1), (x2, x2), …, (xP, xP)?
The Hopfield Network
This initialization is performed according to the
following equation:
1 P (i ) ( j )
wij   x p x p
N p 1
You see that the weight matrix is symmetrical, i.e.,
wij = wji.
We also demand that wii = 0, in which case the
network shows an interesting behavior.
It can be mathematically proven that under these
conditions the network will reach a stable activation
state within an finite number of iterations.
The Hopfield Network
And what does such a stable state look like?
The network associates input patterns with
themselves, which means that in each iteration, the
activation pattern will be drawn towards one of those
patterns.
After converging, the network will most likely present
one of the patterns that it was initialized with.
Therefore, Hopfield networks can be used to restore
incomplete or noisy input patterns.
The Hopfield Network
Example: Image reconstruction (Ritter, Schulten,
Martinetz 1990)
A 2020 discrete Hopfield network was trained with 20 input
patterns, including the one shown in the left figure and 19
random patterns as the one on the right.
The Hopfield Network
After providing only one fourth of the “face” image as
initial input, the network is able to perfectly reconstruct
that image within only two iterations.
The Hopfield Network
Adding noise by changing each pixel with a probability
p = 0.3 does not impair the network’s performance.
After two steps the image is perfectly reconstructed.
The Hopfield Network
However, for noise created by p = 0.4, the network is
unable the original image.
Instead, it converges against one of the 19 random
patterns.
The Hopfield Network
The Hopfield model constitutes an interesting neural
approach to identifying partially occluded objects
and objects in noisy images.
These are among the toughest problems in
computer vision.