Transcript PPT
About Assignment #3
Two approaches to backpropagation learning:
1. “Per-pattern” learning:
Update weights after every exemplar presentation.
2. “Per-epoch” (batch-mode) learning:
Update weights after every epoch. During epoch,
compute the sum of required changes for each weight
across all exemplars. After epoch, update each weight
using the respective sum.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
1
About Assignment #3
Per-pattern learning often approaches near-optimal
network error quickly, but may then take longer to
reach the error minimum.
During per-pattern learning, it is important to present
the exemplars in random order.
Reducing the learning rate between epochs usually
leads to better results.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
2
About Assignment #3
Per-epoch learning involves less frequent weight
updates, which makes the initial approach to the error
minimum rather slow.
However, per-epoch learning computes the actual
network error and its gradient for each weight so that
the network can make more informed decisions about
weight updates.
Two of the most effective algorithms that exploit this
information are Quickprop and Rprop.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
3
The Quickprop Learning Algorithm
The assumption underlying Quickprop is that the
network error as a function of each individual weight
can be approximated by a paraboloid.
Based on this assumption, whenever we find that the
gradient for a given weight switched its sign between
successive epochs, we should fit a paraboloid through
these data points and use its minimum as the next
weight value.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
4
The Quickprop Learning Algorithm
Illustration (sorry for the crummy paraboloid):
assumed error function
(paraboloid)
E
slope:
E’(t-1)
E(t-1)
slope:
E’(t)
E(t)
w(t-1)
w(t)
November 16, 2010
w(t+1)
w(t-1)
Neural Networks
Lecture 17: Self-Organizing Maps
w
5
The Quickprop Learning Algorithm
Newton’s method:
E aw bw c
2
E (t )
E ' (t ) 2aw(t ) b
w
E (t 1)
E ' (t 1) 2aw(t 1) b
w
E ' (t ) E ' (t 1) E ' (t ) E ' (t 1)
2a
w(t ) w(t 1)
w(t 1)
( E ' (t ) E ' (t 1)) w(t )
b E ' (t )
w(t 1)
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
6
The Quickprop Learning Algorithm
For the minimum of E we must have:
E (t 1)
2aw(t 1) b 0
w
b
w(t 1)
2a
w(t 1) [ E ' (t ) E ' (t 1)]w(t ) E ' (t )w(t 1)
w(t 1)
w(t 1)
E ' (t ) E ' (t 1)
E ' (t )w(t 1)
w(t 1) w(t )
E ' (t 1) E ' (t )
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
7
The Quickprop Learning Algorithm
Notice that this method cannot be applied if the error
gradient has not decreased in magnitude and has not
changed its sign at the preceding time step.
In that case, we would ascent in the error function or
make an infinitely large weight modification.
In most cases, Quickprop converges several times
faster than standard backpropagation learning.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
8
Resilient Backpropagation (Rprop)
The Rprop algorithm takes a very different approach
to improving backpropagation as compared to
Quickprop.
Instead of making more use of gradient information for
better weight updates, Rprop only uses the sign of
the gradient, because its size can be a poor and noisy
estimator of required weight updates.
Furthermore, Rprop assumes that different weights
need different step sizes for updates, which vary
throughout the learning process.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
9
Resilient Backpropagation (Rprop)
The basic idea is that if the error gradient for a given
weight wij had the same sign in two consecutive epochs,
we increase its step size ij, because the weight’s optimal
value may be far away.
If, on the other hand, the sign switched, we decrease the
step size.
Weights are always changed by adding or subtracting the
current step size, regardless of the absolute value of the
gradient.
This way we do not “get stuck” with extreme weights that
are hard to change because of the shallow slope in the
sigmoid function.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
10
Resilient Backpropagation (Rprop)
Formally, the step size update rules are:
( t 1)
( t 1)
E
ij , if
w
ij
( t 1)
E
(t )
( t 1)
ij ij , if
wij
(ijt 1) , otherwise
E
0
wij
E (t )
0
wij
(t )
Empirically, best results were obtained with initial step
sizes of 0.1, +=1.2, -=1.2, max=50, and min=10-6.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
11
Resilient Backpropagation (Rprop)
Weight updates are then performed as follows:
(t )
ij
w
E
, if
0
wij
(t )
E
(t )
ij , if
0
wij
0 , otherwise
(t )
(t )
ij
It is important to remember that, like in Quickprop, in
Rprop the gradient needs to be computed across all
samples (per-epoch learning).
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
12
Resilient Backpropagation (Rprop)
The performance of Rprop is comparable to
Quickprop; it also considerably accelerates
backpropagation learning.
Compared to both the standard backpropagation
algorithm and Quickprop, Rprop has one advantage:
Rprop does not require the user to estimate or
empirically determine a step size parameter and its
change over time.
Rprop will determine appropriate step size values by
itself and can thus be applied “as is” to a variety of
problems without significant loss of efficiency.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
13
The Counterpropagation Network
Let us look at the CPN structure again.
How can this network determine its hidden-layer winner unit?
O
11
w
Y1
w13O
O
w21
O
w12
H
12
H1 w
O
w22
H
w21
w11H
Output
layer
O
Y2 w23
H2
H
31
w
H
w22
H3
Hidden
layer
H
w32
X1
November 16, 2010
X2
Additional
connections!
Neural Networks
Lecture 17: Self-Organizing Maps
Input
layer
14
The Solution: Maxnet
A maxnet is a recurrent, one-layer network that uses
competition to determine which of its nodes has the greatest
initial input value.
All pairs of nodes have inhibitory connections with the same
weight -, where typically 1/(# nodes).
In addition, each node has a self-excitatory connection to itself,
whose weight is typically 1.
The nodes update their net input and their output by the
following equations:
net wi xi
i
f (net ) max( 0, net )
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
15
Maxnet
All nodes update their output simultaneously.
With each iteration, the neurons’ activations will
decrease until only one neuron remains active.
This is the “winner” neuron that had the greatest initial
input.
Maxnet is a biologically plausible implementation of a
maximum-finding function.
In parallel hardware, it can be more efficient than a
corresponding serial function.
We can add maxnet connections to the hidden layer
of a CPN to find the winner neuron.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
16
Maxnet Example
Example of a Maxnet with five neurons and = 1, = 0.2:
0.5
0
0.24
0.07
0.9
0
0.24
0.07
0.9
0
0.24
0.07
0.9
0
0.36
0.22
0.17
1
November 16, 2010
Winner!
Neural Networks
Lecture 17: Self-Organizing Maps
17
Self-Organizing Maps (Kohonen Maps)
As you may remember, the counterpropagation
network employs a combination of supervised and
unsupervised learning.
We will now study Self-Organizing Maps (SOMs) as
examples for completely unsupervised learning
(Kohonen, 1980).
This type of artificial neural network is particularly
similar to biological systems (as far as we understand
them).
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
18
Self-Organizing Maps (Kohonen Maps)
In the human cortex, multi-dimensional sensory input
spaces (e.g., visual input, tactile input) are
represented by two-dimensional maps.
The projection from sensory inputs onto such maps is
topology conserving.
This means that neighboring areas in these maps
represent neighboring areas in the sensory input
space.
For example, neighboring areas in the sensory cortex
are responsible for the arm and hand regions.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
19
Self-Organizing Maps (Kohonen Maps)
Such topology-conserving mapping can be achieved
by SOMs:
• Two layers: input layer and output (map) layer
• Input and output layers are completely connected.
• Output neurons are interconnected within a defined
neighborhood.
• A topology (neighborhood relation) is defined on
the output layer.
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
20
Self-Organizing Maps (Kohonen Maps)
Network structure:
output vector o
O1
O2
x1
O3
x2
… Om
… xn
input vector x
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
21
Self-Organizing Maps (Kohonen Maps)
Common output-layer structures:
One-dimensional
(completely interconnected)
i
i
Two-dimensional
(connections omitted,
only neighborhood
relations shown [green])
Neighborhood of neuron i
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
22
Self-Organizing Maps (Kohonen Maps)
A neighborhood function (i, k) indicates how closely
neurons i and k in the output layer are connected to
each other.
Usually, a Gaussian function on the distance between
the two neurons in the layer is used:
position of i
November 16, 2010
position of k
Neural Networks
Lecture 17: Self-Organizing Maps
23
Unsupervised Learning in SOMs
For n-dimensional input space and m output neurons:
(1) Choose random weight vector wi for neuron i, i = 1, ..., m
(2) Choose random input x
(3) Determine winner neuron k:
||wk – x|| = mini ||wi – x|| (Euclidean distance)
(4) Update all weight vectors of all neurons i in the
neighborhood of neuron k: wi := wi + ·(i, k)·(x – wi)
(wi is shifted towards x)
(5) If convergence criterion met, STOP.
Otherwise, narrow neighborhood function and learning
parameter and go to (2).
November 16, 2010
Neural Networks
Lecture 17: Self-Organizing Maps
24