Arithmetic - Ohio University

Download Report

Transcript Arithmetic - Ohio University

Computational Intelligence
Self-organization and
error correction
Based on a course taught by
Prof. Randall O'Reilly
University of Colorado and
Prof. Włodzisława Ducha
Uniwersytet Mikołaja Kopernika
Janusz A. Starzyk
EE141
1
Learning: types
1.
2.
How should an ideal learning system look?
How does a human being learn?
Detectors (neurons) can change
local parameters but we want to
achieve a change in the
functioning of the entire
information processing network.
We will consider two types of
learning, requiring other
mechanisms:



EE141
Learning an internal model of the environment (spontaneous).
Learning a task set by the network (supervised).
Combination of both.
2
Learning operations
One output neuron can't learn much.
Operation = sensomotor transformation, perception-action.
Stimulation and selection of the correct operation, interpretation,
expectations, plan…
What type of learning does this allow us to explain? What types
of learning require additional mechanisms?
EE141
3
Simulation
Select self_org.proj.gz, in Chapter 4.8.1
20 hidden neurons, kWTA; the network will learn interesting features.
4
EE141
Sensomotor maps
Self-organization is modeled in many ways; simple models are
helpful in explaining qualitative features of topographic maps.
Fig. from:
P.S. Churchland,
T.J. Sejnowski,
The computational
brain.
MIT Press, 1992
5
EE141
Motor and somatosensory maps
This is a very simplified image, in reality most neurons are multimodal,
neurons in the motor cortex react to sensory, aural, and visual
impulses (mirror neurons)
- many specialized circuits of perception-action-naming.
6
EE141
Finger representation: plasticity
Hand
Before
stimulation
Face
After
stimulation
Sensory fields in the cortex expand after
stimulation
– local dominances resulting from activation
Plasticity of cortical areas to sensory-motor
representations
EE141
7
Simplest models
SOM or SOFM (Self-Organized Feature Mapping) – self-organizing
feature map, one of the most popular models.
How can topographical maps be created in the brain?
Local neural connections create strong groups
interacting with each other, weaker across greater
distances and inhibiting nearby groups.
History:
von der Malsburg and Willshaw (1976), competitive learning,
Hebbian learning with "Mexican hat" potential, mainly visual system
Amari (1980) – layered models of neural tissue.
Kohonen (1981) – simplification without inhibition; only two
essential variables: competition and cooperation.
8
EE141
SOM: idea
Data: vectors XT = (X1, ... Xd) from d-dimensional space.
A net of nodes with local processors (neurons) in each node.
Local processor # j has d adaptive parameters W(j).
Goal: adjust the W(j) parameters to model the clusters in p-ni X.
EE141
9
Training SOM
o
x=dane
o=pozycje wag
neuronów
x
o
o
o
o x
o
o
x
o
xo
N-wymiarowa
przestrzeń danych
o
o
o
wagi wskazują
na punkty w N-D
siatka neuronów
w 2-D
Fritzke's algorithm Growing Neural Gas (GNG)
Demonstrations of competitive GNG learning in Java:
http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html
10
EE141
SOM algorithm: competition
Nodes should calculate the similarity of input data to their
parameters.
Input vector X is compared to node parameters W.
Similar = minimal distance or maximal scalar product.
Competition: find node j=c with W most similar to X.
XW
( j)

 X
i
 Wi
( j)

2
i
c  arg min X  W ( j )
j
Node number c is most similar to the input vector X
It is a winner, and it will learn to be more similar to X, hence this is
a “competitive learning” procedure.
Brain: those neurons that react to some signals activate and learn.
11
EE141
SOM algorithm: cooperation
Cooperation: nodes on a grid close to the winner c should behave
similarly. Define the “neighborhood function” O(c):

h( r, rc , t )  h0 (t )exp  r  rc /  c 2 (t )
2

t – iteration number (or time);
rc – position of the winning node c (in physical space, usually 2D).
||r-rc|| – distance from the winning node, scaled by c(t).
h0(t) – slowly decreasing multiplicative factor
The neighborhood function determines how strongly the
parameters of the winning node and nodes in its neighborhood will
be changed, making them more similar to data X
12
EE141
SOM algorithm: dynamics
Adaptation rule: take the winner node c, and those in its
neighborhood O(rc), change their parameters making them more
similar to the data X
For i  O  c 
W( i )  t  1  W( i )  t   h  ri , rc ,t   X  t   W ( i )  t  
Randomly select new sample vector X, and repeat.
Decrease h0(t) slowly until there will be no changes.
Result:
 W(i) ≈ the center of local clusters in the X feature space
 Nodes in the neighborhood point to adjacent areas in X space
13
EE141
Maps and distortions
Initial distortions may slowly disappear or may get frozen ... giving the user a
14
completely distorted view of reality.
EE141
Demonstrations with the help of GNG
Growing Self-Organizing Networks demo
http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html
Parameters of the SOM program:
t – iterations
e(t) = ei (ef / ei )t/tmax
(t) = i (f / i )t/tmax
specifies a step in learning
specifies the size of the neighborhood

h(r , rc , t , e , )  e (t )exp  r  rc /  2 (t )
2

Maps 1x30 show the formation of Peano's curves.
We can try to reconstruct Penfield's maps.
15
EE141
Mapping kWTA CPCA
Hebbian learning
finds relationship
between input
and output.
Example:
pat_assoc.proj.gz
in Chapter 5,
described in 5.2
Simulations for 3
tasks, from easy to
impossible.
16
EE141
Derivative based Hebbian learning
Hebb's rule:
Dwkj = e (xk -wkj) yj
will be replaced by derivative based learning based on time domain
correlation of firing between neurons.
This can be implemented in many ways;
For the signal normalization purpose let us assume that the
maximum rate of change between two consecutive time frames is 1.
Let us represent derivative of the signal x(t) change by dx(t).
Assume that the neuron responds to signal changes instead of signal
activation
y j   dxk w j
k
17
EE141
Derivative based Hebbian learning
Define product of derivatives
pdkj(t)=dxk(t)*dyj(t).
Derivative based weight adjustment will be calculated as follows:
Feedforward weights are adjusted as
Dwkj = e0 (pdki (t) - wkj) |pdki (t)|
and feedback weight are adjusted as
Dwjk = e0 (pdki (t) - wjk) |pdki (t)|
This adjustment gives symmetrical feedforward and feedback weights.
xk(t)
yj(t)
pdkj(t)
t
18
EE141
Derivative based Hebbian learning
Asymmetrical weight can be obtained by using product of shifted
derivative values
pdkj(+)=dxk(t)*dyj(t+1) and
pdkj(-)=dxk(t)*dyj(t-1).
Derivative based weight adjustment will be calculated as follows:
Feedforward weights are adjusted as
Dwkj = e1 (pdki (+) - wkj) |pdki (+)|
and feedback weight are adjusted as
yj
Dwjk = e1 (pdki (-) - wjk) |pdki (-)|
wjk
wkj
x1
x2
xk
19
EE141
Derivative based Hebbian learning
yj
Feedforward weights are adjusted as
Dwkj = e1 (pdki (+) - wkj) |pdki (+)|
wkj
x1
x2
xk
xk(t)
yj(t)
yj(t+1)
pdkj(+)
t
20
EE141
Derivative based Hebbian learning
yj
and feedback weight are adjusted as
Dwjk = e1 (pdki (-) - wjk) |pdki (-)|
x1
x2
wjk
xk
xk(t)
yj(t)
yj(t-1)
pdkj(-)
t
21
EE141
Task learning
Unfortunately Hebbian learning won't suffice to learn arbitrary
relationship between input and output.
This can be done by learning based on error correction.
Where do goals come
from? From the "teacher,"
or confronting the
predictions of the internal
model.
22
EE141
The Delta rule
Idea:
weights wik should be revised
so that they change strongly for
large errors and not undergo a
change if there is no error, so
Dwik ~ ||tk – ok|| si
Change is also proportional to
the size of the activation by
input si
Phase + is the presentation of
the goal, phase – is the result
of the network.
This is the delta rule.
23
EE141
Credit Assignment
Credit/blame assignment
Dwik =e ||tk – ok|| si
The error is local, for image k.
If a large error formed and output ok is significantly smaller than
expected then input neurons with a large activation will make the error
even larger. If output ok is significantly larger than expected then input
neurons with a large activation will decrease it significantly.
Eg. input si is the number of calories in different foods, output is a
moderate weight; if it's too big then we must decrease high-calorie
weights (food), if it's too small then we must increase them.
Representations created by an error-minimalization process are the
result of the best assignment of credit to many units, and not the
greatest correlation (like in Hebbian models).
24
EE141
Limiting weights
We don't want the weights to change without limits and not accept
negative values.
This is consistent with biological demands which separate inhibitory
and excitatory neurons and have upper weight limits.
The weight change mechanism below, based on the delta rule, ensures
the fulfillment of both restrictions.
Dwik = Dik (1- wik) if Dik >0
Dwik = Dik wik
if Dik <0
where Dik is the weight change resulting from error propagation
The upper limit is biologically justified by
the maximum amount of NT which can
be emitted and the maximum density of
the synapses
EE141
1
weight
This equation limits the weight values to
the 0-1 range.
0
Dwik
Dik
25
Task learning
We want: Hebbian learning and learning using error correction, hidden
units and biologically justified models.
The combination of error correction and correlations can be aligned
with what we know about LTP/LTD
Dwij = e [  xi yj  +   xi yj  ]
Hebbian networks model states of the world but not perception-action.
Error correction can learn mapping. Unfortunately the delta rule is only
good for output units, and not hidden units, because it has to be given
a goal.
Backpropagation of errors can teach hidden units.
But there is no good biological justification for this method…
26
EE141
Simulations
Select:
pat_assoc.proj.gz, in Chapt. 5
Description: Chapt. 5. 5
The delta rule can learn difficult
mappings, at least theoretically...
27
EE141