Transcript Hebb Rule
Hebb Rule
• Linear neuron
vw u
T
• Hebb rule
dw
t
w
vu, w t 1 w t vu
dt
w
• Similar to LTP (but not quite…)
Hebb Rule
• Average Hebb rule= correlation rule
dw
w
vu
dt
dw
T
T
w
vu w u u u w u
dt
T
T
u u w uu w Qw
• Q: correlation matrix of u
Hebb Rule
• Hebb rule with threshold= covariance rule
w
dw
v u u
dt
Cw
C u u
wT u u u
u
u
T
u u uT
• C: covariance matrix of u
• Note that <(v-< v >)(u-< u >)> would be unrealistic because it predicts
LTP when both u and v are low
Hebb Rule
•
Main problem with Hebb rule: it’s
unstable… Two solutions:
1. Bounded weights
2. Normalization of either the activity of the
postsynaptic cells or the weights.
BCM rule
• Hebb rule with sliding threshold
dw
w
v v v u
dt
d v
2
v
v v
dt
w v
• BCM rule implements competition because when a synaptic weight
grows, it raises by v2, making more difficult for other weights to grow.
Weight Normalization
• Subtractive Normalization:
Nu
w n w Const.
i 1
i
v n u n
dw
w
vu
, n 11 11
dt
Nu
dwi
1 Nu
w
vui
vuk
dt
N u k 1
v n u n
dn w
w
n vu
dt
N
u
n n
vn u 1
0
Nu
Weight Normalization
• Multiplicative Normalization:
Nu
2
w
i Const.
i 1
dw
w
vu a v 2 w
dt
2
d w
dw
w
w 2w
dt
dt
2
d w
2v 2 1 a w
dt
2
• Norm of the weights converge to 1/a
Hebb Rule
• Convergence properties:
dw
w
Qw
dt
• Use an eigenvector decomposition:
Nu
w t cm t e m
m 1
where em are the eigenvectors of Q
Hebb Rule
e2
e1
l1>l2
Hebb Rule
Nu
dw t
Qw t , w t cm t e m
w
dt
m 1
w
w
w
dcm t e m
dt
dcm t
dt
dcm t
dt
Qcm t e m
e m cm t Qe m
e m cm t lm e m
Equations decouple because
em are the eigenvectors of Q
Hebb Rule
w
w
dcm t
dt
dcm t
dt
e m cm t lm e m
cm t lm
lm t
lm t
cm t exp
cm 0 exp
w 0 em
w
w
Nu
lm t
w t exp
w 0 em em
m 1
w
for large t , w t e1 , v e1 u
Hebb Rule
• The weights line up with first eigenvector and the
postsynaptic activity, v, converges toward the projection of
u onto the first eigenvector (unstable PCA)
Hebb Rule
• Non zero mean distribution: correlation vs
covariance
Hebb Rule
• Limiting weights growth affects the final state
1
First eigenvector: [1,-1]
w2/wmax
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
w 1/wmax
0.8
1
Hebb Rule
• Normalization also affects the final state.
• Ex: multiplicative normalization. In this case,
Hebb rule extracts the first eigenvector but keeps
the norm constant (stable PCA).
Hebb Rule
• Normalization also affects the final state.
• Ex: subtractive normalization.
w
dw
w Qn n
Qw
dt
Nu
if e1 n
e1 Qn n
n Qn n
de1
w
Qe1
Qn
0
dt
Nu
Nu
Hebb Rule
if e1 n
e m n 0, m 1
w
dcm t
dt
cm t e m Qn n
e m Qcm t e m
N
u
cm t e m Qe n
Qcm t e m
N
1
u
cm t e m l e n
Qcm t e m
N
1 1
u
Qcm t e m
Hebb Rule
• The constrain does not affect the other
eigenvector:
lm t
w t w 0 e1 e1 exp
w 0 em em
m 2
w
Nu
• The weights converge to the second
eigenvector (the weights need to be
bounded to guarantee stability…)
Ocular Dominance Column
• One unit with one input from right and left
eyes
v wRu R wLu L
uRuR
Q uu
uRuL
s: same eye
u L u R qs
uLu L qd
qd
qs
d: different eyes
Ocular Dominance Column
Q uu
T
uRuR
uRuL
u L u R qs
uLuL qd
• The eigenvectors are:
e1 1,1 / 2, l1 qs qd
e2 1, 1 / 2, l2 qs qd
qd
qs
Ocular Dominance Column
• Since qd is likely to be positive, qs+qd>qs-qd.
As a result, the weights will converge
toward the first eigenvector which mixes
the right and left eye equally. No ocular
dominance...
e1 1,1 / 2, l1 qs qd
e2 1, 1 / 2, l2 qs qd
Ocular Dominance Column
• To get ocular dominance we need
subtractive normalization.
e1 1,1 / 2, l1 qs qd
e2 1, 1 / 2, l2 qs qd
Ocular Dominance Column
• Note that the weights will be proportional to
e2 or –e2 (i.e. the right and left eye are
equally likely to dominate at the end).
Which one wins depends on the initial
conditions.
e1 1,1 / 2, l1 qs qd
e2 1, 1 / 2, l2 qs qd
Ocular Dominance Column
• Ocular dominance column: network with
multiple output units and lateral
connections.
Ocular Dominance Column
• Simplified model
B
uL
uR
Ocular Dominance Column
• If we use subtractive normalization and no lateral
connections, we’re back to the one cell case.
Ocular dominance is determined by initial
weights, i.e., it is purely stochastic. This is not
what’s observed in V1.
• Lateral weights could help by making sure that
neighboring cells have similar ocular dominance.
Ocular Dominance Column
• Lateral weights are equivalent to
feedforward weights
dvi
vi wiRu R wiLu L + Mv
dt
dv
v w RuR w LuL + Mv
dt
dv
0?
dt
Ocular Dominance Column
• Lateral weights are equivalent to
feedforward weights
vi = wiRuR wiLu L + Mv
v = w RuR w LuL + Mv
v = I M
v = KWu
1
w R
u R
wL
uL
Ocular Dominance Column
dW
w
vu
dt
v = KWu
dW
w
KWuu
dt
dW
w
KWQ
dt
Ocular Dominance Column
• We first project the weight vectors of each
cortical unit (wiR,wiL) onto the eigenvectors
of Q.
dW
w
KWQ
dt
dW
w
KWPΛP 1
dt
dWP
w
KWPΛ
dt
Ocular Dominance Column
• There are two eigenvectors, w+ and w-, with
eigenvalues qs+qd and qs-qd:
w wR wL
w wR wL
WP = w R w L
= w
w
wR wL
Ocular Dominance Column
dWP
w
KWPΛ
dt
dw
dt
qs qd
w
K w w
0
dw
dt
dw
dt K qs qd w
w
K
q
q
w
d
w
s
d
dt
0
qs qd
Ocular Dominance Column
• Ocular dominance column: network with
multiple output units and lateral
connections.
dw
w
qs qd Kw
dt
dw
w
qs qd Kw
dt
Ocular Dominance Column
• Once again we use a subtractive
normalization, which holds w+ constant.
Consequently, the equation for w- is the
only one we need to worry about.
dw
w
qs qd Kw
dt
Ocular Dominance Column
• If the lateral weights are translation
invariant, Kw- is a convolution. This is
easier to solve in the Fourier domain.
dw
w
qs qd Kw
dt
qs qd K ( x) * w
d w k
w
qs qd K k w k
dt
Ocular Dominance Column
d w k
w
qs qd K k w k
dt
qs qd K k t
w k t exp
w k 0
w
• The sine function with the highest Fourier
coefficient (i.e. the fundamental) growth the
fastest.
Ocular Dominance Column
• In other words, the eigenvectors of K are
sine functions and the eigenvalues are the
Fourier coefficients for K.
2m a
ea cos
Nv
m
Ocular Dominance Column
• The dynamics is dominated by the sine
function with the highest Fourier
coefficients, i.e., the fundamental of K(x)
(note that w- is not normalized along the x
dimension).
• This results is an alternation of right and left
columns with a periodicity corresponding to
the frequency of the fundamental of K(x).
Ocular Dominance Column
• If K is a Gaussian kernel, the fundamental is
the DC term and w ends up being constant,
i.e., no ocular dominance columns (one of
the eyes dominate all the cells).
• If K is a mexican hat kernel, w will show
ocular dominance column with the same
frequency as the fundamental of K.
• Not that intuitive anymore…
Ocular Dominance Column
• Simplified model
A
B
0.6
1
0.5
K, e
0.4
~
K
0
0.2
-0.5
-1
-0.6 -0.4 -0.2
0
0.2
0.4 0.6
cortical distance (mm )
0
0
20
40
k
(1/mm)
60
Ocular Dominance Column
• Simplified model: weights matrices for right and
left eyes
WL
W
WR
W
WR - W L
W
W