learning rule
Download
Report
Transcript learning rule
Thomas J. Deerinck Digital Image Gallery
Purkinje Neurons and Glia of Rat Cerebellum. Specimen: Double fluorescent labeled thin section
Technique: Two-Photon Microscopy
Adaptive Filter Model
Model of Marr-Albus type:
• mossy fibre input analysed
by granule cell layer
• re-synthesised by Purkinje
cells
output given by weighted sum
z (t ) wi pi (t ) wi Gi [ y (t )]
PC implements linear-in-weights
adaptive filter
C wi Gi
Learning Rule
Weights are adapted during
complex spikes generated by
input on climbing fibre
wi e(t ) pi (t )
• learning rule models both LTP
and LTD at synapse
• identical to LMS rule of
adaptive control theory
• learning stops when CF and
PF inputs are uncorrelated
- “decorrelation-control”
Motor Error Problem: 3D-VOR
uncalibrated VOR
produces retinal slip
ehorizontal
sensory
error
evertical
etorsional
? motor
error
Forward Architecture
Forward Architecture
Open loop - full connectivity
Schematic VOR: Cerebellar output
supplies corrections to 6 motor
commands to muscles
– cells controlling e.g. superior
oblique muscle (SO) must
know their contribution to error
– so-called ‘motor error’
– it is a combination of different
components of sensory error
– use of motor error requires a
complex ‘reference’ structure
Motor Error Problem
Open loop forms basis of previous computational accounts
– learning requires unavailable motor error signal
– hypothesised neural ‘reference’ structure recovers motor error
from sensory signals
– climbing fibres carry motor error signal
but
– reference structure is an approximate plant inverse
– has similar complexity to plant compensation problem it solves
& gets worse for non-linear problems
– uncertain experimental evidence for motor error signal on
climbing fibres
Recurrent Architecture
For mathematical details see: Porrill, Dean & Stone, 2004
Recurrent Architecture: VOR
In recurrent loop cerebellar
output corrects control inut:
– sensor error is a suitable
teaching signal
– no need for reference
structures to compute motor
error
– decorrelates sensor error from
motor command
– greatly simplifies modular,
task-based, control
Dean, Porrill & Stone, Proc Roy Soc B, 2002
Analysis of Recurrent Loop
1
* 2
V ( wi wi )
2
dV
e2 0
dt
sum-square synaptic weight
error is a Lyapounov function
(decreases during learning):
gives stability proof
1
E e2
2
E
wi
wi
near solution learning rule is
gradient descent on mean square
output error
(generalisation of output error learning
property of all-pole adaptive filters; e.g.
Sastry, Adaptive Control, 1989)
Connections to Anatomy
Cerebellum is often embedded in a closed loop circuit (Eccles called
this the ‘dynamic loop’).
Recent investigations have confirmed that these loops are ubiquitous
– “multiple closed loop circuits represent a fundamental architectural
feature of cerebrocerebellar interactions”
and highly specific
– “regions of the cerebellar cortex that receive input from M1 are the
same as those that project to M1”
a challenge:
– “A common closed-loop architecture describes the organisation of
cerebrocerebellar interactions ... A challenge for future studies is to
determine the computations that are supported by this architecture”
Kelly RM & Strick PL, Journal of Neuroscience (2003)
Effect of Delay
There is up to 100ms delay in calculating retinal
slip
– well known that this restricts useful frequency
range for error feedback
– delayed feedback causes instability
neuroscience ‘folk theorem’ says we can sidestep
this
– just use retinal slip as a teaching signal for an
adaptive controller
not so well known that the resulting learning rule
can be unstable
No slip delay: plant compensation
Error decay during learning
Bode plot (gain of VOR)
5
pre-training
post-training
0.5
Gain (dB)
RMS retinal slip (deg/s)
0.6
0.4
0.3
0
-5
0.2
-10
0.1
-1
10
0
10
20
30
40
50
60
Training batch no
70
80
90
100
0
10
1
10
Frequency (Hz)
• Good plant compensation with a single site of
plasticity - parallel-fibre/Purkinje cell
• but no retinal slip delay
Slip Delay 100 ms: Unstable Learning
Error decay during learning
Bode plot (gain of VOR)
10
2
exact
pre-training
post-training
8
6
1.6
4
1.4
2
Gain (dB)
RMS retinal slip (deg/s)
1.8
1.2
1
0.8
0
-2
-4
0.6
-6
0.4
-8
-10
0.2
0
-12
10
20
30
40
Training batch no
50
60
-1
10
0
10
1
10
Frequency (Hz)
Learning becomes unstable, because the sign of the
correlation with the delayed teaching signal reverses
when the frequency reaches 2.5 Hz
Multiple Sites of Plasticity
Experimental data on VOR adaptation suggests two sites
of plasticity
– in the cerebellum
– but also in brainstem nuclei to which the cerebellum projects
If cerebellum is so powerful why is a second site needed?
- Cerebellum cannot learn high frequency gain because of retinal
slip delay
- can output of cerebellum act as surrogate training signal for
structures to which it projects?
this hypothesis suggests novel learning rules for the
brainstem neurons involved in VOR
Proposed Brainstem Plasticity Rule
cerebellar
cortex
z(t)
head
velocity
-
x(t)
+
g
B
The correlation between cerebellar output z(t) and the head
velocity signal x(t) in the range 1.5 to 2.5 Hz is used to adjust the
intrinsic gain g of the brainstem
g = -b<x(t)z(t)>BP
BP = bandpassed above 2 Hz
Brainstem Plasticity Improves Learning
Error decay during learning
Bode plot (gain of VOR)
5
exact
pre-training
post-training
0.5
0
0.4
Gain (dB)
RMS retinal slip (deg/s)
0.6
0.3
-5
0.2
0.1
0
100
200
300
400
500
600
700
Training batch no
800
900 1000
-10
-1
10
0
10
Frequency (Hz)
• cerebellar cortex learns average gain just below cut-off
• learning rule then transfers cerebellar gain to brainstem
– then gain is approx correct at higher frequencies
– can use ‘eligibility trace’ to improve on this
1
10
Electrophysiology
Use in vitro recording techniques from
neurons in MVN slices to investigate
plastic changes induced in FTN
neurons produced by correlated
changes in
– inhibitory cerebellar and
– excitatory vestibular inputs.
Experimental programme has three
stages
Electrode patched onto MVN neuron.
–
–
Potential across membrane can be
measured
Input current can be manipulated to mimic
synaptic input
1. identification of relevant neurons
2. use of conjunctive stimulation to
characterise learning rules
3. investigation of the role of PC
inputs by mimicking the action of
their GABA neurotransmitters.
Non-conjunctive stimulation
Intrinsic excitability of the MVN cells can be
regulated by patterns of inhibitory inputs
applied to the cells.
A intrinsic excitability of MVN cells is increased
following intermittent inhibitory hyperpolarising
current injections.
The increase in excitability is long-lasting, and can
be mimicked by drugs that block specific potassium
channels (data not shown)
Drugs which open potassium channels reduce the
intrinsic excitability of MVN neurons (data not
shown)
Regulation of potassium channel function is one
important way in which the firing rate gain of the
MVN neurons may be regulated
B In contrast to the effects of hyperpolarising
current inputs, depolarising current pulses applied
in the same patterns do not affect the intrinsic
excitability of MVN neurons (Fig.2B).
Robotics
Algorithms are currently being transferred to neuro-controller
implemented in FPGA hardware
– particularly suitable for implementing real-time distributed algorithms
Mechatronic System for Algorithm Evaluation
Flexible Hardware/Software Solution
• Gimbal mounted 3dof camera
provides near human performance
• custom built 3 degree of freedom
head movement simulator (not
shown) allows us to test algorithm
performance
• three MEMS gyroscopes emulate
the vestibular system
• visual feedback processing
performed on neuro-controller
Top: gimbal mounted camera with
driver boards for roll/pitch/yaw control
Right: BenNUEY stack, on top are two
Virtex 11 2V8000 FPGA’s with their heat
sinks
Left: 7x7mm 1axis gyroscope
Next Steps
• Continue theoretical studies of algorithm
performance
• extend modelling to other components of VOR
circuitry
• conjunctive stimulation experiments
• hardware and software performance evaluation
• algorithm performance evaluation