14. Development and Plasticity
Download
Report
Transcript 14. Development and Plasticity
10. Supervised learning and
rewards systems
Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002.
Lecture Notes on Brain and Computation
Byoung-Tak Zhang
Biointelligence Laboratory
School of Computer Science and Engineering
Graduate Programs in Cognitive Science, Brain Science and Bioinformatics
Brain-Mind-Behavior Concentration Program
Seoul National University
E-mail: [email protected]
This material is available online at http://bi.snu.ac.kr/
1
Outline
10.1
10.2
10.3
10.4
Motor learning and control
The delta rule
Generalized delta rules
Reward learning
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
2
10.1 Motor learning and control
Act o a large number of training data without the intention of
storing all the specific examples
The learning of motor skills, motor control
Important for the survival of a species
Ex) Catching a ball, play the piano, etc
The brain must be able to direct the control system
Visual guidance
Arm movements with visual signals
Commonly able to adapt to the changed environment within
only a few additional trials
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
3
10.1.1 Feedback controller
How limb movements could be controlled by the nervous
system
Feedback control
Fig. 10.1 Negative feedback control and the elements of a standard control system.
How to find and implement an appropriate and accurate motor
command generator
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
4
10.1.2 Forward controller
Refined schemes for motor control with slow sensory
feedback
Forward models
the dynamic of the controlled object and the behavior of the
sensory system
Fig. 10.2 Forward model controller
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
5
10.1.3 Inverse model controller
Refined schemes for motor control with slow sensory
feedback
Inverse model controller
Incorporated as side-loop to the standard feedback controller,
learns to correct the computation of the motor command
generator
Fig. 10.3 Inverse model controller
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
6
10.1.4 The cerebellum and motor control
Adaptive controllers are realized in the brain and are vital for
our survival
Fig. 10.4 Schematic illustration
of some connectivity patterns in
the cerebellum. Note that the
output of the cerebellum is
provided by Purkinje neurons
that make inhibitory synapses.
Climbing fibers specific for
each Purkinje neuron and are
tightly interwoven with their
dendritic tree.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
7
10.2 The delta rule
Forward and inverse models can be implemented by feedforward mapping networks
How such mapping networks can be trained
To minimize the mean difference between the output of a feedforward mapping network and a desired state provided by a
teacher
Object function or cost function
Measures the distance between the actual output and the
desired output, E
The mean square error (MSE)
routi is actual output
yi is the desired output
1
E riout yi
2 i
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
2
(10.1)
8
10.2.1 Gradient descent
Minimize the error function of a single-layer mapping network
By changing the weight values
dE
wij k
dwij
k, learning rate
The gradient of the error function
E 1
wij 2 wij
( g ( w r
in
ij j
i
) yi ) 2 f ' (hi )(( wij rjin ) yi )rjin (10.3)
j
j
f ( g ( x)) f ( g ) g ( x)
x
g
x
(10.2)
Delta rule wij k ( yi riout )rjin
(10.4)
(10.5)
Fig. 10.5 Illustration of error
minimization with a gradient descent
method on a one-dimensional error
surface E(w).
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
9
10.2.2 Batch versus online algorithm
Batch algorithm versus Online learning algorithm
1. Initialize weights to random values
2. Apply a sample patterm to the input nodes
ri0 riin iin
3. Calculate rate of the output nodes
riout g ( j wij r jin )
4. Compute the delta term for the output layer
i g ' (hiout )( iout riout )
5. update the weight matrix by adding the term
Δw k i r jin
6. Repeat steps 2 - 5 until error is sufficient ly small
Table 10.1 Summary of delta-rule algorithm
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10
10.2.3 Supervised learning
The delta learning rule depends on knowledge of the desired
output
Supervised learning
Supplies the network with the desired response
The training signal
The climbing fiber in the cerebellum could very well supply
such an error signal to the purkinje cells
The weight changes still takes the form of a correlation rule
between an error factor
The biological mechanisms underlying synaptic plasticity
Unsupervised learning
Hebbian learning
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
11
10.2.4 Supervised learning in multilayer
networks
Generalize the delta rule to multilayer mapping network
The error-back-propagation algorithms or generalized delta rule
The application of multilayer feed-forward mapping networks
(multilayer perceptrons)
Discuss difficulties in connecting the computational step with
brain processes
Strongly restricted number of hidden nodes to achieve good
generalization
There might not be the need in the brain to train multilayer
mapping networks with supervised learning algorithms with
the generalized delta rule
Single-layer networks can represent complicated function
Expansion recoding
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
12
10.3 Generalized delta rules (1)
The gradient of the MSE error function with respect to the output weights
E
1
out
wij
2 wijout
(r
out
i
yi ) 2
i
1
2 wijout
( f
out
i
( wijout rjh ) yi ) 2 f out' (hih )( wijout rjh yi )rjh iout rjh
j
j
(10.6)
The delta factor
iout f out ' (hih )( wijout r jh yi ) f out ' (hih )( riout yi )
j
(10.7)
The calculation of the gradients with respect to the weights to the hidden layer
E 1
wijh 2 wijh
(riout yi )2
i
1
2 wijh
( f
out
( wijout f h ( whjk rkin )) yi ) 2
i
j
k
(10.8)
The derivative of the output layer
E
h in
rj
i
h
wij
(10.9)
The delta term of the hidden term
ih f IN ' (hiin ) wikout kout
k
(10.10)
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
13
10.3 Generalized delta rules (2)
1. Initialize weights to random values
2. Apply a sample patterm to the input nodes
ri 0 riin iin
3. Propagate input thro ugh the network by calculatin g
the rates of nodes in successive layer l
ril g (hil 1 ) g ( j wijl rjl 1 )
4. Compute the delta term for the output layer
iout g ' (hiout )( iout riout )
5. Back - propagate delta term throgh th e network
δil 1 g'( hil 1 ) j wijl lj
6. Update weight matrix by adding the term
Δwijl k il rjl 1
7.Repeat steps 2 - 7 utill error is sufficient ly small
Table 10.2 Summary of error-back-propagation algorithm
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
14
10.3.1 Biological plausibility
The back-propagation of error signals is probably the most
problematic feature in biological terms
The non-locality of the algorithm in which a neuron has to
gather the back-propagated error from all the other nodes to
which it projects
Synchronization issues
Disadvantages for true parallel processing
The delta signals is also problematic
How a forward propagating phase of signals can be separated
effectively from the back-propagation phase of the error
signals
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
15
10.3.2 Advanced algorithms
The basic error-back-propagation algorithm
Convergence performance problem
The learning in the form of statistical learning theories
Improvements over the basic algorithm
Initial conditions
Different error functions
Various acceleration techniques
Hybrid methods
The limitation of the basic error-back-propagation algorithm
Alternative learning strategies
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
16
10.3.3 Momentum method and adaptive
learning rate
The basic gradient descent method
Typically find an initial phase
Followed by a phase of very slow convergence
A shallow part of the error function
Momentum term
Remembers the changes of the weight in the previous time step
E
wij (t 1) k
wij (t )
wij
(10.11)
The momentum term has the effect of biasing the direction of
the new update vector towards the previous direction
To increase the learning rate
when the gradient become small
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
17
10.3.4 Different error functions
Shallow areas in the error function depend on the particular
choice of the error function
Entropic error function
1 yi
1 yi
1
E [(1 yi ) log
(1 yi ) log
]
out
out
2 ,i
1 ri
1 ri
(10.12)
A proper measure for the information content (or entropy) of
the actual output of the multilayer perceptron given the
knowledge of the correct output
It is not always obvious which error functions should be used
A general strategy for choosing the error function can
unfortunately not be given
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
18
1.03.5 High-order gradient methods
The basic line search algorithm of gradient decent is known
for its poor performance with shallow error functions
The minimization of an error function
Many other advanced minimization techniques
Take high-order gradient terms into account
Curvature terms
The curvature of the error surface in the weight change
calculations
The calculation of the inverse of the Hessian matrix
Natural gradient algorithm
Levenberg-marquardt method
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
19
10.3.6 Local minima and simulated
annealing
A general limitation of pure gradient descent methods
A local minimum of the error surface
The system is not able to approach a global minimum of the error
function
Fig. 10.5 Illustration of error
minimization with a gradient descent
method on a one-dimensional error
surface E(w).
Solution
Stochastic processes
Simulated annealing
Add noise to the weight values
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
20
10.3.7 Hybrid methods
A variety of methods utilize the rapid initial convergence of
the gradient descent method and combine it
Global search strategies
After the gradient descent method slows down below an
acceptable level, a new starting point is chosen randomly
Hybrid methods combine the efficient local optimization
capabilities of gradient descent method with the global search
abilities of stochastic processes
Genetic algorithms use similar combinations of deterministic
minimization and stochastic components
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
21
10.4 Reward learning
10.4.1 Classical conditioning and temporal credit
assignment problem
Learning with reward signals
Conditioning
Fig. 10.6 Classical conditioning and temporal credit assignment problem. A
subject is required to associate the ringing of a bell with the pressing of a button
that will open the door to a chamber with some food reward. In the example the
subject has learned to press the left button after the ringing of the bell. This is an
example of a temporal credit assignment problem. It is difficult to devise a
system that is still open to possible other solutions such as a bigger reward
hidden in the right chamber.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
22
10.4.2 Stochastic escape
The experiment another chamber (with rodent)
A larger food reward
Conditioned
Chance to open the left door after the ringing of the bell
If the rodent always stuck to the initial conditioned situation it
would never learn about the existence of the larger food reward
If the rodent is running around randomly in the button chamber
before the bell rings it could still happen that I presses the right
button before running to the left button
The opening right door and the larger food reward
Changes the association of auditory signal to new motor action
Stochastic escape that can balance habit versus novelty
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
23
10.4.3 Reinforcement models
The implementation of a system
Learns from reward signals within neural
architectures
The input to this node represent a certain
input stimulus such as the ringing of the
bell
The node gets activated under the right
conditions and is therefore able to predict
the future reward
P(t ) wi (t )riin (t )
i
Fig. 10.7 (A) Linear predictor node.
(10.13)
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
24
10.4.4 Temporal delta rule
A reward is given at time t + 1
A scalar value r(t + 1)
A temporal version of the delta rule
wi (t ) wi (t 1) (r (t ) P(t 1)) riin (t 1)
(10.14)
Eligibility trace
Node calculate an effective reinforcement
signal r (t ) r (t ) P(t 1) (10.15)
Rescorla-Wagner theory
The model can produce one-step ahead
predictions of a reward signal
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Fig. 10.7 (B) Neural
implementation of temporal
delta rule.
25
10.4.5 Reward chain
Learning in the previous model is restricted to the prediction
of reward in the next time step
The ability to predict future reward at different time steps or
even whole series of reward
V(t), all the future rewards into account, reinforcement value
V (t ) 1r (t 1) 2 r (t 2) 3r (t 3) ...
(10.16)
αi, allow us to specify the weights we give to the reward at
different times
A simple realization of such model
V (t ) r (t 1) r (t 2) 2 r (t 3) ...
(10.17)
0 ≤ γ < 1, αi = γi-1
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
26
10.4.6 Temporal difference learning
Temporal difference learning (advanced
reinforcement learning)
Predict the reinforcement value at time t
correctly
P(t ) r (t 1) r (t 2) 2 r (t 3) ... (10.18)
Predict the correct reinforcement value at
previous time step
P(t 1) r (t ) r (t 1) 2 r (t 2) ...
r (t ) [r (t 1) r (t 2) ...]
(10.19)
So P(t 1) r (t ) P(t ) (10.20)
Minimize the temporal difference error
r (t ) r (t ) P(t ) P(t 1)
(10.21)
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Fig. 10.7 (C) Neural
implementation of temporal
difference learning.
27
10.4.7 Adaptive critic controller
Temporal difference learning is method of learning to predict
future reward contingencies
Adaptive critic
Designed to predict the correct motor command for accurate
future actions
Supervise the motor command generator
Fig. 10.8 Adaptive critic controller.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
28
10.4.8 The basal ganglia in the actor-critic
scheme
Fig. 10.9 (A) Anatomical overview of the connections within the basal ganglia and the major
projections comprising the input and output of the basal ganglia. (B) Organizations within the
basal ganglia are composed of processing pathways within the striosomal and matrix modules
reflecting an architecture that could implement an actor-critic control scheme. C, cerebral cortex;
F, frontal lobe; TH, thalamus; ST, subthalamic nucleus; PD, pallidusl; SPm, spiny neurons in the
matri module; SPs, spiny neurons in the striosomal module; DA, dopaminergic neurons.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
29
10.4.9 Other reward mechanisms in the
brain
The proposed functional role of the basal ganglia
Only one hypothesis mentioned in the literature
Several hypothesis
The details of the biochemical nature of an eligibility trace
Experimental verifications
The origin of reward learning in the brain is still not very
understood
Involve some association of reward contingencies with
specific motor actions in the brain
Amygdala
Orbitofrontal cortex
Dopaminergic neurons
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
30
Conclusion
Motor learning
Feedback, forward, inverse model controller
The delta rule
Gradient descent
Batch algorithm
Online learning
Supervised learning
Generalized delta rule
Acceleration of delta rule
Reward learning
Classical conditioning
Reinforcement learning
Biological mechanisms of reward leanring
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
31