learning - Ohio University

Download Report

Transcript learning - Ohio University

Cognitive Neuroscience
and Embodied Intelligence
Universal Learning Models
Based on a courses taught by
Prof. Randall O'Reilly, University of Colorado,
Prof. Włodzisław Duch, Uniwersytet Mikołaja Kopernika
and http://wikipedia.org/
http://grey.colorado.edu/CompCogNeuro/index.php/CECN_CU_Boulder_OReilly
http://grey.colorado.edu/CompCogNeuro/index.php/Main_Page
Janusz A. Starzyk
EE141
1
Task learning
We want to combine Hebbian learning and learning using error
correction, hidden units and biologically justified models.
Hebbian networks model states of the world but not perception-action.
Error correction can learn mapping. Unfortunately the delta rule is only
good for output units, and not hidden units, because it has to be given
a goal.
Backpropagation of errors can teach hidden units.
But there is no good biological justification for this method…
The idea of backpropagation is simple but a detailed algorithms
requires many calculations.
Main idea: we're looking for the minimum error function, measuring the
difference between the desired behavior and the behavior realized by
the network.
2
EE141
Error function
E(w) – error function, dependent on all parameters of network w, is the
sum of errors E(X;w) for all images X.
ok(X;w) – values reached on output nr. k network for image X.
tk(X;w) – values desired on output nr. k network for image X.
One image X, one parameter w then:
E  X; w    t  X   o  X; w  
2
Error value f. =0 is not always attainable, the network may not have
enough parameters to learn the desired
behavior, we can only aim for the smallest error.
In the minimum error E(X;w) is for parameter w
for derivative dE(X;w)/dw = 0.
For many parameters we have all derivatives
dE/dwi, or gradient.
EE141
3
Error propagation
The delta rule minimizes error for one neuron, e.g..
the output neuron, which is reached by signals si
Dwik =e ||tk – ok|| si
What signals should we take for hidden
neurons?
First we let signals into the network
calculating activation h,
output signals from neurons h, through all
layers, to the outputs ok (forward step).
We calculate the errors dk = (tk-ok),
and corrections for the output neurons
Dwik = e dk hi.
Error for hidden neurons:
dj = e Sk wjk dk hj(1-hj), (backward step)
(backpropagation of error).
The strongest correction for undecided weights – near 0.5
EE141
4
GeneRec
Although most models used in psychology teach multilayer perceptron
structures with the help of variations of backpropagation (in this way
one can learn any function) the idea of transferring information about
errors doesn't have a biological justification.
GeneRec (General Recirculation,
O’Reilly 1996),
Bi-directional signal propagation,
asymmetrical weights
wkl  wjk.
First phase –, response of the network
to the activation of x– gives output y–,
then observation of the desired result
y+ and propagation to input x+. The
change in weights requires information
about signals from both phases.
5
EE141
GeneRec - learning
The learning rule agrees with the delta rule:
Dwij  e  y j  y j  xi
In comparison with backpropagation the difference of signals [y+-y-]
replaces the aggregate error, (the difference of signals) ~ (the
difference of activations) * (the derivative of the activation function),
thus it is a gradient rule.
Db j  e  y j  y j 
For setups b is xi=1, so:
Bi-directional information transfer is almost simultaneous, answers for
the formation of attractor states, constraint satisfaction, image
completion.
The P300 wave which appears 300 msec after activation shows
expectations resulting from external activation
Errors are the result of activity in the whole network, we will get slightly
better results taking the average [x++x-]/2 and retaining the weight
symmetry:
CHL rule (Contrastive
Dwij  e  xi y j  xi y j 
6
Hebbian
Rule)
EE141
Two phases
From where does the error come for correction of synaptic connections?
The layer on the right side = the middle after time t+1; e.g..
a) word pronunciation: external action correction; b) external
expectations and someone's pronunciation; c) awaiting results of
action and their observation; d) reconstruction (awaiting input).
EE141
7
GeneRec properties
Hebbian learning creates a model of the world, remembering
correlations, but it is not capable of learning task execution.
Hidden layers allow for the transformation of a problem and error
correction permits learning of difficult task execution, the relationships
of inputs and outputs.
The combination of Hebbian learning – correlations (x y) – and errorbased learning can learn everything in a biologically correct manner:
CHL leads to symmetry, an approximate symmetry will suffice,
connections are generally bidirectional. Err = CHL in the table.
*
*
*
*
Lack of Ca2+ = there is no learning; little Ca2+ = LTD, much Ca2+ = LTP
8
LTD
–
unfulfilled
expectations,
only
phase
-,
lack
of
z
+
reinforcement.
EE141
Combination of Hebb + errors
It's good to combine Hebbian learning
and CHL error correction
CHL is like socialism
 tries to correct errors of the
whole,
 limits unit motivation,
 common responsibility
 low effectiveness
 planed activity
Hebbian learning is like capitalism





based on greed
local interests
Hebb
individualism
(Local)
efficacy of activity
lack of monitoring the whole Error
(Remote)
EE141
Advantages
Disadvantages
Autonomic
Reliable
narrow
greedy
Purposeful
Cooperative
interdependent
9
lazy
Combination of Hebb + errors
It's good to combine Hebbian learning and CHL error correction
Correlations and errors:
Combination
Additionally, inhibition within layers is necessary:
it creates economical internal representations,
units compete with each other, only the best remain, specialized,
makes possible self-organized learning.
EE141
10
Simulation of a difficult problem
Genrec.proj.gz, chapt. 5.9
3 hidden units.
Learning is interrupted after 5 epochs
without error.
Errors during learning show substantial
fluctuations – networks with recurrence
are sensitive to small changes in weight,
explore different solutions.
Compare with learning easy and difficult
tasks using only Hebb.
EE141
11
Inhibitory competition as a constraint
Inhibition
Leads to sparse distributed representations
(many representations, only some are useful in a concrete situation)
Competition and specialization: survival of the best adapted
Self-organized learning
Often more important than Hebbian
Inhibition was also used in the
mixture of experts framework

gating units are subject
to WTA competition

control outputs of the experts
12
EE141
Comparison of weight change in learning
View of hidden layer weights in
Hebbian learning
 Neural weights are
introduced in reference
to particular inputs
EE141
View of hidden layer weights in error
correction learning
 The weights seem fairly
random when compared with
Hebbian learning
13
Comparison of weight change in learning
b)
Epochs
Charts comparing a) training errors b) number of cycles as functions
of the number of training epochs for three different learning methods
 Hebbian (Pure Hebb)
 Error correction (Pure Err)
 Combination (Hebb& Err) – which attained the best results
14
EE141
Full Leabra model
6 principles of intelligent system construction.
1.
2.
3.
4.
Biological realism
Distributed representations
Inhibitory competition
Bidirectional
Activation Propagation
1. Error-driven learning
2. Hebbian learning
Inhibition within layers, Hebbian learning + error correction for weights
between layers.
15
EE141
Generalization in attractor networks
GeneRec by itself does not give a good generalization.
Simulation: Ch6, model_and_task.proj.
learn_rule = PURE_ERR, PURE_HEBB
or HEBB_AND_ERR
35 training data, testing every 5 epochs
on the remaining 10 data.
Learning by error correction only does not
give good results. Parameter hebb
controls how much CHL and how much
Hebb correlation.
Pure_err implements only CHL. Check + and – learning phases.
Generalization requires good internal representations = strong
correlations, and error correction by itself do not lead to sufficiently
strong internal representations, only Hebb + kWTA will do it.
16
EE141
Generalization: plots
Black line = cnt_err, training data error.
Red = unq_pats, determines how many
input lines is uniquely represented by the
hidden layer (max=10).
Blue = gen_Cnt, evaluates generalization
for 10 new lines – 8 errors at the end.
Weights appear random.
Generalization is poor based only on the error
correction, since there is no conditions forcing
internal representations.
Batch Run repeats 5 times (slow).
17
EE141
Generalizacja:
korekcja
błędów + Hebb
Szybka
zbieżność,
powstają dobre
reprezentacje
wewnętrzne.
18
EE141
Generalization
How do we deal with things which we've never seen
nust
every time we enter the classroom, every meeting, every sentence
that you hear, etc.
We always encounter new situations, and we reasonably
generalize them
How do we do this?
19
EE141
Good representations
Internal distributed representations.
New concepts are combinations of existing properties.
Hebbian learning + competition based on inhibition limit error
correction so as to create good representations.
20
EE141
Generalization in attractor networks
The GeneRec rule itself doesn't lead to good generalization.
Simulations: model_and_task.proj. gz, Chapt. 6
The Hebb parameter controls how
much CHL and how much Hebb.
Pure_err realizes only CHL, check
phases - and +
Compare internal representations for
different types of learning.
21
EE141
Deep networks
To learn difficult problems, many transformations are necessary,
strongly changing the representation of the problem.
Error signals become weak and
learning is difficult.
We must add limits and
self-organizing learning.
Analogy:
Balancing several connected sticks is
difficult, but adding self-organizing
learning between fragments will
simplify this significantly – like adding
a gyroscope to each element.
22
EE141
Sequential learning
Except for object and relationship recognition and task execution,
sequential learning is important, eg. the sequence of words in the
sentences:
The dog bit the man.
The man bit the dog.
The child lifted up the toy.
I drove through the intersection because the car on the right was just
approaching.
The meaning of words, gestures, behaviors, depends on the sequence,
the context.
Time plays a fundamental role: the consequences of the appearance of
image X may be visible only with a delay, eg. the consequences of the
position of figures during a game are only evident after a few turns.
Network models react immediately – how do brains do this?
23
EE141
Family tree
Example simulation: family_trees.proj.gz, Chapt. 6.4.1
24 people = agent. relations:
husband, wife, son, daughter, father,
mother, brother, sister, aunt, uncle,
cousin.
Generalization needs to find
relations between people.
What is still missing? Temporal and sequential relationships!
24
EE141
Family tree
How to learn family relations?
Enter all relations according to the tree below. We need 40 epochs to
learn, but BP needs 80.
Hebbian learning only is not sufficient.
Init_cluster + Cluster run.
25
EE141
Sequential learning
Cluster plot showing the representation of hidden layer neurons
 a) before learning
 b) after learning using a combined Hebbian and errorcorrection method
26
The
trained
network
has
two
branches
corresponding
to
two
families
EE141
Sequential learning
Categories of temporal relationships:
 Sequences with a given
structure
 Delayed in time
 Continuous trajectories
The context is represented in the frontal
lobes of the cortex
 it should affect the hidden layer.
We need recurrent networks, which
can hold onto context information
for a period of time.
Simple Recurrent Network, SRN,
 The context layer is a copy of the hidden layer
Elman network.
27
EE141
Sequential learning
Biological justification for context representation
Frontal lobes of the cortex
 Responsible for planning and performing temporal activities.
 People with damaged frontal lobes have trouble performing
the sequence of an activity even though they have no
problem with the individual steps of the activity
 Frontal lobes are responsible for temporal representations
 For example words such as “fly” or “pole” acquire
meanings based on the context
 Context is a function of previously acquired information
 People with schizophrenia can use context directly before an
ambiguous word but not context from a previous sentence.
Context representations not only lead to sequential
behavior but are also necessary for understanding
sequentially presented information such as speech.
28
EE141
Examples of sequential learning
Can we discover rules of
sequence creation?
Examples:
Are these sequences acceptable?
BTXSE
BPVPSE
BTSXXTVVE
BPTVPSE
BTXXTTVVE
TSXSE
VVSXE
BSSXSE
A machine with consecutive
passages produces these
behaviors:
As studies have shown, people can learn more quickly to
recognize letters produced according to a specific pattern,
even if they don't know the rules being used
29
EE141
Network realization
The network randomly chooses one of
two possible states.
Hidden/contextual neurons learn to
recognize machine states, not only
labels.
Behavior modeling:
the same observations but different
internal states => different decisions
and next states.
Project fsa.proj.gz,
chapt. 6.6.3
30
EE141
Temporal delay and reinforcement
The reward (reinforcement) often follows with a delay eg. learning a
game, behavioral strategies.
Idea: we have to foresee sufficiently early
what events lead to a reward.
This is done by the temporal differences
algorithm.
(Temporal Differences TD - Sutton).
From where does a reward come in the brain?
The midbrain dopaminergic system
modulates the activity of the basal ganglia
(BG) through the substantia nigra (SN), and
the frontal cortex through the ventral
tegmental area (VTA). It's a rather
complicated system, whose actions are
related to the evaluation of impulses/actions
from the point of view of value and reward.
EE141
31
Temporal delay and reinforcement
The ventral tegmental area (VTA)
is part of the reward system.
VTA neurons deliver the
neurotransmitter dopamine (DA) to
the frontal lobes and the basal
ganglia modulating learning in this
area responsible for planning and
action.
More advanced regions of the brain are responsible for producing
this global learning signal
Studies of patients with damage in the VTA area indicate its role in
predicting reward and punishment.
32
EE141
Anticipation of reward and result
33
Anticipation of reward and reaction on the decision (Knutson et al, 2001)
EE141
Basal ganglia BG
VTA neurons first learn to react to reward and then to predict ahead of
time the appearance of a reward.
34
EE141
Formulation sketch –TD algorithm
We need to determine a value function, the sum after all future rewards,
the further away in time the less important:
The adaptive critic AC learns how to estimate the value function V(t).
At every point in time, AC tries to predict the value of the reward
This can be done recursively:
Error of the predicted reward:
The network tries to reduce this error.
The name of the algorithm – TD (temporal difference) represents
the error in the calculation of the value function during a period of
35
time
EE141
Network implementation
Prediction of activity and error.
Conditioned
stimulus
CS for t=2
Unconditioned
stimulus
(reward)
US for t=16
rl_cond.proj.gz
Initially large error for
Time=16 because the
reward r(16) is unexpected
Adaptive
critic AC
36
EE141
Two-phase implementation
(Phase +) computes the expected size of the reward over time t+1 (value r).
(Phase –) in step t-k predicts t-k+1, at the end r(tk).
The function value V(t+1) in phase + is carried over to value V(t) in phase -


1
Vˆ  (t  1)  Vˆ  (t  1)
CS for t=2
US for t=16
Learning progresses backwards in time affecting the value of the previous
37
EE141
step

Two-phase implementation
The system learns that stimulants (tone) predicts the reward
Input CSC – Complete Serial Compound, uses unique elements for
each stimulus for each point in time.
Chapt. 6.7.3, proj.
rl_cond.proj.gz
This is not a very realistic model of classical conditioning.
38
EE141