Memory - Ohio University

Download Report

Transcript Memory - Ohio University

Computational Intelligence
Memory
Based on a course taught by
Prof. Randall O'Reilly
University of Colorado and
Prof. Włodzisław Duch
Uniwersytet Mikołaja Kopernika
Janusz A. Starzyk
EE141
1
General remarks
Memory is any persistent effect of experience.
Memory is seemingly uniform, but in reality it is very differentiated:
spatial, visual, aural, recognition, declarative, semantic, procedural,
explicit, implicit …
Here we test mechanisms, so the primary division is:
 Synaptic memory (physical changes in synapses), long-term and
requiring activation to have some influence on functioning.
 Dynamic memory, active, temporary activations, affects current
functioning.
 Long-term priming, based on synaptic memory, yielding to fast
modification – semantic and procedural memory are the result of
slow processes.
2
 Short-term priming, based on active memory.
EE141
General remarks
Memory Types
STM
LTM
Working memory
Short term memory
Long term memory
Nondeclarative
Declarative
Facts
Events
Parietal cortex
Prefrontal cortex
Limbic system
Manual
skills
Conditioning
Emotional
Nuclei
Priming
Motor
Cerebellum
Neocortex
3
EE141
3 regions
PC – rear parietal cortex and motor
cortex; distributed representations,
spatial memory, long-term priming,
associations, deductions, schemes.
FC – prefrontal cortex, isolated
representations, disruption control,
working memory.
HC – hippocampus formation, episodic
memory, spatial memory, declarative memory, sparse representations,
good image separation.
Slow learning, statistically relevant relationships => procedural and
semantic memory, cortical; fast => episodic, HC.
Retaining active information and simultaneously accepting new
information, eg. multiplying in your head 12*6, requires FC.
4
EE141
Slow/rapid learning
A neurons learns situational
probability, correlations between
the desired activity and input
signals; optimal value of 0.7 is
reached rapidly only with a small
learning constant of 0.005
Every experience is a small fragment of uncertain, potentially useful
knowledge about the world => stability of one's image of the world requires
slow learning, integration leads to forgetting individual events.
Relevant new information is learned after a single exposure.
Lesions in the formation of the hippocampus cause subsequent amnesia.
The neuromodulation system reaches a compromise of stability/plasticity.
5
EE141
Complementary learning systems
6
EE141
Active memory and priming
Distributed overlapping representations in the PC
can efficiently record information about the world, but
this is not very precise and blurs with the passage of
time.
FC – prefrontal cortex, stores isolated
representations; increases memory stability.
The effects of priming are evident in people with a
damaged hippocampus, cortical priming in the PC is
possible.
We will differentiate many forms of priming:
 length (short-term, long-term),
 type of information (visual, lexical),
 similarity (repetition, semantic).
EE141
7
Priming
Standard: completing roots, after reading a list of words we get a root
and must add the ending, eg.
rea--If reaction was on the list earlier, then it is usually chosen.
The interval of time can be about an hour, so active memory can't be
responsible for this.
Homophones: read, reed.
Completion: "It was found that the ...eel is on the ...", in which the last
word is "orange, wagon, shoe, table” is heard as:
"peel is on the orange",
"wheel is on the wagon",
"heel is on the shoe"
"meal is on the table".
8
EE141
Priming model
Project wt_priming.proj, Chapter 9 from
(http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Wt_Priming)
View Events: the first 3 have the same input images, but different output images, in
total 13 pairs x 2 outputs = 26 combinations, IA - IB
Attention: we're not yet learning the AB-AC lists, just the effect of learning.
9
EE141
Exploring the model
View TrainLog and evaluation of
the result:
similarity of the output image,
summarized as a yellow line, the
name of the most similar event,
measured by sm_nm = binary
errors in the names of the closest
events, part of the result not very
similar to the given: A  B.
In blue both_err = 1 only if this isn't one of the two acceptable output
images.
Noise helps to break through impasses but it also causes a small lack of
stabilisation of already-learned images.
10
EE141
Further tests
Test_logs: first we will check if there are some tendencies, and then if we
can teach a network to change preference after the presentation of IA
and then IB.
wt_update=Test, Test does one epoch, check Trial1_TextLog:
ev_nm is either IA, or IB, and sm_nm is either 0 or 1, randomly.
In Epoch1_TextLog we can see that there is always one of the two
results, in sum 13/26, or half the time: there is no tendency.
We check whether one exposure changes anything.
wt_update => On_Line, learning after every event,
Run Test, the frequency increases significantly to 18 and then 25 times.
Conclusions: just error reduction gives mixed outputs A and B, a network
without kWTA won't learn this task.
The parietal cortex can be responsible for long-term priming.
11
EE141
AB-AC Learning
People are able to learn two lists, word pairs A-B, and then A-C, eg.
window-mind
bike-trash
....
and then:
window-train
bike-cloud
without greater interference, doing well on tests for AB and AC.
Networks with only error correction forget catastrophically!
Interference results from using the same elements and weights to learn
different associations.
It's necessary to use different units, or to learn with context.
12
EE141
AB-AC Model
Project ab_ac_interference.proj
(http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_ABAC_List_Learning)
View Events_AB, Events_AC,
Output: either A, or C, the context differentiates.
Replication of catastrophic learning:
View: Train_graph_log, red = errors, yellow = tests for AB.
The test shows that after learning AC, the network forgets AB, many
13
units
in
the
hidden
layer
take
part
in
the
learning
of
both
lists.
EE141
AB-AC Model
hid_kwta 12=>4 to decrease the number of active elements.
The test, but without changes.
Increase the variance of initial values.
wt_var 0.25=>0.4
Stronger influence of context
fm_context 1=>1.5
Hebbian learning hebb 0.01=>0.05
Decrease the rate of learning lrate => 0.1, Batch
Nothing here clearly helps but the catastrophes are less likely...
Two systems of learning are clearly necessary, a fast one and a
slow one – cortex and hippocampus.
14
EE141
Hippocampus
Anatomy and connections of the
structures of the hippocampal
formation: signals reach from uniand multimodal association areas
through the Entorminal Cortex
(EC).
15
EE141
More anatomy
Hippocampus = king of the cortex
Bidirectional connections with the
entorhinal cortex:
olfactory bulb,
cingulate cortex,
superior temporal gyrus (STG),
insula,
orbitofrontal cortex.
16
EE141
More anatomy
Sporadic activation
Representations in CA3 and CA1
are focused on specific
stimuli, while in the
subiculum and the entorhinal
cortex they are strongly
distributed.
17
EE141
Hippocampal formation
Model contains structures:
dentate gyrus (DG),
areas CA1 and CA3,
entorhinal cortex (EC).
Pct Act = % of activation.
18
EE141
Separation and conjunction of images
The hippocampus rapidly associates
various representations of the cortex.
Creates episodic memory
Completes activations recreated
from the memory and separates them
into clearly distinct meanings
Sparse encoding eases the
separation of meanings
CA1 separates by conjunction of images
(representations)
It's also able to recreate the original
activation from the EC by reversible
connections
EE141
19
Model of the hippocampus
Project hip.proj
(http://grey.colorado.edu/CompCogNeuro/index.php/CECN1
_Hippocampus)
Input signals enter through the entorhinal
cortex (EC_in), to the dentate gyrus
DG and the CA3 area,
DG also influences CA3, where received
signals can be completed through
associations.
CA3 has strong internal connections. CA1
has more distributed sparse
representations => EC_out.
EC: 144 el = 4*36; 1 of 4 active.
DG: 625 el, CA3: 240 el
CA1: 384 el = 12 col * 32 el
EE141
20
Exploration of the hippocampal model
Learning of AB – AC associations without interference.
Autoassociations: EC_in = EC_out, reversible transformations.
BuildNet, View_Train_Trial_Log will show the statistics.
The input includes information about the input and output images and
the list.
StepTrain: units chosen in the previous step have white outlines.
Partial overlapping of images in EC_in, DG, CA3, CA1.
Training epoch: 10 list elements + 3 test sets: AB, AC, new
View Test_Logs => text and graph log
train_updt = no_updt to the test log,
Run will do 3 epochs, the results are in Text_log, 70% remembered from
the AB list and 100% from the AC list.
Set test_uodt = no_updt, the network will more rapidly finish 3
training/test epochs.
Test analysis: test_updt = Cycle_updt, Clear Trial1_1_Text_log
21
StepTest, we see only A + context, we see how the image completes.
EE141
Further exploration
Targ in Network shows what image was learned, act  targ
In TextLog,
stim_er_on = proportion of units erroneously activated in EC_out,
stim_er_off = erroneously not activated in EC_out.
In Trial_1_GraphLog we can see these two
numbers after every test, for known images
they're small, correct memories,
for new ones they're large, but on ~0,5 and off
~0.8, the network rarely fails.
To move to list AC we turn off Test_updt = Trial_updt (or no_updt)
and StepTest until in text_log, epc_ctrl changes to 1. These are events
for list AC: the network does not recognize them (rmbr=0) because it
hasn't learned them yet.
Train_Epcs=5, train_env=Train_AC,
Run and check results.
22
EE141
Summary
The hippocampal model can rapidly, sequentially learn associations
AB – AC without excessive interference. For this it was sufficient to
use the Hebbian contrast rule, CPCA and the correct architecture.
Interference results from using the same units, in CA3 it arrives at
separation of identical images (representations) learned in another
context.
Separation of images doesn't allow associations, inferences based on
similarity, efficient encoding of multidimensional information.
The conjunction of images happens in CA1.
This suggests a complementary role of the hippocampus,
supplementing the slow learning mechanisms of the cortex.
The hippocampus can remember episodes helping in spatial
orientation, create conjunctive representations connecting different
stimuli together quicker than the cortex.
23
EE141
Memory
Memory is not uniform
1.
Weights (long-term, require activation) vs activations (short-term,
already activated, can influence processing)
2.
Based on weights


3.
The cortex has initial states but suffers from catastrophic influences.
The hippocampus can learn fast without influences, using sparse
distributed representations of images
Based on activation


The cortex shows initial states but
isn't good for short-term memory
4.
Cooperation of activation and memory based on weights
5.
Video
1.
2.
short-term memory in chimpanzees -30 sec
Comparison with students– 30 sec
24
EE141
Active short-term memory
Short-term priming: attention and influence on reaction speed.
Besides the duration, memory content and effects resulting from similarity are like
long-term priming.
Project act_priming.proj. (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Act_Priming)
Completing roots or homophony, but without learning, only the influence of the
remains after the last activation.
The network has learned series IA-IB.
The test has a series of images and results A and B, we show it A upon output,
the network responds A; now we show the image for B but only phase is turned
on – (lack of learning), the network's result is sometimes A, sometimes B.
LoadNet, View TestLogs,Test
The correlations of previous results A and B depend on the speed of fading of
activation; check efekt act_decay 1 => 0, tendency to leaving a.
Analyze the influence on results in test_log.
25
EE141
Active maintenance
Project act_maint.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Active_Maintenance):
active maintenance of information in working memory despite
interference, quickly accessible, doesn't require synaptic changes.
Recurrence is necessary, an attractor network with a large pool of
attraction, resistant to noise.
Video – remembering with delay – 30 sec
The processes of analysing environmental data don't require such networks,
because they are steered by incoming information.
Activation should diverse, enabling associations and inferences, while we
have external signals this will suffice, eg. if we note on paper the results of
intermediate operations.
With a lack of external activations, we have to rely on actively maintained
representations in working memory, which has serious limits (famous
Miller's 72, and even 42 for complex objects).
First a model without attractors, which requires external signals, then
distributed representations, but shallow attractors, not very resistant to
noise; in the end deep but localised attractors, which disable associations.
26
EE141
Maintenance model
Project act_maint.proj.
3 objects, 3 elements (features)
r.wt, View Grid_log, Run: if there is an input activation is maintained, but
after removal it disperses (the network blurred...).
Check influence wt_mean =0.5, wt_var = 0.1, 0.25, 0.4
Net_Type Higher_order: we add combinations of feature pairs.
27
EE141
Defaults, Run, add noise_var=0.01, the network forgets...
Isolated representations
Default to return to initial parameters.
network = IsolatedNet
Lack of connections between hidden units, but
there is recurrence, activation doesn't fade.
Noise = 0.01 doesn't interfere, but with 0.02
sometimes gets ruined.
Is it worth learning to focus in spite of noise?
Different task: does stimulus S(t) = S(t+2)?
Parameters: input_data = MaintUpdateEnv,
network Isolated, noise 0.01
Init, Run: there are two inputs, Input 1 and 2,
wt_scale 1=>2, changes the strength of local connections.
The network can be switched from fast actualization to long-lasting
maintenance.
How to do this automatically?
Dopamine and dynamic regulation of reward in the PFC.
EE141
28
Working memory
The prefrontal cortex plays the central role in
maintaining active working memory and has desired
properties: isolated self-activating attractor networks
with extensive pools.
Neuroanatomy, PFC connections and microcolumns =>
specialized area for active memory.




A. PR – spatial.
B. PR - spatial, self-ordered tasks.
C. PR - spatial, object and verbal, self-ordered tasks and analytical
thinking.
D. PR - objects, analytical thinking.
Typical experiments require delayed choice and show the differences
between PC, IT, which have only temporary stimulus representations,
and PFC, which maintains them longer.
29
EE141
Role of dopamine
Blocking of dopamine
has a negative
influence on working
memory, and aiding it
has a positive
influence.
TD – temporal
Difference in RL
Dopamine (DA) arrives
from the VTA (ventral
tegmental area).
DA strengthens
internal activations,
regulating access to
working memory.
VTA displays such
increased activity.
Basal ganglia can also regulate PFC activity.
EE141
30
Working memory
Project pfc_maint_updt.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_PFC_Maint_Updt)
Dynamic "gate” AC
added to the network
with recurrence and
learning based on
temporal differences
(TD).
Inputs: A, B, C, D
Ignore, Store, Recall
decides what to do with
them
PFC is working memory, AC = adaptive critic is a reward system
(dopamine) controlling information renewal in the PFC, hidden layer
represents the parietal cortex, hidden 2 maps to the output (frontal
cortex). AC learns to predict the next reward, modulating the strength of
internal PFC connections.
31
EE141
PFC Model
r.wt: one-to-one connections between input, hidden layers and the PFC.
AC has connections with the hidden layer and the PFC, but reverse
connections AC => PFC serve only to modulate.
Act, Step: we observe phases – and +, at first the activation of PFC and
AC is zero, there are two + steps, first to change PFC weights, and then
to set the correct signal propagation.
When signal R appears (reminders), the network will not act correctly at
first, the reward in AC is 0.
At first the network doesn't know what's going on, learning only on Store,
Ignore hidden layer 2, but sometimes noise in the PFC will cause the
correct result and reward to appear.
View Epoch_log, observe the change in weight of unit AC, r.wt
Weights of S => AC should increase and error will decrease, the yellow
line is the number of incorrect predictions of AC.
View, Grid_log, Clear, act, Step. Store introduces data to the PFC, but
32
Ignore doesn't. After Recall, PFC is zeroed.
EE141
A- not B
Interactions between active and synaptic memory - weights have already
changed but active memory is in a different state: what wins?
These interactions are visible in the developing brains of children ~ 8
months (Piaget 1954), experiments done also on animals.
A toy (food) is hidden in box A and after a short delay the child (animal)
can remove it from there. After several repetitions in A, the toy is hidden in
box B; the children keep looking in A.
Active memory doesn't work in children
as efficiently as synaptic memory,
lesions in the area of the prefrontal
cortex cause similar effects in adult
and infant rhesus monkeys.
Children make fewer errors looking in
the direction of the place where the toy
was hidden, than reaching for it. There
are many interesting variants of this
type of experiment and explanations
on different levels.
EE141
33
Project A- not B
Decision-making process model: we know that information about place
and objects is divided, so this information is given on input: place A, B, C,
toy T1 or T2 and cover C1 or C2.
Synaptic memory is realized with the help of standard CPCA Hebbian
learning, and active memory as bi-directional connections between
network representations in the hidden layer.
Output layers: decisions about the direction of looking and reaching.
The direction of looking is always activated
during each experience, reaching is
activated less often, only after moving the
whole set-up toward the child, so these
connections will rely on weaker learning.
Initial tendency: agreement of looking and
reaching on A (weight 0.7). All inputs
connected with hidden neurons, weight 0.3.
Project a_not_b.proj.
(http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_A_Not_B)
EE141
34
Experiment 1
rect_ws =0.3 decides on the strength of
recurrent activations in the hidden layer
(working memory), changing this parameter
simulates a child's development.
View Events: 3 types of events, initial showing
4x, then A 2x, then B 1 x. An event has 4
temporal segments:
1) start, pretrial – boxes covered;
2) presentation, toy hidden in A;
3) expectation – toy in A;
4) choice – possible reaching.
Only visible elements are active.
View: Grid_log, Run performs the entire
experiment, turns off display.
ViewPre shows on Grid_log, A is activated
ViewA shows A tests, after learning.
ViewB shows B tests: the network makes an error.
EE141
35
Further experiments
Activation in the hidden layer flows toward the representation associated
from A.
rect_ws 0.3 => 0.75 for a mature child.
Run, ViewB
Although synaptic memory didn't change, more efficient working memory
enables the undertaking of correct action.
Try for rect_ws = 0.47 i 0.50
What happens? There is no activity – hesitation?
The results depend on the length of the delay, with a shorter delay there
are fewer errors.
Delay 3=>1
Do tests for rect_ws = 0.47 i 0.50
What happens with a very young child?
rect_ws = 0.15, delay = 3;
Weak recurrence, weak learning for A.
EE141
36
Other types of memory
The traditional approach to memory assumes functional, cognitive,
monolithic, canonical representations in memory.
From modeling, it turns out that there are many systems interacting with
each other which are responsible for memory, with different
characteristics, variable representations and types of information.
Recognition memory: was an element of the list seen earlier?
A "recognition" signal is enough, remembering is not necessary.
A hippocampus model is also useful here, it allows for remembering, but
this is too much – in recognition memory the central role seems to be
played by the area of the perirhinal cortex.
Cued recall - completion of missing information.
Free recall – effects of placement on the list (best at the beginning and
the end), as well as grouping (chunking) of information.
37
EE141
Learning categories
Categorization in psychology - many theories. Classic experiments:
Shepard et al. (1961), Nosofsky et al. (1994).
Problems with an increasing degree of complexity, division into
categories C1, C2, 3 binary properties: color (black/white), size
(small/large), shape (,).
Type I: one property defines the category.
Type II: two properties, XOR, np. Cat A: (black,large) or (white,small),
any shape.
Type III-V: one property + increasingly more exceptions.
Type VI: lack of rules, enumeration
Difficulties and speeds of learning: Type I < II < III ~ IV ~ V < VI
38
EE141
Canonical dynamic
What happens in the brain while learning category definitions based on
examples? Complex neurodynamics <=> the simplest dynamics
(canonical). For all logical rules, we can write corresponding equations.
For type II problems, or XOR:
1 2
2
2 2
V  x, y, z   3 xyz   x  y  z 
4
V
x
 3 yz   x 2  y 2  z 2  x
x
V
y
 3 xz   x 2  y 2  z 2  y
y
V
z
 3 xy   x 2  y 2  z 2  z
z
EE141
Feature
area
39
Against majority
List: diseases C or R, symptoms PC, PR, I
Disease C is associated with symptoms (PC, I),
disease R with (PR, I); C happens 3 times more often
than R. (PC, I) => C, PC => C, I => C.
Predictions „against majority” (Medin, Edelson 1988).
Although PC + I + PR => C (60%),
PC + PR => R (60%)
Neurodynamic attractor pools?
PDF in areas {C, R, I, PC, PR}.
Psychological interpretation (Kruschke
1996): PR has meaning even though this is
a differentiating symptom, although PC is
more common. Activation PR + PC more
often leads to result R although the
gradient
in direction R is greater.
EE141
40
Learning
Point of view
Neurodynamics
Psychology
I+PC is more common =>
stronger synaptic connections,
larger and deeper attractor
basins.
Symptoms I, PC are typical for C
since they happen more often.
To avoid attractors around I+PC
leading to C, a deeper and more
localized attractor around I+PR is
created.
For rare disease R, symptom I is
not distinct, so attention focuses
on PR associated with R.
41
EE141
Testing
Point of view
Neurodynamics
Psychology
Activating only I leads to C since
more examples of I+PC create a
larger shared attractor basin than
I+PR.
I => C, in accordance with
expectations, more frequent
stimuli I+PC are recalled more
often.
Activation by I+PC+PR leads
frequently to C, because I+PC
puts the system in the middle of
the large C basin and even for PR
gradients still lead to C.
I+PC+PR => C because all
symptoms are present and C is
more frequent (base rates again).
Activation by PR+PC leads more
PC+PR => R because R is distinct
frequently to R because the
symptom, although PC is more
attractor basin for R is deeper, and common.
the gradient at (PR,PC) leads to R.
42
EE141
Summary

Knowledge formed in memory is
 built, dynamic, continuous, appearing





Behavior and inhibition of knowledge are the result of
dynamic information processing rather than interaction
structures set at the top.
Recognition is based on the ability to differentiate
earlier-learned activations from new, unknown
activations.
The hippocampus ensures high-quality recognition
with a high threshold guaranteeing association of
earlier-learned activations.
Priming contributes to slow building of inviariant
representations
Two learning mechanisms
 Based on connection weights
 Based on neuron activation
EE141
43
Summary







The cortex helps recognition by priming
The cortex leads to unstimulated associations
The cortex is responsible for working memory
cooperating with the hippocampus
Sequences of grouped representations are stored in
long-term memory
Memory based on activation requires combining
quick-actualizing with stable representations
The hippocampus uses sparse distributed
representations for fast learning without mixing ideas
Priming memory can be long-term (based on weights)
or short-term (based on activation)
44
EE141