14/15 April 2008

Download Report

Transcript 14/15 April 2008

Bioinspired Computing
Lecture 16
Associative Memories
with
Artificial Neural Networks
Netta Cohen
Last time
Recurrent neural nets:
Today
Attractor neural nets:
• Biologically realistic
architecture
• Biologically-inspired
associative memories
• Dynamic interactive
behaviour
• Also… steps away from
biologically realistic model
• Natural learning
protocols
• Unsupervised learning
• Applications
2
Recurrent Nets: Pros & Cons
Pros
• Biologically-realistic
architecture/performance
• Complex self-sustained
activity
• Distributed
representations
• Dynamic interactions
with environment
• Powerful computation
• Noise tolerance
• Graceful degradation
Cons Pros
• Hard to formalise in
information processing
terms
• Hard to visualise activity
• Hard to train with no
guarantee of
convergence
• No guaranteed solution
attractor neural nets are a special case of recurrent nets.
3
Training Methods for Recurrent
Networks
• Genetic algorithms
– Usually in a controller
– Fitness evaluated on the controller not the
network
– Requires sophisticated genome
• Backpropagation through time (BPTT)
– Understand principle
4
5
+1
-1
6
Training Methods for Recurrent
Networks
• Jets and Sharks: interesting dynamic
behaviour but weights set by hand
• Looking for an automatic, ‘grounded’ way of
training networks
• Hopfield (today)!
• Elman (also today … turn a ffd net into
something recurrent)
7
Associative Memory
The imprinting and recollection of memories is an important
component of what we do & how we process information.
If we were to model these processes, here are a few conditions we
might want to include in our model:
• Store and reliably recall multiple independent memories.
• Given only partial input, recall complete information, or
• Given noisy input, recall noise-free prototype information.
• Learn new memories in a biologically realistic manner.
• Recall memories fast enough (before next input is received)
• Once recalled, maintain attention or memory long enough
(for information processing & transmission elsewhere in
brain).
8
Attractor Neural Nets
In RNNs, the state of the system is dictated both by internal
dynamics & the system response to inputs from the environment.
state space
cross-section
of energy
landscape
In some cases, trajectories in state-space can be
guaranteed to lead to one of several stable states
(fixed points, cycles or generic attractors).
9
Attractor Neural Nets (cont.)
Dynamical systems such as RNNs could serve as
models of associative memory if it was possible to
encode each memory in a specific stable state or
attractor.
In 1982, John Hopfield realised that by imposing a
couple of restrictions on the architecture of the nets, he
could guarantee the existence of attractors, such that
every initial condition would necessarily evolve to a
stable solution, where it would stay.
This is tantamount to the requirement that the above
picture be described in terms of an energy landscape.
10
Attractor Neural Nets:
Architecture
j
wij
• All connections are symmetric wij = wji
• No self-connectedness
wji
i
wii=0
“Gerard Toulouse has called Hopfield’s
use of symmetric connections a ‘clever
step backwards from biological realism’.
The cleverness arises from the
existence of an energy function.”*
The existence of an energy function provides us with:
• A formalism of the process of memory storage and recall
• A tool to visualise the activity (both learning and recall)
• A straightforward way to train the net
• Once trained, a guaranteed solution (recall of the correct memory).
* Hertz, Krogh & Palmer Introduction to the theory of neural computation (1990).
11
How Does It Work?
Nodes are modelled by conventional binary MP neurons.
Each neuron serves both as an input and output unit.
(There are no hidden units.)
States are given by the pattern of activity of the neurons
(e.g. 101 for a network with three neurons). The number of
neuron sets the maximum length for a bit-string of memory.
Different patterns can be simultaneously stored in the
network. The number of independent patterns that can be
remembered is less than or equal to the number of nodes.
Memory recall corresponds to a trajectory taking the
system from some initial state (input) to the local energy
minimum (closest association) .
Each step along the (recall) trajectory results in the same
or a lower energy. Since energy is bounded from below, a
12
solution is guaranteed for every input.
A working example
2
3
4
5
Input (t=0)
0
1
0
0
0
t=1
1
1
1
0
1
t=2
0
1
1
1
0
1
1
1
1
1
1
1
1
0
0
1
1
1
0
1
3
1
-1
0
0
-3
2
2
4
2
3
0
4
3
3
0
2
-5
-1
-4
1 2 3 4 5 (neuron i)
0
1
-1
2
-3
t=3
1
0
3
-1
0
t=4
-1
3
0
1
-2
2
-1
1
0
1
-3
0
-2
1
0
...
(neuron j)
5 4 3 2 1
Weight matrix
1
t
∑ xi wij
i
threshold = 0
Exercise: repeat this example
with an initial input of [ 0 1 0 1 0 ].
13
More general examples
Stability:
The stable pattern reached in the working example
represents a fixed point in the dynamics.
While stable solutions are guaranteed, not all stable
solutions are fixed point solutions.
State update rule:
This example used a “synchronous updating” method.
Asynchronous (sequential or random) updating
methods can also be implemented.
14
Asynchronous random updating
•
•
•
•
Pick any node
Update it
Repeat
Example difference between
asynchronous/synchronous:
• Hopfield network with two nodes:
• W =1
• Initial state -1, 1
15
0,1 or -1,1
• No essential difference!
• Suppose we use weights w_ij for nodes
that are -1,1
16
Trajectories in energy landscape
Where does energy come in? The formalism we need
to answer this question comes from physics and
requires slight modifications to our notation:
1) neuron coding [0 1]  [-1 1]
2) Threshold now becomes a
“sign” function:
sign(input) =
{
x=1 if input>0
old x if input=0
x= -1 if input<0
3) Asynchronous update
Spin Glass
17
From energies to MP neurons
Define the energy of node i as Ei = - xi inputi
where inputi to node i is the weighted sum over
all neurons inputi = ∑ xj wij
j
This is called the mean field approximation: the “magnetic
field” at each node (each spin) corresponds to a weighted
average over all the fields generated by all other spins.
When a specific node senses this field, it wants to align
with the mean field, thus reducing its energy. This is the
update rule: xi  sign(inputi)
We have just re-discovered that the MP update rule exactly
corresponds to magnetic field alignment in Spin Glasses!
18
Ansynchronous Random
updating
• Easy to show that updating minimizes
energy
• Example given …
19
Attractor Neural Nets
j
wji
wij
i
wii=0
The restrictions imposed on the
recurrent net are now:
• All connections are symmetric wij = wji
• No self-connectedness
“Gerard Toulouse has called Hopfield’s
use of symmetric connections a ‘clever
step backward from biological realism’.
The cleverness arises from the
existence of an energy function.”*
The existence of an energy function provides us with:
• A formalism of the process of memory storage and recall
• A tool to visualise the activity (both learning and recall)
• A straightforward way to train the net
• Once trained, a guaranteed solution (recall of the correct memory).
20
Training the Net
We need to find the set of weights that encode a single
pattern p of length N bits as a minimum energy solution.
The minimum energy is obtained
when the output of node i exactly
matches the inputs to that node.
Plug in a guess for w:
N
pi  sign(  w ij p j )
j1
1
w ij  p i p j
N
N
1
so that pi  sign(  pi p jp j )
j1 N
OK
21
Training the Net (cont.)
This weight assignment is remarkably reminiscent of
Hebbian learning:
If two nodes are spiking at the same time, then the
weight connecting them is strengthened. Here anticorrelated nodes result in negative (inhibitory) weights.
1
p i p j . Only patterns
For gradual learning w ij  
N
introduced repeatedly will result in the formation of new
memories; noise will be ignored.
Now generalising
for M memories or
patterns:
1
w ij 
N
M
p
m 1
(m)
i
p
(m)
j
Generalised Hebb Rule
22
Does it work?
This applet demonstrates: The distributed representation
The ability to perform powerful computation
High storage capacity (7 100-bit patterns in 100 neurons)
High fidelity and noise tolerance
Graceful degradation (for more memories)
Eliminated features:
Dynamic inputs in training
Intrinsic background activity
http://www.cs.tcd.ie/Padraig.Cunningham/applets/Hopfield/Hopfield.htm
from http://suhep.phy.syr.edu/courses/modules/MM/SIM/Hopfield/ where no longer available.
23
Adding extra training patterns
• For a single training pattern it is clear that the
pattern is an energy minimum (why?)
• For two training patterns, each of them is still
an energy minimum, slightly perturbed by the
other (explicit example)
• General condition p << N
• Consequence resistance to damage
• No localisation of memory: ‘holographic’
24
Storage Capacity
How many memories can be stored in the network? To store
M memories, each of length N bits, in a network of N neurons,
we first ask how many stable patterns can be reached?
In 1987, McEliece et al derived an upper limit for the number
of memories that can be stored accurately: M = N/(2 logN).
e.g. for N = 100 neurons, M = 11 distinct memories, each 100
bits long can be faithfully stored and recalled. To write out
these 11 distinct memories, would take 1100 bits!
In general, the coding efficiency of the network can be
summarised as 2 log N neurons per pattern (each N bits long).
This enormous capacity is paid for by a potentially lengthy
recall process.
25
McEliece et al., (1987) IEEE Trans. Inf. Theor. IT-33:461-482.
Applications
Hopfield nets have obvious applications for any problem that can be
posed in terms of optimisation in the sense of maximising or
minimising some function, that can be likened to an energy function.
The distance matching problem:
given points
match pairs
shortest
length
The travelling salesman problem:
given points
find path
shortest
26
What about the brain (pros)?
Hopfield nets maintain some very attractive features from
recurrent net architectures. However, the imposition of
symmetric weights was a conscious move away from
biological realism and toward engineering-like reliability.
In contrast, Hopfield nets seem more biologically realistic
in disallowing self-connected neurons.
Hebbian-like learning is also a great appeal of Hopfield
nets, capturing several important principles:
(1) unsupervised learning
(2) natural synaptic plasticity
(3) No necessary distinction between training & testing.
(4) robustness to details of training procedure
27
What about the brain (cons)?
While we now have dynamics in training and in recall, we
might still ask is this dynamics realistic in the brain?
1) In the memory recall stage: we consider inputs one at a
time, waiting for the association to be made before
proceeding to the next pattern. Is this how the brain works?
2) The aspiration of every Hopfield net is to arrive at a stable
solution. Is this a realistic representation of association, or
cognition in general in the brain? In other words, do we
represent solutions to problems by relaxing to stable states
of activity in the brain, or does the brain represent solutions
according to very different, dynamic paradigms that handle
continuous inputs and actively resist falling into the abyss of
equilibrium?
28
What about the brain? (cont.)
How memories are implanted in our brains remains
an exciting research question.
While Hopfield nets no longer participate in this
discourse, their formative role in shaping our intuition
about associative memories remains admirable.
29
Next time…
• Final lecture about neural networks (for the time being)
Reading
• John Hopfield (1982) “Neural Networks and Physical Systems
with Emergent Collective Computational Properties”, Proc. Nat.
Acad. Sci. 79: 2554-2588.
• A highly accessible introduction to the subject, incl. both nontechnical and technical approaches can be found at:
www.shef.ac.uk/psychology/gurney/notes/contents.html
• Some food for thought: a popular article on CNN: “Researchers:
It’s easy to plant false memories”, CNN.com, Feb 16, 2003.
30