Transcript {-1,+1}.

Recurrent
Network
Inputs
Outputs
Motivation
• Associative Memory Concept
• Time Series Processing
– Forecasting of Time series
– Classification Time series
– Modeling of Time series
– Mapping one Time series onto other
• Signal Processing ( a field very close to TS processing)
• Optimization problems (like Traveling Salesman)
Address – addressable Vs.
content addressable Memory
AM provide an approach of storing and retrieving data
based on content rather than storage address.
Storage in a NN is distributed throughout the system in the
net’s weights, hence a pattern does not have a storage .
Auto associative vs. HeteroAssociative
Memory
• Each association is an i/p o/p vector pair , s:f.
Association (for two patterns s and f)
s=f
auto-associative memory
s = !f
hetero-associative memory
So What’s the difference?
evoke associated
patterns
Associative Recalls
recall a pattern
by part of it
evoke/recall with incomplete/
noisy patterns
• The net not only learns
the specific patterns
pairs that were used for
training, but is also able
to recall the desired
response pattern when
given an I/P stimulus
that is similar, but not
identical, to the training
I/P.
Training an AM NN
• The original patterns must be converted to an
appropriate representation for computation.
“on” →+1 , “off”→0 (binary representation)
“on”→+1, “off”→-1 (bipolar representation).
OR
• Two common training methods for single layer
nets are usually :
– Hebbian learning rule
– Its variations–gradient descent
Hebbian Learning Rule
• ”When one cell repeatedly assists in firing another, the axon of the first cell
• develops synaptic knobs (or enlarges them if they already exist) in contact
• with the soma of the second cell.” (Hebb, 1949)
• In an associative neural net, if we compare two pattern components (e.g. pixels)
within many patterns and find that they are frequently in:
a) the same state, then the arc weight between their NN nodes should be +ve
b) different states, then the arc weight between their NN nodes shoud be -ve
The weights must store the average correlations between all pattern
components across all patterns. A net presented with a partial pattern can then
use the correlations to recreate the entire pattern.
Weights = Average Correlations
Quantative definition of Hebbian Learning Rule
• Auto-Association:
w jk  i pk i pj
* When the two components are the same (different), increase (decrease) the weight
Hetero-Association:
w jk  i pk o pj
i = input component
o = output component
Ideally, the weights will record the average correlations across all patterns:
P
Auto:
w jk   i pk i pj
p 1
P
Hetero:
w jk   i pk o pj
p 1
Hebbian Principle: If all the input patterns are known prior to retrieval time,
then init weights as:
Auto:
1 P
w jk   i pk i pj
P p 1
Hetero:
1 P
w jk   i pk o pj
P p 1
Architectures of AM NN
Associative memory NN
Static / Feed
Forward Systems
Auto
associative
Hetero
associative
Dynamic /Recurrent /
Iterative Systems i.e.
with feed back
Auto
associative
Hetero
associative
Mapping of Inputs to Outputs
Information Recording
Information Retrieval
• M is expressed as the
prototype vectors stored.
• Mapping: x →v
•Linear or nonlinear
• v = M [x]
• Input a key vector x and find a desired vector.
• M is matrix type operator
• v previously stored in the network.
Static vs. Dynamic
Static Memory
Recurrent Autoassociative Memory
The operator M2 operates at present instant k on the present input xk and output
vk to produce the o/p in the next instant k+1 Δ is a unit delay needed for cyclic
operation. The pattern is associated to itself (auto-)
Heteroassociative Memory Net
• Memory Association of pairs (x,v)This AM operates with a cycle of 2Δ.
• It associates pairs of vectors (x(i),v(i)).
Hop Field Model
(A recurrent auto associative network)
The input x0 is used to initialize v0, i.e. x0= v0, and the input is then
removed for the following evolution:
Operator M2 consists of multiplication by a weight matrix followed by the
ensemble of non-linear mapping operations vi = f(neti) performed by the
layer of neurons.
Hopfield Model
Hopfields AutoassociativeMemory
(1982,1984)
Distributed Representation
Info is stored as a pattern of activations/weights
Multiple info is imprinted on the same network
Content-addressable memory
Store patterns in a network by adjusting weights
To retrieve a pattern, specify a portion
Distributed, asynchronous control
Individual processing elements behave independently
Fault tolerance
Few processors can fail, and the network will still work
Active or Inactive
processing units are in one of two states
Units are connected with weighted, symmetric connections
Multiple-loop feedback system
with no self-feedback
X1
weight
1
+1
-1
+1
-1
X2
2
Example of Hopfield NN for
3 dimensional input data
-1
3
X3
-1
neuron
Attribute 3 of input (x1,x2,x3)
Execution:
Input pattern attributes are initial states of neurons
Repeatedly update state of neurons asynchronously until states do not change
Hopfields Autoassociative Memory
• Input vectors values are in {-1,1} (or {0,1}).
• The number of neurons is equal to the input dimension.
• Every neuron has a link from every other neuron (recurrent
• architecture) except itself (no self-feedback).
• The activation function used to update a neuron state is the
• sign function but if the input of the activation function is 0
then the new output (state) of the neuron is equal to the old one.
•
The weights are symmetric
w w
ij
ji
NN Training
1. Storage: . Let f1, f2, … , fM denote a known set of N-dimensional
fundamental memories. The weights of the network are:
w
ji
•
•
f
•
x (n)
i
 ,i

 1

  M


0

M
 f  ,i f  , j
 1
j i
j i
i-th component of the fundamental memory.
State of neuron i at time n.
• wji is the weight from neuron i to neuron j.
• The elements of the vector fμ are in {-1,+1}. Once they are computed, the
synaptic weights are kept fixed.
• N: input dimension ;
• M: number of patterns (called fundamental memories) are used to
compute the weights.
NN Training
• Each stored pattern (fundamental memory) establishes a
correlation between pairs of neurons: neurons tend to be
of the same sign or opposite sign according to their value
in the pattern.
• If wji is large, this expresses an expectation that neurons i
and j are positively correlated. If it is small (negative) this
indicates a negative correlation.
•
 wij xi x j
i, j
will thus be large for a state x equal to a
fundamental memory (since wij will be positive if the
product xi xj > 0 and negative if xi xj < 0).
• The negative of the sum will thus be small.
NN Execution
2. Initialization: Let x denote an input vector(
probe) presented to the network. The
algorithm is initialized by setting:
x (0) x
j
probe, j j = 1……N
x
probe, j
x (0)
j
is the j-th element of the probe vector x probe.
is the state of neuron j at time t =0
NN Execution
3. Iteration Until Convergence
Update the elements of network state vector
x(n) asynchronously (i.e. randomly and one at
the time) according to the rule:


N
x (t 1) sgn  j 1 w x (t )
j
ji i 

Repeat the iteration until the state vector x remains
unchanged.
NN Execution
• Outputting: Let x fixed denote the fixed point
(or stable state, that is such that x(t+1)=x(t))
computed at the end of step 3. The resulting
output y of the network is:
y x
fixed
Pictorial Execution of Hop field Net
3. Retrieval
1. Auto-Associative Patterns to Remember
1
2
1
2
3
4
3
4
Comp/Node value legend:
dark (blue) with x => +1
dark (red) w/o x => -1
light (green) => 0
2. Distributed Storage of All Patterns:
-1
1
2
1
3
4
• No. Of neuron = dimension of pattern
• Fully connected
• Weights = avg correlations across
all patterns of the corresponding units
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
Hopfield Network Example
1. Patterns to Remember
p1
p2
1
2
3
1
4
3
3. Build Network
p3
1
2
4
3
1/3
2
1
2
4
-1/3
-1/3
p1 p2 p3
Avg
1 1 -1
1 -1 -1
1/3
-1/3
1
2
3
4
W14 -1
1
1
1/3
W23 1
-1
1
1/3
W24 -1
1 -1
-1/3
W34 -1
-1 -1
-1
[+]
1/3
4
3
W12
W13
[-]
1/3
-1
4. Enter Test Pattern
1/3
+1
1/3
-1/3
-1/3
1/3
0
-1
-1
Hopfield Network Example (2)
5. Synchronous Iteration (update all nodes at once)
Inputs
Node
1
2
3
4
1
1
1/3
-1/3
1/3
2
0
0
0
0
From discrete output rule: sign(sum)
3
0
0
0
0
4
Output
-1/3
1
1/3
1
1
1
-1
-1
Values from Input Layer
1/3
p1
1/3
-1/3
=
-1/3
1/3
-1
Stable State
1
2
3
4
Matrices Computation
Goal: Set weights such that an input vector Vi, yields itself when
multiplied by the weights, W.
X = V1,V2..Vp,
where p = # input vectors (i.e., patterns)
So Y=X, and the Hebbian weight calculation is:
1
1 1 1 -1
1
X=
1 1 -1 1
XT=
1
-1 1 1 -1
-1
XTX =
3 1 -1 1
1 3 1 -1
-1 1 3 -3
1 -1 -3 3
W = XTY = XTX
1 -1
1 1
-1 1
1 -1
Common index = pattern #, so
this is correlation sum.
w2,4 = w4,2 = xT2,1x1,4 + xT2,2x2,4 + xT2,3x3,4
Matrices computation
• The upper and lower triangles of the product matrix represents the 6
weights wi,j = wj,i
• Scale the weights by dividing by p (i.e., averaging) .
• This produces the same weights as in the non-matrix description.
• Testing with input = ( 1 0 0 -1)
(1 0 0 -1)
3 1 -1 1
1 3 1 -1
-1 1 3 -3
= (2 2 2 -2)
1 -1 -3 3
Scaling by p = 3 and using 0 as a threshold gives:
(2/3 2/3 2/3 -2/3) => (1 1 1 -1)
Associative Retrieval = Search
p3
p1
p2
Back-propagation:
• Search in space of weight vectors to minimize output error
Associative Memory Retrieval:
• Search in space of node values to minimize conflicts between
a) node-value pairs and average correlations (weights), and
b) node values and their initial values.
• Input patterns are local (sometimes global) minima, but many
spurious patterns are also minima.
• High dependence upon initial pattern and update sequence (if asynchronous)
Energy Function
Energy of the associative memory should be low when pairs of node values
mirror the average correlations (i.e. weights) on the arcs that connect the node pair, and
when current node values equal their initial values (from the test pattern).
E  a  wkj x j xk  b I k xk
k
j
k
When pairs match
correlations,
wkjxjxk > 0
When current values match input values,
Ikxk > 0
Gradient Descent
A little math shows that asynchronous updates using the discrete rule:
n
x pk (t  1)  sgn(  wkj x pj (t )  I pk )
j 1
yield a gradient descent search along the energy landscape for the E defined above.
Storage Capacity of Hopfield Networks
Capacity = Relationship between # patterns that can be stored &
retrieved without error to the size of the network.
Capacity = # patterns / # nodes or # patterns / # weights
• If we use the following definition of 100% correct retrieval:
When any of the stored patterns is entered completely (no noise), then that
same pattern is returned by the network; i.e. The pattern is a stable attractor.
• A detailed proof shows that a Hopfield network of N nodes can
achieve 100% correct retrieval on P patterns if: P < N/(4*ln(N))
In general, as more patterns are added to a network,
the avg correlations will be less likely to match the
correlations in any particular pattern. Hence, the
likelihood of retrieval error will increase.
=> The key to perfect recall is selective ignorance!!
N Max P
10
1
100
5
1000
36
10000 271
1011
109
Things to Remember
• Auto-Associative -vs- Hetero-associative
 Wide variety of net topologies
 All use Hebbian Learning => weights ~ avg correlations
• One-shot -vs- Iterative Retrieval
 Iterative gives much better error correction.
• Asynchronous -vs- Synchronous state updates
 Synchronous updates can easily lead to oscillation
 Asynchronous updates can quickly find a local optima (attractor)
 Update order can determine attractor that is reached.
• Pattern Retrieval = Search in node-state space.
 Spurious patterns are hard to avoid, since many are attractors also.
 Stochasticity helps jiggle out of local minima.
 Memory load increase => recall error increase.
• Associative -vs- Feed-Forward Nets
 Assoc: Many - 1 mapping Feed-Forward: many-many mapping
 Backprop is resource-intensive, while Hopfield iterative update is O(n)
 Gradient-Descent on an Error -vs- Energy Landscape:
 Backprop => arc-weight space
Hopfield => node-state space