Intro to NN - NeuralNetworksClusterS12

Download Report

Transcript Intro to NN - NeuralNetworksClusterS12

4/11/2016
Neural Networks
1
Artificial Neural Network (ANN)
o Neural network -- “a machine that is designed to
model the way in which the brain performs a
particular task or function of interest” (Haykin,
1994, pg. 2).
– Uses massive interconnection of simple
computing cells (neurons or processing units).
– Acquires knowledge thru learning.
– Modify synaptic weights of network in orderly
fashion to attain desired design objective.
o Attempts to use ANNs since 1950’s.
– Abandoned by most by 1970’s.
4/11/2016
Neural Networks
2
Artificial Intelligence (AI)
o “A field of study that encompasses computational
techniques for performing tasks that apparently
require intelligence when performed by humans”
(Tanimoto, 1990).
– Goal to increase our understanding of
reasoning, learning, & perceptual processes.
o Knowledge representation.
o Search.
o Perception & inference.
4/11/2016
Neural Networks
Fundamental Issues
3
Traditional AI vs. Neural
Networks
Traditional AI:
o Programs brittle & overly
sensitive to noise.
o Programs either right or
fail completely.
– Human intelligence
much more flexible
(guessing).
o http://wwwai.ijs.si/eliza/eliza.html
4/11/2016
Neural Networks:
o Capture knowledge in large
# of fine-grained units.
o More potential for partially
matching noisy &
incomplete data.
o Knowledge is distributed
uniformly across network.
o Model for parallelism –
each neuron is independent
unit.
o Similar to human brains?
Neural Networks
4
Neural
Networks
Handwriting Neural Network
o http://www.youtube.com/watch?v=qXoVGxjUTtA
4/11/2016
Neural Networks
6
http://www.manifestation.com/ne
urotoys/eliza.php3
4/11/2016
Neural Networks
7
NETtalk (Sejnowski & Rosenberg)
o http://cnl.salk.edu/Media/nettalk.mp3.
4/11/2016
Neural Networks
8
Human Brain
o “… a highly complex, nonlinear, and parallel
computer (information-processing system). It has
the capability to organize its structural
constituents, know as neurons, so as to perform
certain computations (e.g., pattern recognition,
perception, and motor control) many times faster
than the fastest digital computer in existence
today.” (Haykin, 1999, Neural Networks: A
Comprehensive Foundation, pg. 1).
4/11/2016
Neural Networks
9
Approaches to Studying Brain
o Know enough neuroscience to understand why
computer models make certain approximations.
– Understand when approximations are good &
when bad.
o Know tools of formal analysis for models.
– Some simple mathematics.
– Access to simulator or ability to program.
o Know enough cognitive science to have some idea
of about what the system is supposed to do.
4/11/2016
Neural Networks
10
Why Build Models?
“… a model is simply a detailed theory.”
1. Explicitness – constructing model of theory & implementing
it as computer program requires great level of detail.
2. Prediction – difficult to predict consequences of model due
to interactions between different parts of model.
– Connectionist models are non-linear.
3. Discover & test new experiments & novel situations.
4. Practical reasons why difficult to test theory in real world.
– Systematically vary parameters thru full range of
possible values.
5. Help understand why a behavior might occur.
• Simulations open for direct inspections  explanation
of behavior.
4/11/2016
Neural Networks
11
Simulations As Experiments
o Easy to do simulations, but difficult to do them well.
o Running a good simulation like running good
experiment.
1. Clearly articulated problem (goal).
2. Well-defined hypothesis, design for testing
hypothesis, & plan how to the results.
– Hypothesis from current issues in literature.
– E.g., test predictions, replicate observed
behaviors, test theory of behavior.
3. Task, stimulus representations & network
architectures must be defined.
4/11/2016
Neural Networks
12
What kinds of problems can ANNs
help us understand?
o Brain of newborn child contains billions
of neurons
– But child can’t perform many
cognitive functions.
o After a few years of receiving
continuous streams of signals from
outside world via sensory systems,
– Child can see, understand language
& control movements of body.
o Brain discovers, without being taught,
how to make sense of signals from
world.
o How???
o Where do you start?
4/11/2016
Neural Networks
13
NN Applications
http://www-csfaculty.stanford.edu/~eroberts/courses/soco/projects/2000
-01/neural-networks/Applications/index.html
o
o
o
o
o
Character recognition
Image compression
Stock market prediction
Traveling salesman problem
Medicine, electronic noise, loan applications
4/11/2016
Neural Networks
14
Neural Networks (ACM)
o Web spam detection by probability mapping graphSOMs and graph
neural networks
o No-reference quality assessment of JPEG images by using CBP
neural networks
o An Embedded Fingerprints Classification System based on
Weightless Neural Networks
o Forecasting Portugal global load with artificial neural networks
o 2006 Special issue: Neural network forecasts of the tropical
Pacific sea surface temperatures
o Developmental learning of complex syntactical song in the
Bengalese finch: A neural network model
o Neural networks in astronomy
4/11/2016
Neural Networks
15
Artificial & Biological
Neural Networks
o Build intelligent programs using models that parallel
structure of neurons in human brain.
o Neurons – cell body with dendrites & axon.
– Dendrites receive signals from other neurons.
– When combined impulses exceed threshold,
neuron fires & impulse passes down axon.
– Branches at end of axon form synapses with
dendrites of other neurons.
• Excitatory or inhibitory.
4/11/2016
Neural Networks
16
Do Neural Networks Mimic
Human Brain?
o “It is not absolutely necessary to believe
that neural network models have anything
to do with the nervous system, …
o … but it helps.
o Because, if they do, we are able to use a
large body of ideas, experiments, and
facts from cognitive science and
neuroscience to design, construct, and
test networks.” (Anderson, 1997, p. 1)
4/11/2016
Neural Networks
17
Neural Networks Abstract From the
Details of Real Neurons
o Conductivity delays are neglected.
o Net input is calculated as weighted sum of input
signals.
o Net input is transformed into an output signal via a
simple function (e.g., a threshold function).
o Output signal is either discrete (e.g., 0 or 1) or it
is a real-valued number (e.g., between 0 and 1).
4/11/2016
Neural Networks
18
4/11/2016
Neural Networks
19
ANN Features
o A series of simple
computational elements, called
neurons (or nodes, units, cells)
o Connections between neurons
that carry signals
o Each link (connection) between
neurons has a weight that can
be modified
o Each neuron sums the
weighted input signals and
applies an activation function
to determine the output signal
(Fausett, 1994).
4/11/2016
Neural Networks
20
Neural Networks Are Composed
of Nodes & Connections
Fully
recurrent
network
3-layer feed
forward
network
4/11/2016
o Nodes – simple processing units.
– Similar to neurons – receive
inputs from other sources.
– Excitatory inputs tend to
increase neuron’s rate of
firing.
– Inhibitory inputs tend to
decrease neuron’s rate of
firing.
o Firing rate changes via realvalued number (activation).
o Input to node comes from other
nodes or from some external
source.
Neural Networks
21
Connections
o Input travels along connection lines.
o Connections between different nodes can
have different potency (connection
strength) in many models.
– Strength represented by real-valued
number (connection weight).
– Input from one node to another is
multiplied by connection weight.
o If connection weight is
– Negative number – input is inhibitory.
– Positive number – input is excitatory.
4/11/2016
Neural Networks
22
Nodes & Connections Form
Various Layers of NN
4/11/2016
Neural Networks
23
A Single Node/Neuron
Inputs from
other nodes
o
o
o
o

f(net)
Outputs to
other nodes
Inputs to node usually summed (  ).
Net input passed thru activation function ( f(net) ).
Produces node’s activation which is sent to other nodes.
Each input line (connection) represents flow of activity from
some other neuron or some external source.
4/11/2016
Neural Networks
24
More Complex Model of a Neuron
I
n
p
u
t
s
i
g
n
a
l
s
x1
x2
wk1
Linear
Combiner
Output
wk2

uk
Activation
Function
Output
(-)
yk
…
…
xp
4/11/2016
wkp
Summing
function
Synaptic weights
of neuron
k
Threshold
Neural Networks
25
Add up Net Inputs to Node
o
Each input (from different nodes) is calculated by
multiplying activation value of input node by weight on
connection (from input node to receiving node).
neti = 
j
wijaj
Net input to node i
o  = sigma (summation)
o
o
o
i = receiving node
aj = activation on nodes sending to node i
wij = weights on connection between nodes j & i.
4/11/2016
Neural Networks
26
Sums (weights * activation) For
All Input Nodes
neti = 
4
0
4/11/2016
1
2
j
wijaj
o i = 4 (node 4).
o j = 3 (3 input nodes
into node 4).
o add up wij * ai for all 3
input nodes.
Neural Networks
27
Activation Functions : Node Can Do
Several Things With Net Input
1. Activation (e.g., output) = Input.
Real
neurons
• (f(net)) is Identity function.
• Simplest case.
2. Threshold must be achieved before activation
occurs.
1. Activation function may be non-linear function of
input. Resembles sigmoid.
2. Activation function may be linear.
4/11/2016
Neural Networks
28
Different Types of NN Possible
1. Single layer or multi-layer architectures (Hopfield, Kohonen).
2. Data processing. thru network.
o Feedforward.
o Recurrent.
3. Variations in nodes.
o Number of nodes.
o Types of connections among nodes in network.
4. Learning algorithms.
o Supervised.
o Unsupervised (self-organizing).
o Back propagation learning (training).
5. Implementation.
– Software or hardware.
4/11/2016
Neural Networks
29
4/11/2016
Neural Networks
30
Steps in Designing a Neural
Network
1. Arrange neurons in various layers.
2. Decide type of connections among neurons for
different layers, as well as among neurons within
layer.
3. Decide way a neuron receives input & produces
output.
4. Determine strength of connection within network
by allowing network to learn appropriate values of
connection weights via training data set.
4/11/2016
Neural Networks
31
Activation Functions
1. Identity function: f(x) = x for all x
2. Binary step function: f(x) = 1 if x >= θ; f(x) = 0 if
x<θ
3. Continuous log-sigmoid function (Logistic
function): f(x) = 1/[1 + exp(-σx)]
4/11/2016
Neural Networks
32
Sigmoid Activation Function
– a i = activation (output) of node i
– net i = net activation flowing into node i
– e = exponential
o What output of node will be for any given net input.
o Graph of relationship (next slide).
4/11/2016
Neural Networks
33
Sigmoid Activation Function Often Used
for Nodes in NN
o For wide range of inputs (> 4.0 & <
-4.0), nodes exhibit all or nothing.
– Output max. value of 1(on).
– Output min. value of 0 (off).
nothing or all
4/11/2016
o Within range of –4.0 to 4.0, nodes
show greater sensitivity.
– Output capable of making fine
discriminations between
different inputs.
o Non-linear response is at heart of
what makes these networks
interesting.
34
o What will be the
Negative
activation of node 2,
assuming the input you inputs
just calculated?
o If node 2 receives
input of 1.25,
activation of 0.777.
o Activation function
scales from 0.0 to 1.0.
o When net input = 0.0,
net output is exact
mid-range of possible
activation (0.5).
4/11/2016
Neural Networks
35
Example 2-Layered Feedforward
Network : Step Thru Process
o Neural network consists of
collection of nodes.
– Number & arrangement of nodes
defines network architecture.
o Example 2-layered feedforward.
– 2 layers (input, output).
– no intra-level connections.
– no recurrent connections.
– single connection into input
nodes & out of output nodes.
o Very simplified in comparison to
biological neural network!
4/11/2016
Neural Networks
Output
nodes
a2
w20
Input
nodes
a0
w21
a1
2-layered
feedforward network
36
a2
w20
a0
w21
a1
wij = 20 when i
=0&j=2
4/11/2016
o Each input node has certain level of
activity associated with it.
– 2 input nodes (a0, a1).
– 2 output nodes (a2, a3).
o Look at one output unit (a2).
– Receives input from a0 & a1 via
independent connections.
– Amount depends on activation
values of input nodes (a0 & a1) and
weights (w20, w21).
o For this network, activity flows in 1
direction along connections.
– e.g., w20  w02
– w02 doesn’t exist
o Total input to node 2 (a2) = w20a0 +
w21a1.
Neural Networks
37
Exercise 1.1
o What is the input received by node 2?
o Net input for node 2 = (1.0 * 0.75)
+ (1.0 * 0.5) = 1.25
a2
0.75
0.5
1
1
4/11/2016
o Net input alone doesn’t determine
activity of output node.
o Must know activation function of node.
o Assume nodes have activation
functions shown in EQ 1.2 (& Fig. 1.3).
o Next slide shows sample inputs &
activations produced - assuming
logistic activation function.
Neural Networks
38
Bias Node (Default Activation)
o In absence of any input (i.e. input = 0.), nodes have
output of 0.5.
o Useful to allow nodes to have default activation.
– Node is “off” (output 0.0) in absence of input.
– Or can have default state where node is “on”.
o Accomplish this by adding node to network which
receives no inputs, but is always fully activated &
outputs 1.0 (bias node).
– Node can be connected to any node in network.
– Often connected to all nodes except input nodes.
– Allow weights on connections from this node to
receiving nodes to
beNetworks
different.
4/11/2016
Neural
39
o Guarantees that all receiving nodes have some input
even if all other nodes are off.
o Since output of bias node is always 1.0, input it
sends to any other node is 1.0 * wij (value of weight
itself).
o Only need one bias node per network.
o Similar to giving each node a variable threshold.
– large negative bias == node is off (activation
close to 0.0) unless gets sufficient positive input
from other sources to compensate.
– large positive bias == receiving node is on &
requires negative input from other nodes to turn
it off.
o Useful to allow individual nodes to have different
defaults.
4/11/2016
Neural Networks
40
Learning From Experience
• Changing of neural networks connection weights
(training) causes network to learn solution to a
problem.
• Strength of connection between neurons
stored as weight-value for specific connection.
• System learns new knowledge by adjusting
these connection weights.
4/11/2016
Neural Networks
41
Three Training Methods for NN
1. Unsupervised learning – hidden neurons must
find a way to organize themselves without help
from outside.
• No sample outputs provided to network against
which it can measure its predictive
performance for given vector of inputs.
• Learning by doing.
4/11/2016
42
2. Supervised Learning
(Reinforcement)
o works on reinforcement from outside.
• Connections among neurons in hidden layer
randomly arranged, then reshuffled as
network told how close it is to solution.
• Requires teacher -- training set of data or
observer who grades performance of network
results.
• Both unsupervised & supervised suffer from
relative slowness & inefficiency relying on
random shuffling to find proper connection
weights.
4/11/2016
Neural Networks
43
3. Back Propagation
o Network given reinforcement for how it is doing
on task plus information about errors is used to
adjust connections between layers.
– Proven highly successful in training of
multilayered neural nets.
– Form of supervised learning.
4/11/2016
Neural Networks
44
Example Learning Algorithms
1. Hebb’s Rule -- how physical networks might learn.
2. Perceptron Convergence Procedures (PCP).
– Widrow-Hoff Learning Rule (1960s).
3. Hopfield.
4. Backpropagation of Error (Generalized Delta
Rule).
5. Kohonen’s Learning Laws (not covered here).
4/11/2016
Neural Networks
45
McCulloch-Pitts (1943) Neuron
1. Activity of neuron is an “all-or-none” process.
2. Certain fixed number of synapses must be excited within
period of latent addition to excite neuron at any time.
o Number is independent of previous activity & position
of neuron.
3. Only significant delay within nervous system is synaptic
delay.
4. Activity of any inhibitory synapse absolutely prevents
excitation of neuron at that time.
5. Structure of net does not change with time.
4/11/2016
Neural Networks
46
McColloch-Pitts Neuron
o Firing within a neuron
is controlled by a
fixed threshold (θ).
o binary step function:
f(x) = 1 if x >= θ; f(x) =
0 if x < θ.
o What happens here if
θ = 2?
4/11/2016
Neural Networks
47
McColloch-Pitts Neuron AND
Threshold = 2
Does a2 fire?
P
Q
T
T
F
F
T
F
T
F
4/11/2016
P ^ Q
(P and Q)
T
F
F
F
Neural Networks
48
McColloch-Pitts Neuron OR
Threshold = 2
Does a2 fire?
P
Q
T
T
F
F
T
F
T
F
4/11/2016
P V Q
(P or Q)
T
T
T
F
Neural Networks
49
McColloch-Pitts Neuron XOR
Threshold = 2
Does a2 fire?
P
Q
T
T
F
F
T
F
T
F
4/11/2016
P XOR Q
(P xor Q)
F
T
T
F
Neural Networks
50
McColloch-Pitts Neuron AND
NOT
o Did you get weights
of 2 for w20 and -1
for w21?
4/11/2016
Neural Networks
51
McColloch-Pitts Neuron
o http://lcn.epfl.ch/tutorial/english/mcpits/html/in
dex.html
o No learning algorithms
4/11/2016
Neural Networks
52
Hebb : The Organization of
Behavior (1949)
o When an axon of cell A is near enough to excite a cell B &
repeatedly or persistently takes part in firing it, some
growth process or metabolic change takes place in one or
both cells such that A’s efficiency, as one of the cells firing
B, is increased.”
o If neuron receives input from another neuron & if both
highly active, weight between neurons should be
strengthened.
– Specific synaptic change (Hebb synapse) which underlies
learning.
o Result was interconnections between large, diffuse set of
cells, in different parts of brain called “cell assemblies.”
o Changes suggested by Rochester et al. (1956) make more
practical model.
4/11/2016
Neural Networks
53
Hebb’s Rule: Associative learning
“Cells that fire together, wire together”
o wij = ai aj
– where change in weight = product of activations of nodes
that are connected to it.
o wij = ηai aj
– where η is the learning rate
o Unsupervised learning
o Success at learning some patterns
– it only learns these patterns (e.g., pair-wise correlations).
There will be times when want ANN to learn to associate
a pattern with some desired behaviors even when there
is no pair-wise correlation
4/11/2016
Neural Networks
54
Pros & Cons of Hebbian Learning
o Known biological
mechanisms that might use
Hebbian Learning.
o Provides reasonable answer
to “where does teacher
info for learning process
come from?”
– Lots of useful info in
correlated activity.
– System just needs to
look for patterns.
4/11/2016
o All it can learn is pair-wise
correlations.
o May need to learn to
associate patterns with
desired behaviors even if
patterns aren’t pair-wise.
– Hebb rule can’t do this.
CS 271 Ch. 4
55
Perceptron Convergence Procedures
(PCP)
o Variations of Hebb’s Rule from 1960s.
– Perceptron (Rosenblatt, 1958).
– Widrow-Hoff rule is similar to PCP (1960).
o Start with network of units with connections initialized with
random weights.
o Take target set of input/output patterns & adjust weights
automatically so at end of training weights yield correct
outputs for any input.
– Network should generalize to produce correct output for
input patterns it hasn’t seen during training.
o gradient descent rule, Delta rule or Adaline rule
4/11/2016
CS 271 Ch. 4
56
http://lcn.epfl.ch/tutorial/english/perceptron/html/index.html
4/11/2016
CS 271 Ch. 4
57
Widrow-Hoff Rule
o starts with connections
initialized with random
weights and one input
pattern is presented to the
network.
o For each input pattern, the
network’s actual output is
compared to the target
output for that pattern.
Figure 18: Supervised (Delta Rule) vs.
Unsupervised (Perceptron) Learning
(www.willamette.edu/~gorr/classes/cs449/
Classification/delta.html)
4/11/2016
Neural Networks
58
o Any discrepancy (error) used as basis for changing weights
on input connections & changing output node’s threshold for
activation.
o How much weights are changed depends on error produced &
activation from given input.
– Correction is proportional to error signal multiplied by
value of activation given by derivative of transfer
function.
– Using derivative allows making finely tuned corrections
when activation is near its extreme values (minimum or
maximum) & larger corrections when activation is in
middle range.
o Goal of the Widrow-Hoff Rule is to minimize error on output
unit by apportioning credit & blame to the input nodes.
o Only works for simple, 2-layer networks (I/O units).
4/11/2016
Neural Networks
59
Using Similarity
o Basic principle that drives learning
o Allows generalization of behaviors because similar inputs tend
to yield similar outputs.
o 11110000 vs. 11110001
o “make” and “bake”  “made” and “baked”
o Cats and tigers
o Similarity is generally a good rule of thumb, but not in every
case.
o Hebbian networks & basic, 2-layer PCP networks can only learn
to generalize on basis of physical similarity
4/11/2016
Neural Networks
60
2-layer Perceptron Can’t Solve
Problem of Boolean XOR
o If want output to be true (1).
– At least 1 input must be 1 & at
least 1 weight must be large
enough so when multiplied, output
node turns on.
o For patterns (00 & 11) want 0 so set
weights to 0.
o For patterns (01 & 10), need weights
from either input large enough so 1
input alone activates output.
o Contradictory requirements -- no set
of weights allows output to come on
if either input on & keeps it off if
both are on!
4/11/2016
CS 271 Ch. 4
Node 0
Node 1
XOR
0
0
0
1
0
1
0
1
1
1
1
0
a
w2
2
w21
0
a0
a1
61
Vectors
•
1,1
0,0
•
•
1,0
0
1
1
•
0
0,1
o Vector -- collection of numbers or point in space.
o Can think of inputs in XOR example as 2-D space.
– With each number indicating how far out along
the dimension the point is located.
o Judge similarity of 2 vectors by Euclidean distance
in space.
– Pairs of patterns furthest apart & most dissimilar
(00 & 11) are ones need to group together for
XOR function.
4/11/2016
CS 271 Ch. 4
62
1
•
0,1
•
1,1
0
0,0
•
•
1,0
0
1
AND
1
•
0,1
•
1,1
0
0,0
•
•
1,0
0
1
OR
1
•
0,1
•
1,1
0
0,0
•
•
1,0
0
1
XOR
o I/O weights impose linear decision bound on input space.
– Patterns which fall on 1 side of decision line classified
differently than patterns on other side.
o When groups of inputs can’t be separated by line, no way
for unit to discriminate between categories.
– Problems called non-linearly separable.
o What’s needed are hidden units & learning algorithms that
can handle more than one layer.
4/11/2016
CS 271 Ch. 4
63
Solving the XOR Problem : Allow
Internal Representation
o Add extra node(s) between I &  XOR problem
solved.
o “Hidden” units equivalent to internal
representations & aren’t seen by world.
– Very powerful -- networks have internal representations
that capture more abstract, functional relationships.
o Inputs (sensors), outputs (motor effectors) &
hidden (inter-neurons).
o Input similarity still important .
– All things being equal, physical resemblance of inputs
exerts strong pressure to induce similar responses.
CS 271 Ch. 4
64
Hidden Units & XOR Problem
0,1
1
•
0
0,0
•
•
•
1,1
1,0
0
1
Input 1 (a)
1
0
•
0,1
•
•
1,1 0,0
1,0
0
1
Hidden unit 1 (b)
1,0
1,1
0,1
0,0
Output (c)
o (a) what input looks like to network showing intrinsic similarity
structure of inputs.
o Input vectors are passed through weights between inputs & hidden
units (multiplied); transforms (folds) input space to produce (b).
o (b) 2 most distinct patterns (11, 00) are close in hidden space.
o Weights to output unit can impose linear decision bound &
classify output (c).
CS 271 Ch. 4
65
Hidden Units Used to Construct Internal
Representations of External World
o Hidden units make it possible for network to treat
physically similar inputs as different, as needed.
– Transform input representations to more
abstract kinds of representations.
– Solve difficult problems like XOR.
o However, being able to solve problem, just means
that some set of weights exist -- in principle.
– Network must be able to learn these weights!
o Real challenge is how to train networks!
– One solution -- backpropagation of error.
4/11/2016
CS 271 Ch. 4
66
Earlier Laws (PCP) Can’t Handle Hidden Layers
Since Don’t Know How to Change Weights To Them
o PCP & others work well for weights leading to
outputs since have target for Output & can
calculate weight changes.
o Problem occurs when have hidden units -- how to
change weights from inputs to hidden units?
– With these algorithms must know how much
error is already apparent at level of Hidden
before Output is activated.
– Don’t have predefined target for H, so can’t
say what their activation levels should be.
– Can’t specify error at this level of network.
4/11/2016
CS 271 Ch. 4
67
Hopfield
o Recurrent ANN
o They are guaranteed
to converge to a local
minimum, but
convergence to one of
the stored patterns is
not guaranteed
o http://www.cbu.edu/~
pong/ai/hopfield/hopfi
eldapplet.html
4/11/2016
Neural Networks
68
Backpropagation of Error
AKA Generalized Delta Rule. (δ)
(Rummelhart, Hinton & Williams, 1986)
o Begin with network which has been assigned initial
weights drawn at random.
– Usually from uniform distribution with mean of 0.0 &
some user-defined upper & lower bounds ( ±1.0).
o User has set of training data in form of
input/output pairs.
o Goal of training -- learn single set of weights such
that any input pattern will produce correct output
pattern.
– Desired if weights allow network to generalize to novel
data not seen during training.
4/11/2016
Neural Networks
69
Backprop
o Extremely powerful learning tool.
– Applied over wide range of domains.
o Provides very general framework for learning.
– Implements gradient descent search in space of
possible network weights to minimize network
error.
o What counts as error is up to modeler.
– Usually squared difference between target &
actual output, but any quantity that is affected
by weights may be minimized.
4/11/2016
Neural Networks
70
Backprop Training Takes 4 Steps
1. Select I/O pattern (usually at random).
2. Compare network’s output with desired output
(teacher pattern) on node-by-node basis &
calculate error for each output node.
3. Propagate error info backwards in network from
output to hidden.
4. Adjust weights on connections to reduce errors.
4/11/2016
CS 271 Ch. 4
71
1. Select I/O pattern
o Pattern usually selected at random.
o Input pattern used to activate network &
activation values for output nodes are calculated.
o Can have additional nodes between I/O (“hidden”).
o Since weights selected at random, outputs
generated at start are typically not those that go
with input pattern.
4/11/2016
CS 271 Ch. 4
72
2: Calculate Delta (
(EQ 1.3)

ip
ip
) Error
= (tip - oip) f’(netip) = (tip - oip) o
ip
(1-oip)
o ( ip ) = difference in value between target for
node i on training pattern p (target ip) and
o actual output for that node on that pattern (oip)
o multiplied by derivative of output node’s
activation function given its input.
– f’(net ip) = slope of activation function.
– EQ 1.2, Fig. 1.3 -- steepest around middle of
function where net input closest to 0.
4/11/2016
CS 271 Ch. 4
73
o For large values of net input to node (+ & -),
derivative is small.
– ( ip ) will be small.
– Net input to node tends to be large when
connections feeding into it are strong.
o Weak connections tend to yield small input to
node.
– Activation function is large & ( ip ) can be
large.
4/11/2016
CS 271 Ch. 4
74
Weight Changes in the Delta
Rule
Error
Weight y
Ideal weight
vector
Current weight vector
Delta vector
New weight vector
Weight x
4/11/2016
CS 271 Ch. 4
75
Gradient Descent Learning Rule
o Moves weight vector from current position on bowl
to new position closer to minimum error by falling
down the negative gradient of the bowl.
o Not guaranteed to find correct answer.
– Always goes down hill & may get stuck in local
minimum.
o Use momentum to “push” changes in same
direction & possibly keep network from getting
stuck.
4/11/2016
CS 271 Ch. 4
76
Backprop: Calculate Weight
Adjustments
o Know, for each output node, how far off target
value is.
o Must adjust weights on connections that feed into
it to reduce error.
– Want to change weight on connections from
every nodej coming into current node i so that
can reduce error on pattern.
error
weights
Learning rate
77
o Partial derivative – rate of change.
– May be other variables, but they’re
being held constant.
o Measures how quantity on top changes
when quantity on bottom is changed.
– i.e., how is error (E) affected by
changing weights (w)?
o If know this, know how to change weight to
decrease error.
– i.e., to decrease discrepancy between
what network outputs & what we want it
to output.
4/11/2016
CS 271 Ch. 4
78
o Partial derivative is bell shaped for
sigmoidal curves (threshold function).
– Large values are in the mid-range.
o Contributes to stability of network – as
outputs approach 0 or 1, only small changes
occur.
o Helps compensate for excessive blame
attached to hidden nodes.
o () = Learning Rate.
o Convert partial derivative in EQ 1.4 to EQ
1.5.
4/11/2016
CS 271 Ch. 4
79
Backprop: Delta Rule (EQ 1.5)
wij =   ip ojp
o Make changes small -- learning rate () set to less
than 1.0 so that changes aren’t too drastic.
– Change in weight depends on error have for
unit ( ip ).
o Take output into account (ojp) since node’s error
is related to how much (mis)information it has
received from another node.
• If node is highly active & contributed lots to
current activation, then responsible for much
of current error.
• If node inactive to unit i, won’t contribute to
i’s error.
4/11/2016
CS 271 Ch. 4
80
Delta Rule continued
o  ip reflects error on unit i for input pattern p.
– Difference between target & output.
– Also includes partial derivative (EQ 1.4).
o Calculate errors on all output nodes & weight
changes on connections coming into them.
– Don’t yet make any changes.
wij =  
4/11/2016
CS 271 Ch. 4
ip ojp
81
3. Propagate error info backwards
from output to hidden
–
Assume shared blame of hidden unit on basis of:
–
–
What errors on O unit H unit is activating and
Strength of connection between H & each O it connects
to.
Move to hidden layer(s), if any, & use EQ 1.5 to
change weights leading into hidden units from
below.
–
–
–
4/11/2016
Can’t use EQ 1.3 to compute H nodes’ errors since no
given target to make comparison with.
H nodes “inherit” errors of all nodes they’ve activated.
If nodes activated by H unit have large errors, then H
unit shares blame.
CS 271 Ch. 4
82
o Calculate error by summing up errors of nodes it
activates multiplied by weight between nodes
since it will have effect.
– i = hidden node
– p = current pattern
– k indexes output node feeding back to hidden
node.
– derivative of hidden unit’s activation function
multiplied in.
o Continues iteratively down thru network
(backpropagation of error)…
4/11/2016
CS 271 Ch. 4
83
4. Adjust weights on connections to
reduce errors
o When reach layer above
input layer (no incoming
weights), actually impose
the weight changes.
Error
flow
4/11/2016
CS 271 Ch. 4
84
Backprop Pros & Cons
o Extremely powerful learning
o Requires large #
tool that is applied over wide
presentations of input dat
range of domains.
to learn.
o Provides very general
o Each presentation requires
framework for learning.
2 passes thru network
– Implements gradient
(forward & backward).
descent search.
o Each pass is complex
o What counts as error is up to
computationally.
modeler.
– Usually squared
difference between target
& actual output.
– Any quantity that is
affected by weights may
be minimized.
4/11/2016
CS 271 Ch. 4
85
4/11/2016
Neural Networks
86
Kohonen
4/11/2016
Neural Networks
87
3 Ways Developmental
Models Handle Change
1. Development results from working out
predetermined behaviors. Change is the
triggering of innate knowledge.
2. Change is inductive learning. Learning involves
copying or internalizing behaviors present in the
environment.
3. Change arises through interaction of maturational
factors, under genetic control, and environment.
• Progress in neurosciences.
• Computational framework good for exploring &
modeling.
4/11/2016
Neural Networks
88
Biologically-Oriented Connectionism
(Elman, et al)
1.
We think it is critical to pay attention to what is known
about genetic basis for behavior & about developmental
neuroscience.
2. At level of computational & modeling, believe it is important
to understand sorts of computations that can plausibly be
carried out in neural systems.
3. We take a broad view of biology which includes concern for
evolutionary basis for behavior.
4. A broader biological perspective emphasizes adaptive
aspects of behaviors & recognizes that to understand
adaptation requires attention to environment.
4/11/2016
Neural Networks
89
Connectionist Models
o Cognitive functions performed by system that
computes with simple neuron-like elements, acting
in parallel, on distributed representations.
1. Have precisely matched data from human subject
experiments.
– Measure speed of reading words – depends on
frequency of word & regularity of
pronunciation pattern. (E.g., GAVE, HAVE).
•
•
4/11/2016
Similar pattern (humans – latency, NN – errors).
Fig. P.1 on pg. 3 (McLeod, Plunkett, Rolls)
Neural Networks
90
4/11/2016
Neural Networks
91
2. Connectionist models can predict results.
– Suggest areas of investigation
– E.g., U-shape learning or Over-generalization
problems when kids learn past tense of verbs
(WENT – GOED) suggests linguistic development
occurs in stages.
– NN model produced over-regularization errors.
– Fig. P.2. (McLeod, Plunkett, Rolls)
4/11/2016
Neural Networks
92
4/11/2016
Neural Networks
93
3. Connectionist models have suggested solutions to
some of the oldest problems in cognitive science.
• E.g., face recognition from various angles.
• View invariance – respond to one particular face
(regardless of view) & not the other faces.
• E.g., face 3 in Fig. P.3. (McLeod, Plunkett, Rolls)
4/11/2016
Neural Networks
94
4/11/2016
Neural Networks
95
4/11/2016
Neural Networks
96
4/11/2016
Neural Networks
97
Task
o When train network, want it to produce some
behavior.
o Task – behavior that are training network to do.
– E.g., associate present tense form of verb with
past tense form.
o Task must be precisely defined – for class of
networks we’re dealing with – learning correct
output for a given input.
Training
– Set of input stimuli.
Environment
– Correct output is paired with each input.
4/11/2016
Neural Networks
98
Implications of Defining the Task
o Must conceptualize behavior in terms of inputs &
outputs.
– May need abstract notion of input & output.
– E.g., associate 2 forms of verb – neither is really
input for other.
o Teach network task by example, not by explicit rule.
– If successful, network learns underlying
relationship between input & output by induction.
– Can’t assume network has learned generalization
we assume underlies behavior – may have learned
some other behavior!
Eg., tanks.
4/11/2016
Neural Networks
99
1980s Pentagon trained NN to
recognize tanks
4/11/2016
Neural Networks
100
Implications - 2
o Nature of training data is extremely important for
learning.
– The more data you give a network, the better.
– With too little data, may make bad generalization.
– Quality counts too!! – structure of environment
influences outcome.
o Some tasks more convincing/more effective/more
informative than others to demonstrate a point.
– Is info represented in teacher (output) plausibly
available to human learners?
– E.g., children? See task on next slide.
4/11/2016
Neural Networks
101
Two Ways to Teach Network to
Segment Sounds into Words
1.
Expose network to sequences of sounds (present one at time, in
order, with no breaks between words).
• Train network to produce “yes” when sequence makes word.
• Explicitly learns about words from info where words start.
2. Train network on different task – given same sequences of
sounds as input, but task is to predict next sound.
• At beginning of word, network makes many mistakes.
• As it hears more of word, prediction error declines until
end of word.
• Learns about words implicitly as indirect consequence of
task.
o First approach -- gives away secret by directly teaching task
(boundary info) which is NOT how children learn.
4/11/2016
Neural Networks
102
Network Architectures : Number &
Arrangement of Nodes in Network
1. Single-layer feedforward networks -input layer that projects onto output
layer of neurons in one direction.
2. Multilayer feedforward network -- has
1+ hidden layers that intervene between
external input & network output.
4/11/2016
Neural Networks
103
Network Architectures : Number &
Arrangement of Nodes in Network
3. Recurrent network -- has at least 1
feedback loop.
4. Lattice structure -- 1-D, 2-D or
greater arrays of neurons with output
neurons arranged in rows & columns.
4/11/2016
Neural Networks
104
Most Neural Networks
Consists of 3 Layers
4/11/2016
Neural Networks
105
6 Different Types of Connections Used
Between Layers (Inter-layer Connections)
1. Fully connected. Each neuron on first
layer is connected to every neuron on
second layer.
2. Partially connected. Neuron of first
layer does not have to be connected to
all neurons on second layer.
3. Feed forward. Neurons on first layer
send their output to neurons on second
layer, but receive no input back from
neurons on second layer.
4/11/2016
Neural Networks
106
4. Bi-directional (recurrent). .Another set
of connections carrying output of
neurons of second layer into neurons of
first layer.
5. Hierarchical. Neurons of lower layer
may only communicate with neurons on
next level of layer.
6. Resonance.Layers have bi-directional
connections.
– Can continue sending messages across
connections number of times until
certain condition is achieved.
4/11/2016
Neural Networks
107
How to Select Correct Network
Architectures
o Any task can be solved by some neural network (in
theory) – not any neural network can solve any task.
o Number & arrangement of nodes defines network
architecture.
o Textbook uses: 1) feedforward.
2) simple recurrent networks.
o # nodes depends on task & how I/O are represented.
– E.g., if images input in 100x100 dot array -- 10,000 I nodes.
o Selection of architecture reflects modeler’s theory
about what info processing is required for task.
4/11/2016
Neural Networks
108
Analysis
1. Train network on task.
2. Evaluate network’s performance & try to
understand basis for performance.
o Need to anticipate kinds of tests before training!
Ways to evaluate network performance:
1. Global error.
2. Individual pattern error.
3. Analyzing weights & internal representations.
4/11/2016
Neural Networks
109
Evaluate Network Performance:
Global Error
o During training, simulator calculates discrepancy
between actual network output activations &
target activations it is being taught to produce.
o Simulator reports this error on-line -- sum it over
number of patterns.
– As learning occurs, error should decline & reach
0.
o If network is trained on task in which same input
can produce different outputs, then network can
learn correct probabilities, but error rate never
reaches 0.
4/11/2016
Neural Networks
110
Evaluate Network Performance:
Individual Pattern Error
o Global error can be misleading.
– If have large # of patterns to learn, global
error may be low even if some patterns are not
learned correctly.
– These may be the interesting patterns.
o Also may want to create special test stimuli not
presented to network during training.
– Generalize to novel cases?
– What has network learned?
o Helps discover what generalizations have been
created from a finite data set.
4/11/2016
Neural Networks
111
Evaluate Network Performance: Analyzing
Weights & Internal Representations
1. Hierarchical clustering of hidden unit activations.
2. Principal component analysis & projection pursuit.
3. Activation patterns in conjunction with actual
weights.
4/11/2016
Neural Networks
112
Hierarchical Clustering of Hidden
Unit Activations
o Present test patterns to network after training.
o Patterns produce activations on hidden units
which record & tag -- vectors in multi-dimensional
space.
o Clustering looks at similarity structure of space.
o Inputs treated as similar by network produce
internal representations that are similar.
o Produces tree format of inner-pattern distance.
o Can’t examine space directly -- difficult to
visualize high-dimensional spaces.
4/11/2016
Neural Networks
113
Principal Component Analysis &
Project Pursuit
o Used to identify interesting lower-dimensional
slices from hierarchical clustering.
o Move viewing perspective around in this space.
4/11/2016
Neural Networks
114
Activation Patterns in Conjunction
With Actual Weights
o When look at activation patterns, only look at part
of what network “knows.”
o Network manipulates & transforms info via
connections between nodes.
o Examine connections & weights to see how
transformations are being carried out.
o Hinton diagrams can be used -- weights shown as
colored squares with color & size of square
representing magnitude & sign of connection.
4/11/2016
Neural Networks
115
4/11/2016
Neural Networks
116
Hinton Diagram.
White = positive weight. Black = negative weight.
Area of box proportional to absolute value of corresponding
weight.
4/11/2016
Neural Networks
117
What Do We Learn From a
Simulation?
o Are the simulations framed in such way that
clearly address some issue?
o Are the task & stimuli appropriate for points being
made?
o Do you feel you’ve learned something from the
simulation?
4/11/2016
Neural Networks
118
Uses of Neural Networks
o Prediction -- Use input values to predict some output. E.g. pick
best stocks, predict weather, identify cancer risk people.
o Classification -- Use input values to determine classification.
E.g. is input letter A; is blob of video data a plane & what kind?
o Data association -- Recognize data that contains errors. E.g.
identify characters when scanner is not working properly.
o Data Conceptualization -- Analyze inputs so that grouping
relationships can be inferred. E.g. extract from database
names most likely to buy product.
o Data Filtering -- Smooth an input signal. E.g. take the noise
out of a telephone signal.
4/11/2016
Neural Networks
119
Send In The Robots
http://www.spacedaily.com/news/robot-01b.html
by Annie Strickler and Patrick Barry for NASA Science News
Pasadena - May 29, 2001
o As a project scientist specializing in artificial intelligence at
NASA's Jet Propulsion Laboratory (JPL), Ayanna is part of a team
that applies creative energy to a new generation of space missions - planetary and moon surface explorations led by autonomous
robots capable of "thinking" for themselves.
o Nearly all of today's robotic space probes are inflexible in how
they respond to the challenges they encounter (one notable
exception is Deep Space 1, which employs artificial intelligence
technologies). They can only perform actions that are explicitly
written into their software or radioed from a human controller on
Earth.
o When exploring unfamiliar planets millions of miles from Earth, this
"obedient dog" variety of robot requires constant attention from
humans. In contrast, the ultimate goal for Ayanna and her
colleagues is "putting a robot on Mars and walking away, leaving it
to work without direct human interaction."
4/11/2016
Neural Networks
120
o "We want to tell the robot to think about any obstacle it
encounters just as an astronaut in the same situation would
do," she says. "Our job is to help the robot think in more
logical terms about turning left or right, not just by how
many degrees." …
o To do this, Ayanna rely on 2 concepts in field of artificial
intelligence: "fuzzy logic" & "neural networks." …
o Neural networks also have ability to learn from experience.
This shouldn't be too surprising, since design of neural
networks mimics way brain cells process information.
o "Neural networks allow you to associate general input to a
specific output," Ayanna says. "When someone sees four legs
and hears a bark (the input), their experience lets them
know it is a dog (the output)." This feature of neural
networks will allow a robot pioneer to choose behaviors
based on the general features of its surroundings, much like
humans do. “
4/11/2016
Neural Networks
121
o By combining these two technologies, Ayanna and her
colleagues at JPL hope to create a robot "brain" that can
learn on its own how to expertly traverse the alien terrains
of other planets.
o Such a brainy 'bot might sound more like the science fiction
fantasies of children's comics than a real NASA project, but
Ayanna thinks the sci-fi flavor of the project contributes to
its importance for space exploration.
o Ayanna -- who wanted to be television's "Bionic Woman"
when she was young, and later decided she wanted to try to
build her instead -- says she believes that the flights of
imagination common in childhood translate into adult
scientific achievement.
o "I truly believe science fiction drives real science forward,"
she says. "You must have imagination to go to the next level."
4/11/2016
Neural Networks
122
Learning to Use tlearn
o Define task.
o Define architecture.
o Setting up simulator.
– Configuration (.cf)
file.
– Data (.data) file .
– Teach (.teach) file.
o Check architecture.
4/11/2016
o Run simulation.
– Global error.
– Pattern error.
o Examine weights.
– Role of start state.
– Role of learning
state.
o Try:
– Logical Or.
– Exclusive Or.
Neural Networks
123
Define Task
o Train neural network to map Boolean functions AND,
OR, EXCLUSIVE OR (XOR).
o Boolean functions take set of inputs (1, 0) & decide if
given input falls into positive or negative category.
o Input & output activation values of nodes in network
with 2 input units & 1 output unit.
o Networks simple & relatively easy to construct for
task.
o Many of problems encounter with this task have
direct implications for more complex problems.
4/11/2016
Neural Networks
124
Boolean Functions AND, OR, XOR
Input
Activations
4 possible
input
combinations
22
4/11/2016
Output
Activations (Node 3)
Node
0
Node
1
AND
OR
XOR
0
0
0
0
0
0
1
0
1
1
1
0
0
1
1
1
1
1
1
0
Neural Networks
125
Define Architecture
for AND Function
o 4 input patterns & 2 distinct outputs.
– Each input pattern has 2 activation values.
– Each output has single activation.
– For every input pattern, have well-defined output.
o Use simple feedforward network with 2 I units & 1 O
unit.
Single Layer
Perceptron – 1 layer
of weights.
4/11/2016
a2
w20
a0
w21
a1
126
4/11/2016
Neural Networks
127
1. Network menu – New Project option.
2. New project dialogue box appears.
3. Select directory or folder in which to save your
project files. Use N: Drive!
4. Call project and. All files associated with project
should have same name (any name you want).
5. Get 3 windows on screen – each used for entering
info relevant to different aspect of network
architecture.
–
–
–
4/11/2016
and.teach – defines output patterns to network, how
many & format.
and.data – defines input patterns to network, how many
& format.
and.cf – used to define # nodes in network & initial
pattern of connectivity between nodes before training.
Neural Networks
128
Info Stored in .cf, .data &
.teach Files
o Can use editor of tlearn.
o Or text editor or word processor.
– Must save files in ASCII format (text).
o Enter data for and.cf file.
– Follow upper- & lower-case distinctions, spaces
& colons.
– Use delete or backspace keys to correct errors.
o File Save command in tlearn.
4/11/2016
Neural Networks
129
1 AND 1 = 1
0 AND 0 = 0
0 AND 1 = 0
INPUT
1 AND 0 = 0
CONFIGURATION
OUTPUT
4/11/2016
Neural Networks
130
Key to setting up simulator.
Describes configuration of
network.
Conforms to fairly rigid
format.
3 sections:
NODES:
CONNECTIONS:
SPECIAL
4/11/2016
Neural Networks
131
NODES:
NODES:
nodes = 1
inputs = 2
outputs = 1
output node is 1
Beginning of nodes section
# units in network (not input)
# input units (counted separately)
# output units in network
identifies output unit – only 1 noninput node in network. Start at 1.
o Inputs don’t count as nodes.
o Output nodes are < node-list>.
o Spaces are critical.
4/11/2016
Neural Networks
132
CONNECTIONS:
CONNECTIONS : Beginning of section
groups = 0
How many groups of connections are
constrained to have same value.
1 from i1-i2
Indicates node 1 (output) receives
input from 2 input units. Input units
given prefix i.
1 from 0
Node 0 is bias unit which is always
on. So node 1 has a bias.
o All connections in a group are identical strength.
– groups = 0 is common.
4/11/2016
Neural Networks
133
o <node-list> from <node-list> provides info about
connections
– <node-list> is comma-separated list of node #
with dashes indicating that intermediate node
# are included.
– 1 from i1-i2
– Contains no spaces.
– Nodes numbered counting from 1.
o Inputs are numbered, counting from 1, with i
prefix.
o Node 0 always outputs a 1 & serves as bias node.
– If biases are desired, connections must be
specified from node 0 to specific other nodes.
– 1 from 0
4/11/2016
Neural Networks
134
SPECIAL:
SPECIAL:
selected = 1
weight_limit = 1.00
Beginning of section
Which units selected for special
printout. Output node (1) is selected.
Sets start weights (from I to O &
biases to O) randomly in range of +/0.5.
o Optional lines can specify if :
– linear = <node-list>
some nodes linear
– bipolar = <node-list>
values range from –1 to 1
– selected = <node-list> nodes selected for special printout
4/11/2016
Neural Networks
135
Data (.data) File
o Defines input patterns presented to tlearn.
o First line is either:
– distributed (normal) – set of vectors with i
values.
– localist (only few numbers of many input lines
are non-zero).
o Second line is integer specifying number of input
vectors to follow.
o Remainder of file consists of input.
– Integers or floating-point numbers.
4/11/2016
Neural Networks
136
4/11/2016
Neural Networks
137
Teach (.teach) File
o Required whenever learning is to be performed.
o First line: distributed (normal)
localist (only few of many target values nonzero).
o Integer specifying # output vectors to follow.
o Ordering of output pattern matches ordering of
corresponding input patterns in .data file.
o In normal (distributed), each output vector contains
o floating point or integer numbers.
– o = number of outputs in network.
– can use * instead of a floating point number to
indicate “don’t care”.
4/11/2016
Neural Networks
138
4/11/2016
Neural Networks
139
Checking the Architecture
o If typed in info to and.cf, and.data & and.teach
files correctly should have no problems.
o tlearn offers check of and.cf by displaying picture
of network architecture.
– Displays menu, Network Architecture option.
– Can change how see nodes, but doesn’t change
contents of network configuration file.
o Get error message if mistake in syntax of training
files.
o Doesn’t not find incorrect entries in data!!
4/11/2016
Neural Networks
140
4/11/2016
Neural Networks
141
Running the Simulation
o Specify 3 input files (.cf, .data, .teach) & save
them.
o Specify parameters for tlearn to determine initial
start state of network, learning rate, & momentum.
o Network menu, training options.
4/11/2016
Neural Networks
142
o # training sweeps before stop
– training sweep is 1 presentation of input
pattern causing activation to propagate thru
network & appropriate weight adjustments to
be carried out.
o Order in which patterns are presented to network
determined by :
– train sequentially – presents patterns in order
they appear in .data & .teach files.
– train randomly – presents patterns in random
order.
o Learning Rate – determines how fast weights are
changed in response to a given error signal.
– set to 0.100
o Momentum –discussed later.
– set to 0.0
4/11/2016
Neural Networks
143
o Initial state of network determined by weight
values assigned to connections before training
starts.
– .cf file specifies weight_limit
o Weights assigned according to random seed
indicated by number next to Seed with: button.
– Select any number you like.
– Simulation can be replicated using the same
random seed – initial start weights of network
are identical & patterns are sampled in same
random order.
o Seed randomly – computer selects random seed.
o Both Seed with & Seed randomly select set of
random start weights within the limits specified
by weight_limit parameter.
4/11/2016
Neural Networks
144
Train the Network
o Once set training options, select Train the
network from Network menu.
o Get tlearn Status display.
– # sweeps
– Abort, dump current state in weights file.
– Iconify – clear screen for other tasks while
tlearn runs in background.
4/11/2016
Neural Networks
145
Has the Network Solved the
Problem?
1. Examine global error produced at output nodes
averaged across patterns.
2. Examine response of network to individual input
patterns.
3. Analyzing weights & internal representations.
4/11/2016
Neural Networks
146
Examine Global Error
o During training, simulator calculates discrepancy between
actual network output activations & target activations it is
being taught to produce.
o Simulator reports this error on-line -- sum it over a number
of patterns.
– As learning occurs, error should decline & reach 0.
o If network is trained on task in which same input can produce
different outputs, then network can learn correct
probabilities, but error rate never reaches 0.
o Error calculated by subtracting actual response from desired
(target) response.
o Value of discrepancy is either:
– Positive if target greater than actual output.
– Negative if actual output is greater than target output.
4/11/2016
Neural Networks
147
Root Mean Square (RMS) Error
o Global error – average error across 4 pairs at a
given point in training.
o tlearn provides Root Mean Square error (RMS)
to prevent cancellation of positive & negative
numbers.
– Average of the squared errors for all patterns.
– Returns square root of average.
4/11/2016
Neural Networks
148
AND
Network
• Tracks RMS error throughout training (every 100 sweeps).
• Error decreases as training continues … after 1000 sweeps RMS
error = 0.35.
– Average output error = 0.35
– Output off target by approx. 0.35 averaged across 4
patterns.
4/11/2016
Neural Networks
149
o Equation 3.1
– k indicates number of input patterns (4 for AND)
– ok is vector of output activations produced by input
pattern k
– number of elements in vector corresponds to number of
output nodes.
• e.g., in this case (AND), only one output node so
vector contains only 1 element.
– vector tk specifies desired or target activations for input
pattern k.
o With 1000 sweeps & 4 input patterns, network sees each
pattern 250 approximately.
4/11/2016
Neural Networks
150
o Given RMS error = 0.35, has the network learned the AND
function?
– Depends on how define acceptable level of error.
o Activation function of output unit is sigmoid function (EQ
1.2).
– Activation curve never reaches 1.0 or 0.0
– Net input to node would need to be ± infinity.
– Always some residual finite error.
o So what level of error is acceptable? No right answer.
– Can say all outputs be within 0.1 of target.
– Can round off activation values & ones closest to 1.0 are
correct if target is 1.0.
4/11/2016
Neural Networks
151
Has Network Solved Problem?
o RMS error = 0.35. Solved?
o Depends on how define acceptable level of error.
– Can’t always use just global error.
– Network may have low RMS, but hasn’t solved
all input patterns correctly.
Exercise 3.3
1. How many times has network seen each input
pattern after 1000 sweeps through training set?
2. How small must RMS error be before we can say
network has solved problem?
4/11/2016
Neural Networks
152
Pattern Error – Verify Network
Has Learned
o RMS error is the average error across 4
patterns.
o Is error uniformly distributed across different
patterns or have some patterns been correctly
learned while others are not??
o Verify network has learned from Network menu
– Presents each input pattern to network once &
observes resulting output node activation.
– Compare output activations with teacher signal
in .teach file.
4/11/2016
Neural Networks
153
o Output window indicates file and.1000.wts as
specification of state of network.
o Used and.data training patterns to verify network
performance.
o Compare activation values to target activations in
and.teach file.
o Has the network solved Boolean AND?
4/11/2016
Neural Networks
154
Pattern Error – Node Activities
o Activation levels indicated by squares.
– Large white = high activations.
– Small white = low activations.
– Grey = inactive node.
4/11/2016
Neural Networks
155
Individual Pattern Error
• Global error can be misleading.
– If have large # of patterns to learn, global
error may be low even if some patterns are not
learned correctly.
– These may be the interesting patterns.
• Also may want to create special test stimuli not
presented to network during training.
– Generalize to novel cases?
– What has network learned?
• Helps discover what generalizations have been
created from a finite data set.
4/11/2016
Neural Networks
156
Pattern Error: Present each Input
Pattern Just Once
•
•
•
Select Verify network has learned from
Network menu.
• Presents each input pattern to network just
once.
• E.g., for AND function, should do 4 sweeps (1
per each training input).
Observe resulting output node activations.
Compare output activations with teacher signal in
.teach file.
4/11/2016
Neural Networks
157
AND Network
• Output window indicates file and.1000.wts as
specification of state of network.
• Used and.data training patterns to verify network
performance.
• Compare activation values to target activations in
and.teach file.
• Has the network solved Boolean AND?
4/11/2016
Neural Networks
158
Calculate Actual RMS Error Value & Compare it to
Value Plotted (Boolean AND)
Inpu
t
Output
Round
Off
Targe
t
Squared
Error
0
0
0.099
0
0
.0098
1
0
0.294
0
0
.0864
0
1
0.301
0
0
.0906
1
1
4/11/2016
0.620
1
1
RMS
Neural Networks
Error
=
.1444
Sqrt(.3312/4)
.2877
159
Pattern Error – Node Activities
• Activation levels indicated by squares.
– Large white = high activations.
– Small white = low activations.
– Grey = inactive node.
4/11/2016
Neural Networks
160
Examine Weights
o Input activations transmitted to other nodes
along modifiable connections.
o Performance of network determined by strength
of connections (weight values).
1. Display menu, Connection Weights (Hinton
diagram).
–
–
–
4/11/2016
white (positive)
black (negative)
size reflects absolute size
of connection
bias node/first input/second input
Neural Networks
161
o All rectangles in first column code values of
connection from bias node.
o Rectangles in 2nd column code connections from 1st
input unit.
o Across columns – higher numbered nodes (from
.cf)
o Rows in each column identify destination nodes of
connection.
– higher numbered rows indicate higher
numbered destination nodes.
– Only one node in this example receives inputs
(output node) – only one that receives incoming
connections.
4/11/2016
Neural Networks
162
o Hinton diagram provides clues how network solves
Boolean AND.
– Bias has strong negative connection to output
node.
– 2 input nodes have moderately sized positive
connections to output node.
– One active node by itself can’t provide enough
activation to overcome strong negative bias.
– Two active input nodes together can overcome
negative bias.
– Output node only turns on if both input nodes
are active!
4/11/2016
Neural Networks
163
Role of Start State
o Network solved Boolean AND starting with
particular set of random weights & biases.
o Use different random seed (Training options) to
wipe out learning that has occurred …
o Can resume training beyond the specified number
of sweeps using the Resume training option.
o Start states can have dramatic impact on way
network attempts to solve a problem & on final
solution.
– Training networks with different random seeds is like
running subjects on experiments.
4/11/2016
Neural Networks
164
Role of Learning Rate
o Learning rate determines proportion of error signal which is
used to change weights in network.
– Large learning rates lead to big weight changes.
– Small learning rates lead to small weight changes.
o To examine effect of learning rate on performance, run
simulation so that learning rate is only factor changed.
– Start with same random weights & biases.
o Modelers often use small learning rate to avoid large weight
changes.
– Large weight changes can be disruptive (learning is
undone).
– Large weight changes can be counter-productive when
network is close to a solution!
4/11/2016
Neural Networks
165
Steps To Building Neural
Network in tlearn
1.
Network menu – New Project option. New project dialogue box
appears.
2. Select directory or folder in which to save your project files.
Use N: Drive!
3. Get 3 windows on screen – each used for entering info relevant
to different aspect of network architecture (.teach, .data, &
.cf).
4. Check architecture.
5. Specify training option parameters to determine initial start
state of network, learning rate, & momentum.
6. Train network (from Network menu).
7. Determine if network has learned task by checking error rates,
examine response to individual patterns, etc.
4/11/2016
Neural Networks
166
AND Network :
Hinton Diagram
Bias Node
First Input
Second Input
4/11/2016
Neural Networks
167
Hinton Diagram.
White = positive weight. Black = negative weight.
Area of box proportional to absolute value of
corresponding weight.
4/11/2016
Neural Networks
168
Logical AND Network Implemented
With 2 I & 1 O
o Output unit on (value close to
1.0) when both inputs 1.0.
Otherwise off.
o With large - weight from bias
unit to output, off by default.
o Make weights from input nodes
to output large enough that if
both nodes are present, net
input is great enough to turn
output on.
– Neither input by itself is
large enough to overcome
negative bias.
4/11/2016
Node 0 is bias unit which
is always on. So node 1
has a bias.
Neural Networks
169
Hinton
Diagram
Example
4/11/2016
Neural Networks
170
Weights File in tlearn
o tlearn keeps up-to-date record of network’s state in weights
file.
o Saved to disk at regular intervals & at end of training.
o Lists all connections in network grouped according to received
node.
o In and.cf file only 1 receiving node is specified (output node
1).
4/11/2016
Neural Networks
171
o 1st # represents weight on connections from bias node to
output node (-2.204).
o 2nd # (1.328) shows connection from 1st input node to output.
o 3rd # (1.36) shows connection from 2nd input node to output
node.
o Final number (0.000) shows connection from output node
itself – non-existent due to feedforward nature.
4/11/2016
Neural Networks
172
Resume Training
o Can continue network training by Resume training
option on the Network menu.
– Extend training by # sweeps & adjust error
display to accommodate extra training sweeps.
o Does the RMS error decrease significantly?
4/11/2016
Neural Networks
173
Several Different Ways to Analyze
Weights & Examine Internal
Representations
1. Hierarchical clustering of hidden unit activations.
2. Principal component analysis & projection pursuit.
3. Activation patterns in conjunction with actual
weights.
•
Examine these methods in detail later in
semester!
4/11/2016
Neural Networks
174
1 - Hierarchical Clustering of
Hidden Unit Activations
•
•
•
•
•
•
Present test patterns to network after training.
Patterns produce activations on hidden units which record &
tag -- vectors in multi-dimensional space.
Clustering looks at similarity structure of space.
Inputs treated as similar by network produce internal
representations that are similar.
Produces tree format of inner-pattern distance.
Can’t examine space directly -- difficult to visualize highdimensional spaces.
4/11/2016
Neural Networks
175
2 - Principal Component Analysis
& Project Pursuit
• Used to identify interesting lower-dimensional
slices from hierarchical clustering.
• Move viewing perspective around in this space.
4/11/2016
Neural Networks
176
3 - Activation Patterns In
Conjunction With Actual Weights
• When look at activation patterns, only look at part
of what network “knows.”
• Network manipulates & transforms info via
connections between nodes.
• Examine connections & weights to see how
transformations are being carried out.
• Hinton diagrams can be used -- weights shown as
colored squares with color & size of square
representing magnitude & sign of connection.
4/11/2016
Neural Networks
177
Has Network Solved AND
Problem? RMS error = 0.35. Solved?
•
•
•
Depends on how define acceptable level of error.
– Can’t always use just global error.
– Network may have low RMS, but hasn’t solved all input
patterns correctly.
Exercise 3.3
1. How many times has network seen each input pattern
after 1000 sweeps through training set?
2. How small must RMS error be before we can say
network has solved problem?
Exercise 3.4
1. Compare exact value of RMS to plotted value.
4/11/2016
Neural Networks
178
What Do We Learn From a
Simulation?
• Are the simulations framed in such way that
clearly address some issue?
• Are the task & stimuli appropriate for points being
made?
• Do you feel you’ve learned something from the
simulation?
4/11/2016
Neural Networks
179
Logical OR
o What type of
network
architecture?
o 2 input, 1 output
+ bias node
o Try the OR
network (pg. 5762).
4/11/2016
Input
Activations
Output
Activations (Node 3)
Node 0
Node 1
AND
OR
XOR
0
0
0
0
0
0
1
0
1
1
1
0
0
1
1
1
1
1
1
0
Neural Networks
180
4/11/2016
Neural Networks
181
Exclusive OR
o Create third project called xor and try the
exclusive OR function with input layer and output
layer.
4/11/2016
Neural Networks
182
Neural Network Simulation
Software : tlearn, Membrain
o Simulations allow examination of how model solved
problem.
o Simulator needs to be told:
– Network architecture.
– Training data.
– Learning rate & other parameters.
o Simulator:
– Creates network.
– Performs training.
– Reports results.
o You can examine results.
4/11/2016
Neural Networks
183
Tlearn Software
1. Copy win_tlearn.exe from disk or R: drive to N:
drive.
2. Double-click on file to begin installation.
3. Executable is called tlearn.
o http://www.columbia.edu/cu/psychology/courses/
3205/tlearn/
To download Adobe Acrobat PDF version:
ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/Tle
arnManual.pdf
4/11/2016
Neural Networks
184