Transcript Document

Neural Networks
• Introduction
–
–
–
–
Biological neurons
Artificial neurons
Concepts
Conventions
• Single Layer Perceptron
– Example
– Limitation
Lecture 6, CS567
1
Biological neuron
• Neuron = Cell superclass in nervous system
• Specs
– Total number = ~1011 (Size of hard disk circa ’03)
• Maximum number before birth
• 104 lost/day (More if you don’t study everyday!)
–
–
–
–
Connections/neuron = ~104
Signal Rate = ~103 Hz (Cpu = 109 Hz circa ’03)
Signal Propagation Velocity = 10(-1 to 2)/sec
Power = 40W
Lecture 6, CS567
2
Biological Neuron
• Connectivity important (Just like human society)
– Connected
• To what and
• To what extent
– Basis of memory and learning (revising opinions;
learning lessons in life)
– Revision important (And why reading for the first time
on eve of exam is a flawed strategy)
– Covering eye to prevent loss of vision in squint (Why
advertising industry persists, subliminally or blatantly)
Lecture 6, CS567
3
Artificial Neural Networks
• What
– Connected units with inputs and outputs
• Why
– Can “learn” and approximate any function,
including non-linear functions (XOR)
• When
– Basic idea more than 60 years old
– Resurgence of interest once coverage extended to
non-linear problems
Lecture 6, CS567
4
• Trial
–
–
–
–
–
Concepts
Output = Verdict = Guilty/Not guilty
Processing neurons = Jury members
Output neuron = Jury Foreman
Inputs = Witnesses/Lawyers
Weights = Credibility of Witnesses/Lawyers
• Investment
–
–
–
–
Output decision = Buy/Sell
Inputs = Financial advisors
Weights = Past reliability of advice
Iterate = Revise weights after results
Lecture 6, CS567
5
Concepts
• Types of learning
– Supervised
• NN learns from a series of labeled examples (human
propagation of prejudice)
• Distinction between training and prediction phases
– Unsupervised
• NN discovers clusters and classifies examples
• Also called self-organizing networks (human
tendency)
• Typically, prediction rules cannot be derived
from an NN
Lecture 6, CS567
6
Conventions
p1
w1,1
w1,2
p2
1h1
2h1
o1
1h2
2h2
o2
1hM
2hP
oK
p3
pN
(Input)
wM,N
(
Hidden
Lecture 6, CS567
)
(Output) LAYERS
7
Conventions
• Generally, rich connectivity between, but not within
layers
• Output for any neuron = Transfer/Activation function
f(x) = f(WP + b) where
W = Weight Matrix [w1,1
P = Input Matrix
p1
w1,2
w1,3 …. w1,N]
p2
pN
WP = Matrix product = [w1,1p1+w1,2p2+ w1,3p3 ... +w1,NpN]
b = Bias/Offset
Lecture 6, CS567
8
Activation Functions
•
•
•
•
•
Hard limit: f(x) = [0/1]. If x < 0, f(x) = 0, else 1
Symmetric hard limit: f(x) = [-1/1]. If x < 0, f(x) = -1, else 1
Linear: f(x) = x
Positive linear: f(x) = [0,x]. If x < 0, f(x) = 0, else x
Saturating linear: f(x) = [0,1]. If x < 0, f(x) = 0; if x > 1, then 1,
else x
• Symmetric Saturating linear: f(x) = [-1,1]. If x < -1, f(x) = -1; if x
> 1, then 1, else x
• Log-sigmoid: f(x) = 1/(1+e-x)
• Competitive (multiple neuron layer; winner takes all):
f(xi) = 1 | xi > (not xi); f(not xi) = 0;
Lecture 6, CS567
9
Conventions
• Output for any layer = column matrix =
[ f(W1P + b1)
f(W2P + b2)
.
f(WMP + bM)]
where
Wi = Weight Matrix [wi,1
Lecture 6, CS567
wi,2
wi,3 …. w1,N]
10
Single Layer Perceptron
• Single Layer Single Neuron Perceptron
– Consider multiple inputs (column vector) with respective weights
(row vector) to a neuron that serves as the output neuron
– Assume f(x) is the hard limit function
– Labeled training examples are provided {(P1,t1), (P2,t2) ….
(PZ,tZ)}, where each ti is 0 or 1.
– Learning rule (NOT the same as prediction rule)
• Error e = Target - f(x)
• For each input set
Wcurrent = Wprevious + eP
bcurrent = bprevious + e
• Iterate till e is zero for all training examples
Lecture 6, CS567
11
Single Layer Perceptron
• Single Layer Multiple Neuron Perceptron
– Consider multiple inputs (column vector) with respective weights
(row vector) to a layer of several neurons that serve as the output
– Assume f(x) is the hard limit function
– Labeled training examples are provided {(P1,t1), (P2,t2) ….
(PZ,tZ)}, where each ti is a column vector consisting of 0s and/or
1s.
– Learning rule (NOT the same as prediction rule; use vectors for the
error and bias)
• Error E = Target - f(x)
• For each input set
Wcurrent = Wprevious + EP
Bcurrent = Bprevious + E
• Iterate till E is zero for all training examples
Lecture 6, CS567
12