Transcript Document

Neural Networks - Adaline
L. Manevitz
NNs Adaline
1
Plan of Lecture
• Perceptron: Connected and Convex examples
• Adaline Square Error
• Gradient
• Calculate for and, xor
• Discuss limitations
• LMS algorithm: derivation.
NNs Adaline
2
What is best weights?
• Most classified correctly?
• Least Square Error
• Least Square Error Before Cut-Off!
• To Minimize S (d – Sw x)**2
(first sum over
examples; second over dimension)
NNs Adaline
3
Least Square Minimization
• Find gradient of error over all examples. Either
calculate the minimum or move opposite to
gradient.
• Widrow-Hoff(LMS): Use instantaneous
example as approximation to gradient.
– Advantages: No memory; on-line; serves similar
function as noise to avoid local problems.
– Adjust by w(new) = w(old) + a (d) x for each x.
– Here d = (desired output – Swx)
NNs Adaline
4
LMS Derivation
• Errsq = S (d(k) – W x(k)) ** 2
• Grad(errsq) = 2S(d(k) – W x(k)) (-x(k))
• W (new) = W(old) - m Grad(errsq)
• To ease calculations, use Err(k) in place of
Errsq
• W(new) = W(old) + 2mErr(k) x(k))
• Continue with next choice of k
NNs Adaline
5
Applications
• Adaline has better convergence
properties than Perceptron
• Useful in noise correction
• Adaline in every modem.
NNs Adaline
6
LMS (Least Mean Square Alg.)
•
1. Apply input to Adaline input
•
2. Find the square error of current input
–
•
Errsq(k) = (d(k) - W x(k))**2
3. Approximate Grad(ErrorSquare) by
–
differentiating Errsq
–
approximating average Errsq by Errsq(k)
–
obtain -2Errsq(k)x(k)
4.
Update W: W(new) = W(old) + 2mErrsq(k)X(k)
5.
Repeat steps 1 to 4.
NNs Adaline
7
Comparison with Perceptron
• Both use updating rule changing with each
input
• One fixes binary error; the other minimizes
continuous error
• Adaline always converges; see what happens
with XOR
• Both can REPRESENT Linearly separable
functions
NNs Adaline
8
Convergence Phenomenom
• LMS converges depending on choice of
m.
• How to choose it?
NNs Adaline
9
Limitations
• Linearly Separable
• How can we get around it?
– Use network of neurons?
– Use a transformation of data so that it is
linearly separable
NNs Adaline
10
Multi-level Neural Networks
• Representability
– Arbitrarily complicated decisions
– Continuous Approximation: Arbitrary Continuous
Functions (and more) (Cybenko Theorem)
• Learnability
– Change Mc-P neurons to Sigmoid etc.
– Derive backprop using chain rule. (Like LMS
TheoremSample Feed forward Network (No loops)
NNs Adaline
11
Replacement of Threshold
Neurons with Sigmoid or
Differentiable Neurons
•Sigmoid
•Threshold
NNs Adaline
12
Prediction
•delay
•Input/Output
•NN
NNs Adaline
•Compare
13
Sample Feed forward Network
(No loops)
•Weights
•Weights
•Output
•Weights
•Input
•Wji
•Vik
F(S wji xj
NNs Adaline
14