Neural Nets Using Backpropagation
Download
Report
Transcript Neural Nets Using Backpropagation
Neural Nets Using Backpropagation
Chris Marriott
Ryan Shirley
CJ Baker
Thomas Tannahill
Agenda
Review of Neural Nets and Backpropagation
Backpropagation: The Math
Advantages and Disadvantages of Gradient Descent
and other algorithms
Enhancements of Gradient Descent
Other ways of minimizing error
Review
Approach that developed from an analysis of the
human brain
Nodes created as an analog to neurons
Mainly used for classification problems (i.e. character
recognition, voice recognition, medical applications,
etc.)
Review
Neurons have weighted inputs, threshold values,
activation function, and an output
Weighted inputs
Output
Activation function = f(S(inputs * weight))
Review
4 Input AND
Inputs
Threshold = 1.5
Outputs
Threshold = 1.5
Inputs
Threshold = 1.5
All weights = 1 and all outputs = 1 if active 0 otherwise
Review
Output space for AND gate
Input 1
(1,1)
(0,1)
1.5 = w1*I1 + w2*I2
(0,0)
(1,0)
Input 2
Review
Output space for XOR gate
Demonstrates need for hidden layer
Input 1
(0,1)
(1,1)
Input 2
(0,0)
(1,0)
Backpropagation: The Math
General multi-layered neural network
Output Layer
0
X0,0
1
2
3
4
5
6
Hidden Layer
1
W1,0
W0,0
i
Wi,0
Input Layer
0
8
9
X9,0
X1,0
0
7
1
Backpropagation: The Math
Backpropagation
Calculation of hidden layer activation values
Backpropagation: The Math
Backpropagation
Calculation of output layer activation values
Backpropagation: The Math
Backpropagation
Calculation of error
dk = f(Dk) -f(Ok)
Backpropagation: The Math
Backpropagation
Gradient Descent objective function
Gradient Descent termination condition
Backpropagation: The Math
Backpropagation
Output layer weight recalculation
Learning Rate
(eg. 0.25)
Error at k
Backpropagation: The Math
Backpropagation
Hidden Layer weight recalculation
Backpropagation Using Gradient
Descent
Advantages
Relatively simple implementation
Standard method and generally works well
Disadvantages
Slow and inefficient
Can get stuck in local minima resulting in sub-optimal
solutions
Local Minima
Local
Minimum
Global Minimum
Alternatives To Gradient Descent
Simulated Annealing
Advantages
Can guarantee optimal solution (global minimum)
Disadvantages
May be slower than gradient descent
Much more complicated implementation
Alternatives To Gradient Descent
Genetic Algorithms/Evolutionary Strategies
Advantages
Faster than simulated annealing
Less likely to get stuck in local minima
Disadvantages
Slower than gradient descent
Memory intensive for large nets
Alternatives To Gradient Descent
Simplex Algorithm
Advantages
Similar to gradient descent but faster
Easy to implement
Disadvantages
Does not guarantee a global minimum
Enhancements To Gradient Descent
Momentum
Adds a percentage of the last movement to the current
movement
Enhancements To Gradient Descent
Momentum
Useful to get over small bumps in the error function
Often finds a minimum in less steps
w(t) = -n*d*y + a*w(t-1)
w is the change in weight
n is the learning rate
d is the error
y is different depending on which layer we are calculating
a is the momentum parameter
Enhancements To Gradient Descent
Adaptive Backpropagation Algorithm
It assigns each weight a learning rate
That learning rate is determined by the sign of the gradient of the
error function from the last iteration
If the signs are equal it is more likely to be a shallow slope so the
learning rate is increased
The signs are more likely to differ on a steep slope so the learning rate
is decreased
This will speed up the advancement when on gradual slopes
Enhancements To Gradient Descent
Adaptive Backpropagation
Possible Problems:
Since we minimize the error for each weight separately the
overall error may increase
Solution:
Calculate the total output error after each adaptation and if it is
greater than the previous error reject that adaptation and
calculate new learning rates
Enhancements To Gradient Descent
SuperSAB(Super Self-Adapting Backpropagation)
Combines the momentum and adaptive methods.
Uses adaptive method and momentum so long as the sign of the
gradient does not change
This is an additive effect of both methods resulting in a faster traversal
of gradual slopes
When the sign of the gradient does change the momentum will
cancel the drastic drop in learning rate
This allows for the function to roll up the other side of the minimum
possibly escaping local minima
Enhancements To Gradient Descent
SuperSAB
Experiments show that the SuperSAB converges faster than
gradient descent
Overall this algorithm is less sensitive (and so is less likely to
get caught in local minima)
Other Ways To Minimize Error
Varying training data
Add noise to training data
Cycle through input classes
Randomly select from input classes
Randomly change value of input node (with low probability)
Retrain with expected inputs after initial training
E.g. Speech recognition
Other Ways To Minimize Error
Adding and removing neurons from layers
Adding neurons speeds up learning but may cause loss in
generalization
Removing neurons has the opposite effect
Resources
Artifical Neural Networks, Backpropagation, J.
Henseler
Artificial Intelligence: A Modern Approach, S. Russell
& P. Norvig
501 notes, J.R. Parker
www.dontveter.com/bpr/bpr.html
www.dse.doc.ic.ac.uk/~nd/surprise_96/journal/vl4/cs
11/report.html