Fast Learning in Networks of Locally

Download Report

Transcript Fast Learning in Networks of Locally

Fast Learning in Networks of
Locally-Tuned Processing Units
John Moody and Christian J. Darken
Yale Computer Science
Neural Computation 1, 281-294
Network Architecture
• Responses of neurons are “locally-tuned” or
“selective” for some part of the input space.
• Contains a single hidden layer of these
locally-tuned neurons.
• Hidden layer outputs are fed to a layer of
linear neurons, giving network output.
• For mathematical simplicity, we’ll assume
only one neuron in the linear output layer.
Network Architecture (2)
Biological Plausibility
• Cochlear stereocilia cells in human ear
exhibit locally-tuned response to frequency.
• Cells in visual cortex respond selectively to
stimulation that is both local in retinal
position and local in angle of orientation.
• Prof. Wang showed locally-tuned responses
to motion of particular speeds and
Mathematical Definitions
• A network of M locally-tuned units has
overall response function:
f ( x )   A R ( x )
 1
 
R ( x )  R ( x  x  /   )
• Here, x is a real-valued vector in input
space, R is the response function of the  - th
locally-tuned unit, R is a radially-symmetric
function with a single maximum its center
and which drops to zero at large radii.
Mathematical Definitions (2)
• x and  are the center and width in the
input space of the  - th unit, and A is the
weight or amplitude of the  - th unit.
• A simple R is the unit normalized Gaussian
 
R (x)  e
 
 x  x
/(  ) 2
Possible Training Methods
• Fully supervised training to find neuron
centers, widths, and amplitude.
– Uses error gradient found by varying all
parameters (no restrictions on the parameters).
– In particular, widths can grow large, thereby
losing the local nature of the neurons.
– Compared with backpropagation, achieves
lower error, but like BP, very slow to train.
Possible Training Methods (2)
• Combination of supervised and
unsupervised learning, a better choice?
– Neuron centers and widths are determined
through unsupervised learning.
– Weights or amplitudes for hidden layer outputs
are determined through supervised training.
Unsupervised Learning
• Determination of neuron centers, how?
• k-means clustering
– Find set a k neuron centers which represent a
local minimum of the total squared euclidean
distances between the training vectors and the
neuron centers.
• Learning Vector Quantization (LVQ)
Unsupervised Learning (2)
• Determination of neuron widths, how?
• P nearest-neighbor heuristics
– Vary widths to achieve certain amount of
response overlap between each neuron and its P
nearest neighbors.
• Global first nearest-neighbor, P = 1
– Uses global average width between each neuron
and its nearest neighbor as net’s uniform width.
Supervised Learning
• Determination of weights, how?
• Simple case for 1 linear output
– Use Widrow-Hoff learning rule.
• For a layer of linear outputs?
– Simply use Gradient Descent learning rule.
• Reduced to a linear optimization problem.
Advantages Over Backprop
• Training via a combination of linear
supervised and linear self-organizing
techniques is much faster than backprop.
• For a given input, only a small fraction of
neurons (those with nearby centers) will
give ~ non-zero responses. Hence we don’t
need to fire all neurons to get overall output.
This improves performance.
Advantages Over Backprop (2)
• Based on well-developed mathematical
theory (kernel theory) yielding statistical
• Computational simplicity since only one
layer is involved in supervised training.
• Provides guaranteed, globally optimal
solution via simple linear optimization.
Project Proposal
• Currently debugging C++ RBF network
with n dimensional input and 1 linear output
– Uses k-means clustering, global first nearest
neighbor heuristic, and gradient descent.
– Experimentation with different training algs.
• Try to reproduce results for RBF neural nets
performing face-recognition.