#### Transcript Fast Learning in Networks of Locally

Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, 281-294 (1989) Network Architecture • Responses of neurons are “locally-tuned” or “selective” for some part of the input space. • Contains a single hidden layer of these locally-tuned neurons. • Hidden layer outputs are fed to a layer of linear neurons, giving network output. • For mathematical simplicity, we’ll assume only one neuron in the linear output layer. Network Architecture (2) Biological Plausibility • Cochlear stereocilia cells in human ear exhibit locally-tuned response to frequency. • Cells in visual cortex respond selectively to stimulation that is both local in retinal position and local in angle of orientation. • Prof. Wang showed locally-tuned responses to motion of particular speeds and orientations. Mathematical Definitions • A network of M locally-tuned units has overall response function: M f ( x ) A R ( x ) 1 R ( x ) R ( x x / ) • Here, x is a real-valued vector in input space, R is the response function of the - th locally-tuned unit, R is a radially-symmetric function with a single maximum its center and which drops to zero at large radii. Mathematical Definitions (2) • x and are the center and width in the input space of the - th unit, and A is the weight or amplitude of the - th unit. • A simple R is the unit normalized Gaussian R (x) e x x 2 /( ) 2 Possible Training Methods • Fully supervised training to find neuron centers, widths, and amplitude. – Uses error gradient found by varying all parameters (no restrictions on the parameters). – In particular, widths can grow large, thereby losing the local nature of the neurons. – Compared with backpropagation, achieves lower error, but like BP, very slow to train. Possible Training Methods (2) • Combination of supervised and unsupervised learning, a better choice? – Neuron centers and widths are determined through unsupervised learning. – Weights or amplitudes for hidden layer outputs are determined through supervised training. Unsupervised Learning • Determination of neuron centers, how? • k-means clustering – Find set a k neuron centers which represent a local minimum of the total squared euclidean distances between the training vectors and the neuron centers. • Learning Vector Quantization (LVQ) Unsupervised Learning (2) • Determination of neuron widths, how? • P nearest-neighbor heuristics – Vary widths to achieve certain amount of response overlap between each neuron and its P nearest neighbors. • Global first nearest-neighbor, P = 1 – Uses global average width between each neuron and its nearest neighbor as net’s uniform width. Supervised Learning • Determination of weights, how? • Simple case for 1 linear output – Use Widrow-Hoff learning rule. • For a layer of linear outputs? – Simply use Gradient Descent learning rule. • Reduced to a linear optimization problem. Advantages Over Backprop • Training via a combination of linear supervised and linear self-organizing techniques is much faster than backprop. • For a given input, only a small fraction of neurons (those with nearby centers) will give ~ non-zero responses. Hence we don’t need to fire all neurons to get overall output. This improves performance. Advantages Over Backprop (2) • Based on well-developed mathematical theory (kernel theory) yielding statistical robustness. • Computational simplicity since only one layer is involved in supervised training. • Provides guaranteed, globally optimal solution via simple linear optimization. Project Proposal • Currently debugging C++ RBF network with n dimensional input and 1 linear output neuron. – Uses k-means clustering, global first nearest neighbor heuristic, and gradient descent. – Experimentation with different training algs. • Try to reproduce results for RBF neural nets performing face-recognition.