#### Transcript Artificial Neural Networks - University College London

Artificial Neural Networks An Introduction Outline • • • • • • Introduction Biological and artificial neurons Perceptrons (problems) Backpropagation network Training Other ANNs (examples in HEP) Introduction - What are ANNs? • Artificial Neural Networks: – data analysis tools (/computational modelling tools) – model complex real-world problems – structures comprised of densely interconnected simple processing elements – each element is linked to neighbours with varying strengths – learning is accomplished by adjusting these strengths to cause network to output appropriate results – learn from experience (rather than being explicitly programmed with rules) – inspired by biological neural networks (ANN’s idea is not to replicate operation of bio systems, but use what’s known of their functionality to solve complex problems) • Information processing characteristics : • Generally ANNs outperform other computational tools in – nonlinearity (allows better fit solving a variety of problems: to data) – fault and failure tolerance (for uncertain data and measurement errors) – learning and adaptivity (allows system to update its internal structure in response to changing environment) – generalization (enables application of model to unlearned data) • Pattern classification; categorizes set of input patterns in terms of different features • Clustering; clusters formed by exploring similarities between input patterns based on their inter-correlations • Function approximation; training ANN to approx. the underlying rules relating the inputs to the outputs Biological Neuron • 3 major functional units • Dendrites • Cell body • Axon • Synapse • Amount of signal passing through a neuron depends on: x1 x2 w1 xn w2 wn • Intensity of signal from feeding neurons • Their synaptic strengths • Threshold of the receiving neuron • Hebb rule (plays key part in learning) • (A synapse which repeatedly triggers the activation of a postsynaptic neuron will grow in strength, others will gradually weaken.) • Learn by adjusting magnitudes of synapses’ strengths y g(ξ) ξ Artificial Neurons (basic computational entities of an ANN) • Analogy between artificial and biological (connection weights represent synapses) • In 1958 Rosenblatt introduced mechanics (perceptron) • Input to output (y=g(∑iwixj) • Only when sum exceeds the threshold limit will neuron fire • Weights can enhance or inhibit • Collective behaviour of neurons is what’s interesting for intelligent data processing y g( ) ∑w.x w1 w2 w3 x3 x1 x2 Perceptrons • Can be trained on a set of examples using a special learning rule (process) • Weights are changed in proportion to the difference (error) between target output and perceptron solution for each example. • Minimize summed square error function: E = 1/2 ∑p∑i(oi(p) - ti(p))2 with respect to the weights. oi wij xj • Error is function of all the weights and forms an irregular multidimensional complex hyperplane with many peaks, saddle points and minima. • Error minimized by finding set of weights that correspond to global minimum. • Done with gradient descent method – (weights incrementally updated in proportion to δE/δwij) • Updating reads: wij(t + 1) = wij(t) – Δwij • Aim is to produce a true mapping for all patterns g(ξ) ξ threshold Summary of Learning for Perceptron 1. Initialize wij with random values. 2. Repeat until wij(t + 1) ≈ wij(t): • • • Pick pattern p from training set. Feed input to network and calculate the output. Update the weights according to wij(t + 1) = wij(t) – Δwij where Δwij = -η δE/δwij. • When no change (within some accuracy) occurs, the weights are frozen and network is ready to use on data it has never seen. Example AND x1 x2 t OR x1 x2 t 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 1 1 0 • Perceptron learns these rules easily (ie sets appropriate weights and threshold) (to w=(w0,w1,w2) = (-1.5,1.0,1.0) and (-0.5,1.0,1.0) where w0 corresponds to the threshold term) Problems • Perceptrons can only perform accurately with linearly separable classes (linear hyperplane can place one class of x1 objects on one side of plane and other class on other) • ANN research put on hold for 20yrs. • Solution: additional (hidden) layers of neurons, MLP architecture x2 x1 • Able to solve non-linear classification problems x2 MLPs • Learning procedure is extension of simple perceptron algorithm • Response function: oi wij oi=g(∑iwijg(∑kwjkxk)) Which is non-linear so network able to perform non-linear mappings • (Theory tells us that a neural network with at least 1 hidden layer can represent any xk function) • Vast number of ANN types exist hj wjk Backpropagation ANNs • Most widely used type of network • Feedforward • Supervised (learns mapping from one data space to another using examples) • Error propagated backwards • Versatile. Used for data modelling, classification, forecasting, data and image compression and pattern recognition. BP Learning Algorithm • Like Perceptron, uses gradient descent to minimize error (generalized to case with hidden layers) • Each iteration constitutes two sweeps • To minimize Error we need δE/δwij but also need δE/δwjk (which we get using the chain rule) • Training of MLP using BP can be thought of as a walk in weight space along an energy surface, trying to find global minimum and avoiding local minima • Unlike for Perceptron, there is no guarantee that global minimum will be reached, but most cases energy landscape is smooth Summary of BP learning algorithm 1. Initialize wij and wjk with random values. 2. Repeat until wij and wjk have converged or the desired performance level is reached: • • • Pick pattern p from training set. Present input and calculate the output. Update weights according to: wij(t + 1) = wij(t) – Δwij wjk(t + 1) = wjk(t) – Δwjk where Δw = -η δE/δw. (…etc…for extra hidden layers). Training • • • Generalization; network’s performance on a set of test patterns it has never seen before. (lower than on training set) Training set used to let ANN capture features in data or mapping. Initial large drop in error is due to learning, but subsequent slow reduction is due to: 1. 2. Error (eg SSE) Testing Optimum network Network memorization (too many training cycles used). Overfitting (too many hidden nodes). Training (network learns individual training examples and loses generalization ability) No. of hidden nodes or training cycles Other Popular ANNs Some applications may be solved using variety of ANN types, some only via specific. (problem logistics) • Hopfield networks; optimization. Presented with incomplete/noisy pattern, network responds by retrieving an internally stored pattern it most closely resembles. • Kohonen networks; (self-organizing) Trained in an unsupervised manner to form clusters in the data. Used for pattern classification and data compression. HEP Applications ANNs applied from off-line data analysis to lowlevel experimental triggers • Signal to background ratios reduced. (BP…) – ie in flavour tagging, Higgs detection • Feature recognition problems in track finding. (feed-back) • Function approximation tasks (feed-back) – ie reconstructing the mass of a decayed particle from calorimeter information • http://www.doc.ic.ac.uk/~nd/surprise_96.journal/vol4/cs11/report.html • http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html • Carsten Peterson and Thorsteinn Rognvaldsson, An Introduction to Artificial Neural Networks, LU TP 91-23, September 1991 (Lectures given at the 1991 Cern School of Computing, Sweden)