Artificial Neural Networks - University College London

Download Report

Transcript Artificial Neural Networks - University College London

Artificial Neural
Networks
An Introduction
Outline
•
•
•
•
•
•
Introduction
Biological and artificial neurons
Perceptrons (problems)
Backpropagation network
Training
Other ANNs (examples in HEP)
Introduction - What are ANNs?
• Artificial Neural Networks:
– data analysis tools (/computational modelling tools)
– model complex real-world problems
– structures comprised of densely interconnected simple
processing elements
– each element is linked to neighbours with varying strengths
– learning is accomplished by adjusting these strengths to cause
network to output appropriate results
– learn from experience (rather than being explicitly programmed
with rules)
– inspired by biological neural networks (ANN’s idea is not to
replicate operation of bio systems, but use what’s known of their
functionality to solve complex problems)
• Information processing
characteristics :
• Generally ANNs outperform
other computational tools in
– nonlinearity (allows better fit
solving a variety of problems:
to data)
– fault and failure tolerance (for
uncertain data and
measurement errors)
– learning and adaptivity
(allows system to update its
internal structure in response
to changing environment)
– generalization (enables
application of model to
unlearned data)
• Pattern classification;
categorizes set of input patterns
in terms of different features
• Clustering; clusters formed by
exploring similarities between
input patterns based on their
inter-correlations
• Function approximation;
training ANN to approx. the
underlying rules relating the
inputs to the outputs
Biological Neuron
• 3 major functional units
• Dendrites
• Cell body
• Axon
• Synapse
• Amount of signal passing through a neuron
depends on:
x1
x2
w1
xn
w2
wn
• Intensity of signal from feeding neurons
• Their synaptic strengths
• Threshold of the receiving neuron
• Hebb rule (plays key part in learning)
• (A synapse which repeatedly triggers the
activation of a postsynaptic neuron will grow in
strength, others will gradually weaken.)
• Learn by adjusting magnitudes of synapses’
strengths
y
g(ξ)
ξ
Artificial Neurons
(basic computational entities of an ANN)
• Analogy between artificial and
biological (connection weights
represent synapses)
• In 1958 Rosenblatt introduced
mechanics (perceptron)
• Input to output (y=g(∑iwixj)
• Only when sum exceeds the
threshold limit will neuron fire
• Weights can enhance or inhibit
• Collective behaviour of neurons is
what’s interesting for intelligent
data processing
y
g( )
∑w.x
w1
w2
w3
x3
x1
x2
Perceptrons
• Can be trained on a set of examples using a
special learning rule (process)
• Weights are changed in proportion to the
difference (error) between target output and
perceptron solution for each example.
• Minimize summed square error function:
E = 1/2 ∑p∑i(oi(p) - ti(p))2
with respect to the weights.
oi
wij
xj
• Error is function of all the weights and forms an
irregular multidimensional complex hyperplane
with many peaks, saddle points and minima.
• Error minimized by finding set of weights that
correspond to global minimum.
• Done with gradient descent method – (weights
incrementally updated in proportion to δE/δwij)
• Updating reads: wij(t + 1) = wij(t) – Δwij
• Aim is to produce a true mapping for all patterns
g(ξ)
ξ
threshold
Summary of Learning for
Perceptron
1. Initialize wij with random values.
2. Repeat until wij(t + 1) ≈ wij(t):
•
•
•
Pick pattern p from training set.
Feed input to network and calculate the output.
Update the weights according to
wij(t + 1) = wij(t) – Δwij
where Δwij = -η δE/δwij.
•
When no change (within some accuracy) occurs, the
weights are frozen and network is ready to use on data
it has never seen.
Example
AND
x1 x2 t
OR
x1 x2 t
1
1
0
0
1
1
0
0
1
0
1
0
1
0
0
0
1
0
1
0
1
1
1
0
• Perceptron learns these rules easily
(ie sets appropriate weights and threshold)
(to w=(w0,w1,w2) = (-1.5,1.0,1.0) and (-0.5,1.0,1.0) where w0 corresponds
to the threshold term)
Problems
• Perceptrons can only perform
accurately with linearly separable
classes (linear hyperplane can place one class of
x1
objects on one side of plane and other class on other)
• ANN research put on hold for
20yrs.
• Solution: additional (hidden) layers
of neurons, MLP architecture
x2
x1
• Able to solve non-linear classification
problems
x2
MLPs
• Learning procedure is extension
of simple perceptron algorithm
• Response function:
oi
wij
oi=g(∑iwijg(∑kwjkxk))
Which is non-linear so network able to
perform non-linear mappings
• (Theory tells us that a neural
network with at least 1 hidden
layer can represent any
xk
function)
• Vast number of ANN types exist
hj
wjk
Backpropagation ANNs
• Most widely used type of network
• Feedforward
• Supervised (learns mapping from one data
space to another using examples)
• Error propagated backwards
• Versatile. Used for data modelling,
classification, forecasting, data and image
compression and pattern recognition.
BP Learning Algorithm
• Like Perceptron, uses gradient
descent to minimize error (generalized
to case with hidden layers)
• Each iteration constitutes two sweeps
• To minimize Error we need δE/δwij but
also need δE/δwjk (which we get
using the chain rule)
• Training of MLP using BP can be
thought of as a walk in weight space
along an energy surface, trying to
find global minimum and avoiding
local minima
• Unlike for Perceptron, there is no
guarantee that global minimum will
be reached, but most cases energy
landscape is smooth
Summary of BP learning algorithm
1. Initialize wij and wjk with random values.
2. Repeat until wij and wjk have converged or the
desired performance level is reached:
•
•
•
Pick pattern p from training set.
Present input and calculate the output.
Update weights according to:
wij(t + 1) = wij(t) – Δwij
wjk(t + 1) = wjk(t) – Δwjk
where Δw = -η δE/δw.
(…etc…for extra hidden layers).
Training
•
•
•
Generalization; network’s performance
on a set of test patterns it has never
seen before. (lower than on training set)
Training set used to let ANN capture
features in data or mapping.
Initial large drop in error is due to
learning, but subsequent slow reduction
is due to:
1.
2.
Error (eg SSE)
Testing
Optimum
network
Network memorization (too many training
cycles used).
Overfitting (too many hidden nodes).
Training
(network learns individual training examples
and loses generalization ability)
No. of hidden nodes or training cycles
Other Popular ANNs
Some applications may be solved using variety of
ANN types, some only via specific. (problem
logistics)
• Hopfield networks; optimization.
Presented with incomplete/noisy pattern, network responds by
retrieving an internally stored pattern it most closely resembles.
• Kohonen networks; (self-organizing)
Trained in an unsupervised manner to form clusters in the data.
Used for pattern classification and data compression.
HEP Applications
ANNs applied from off-line data analysis to lowlevel experimental triggers
• Signal to background ratios reduced. (BP…)
– ie in flavour tagging, Higgs detection
• Feature recognition problems in track finding.
(feed-back)
• Function approximation tasks (feed-back)
– ie reconstructing the mass of a decayed particle from
calorimeter information
•
http://www.doc.ic.ac.uk/~nd/surprise_96.journal/vol4/cs11/report.html
•
http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html
•
Carsten Peterson and Thorsteinn Rognvaldsson, An Introduction to Artificial
Neural Networks, LU TP 91-23, September 1991 (Lectures given at the
1991 Cern School of Computing, Sweden)