Artificial Neural Networks - Introduction -

Download Report

Transcript Artificial Neural Networks - Introduction -

Artificial Neural
Networks
and applications
Dr. L. Iliadis
Assis. Professor Democritus University of
Thrace, Greece
[email protected]
Overview
1. Definition of a Neural Network
2. The Human Brain
3. Neuron Models
4. Artificial neural networks ANN
5. Historical Notes
6. ANN Architecture
7. Learning processes – Training and Testing ANN
7.1. Backpropagation Learning
8. Well Known Applications of ANN
What is a Neural Network?
A Neural Network is a collection of units connected in some
pattern to allow communication between them and it acts as a
massively distributed processor.
These units are also referred to as neurons or nodes. The Neural
Network has a natural propensity for storing experiential knowledge
and making it available for use.
It has two main characteristics:
 Knowledge is acquired by the network from its environment
through a learning process.
Interneuron connection strengths, known as synaptic weights, are
used to store the acquired knowledge.
Animals are able to react adaptively to changes in their external and
internal environment, and they use their nervous system to perform these
behaviours. This is called Plasticity.
Zoom on Human Brain Neurons
Axon
Soma (Cell Body)
Dendrites
Biological Analogy of a Neuron
Dendrites
Soma (cell body)
Axon
The Role of the synapses
axon
dendrites
synapses
The synapses are responsible for the information
transmission between two connected Neurons
Structural Organization of Levels in the
Brain
Functional Areas of the Brain
Primary motor: voluntary movement
Primary somatosensory: tactile, pain, pressure,
position, temp., mvt.
Motor association: coordination of complex
movements
Sensory association: processing of
multisensorial information
Prefrontal: planning, emotion, judgement
Speech center (Broca’s area): speech production
and articulation
Wernicke’s area: comprehension
of speech
Auditory: hearing
Auditory association: complex auditory
processing
Visual: low-level vision
Visual association: higher-level
vision
HUMAN BRAIN VERSUS SILLICON
The Human cortex has approximately 10 billion
neurons and 60 trillion synapses. The net results is
that the Brain is an enormously efficient structure
The Energetic Efficiency of the Brain is
approximately 10-16 Joules per Operation per second
whereas the corresponding value for the best
computers in use today is about 10-6 Joules per
Operation per second
Human Brain Neurons (where events happen in the
millisecond range 10-3 sec) are 5 or 6 orders of
magnitude slower than silicon logic gates (where
events happen in the nanosecond range 10-9 sec).
ARTIFICIAL NEURON MODEL
A signal Xi at the input of synapse i, connected to neuron k is multiplied by
the synaptic weight wi
An Adder is Summing the input signals weighted by the respective synapses
An activation function for limiting the amplitude of the output of a neuron. It
squashes the permissible amplitude range of output signal to a finit value in
the interval [0,1] or alternatively in [-1,1].
Bias b
k
x1
w1
x2
Inputs
w2
x3
…
xn-1
xn
.
z   wi xi
i 1
w3
..
n
wn-1
Summing
wn
function
Synaptic
Weights
y  H ( z  bk )
Activation
function
Output
y
Artificial neural Networks
Artificial Neural Networks consist of interconnected elements which
were inspired from studies of biological nervous Systems. They are an
attempt to create machines that work in a similar way to the human brain,
using components that behave like biological neurons.
• Synaptic strengths are translated as synaptic weights;
• Excitation means positive product between the incoming spike rate
and the corresponding synaptic weight;
• Inhibition means negative product between the incoming spike rate
and the corresponding synaptic weight;
Neuron’s Output
Nonlinear generalization of the neuron:
Sigmoidal or Gaussian functions may be used
y  H ( x, w)
Where y is the neuron’s output, x is the vector of inputs,
and w is the vector of synaptic weights.
y
1
1 e
ye
w xa
T
|| x  w||2

2a 2
Sigmoidal neuron
Gaussian neuron
Software or Hardware?
Although ANN can be implemented as fast
Hardware devices, much research has been
performed using conventional computer running
software simulations.
Software Simulations provide a somewhat cheap
and flexible environment in which to research
ideas for many real-world applications as there
exists an adequate performance. e.g. An ANN
Software package might be used to develop a
System from Credit Scoring of an individual who
applies for a Bank Loan.
Historical Notes
Ramon y Cajal in 1911 introduced the idea of neurons as
structural constituents of the brain
The origins of ANN go way back to the 1940s, when
McCulloch and Pitts published the first mathematical
model of a biological Neuron
Research on ANN stopped for more than 20 years.
In the mid 1980s emerged a huge interest in ANN due to the
publication of the book “Parallel Distributed Processors” by
Rumelhart and McClelland
ANN made a grate come-back in the 1990’s and they are
now widely accepted as a tool in the development of
Intelligent Systems
Architecture of Artificial Neural
Networks
Hidden Layer
Input Layer
Output Layer
The way that the artificial neurons are linked together to compose an
ANN may vary according to its Architecture. The above architectural
graph illustrates the layout of a Multilayer Feedforward (data flow only
to one direction) ANN in the case of a single Hidden Layer. The Hidden
Layer is where the process takes place.
Supervised Learning
Learning = learning by adaptation
For example: Animals learn that the green fruits are sour and the
yellowish/reddish ones are sweet. The learning happens by adapting
the fruit picking behavior.
Learning can be perceived as an optimisation process. When an
ANN is in its SUPERVISED training or learning phase, there are
three factors to be considered:
The Inputs applied are chosen from a Training Set, where the
desired response of the System to these Inputs is Known
The Actual Output Produced when an Input Pattern is applied, is
compared to the desired Output and an Error is estimated
In ANN the learning occurs by changing the synaptic strengths
(change of the weights) eliminating some synapses, and building new
ones.
PERCEPTRON
The Perceptron is one of the early ANN
which is built around a nonlinear neuron,
namely the McCulloch-Pitts neuron model.
It produces an output equal to +1 if the hard
limiter input is positive, and -1 if its is
negative.
Learning with a Perceptron
The synapse strength modification rules for artificial neural networks
can be derived by applying mathematical optimisation methods
T
y

w
x
Perceptron: out
1
2
N
(
x
,
y
),
(
x
,
y
),...,
(
x
, yN )
Input Data:
1
2
2
T t
2
E
(
t
)

(
y
(
t
)

y
)

(
w
(
t
)
x

y
)
Error:
out
t
t
Learning (Weight Adjustment):
 ( w(t )T x t  yt ) 2
E (t )
wi (t  1)  wi (t )  c 
 wi (t )  c 
wi
wi
wi (t  1)  wi (t )  c  ( w(t )T x t  yt )  xit
m
w(t ) x   w j (t )  x tj
T
j 1
Learning with MLP ANN
y 
1
k
MLP Multi Layer Process ANN:
1
 w1 kT x  a1k
1 e
y 1  ( y11 ,..., y 1M )T
, k  1,..., M 1
1
with p layers
y k2 
yout
x
1
 w 2 kT y 1  a k2
1 e
2
y 2  ( y12 ,..., y M
)T
, k  1,..., M 2
2
...
1
2
…
p-1 p
y out  F ( x;W )  w pT y p 1
Data: ( x1 , y1 ), ( x 2 , y2 ),...,( x N , y N )
Error: E(t )  ( y(t ) out  yt ) 2  ( F ( x t ;W )  yt ) 2
Calculation of the weight changes is too complicated
Backpropagation Learning
It was developed by Werbos but Rumelhart et. al. in 1986 gave a new
lease of life to ANN. The Weight adaption rule is known as
Backpropagation.
• It defines 2 sweeps of the ANN. First it performs a forward sweep
from the input layer. Thus it calculates first the changes for the
synaptic weights of the output neuron;
• Then is performs a backward sweep from the output layer to the
Input. In this way it calculates the changes backward starting from
layer p-1, and propagates backward the local error terms.
•The bakcward sweep is similar to the forward, except that error
values are propagated back through the ANN to determine how the
weights are to be changed during training.
EXTDBD Learning Rule
The Extended Delta Bar Delta is a Heuristic
technique that has been used successfully in a wide
range of applications and its main characteristic is
that it uses a termed momentum. More specifically,
a term is added to the standard weight change,
which is proportional to the previous weight change.
In this way good general trends are reinforced and
oscillations are damped.
EVALUATION INSTRUMENTS
The RMS Error adds up the squares of the errors for each PE in the
output layer, divides by the number of PEs in the output layer to obtain
an average and then takes the square root of that average hence the name
“root square”.
Also another instrument is the Common Mean Correlation (CMC)
coefficient of the desired (d) and the actual (predicted) output (y) across
the Epoch. The CMC is calculated by
CMC 




  d i  d  yi  y 


  d i  d 
2


  yi  y 
1
d   di
E

2
where
and

y
1
yi

E
It should be clarified that d stands for the desired values, y for the
predicted values and i ranges from 1 to n (the number of cases in the data
training set) and E is the Epoch size which is the number of sets of
training data presented to the ANN learning cycles between weight
updates.
TESTING AND OVERTRAINING
Over-Training is a very serious problem!!!
Testing is the process that actually determines the strength of the ANN
and its ability to generalize.
The performance of an ANN is critically dependent on the training data
that must be representative of the task to learn (Callan, 1999).
For this purpose in the Testing phase we chose randomly
A lomg set of actual cases (records) that were not applied in the
training phase.
New methods for learning with
neural networks
Bayesian learning:
the distribution of the neural network
parameters is learnt
Support vector learning:
the minimal representative subset of the
available data is used to calculate the synaptic
weights of the neurons
Tasks Performed by Artificial neural
networks
The following tasks are usually performed by ANN
• Controlling the movements of a robot based on selfperception and other information (e.g., visual
information)
• Decision making
• Pattern Recognition (e.g. recognizing a visual object,
a familiar face)
ANN tasks
• Control
These can be reformulated
in general as
• Classification
FUNCTION
APPROXIMATION
• Prediction
• Approximation
tasks.
With the term Approximation we mean: given a set of
values of a function g(x) build a neural network that
approximates the g(x) values for any input x.
Learning to approximate
Error measure:
1
E
N
N
2
(
F
(
x
;
W
)

y
)
 t
t
t 1
Rule for changing the synaptic weights:
E
wi  c 
(W )
j
wi
j
wi
j , new
 wi  wi
j
j
Where c is the learning parameter (usually a constant)
Summary
• Artificial neural networks are inspired by the learning processes that
take place in biological systems.
• Artificial neurons and neural networks try to imitate the working
mechanisms of their biological counterparts.
• Learning can be perceived as an optimisation process.
• Biological neural learning happens by the modification of the
synaptic strength. Artificial neural networks learn in the same way.
• The synapse strength modification rules for artificial neural networks
can be derived by applying mathematical optimisation methods.
Summary
• Learning tasks of artificial neural networks can be reformulated
as function approximation tasks.
• Neural networks can be considered as nonlinear function
approximating tools (i.e., linear combinations of nonlinear basis
functions), where the parameters of the networks should be found
by applying optimisation methods.
• The optimisation is done with respect to the approximation error
measure.
• In general it is enough to have a single hidden layer neural
network (MLP, RBF or other) to learn the approximation of a
nonlinear function. In such cases general optimisation can be
applied to find the change rules for the synaptic weights.
DEVELOPING ANN
DETERMINING ANN’S TOPOLOGY