Topic_4A_ANN
Download
Report
Transcript Topic_4A_ANN
ICT619 Intelligent
Systems
Topic 4: Artificial Neural
Networks
Artificial Neural Networks
PART A
Introduction
An overview of the biological neuron
The synthetic neuron
Structure and operation of an ANN
Problem solving by an ANN
Learning in ANNs
ANN models
Applications
PART B
Developing neural network applications
Design of the network
Training issues
A comparison of ANN and ES
Hybrid ANN systems
Case Studies
ICT619
2
Introduction
Artificial Neural Networks (ANN)
Also known as
Neural networks
Neural computing (or neuro-computing) systems
Connectionist models
ANNs simulate the biological brain for problem solving
This represents a totally different approach to machine intelligence
from the symbolic logic approach
The biological brain is a massively parallel system of
interconnected processing elements
ANNs simulate a similar network of simple processing elements at
a greatly reduced scale
ICT619
3
Introduction
ANNs adapt themselves using data to learn problem
solutions
ANNs can be particularly effective for problems that are
hard to solve using conventional computing methods
First developed in the 1950s, slumped in 70s
Great upsurge in interest in the mid 1980s
Both ANNs and expert systems are non-algorithmic
tools for problem solving
ES rely on the solution being expressed as a set of
heuristics by an expert
ANNs learn solely from data.
ICT619
4
ICT619
5
An overview of the biological
neuron
Estimated 1000 billion neurons in the human brain,
with each connected to up to 10,000 others
Electrical impulses produced by a neuron travel along
the axon
The axon connects to dendrites through synaptic
junctions
ICT619
6
An overview of the biological
neuron
ICT619
Photo: Osaka University
7
An overview of the biological
neuron
A neuron collects the excitation of its inputs and "fires"
(produces a burst of activity) when the sum of its inputs
exceeds a certain threshold
The strengths of a neuron’s inputs are modified
(enhanced or inhibited) by the synaptic junctions
Learning in our brains occurs through a continuous
process of new interconnections forming between
neurons, and adjustments at the synaptic junctions
ICT619
8
The synthetic neuron
A simple model of the biological neuron, first
proposed in 1943 by McCulloch and Pitts
consists of a summing function with an internal
threshold, and "weighted" inputs as shown
below.
ICT619
9
The synthetic neuron (cont’d)
For a neuron receiving n inputs, each input xi ( i ranging
from 1 to n) is weighted by multiplying it with a weight wi
The sum of the products wixi gives the net activation
value of the neuron
The activation value is subjected to a transfer function to
produce the neuron’s output
The weight value of the connection carrying signals from
a neuron i to a neuron j is termed wij..
ICT619
10
Transfer functions
These compute the output of a node from its net
activation. Among the popular transfer functions are:
Step function
Signum (or sign) function
Sigmoid function
Hyperbolic tangent function
In the step function, the neuron produces an output
only when its net activation reaches a minimum value –
known as the threshold
For a binary neuron i, whose output is a 0 or 1 value,
the step function can be summarised as:
0 if activationi T
outputi
1 if activationi T
ICT619
11
Transfer functions (cont’d)
The sign function returns a value between -1 and +1. To avoid
confusion with 'sine' it is often called signum.
outputi
+1
0
activationi
-1
1 if activationi 0
outputi
1 if activationi 0
ICT619
12
Transfer functions (cont’d)
The sigmoid
The sigmoid transfer function produces a continuous
value in the range 0 to 1
The parameter gain affects the slope of the function
around zero
ICT619
13
Transfer functions (cont’d)
The hyperbolic tangent
A variant of the sigmoid transfer function
outputi
e activationi e activationi
e activationi e activationi
Has a shape similar to the sigmoid (like an S), with the
difference being that the value of outputi ranges
between –1 and 1.
ICT619
14
Structure and operation of
an ANN
The building block of an ANN is the artificial
neuron. It is characterised by
weighted inputs
summing and transfer function
The most common architecture of an ANN consists
of two or more layers of artificial neurons or nodes,
with each node in a layer connected to every node
in the following layer
Signals usually flow from the input layer, which is
directly subjected to an input pattern, across one
or more hidden layers towards the output layer.
ICT619
15
Structure and operation of an
ANN
The most popular ANN architecture, known as the
multilayer perceptron (shown in diagram above),
follows this model.
In some models of the ANN, such as the selforganising map (SOM) or Kohonen net, nodes in
the same layer may have interconnections among
them
In recurrent networks, connections can even go
backwards to nodes closer
ICT619 to input
16
Problem solving by an ANN
The inputs of an ANN are data values grouped
together to form a pattern
Each data value (component of the pattern vector) is
applied to one neuron in the input layer
The output value(s) of node(s) in the output layer
represent some function of the input pattern
ICT619
17
Problem solving by an ANN
(cont’d)
• In the example above, the ANN maps the input pattern
to either one of two classes
The ANN produces the output for an accurate
prediction, only if the functional relationships between
the relevant variables, namely the components of the
input pattern, and the corresponding output, have been
“learned” by the ANN
Any three-layer ANN can (at least in theory) represent
the functional relationship between an input pattern
and its class
It may be difficult in practice for the ANN to learn a
given relationship
ICT619
18
Learning in ANN
Common human learning behaviour: repeatedly going
through same material, making mistakes and learning
until able to carry out a given task successfully
Learning by most ANNs is modelled after this type of
human learning
Learned knowledge to solve a given problem is stored
in the interconnection weights of an ANN
The process by which an ANN arrives at the right
values of these weights is known as learning or training
ICT619
19
Learning in ANN (cont’d)
Learning in ANNs takes place through an
iterative training process during which node
interconnection weight values are adjusted
Initial weights, usually small random values,
are assigned to the interconnections between
the ANN nodes.
Like knowledge acquisition in ES, learning in
ANNs can be the most time consuming phase
in its development
ICT619
20
Learning in ANNs (cont’d)
ANN learning (or training) can be supervised or
unsupervised
In supervised training,
data sets consisting of pairs, each one an input patterns and
its expected correct output value, are used
The weight adjustments during each iteration aim to reduce
the “error” (difference between the ANN’s actual output and
the expected correct output)
Eg, a node producing a small negative output when it is
expected to produce a large positive one, has its positive
weight values increased and the negative weight values
decreased
ICT619
21
Learning in ANNs
In supervised training,
Pairs of sample input value and corresponding output
value are used to train the net repeatedly until the
output becomes satisfactorily accurate
In unsupervised training,
there is no known expected output used for guiding the
weight adjustments
The function to be optimised can be any function of the
inputs and outputs, usually set by the application
the net adapts itself to align its weight values with
training patterns
This results in groups of nodes responding strongly to
specific groups of similar inputs patterns
ICT619
22
The two states of an ANN
A neural network can be in one of two
states: training mode or operation mode
Most ANNs learn off-line and do not change their
weights once training is finished and they are in
operation
In an ANN capable of on-line learning, training and
operation continue together
ANN training can be time consuming, but once
trained, the resulting network can be made to run
very efficiently – providing fast responses
ICT619
23
ANN models
ANNs are supposed to model the structure and
operation of the biological brain
But there are different types of neural networks
depending on the architecture, learning strategy and
operation
Three of the most well known models are:
1. The multilayer perceptron
2. The Kohonen network (the Self-Organising Map)
3. The Hopfield net
The Multilayer Perceptron (MLP) is the most popular
ANN architecture
ICT619
24
The Multilayer Perceptron
Nodes are arranged into an input layer, an output layer
and one or more hidden layers
Also known as the backpropagation network because
of the use of error values from the output layer in the
layers before it to calculate weight adjustments during
training.
Another name for the MLP is the feedforward network.
ICT619
25
MLP learning algorithm
The learning rule for the multilayer perceptron is known
as "the generalised delta rule" or the "backpropagation
rule"
The generalised delta rule repeatedly calculates an
error value for each input, which is a function of the
squared difference between the expected correct
output and the actual output
The calculated error is backpropagated from one layer
to the previous one, and is used to adjust the weights
between connecting layers
ICT619
26
MLP learning algorithm (cont’d)
New weight = Old weight + change calculated from square of error
Error = difference between desired output and actual output
Training stops when error becomes acceptable, or
after a predetermined number of iterations
After training, the modified interconnection weights
form a sort of internal representation that enables the
ANN to generate desired outputs when given the
training inputs – or even new inputs that are similar to
training inputs
This generalisation is a very important property
ICT619
27
The error landscape in a
multilayer perceptron
For a given pattern p, the error Ep can be plotted
against the weights to give the so called error surface
The error surface is a landscape of hills and valleys,
with points of minimum error corresponding to wells
and maximum error found on peaks.
The generalised delta rule aims to minimise Ep by
adjusting weights so that they correspond to points of
lowest error
It follows the method of gradient descent where the
changes are made in the steepest downward direction
All possible solutions are depressions in the error
surface, known as basins of attraction
ICT619
28
The error landscape in a
multilayer perceptron
Ep
j
i
ICT619
29
Learning difficulties in
multilayer perceptrons - local
minima
The MLP may fail to settle into the global minimum of
the error surface and instead find itself in one of the
local minima
This is due to the gradient descent strategy followed
A number of alternative approaches can be taken to
reduce this possibility:
Lowering the gain term progressively
Used to influence rate at which weight changes are made
during training
Value by default is 1, but it may be gradually reduced to reduce
the rate of change as training progresses
ICT619
30
Learning difficulties in
multilayer perceptrons
(cont’d)
Addition of more nodes for better representation of patterns
Too few nodes (and consequently not enough weights) can cause
failure of the ANN to learn a pattern
Introduction of a momentum term
Determines effect of past weight changes on current direction of
movement in weight space
Momentum term is also a small numerical value in the range 0 -1
Addition of random noise to perturb the ANN out of local minima
Usually done by adding small random values to weights.
Takes the net to a different point in the error space – hopefully out
of a local minimum
ICT619
31
The Kohonen network (the selforganising map)
Biological systems display both supervised and
unsupervised learning behaviour
A neural network with unsupervised learning
capability is said to be self-organising
During training, the Kohonen net changes its
weights to learn appropriate associations,
without any right answers being provided
ICT619
32
The Kohonen network (cont’d)
The Kohonen net consists of an input layer, that
distributes the inputs to every node in a second layer,
known as the competitive layer.
The competitive (output) layer is usually organised into
some 2-D or 3-D surface (feature map)
ICT619
33
Operation of the Kohonen Net
Each neuron in the competitive layer is connected to other
neurons in its neighbourhood
Neurons in the competitive layer have excitatory (positively
weighted) connections to immediate neighbours and
inhibitory (negatively weighted) connections to more distant
neurons.
As an input pattern is presented, some of the neurons in the
competitive layer are sufficiently activated to produce
outputs, which are fed to other neurons in their
neighbourhoods
The node with the set of input weights closest to the input
pattern component values produces the largest output. This
node is termed the best matching (or winning) node
ICT619
34
Operation of the Kohonen Net
(cont’d)
During training, input weights of the best matching node and
its neighbours are adjusted to make them resemble the
input pattern even more closely
At the completion of training, the best matching node ends
up with its input weight values aligned with the input pattern
and produces the strongest output whenever that particular
pattern is presented
The nodes in the winning node's neighbourhood also have
their weights modified to settle down to an average
representation of that pattern class
As a result, the net is able to represent clusters of similar
input patterns - a feature found useful for data mining
applications, for example.
ICT619
35
The Hopfield Model
The Hopfield net is the most widely
known of all the autoassociative pattern completing - ANNs
In autoassociation, a noisy or partially
incomplete input pattern causes the
network to stabilise to a state
corresponding to the original pattern
It is also useful for optimisation tasks.
The Hopfield net is a recurrent ANN in
which the output produced by each
neuron is fed back as input to all other
neurons
Neurons computer a weighted sum
with a step transfer function.
ICT619
36
The Hopfield Model (cont’d)
The Hopfield net has no iterative
learning algorithm as such. Patterns
(or facts) are simply stored by
adjusting the weights to lower a term
called network energy
During operation, an input pattern is
applied to all neurons simultaneously
and the network is left to stabilise
Outputs from the neurons in the stable
state form the output of the network.
When presented with an input pattern,
the net outputs a stored pattern
nearest to the presented pattern.
ICT619
37
When ANNs should be applied
Difficulties with some real-life problems:
Solutions are difficult, if not impossible, to define
algorithmically due mainly to the unstructured nature
Too many variables and/or the interactions of relevant
variables not understood well
Input data may be partially corrupt or missing, making it
difficult for a logical sequence of solution steps to
function effectively
ICT619
38
When ANNs should be applied
(cont’d)
The typical ANN attempts to arrive at an answer by
learning to identify the right answer through an iterative
process of self-adaptation or training
If there are many factors, with complex interactions
among them, the usual "linear" statistical techniques
may be inappropriate
If sufficient data is available, an ANN can find the
relevant functional relationship by means of an
adaptive learning procedure from the data
ICT619
39
Current applications of ANNs
ANNs are good at recognition and classification tasks
Due to their ability to recognise complex patterns,
ANNs have been widely applied in character,
handwritten text and signature recognition, as well as
more complex images such as faces
They have also been used successfully for speech
recognition and synthesis
ANNs are being used in an increasing number of
applications where high-speed computation of
functions is important, eg, in industrial robotics
ICT619
40
Current applications of ANNs
(cont’d)
One of the more successful applications of ANNs has
been as a decision support tool in the area of finance
and banking
Some examples of commercial applications of ANN
are:
Financial market analysis for investment decision making
Sales support - targeting customers for telemarketing
Bankruptcy prediction
Intelligent flexible manufacturing systems
Stock market prediction
Resource allocation – scheduling and management of
personnel and equipment
ICT619
41
ANN applications - broad
categories
According to a survey (Quaddus & Khan, 2002)
covering the period 1988 up to mid 1998, the
main business application areas of ANNs are:
Production (36%)
Information systems (20%)
Finance (18%)
Marketing & distribution (14.5%)
Accounting/Auditing (5%)
Others (6.5%)
ICT619
42
ANN applications - broad
categories (cont’d)
Table 1: Distribution of the Articles by Areas and Year
AREA
1988
Accounting/Auditing
1
Finance
0
Human resources
0
Information systems
4
Marketing/Distribution
2
Production
2
Others
0
Yearly Total
9
% of Total
1.32
89
0
0
0
6
2
6
0
14
2.05
90
1
4
0
9
2
8
1
25
3.65
91
1
11
1
7
3
21
7
51
7.46
92
6
19
0
15
8
31
3
82
11.99
93
3
28
1
24
10
38
8
112
16.37
94
3
27
1
21
12
24
7
95
13.89
95
7
18
0
18
17
50
8
118
17.25
96
7
5
0
13
29
29
7
90
13.16
97
5
9
0
18
14
31
5
82
11.99
98 Total % of Total
0
34
4.97
2
123
17.98
0
3
0.44
3
138
20.18
0
99
14.47
1
241
35.23
0
46
6.73
6
684 100.00
0.88 100.00
The levelling off of publications on ANN applications
may be attributed to the ANN moving from the research
to the commercial application domain
The emergence of other intelligent system tools may
be another factor
ICT619
43
Some advantages of ANNs
Able to take incomplete or corrupt data and provide
approximate results.
Good at generalisation, that is recognising patterns
similar to those learned during training
Inherent parallelism makes them fault-tolerant – loss of
a few interconnections or nodes leaves the system
relatively unaffected
Parallelism also makes ANNs fast and efficient for
handling large amounts of data.
ICT619
44
ANN State-of-the-art overview
Currently neural network systems are available as
Software simulation on conventional computers - prevalent
Special purpose hardware that models the parallelism of
neurons.
ANN-based systems not likely to replace conventional
computing systems, but they are an established
alternative to the symbolic logic approach to
information processing
A new computing paradigm in the form of hybrid
intelligent systems has emerged - often involving ANNs
with other intelligent system tools
ICT619
45
REFERENCES
AI Expert (special issue on ANN), June 1990.
BYTE (special issue on ANN), Aug. 1989.
Caudill,M., "The View from Now", AI Expert, June 1992, pp.27-31.
Dhar, V., & Stein, R., Seven Methods for Transforming Corporate
Data into Business Intelligence., Prentice Hall 1997
Kirrmann,H., "Neural Computing: The new gold rush in
informatics", IEEE Micro June 1989 pp. 7-9
Lippman, R.P., "An Introduction to Computing with Neural Nets",
IEEE ASSP Magazine, April 1987 pp.4-21.
Lisboa, P., (Ed.) Neural Networks Current Applications, Chapman
& Hall, 1992.
Negnevitsky, M. Artificial Intelligence A Guide to Intelligent
Systems, Addison-Wesley 2005.
ICT619
46
REFERENCES (cont’d)
Quaddus, M. A., and Khan, M. S., "Evolution of Artificial Neural
Networks in Business Applications: An Empirical Investigation
Using a Growth Model", International Journal of Management and
Decision Making, Vol.3, No.1, March 2002, pp.19-34.(see also
ANN application publications end note library files, ICT619 ftp site)
Wasserman, P.D., Neural Computing, Theory and Practice, Van
Nostrand Reinhold, New York 1989
Wong, B.K., Bodnovich, T.A., Selvi, Yakup, "Neural Networks
applications in business: A Review and Analysis of the literature
(1988-95)", Decision Support Systems, 19, 1997, pp. 301-320.
Zahedi, F., Intelligent Systems for Business, Wadsworth
Publishing, Belmont, California, 1993.
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.ht
ml
ICT619
47