Topic_4A_ANN

Download Report

Transcript Topic_4A_ANN

ICT619 Intelligent
Systems
Topic 4: Artificial Neural
Networks
Artificial Neural Networks
PART A
 Introduction
 An overview of the biological neuron
 The synthetic neuron
 Structure and operation of an ANN
 Problem solving by an ANN
 Learning in ANNs
 ANN models
 Applications
PART B
 Developing neural network applications
 Design of the network
 Training issues
 A comparison of ANN and ES
 Hybrid ANN systems
 Case Studies
ICT619
2
Introduction
 Artificial Neural Networks (ANN)
Also known as
 Neural networks
 Neural computing (or neuro-computing) systems
 Connectionist models
 ANNs simulate the biological brain for problem solving
 This represents a totally different approach to machine intelligence
from the symbolic logic approach
 The biological brain is a massively parallel system of
interconnected processing elements
 ANNs simulate a similar network of simple processing elements at
a greatly reduced scale
ICT619
3
Introduction
 ANNs adapt themselves using data to learn problem
solutions
 ANNs can be particularly effective for problems that are
hard to solve using conventional computing methods
 First developed in the 1950s, slumped in 70s
 Great upsurge in interest in the mid 1980s
 Both ANNs and expert systems are non-algorithmic
tools for problem solving
 ES rely on the solution being expressed as a set of
heuristics by an expert
 ANNs learn solely from data.
ICT619
4
ICT619
5
An overview of the biological
neuron
 Estimated 1000 billion neurons in the human brain,
with each connected to up to 10,000 others
 Electrical impulses produced by a neuron travel along
the axon
 The axon connects to dendrites through synaptic
junctions
ICT619
6
An overview of the biological
neuron
ICT619
Photo: Osaka University
7
An overview of the biological
neuron
 A neuron collects the excitation of its inputs and "fires"
(produces a burst of activity) when the sum of its inputs
exceeds a certain threshold
 The strengths of a neuron’s inputs are modified
(enhanced or inhibited) by the synaptic junctions
 Learning in our brains occurs through a continuous
process of new interconnections forming between
neurons, and adjustments at the synaptic junctions
ICT619
8
The synthetic neuron
 A simple model of the biological neuron, first
proposed in 1943 by McCulloch and Pitts
consists of a summing function with an internal
threshold, and "weighted" inputs as shown
below.
ICT619
9
The synthetic neuron (cont’d)
 For a neuron receiving n inputs, each input xi ( i ranging
from 1 to n) is weighted by multiplying it with a weight wi
 The sum of the products wixi gives the net activation
value of the neuron
 The activation value is subjected to a transfer function to
produce the neuron’s output
 The weight value of the connection carrying signals from
a neuron i to a neuron j is termed wij..
ICT619
10
Transfer functions
 These compute the output of a node from its net
activation. Among the popular transfer functions are:




Step function
Signum (or sign) function
Sigmoid function
Hyperbolic tangent function
 In the step function, the neuron produces an output
only when its net activation reaches a minimum value –
known as the threshold
 For a binary neuron i, whose output is a 0 or 1 value,
the step function can be summarised as:
0 if activationi  T
outputi  
1 if activationi  T
ICT619
11
Transfer functions (cont’d)
The sign function returns a value between -1 and +1. To avoid
confusion with 'sine' it is often called signum.
outputi
+1
0
activationi
-1
 1 if activationi  0
outputi  
 1 if activationi  0
ICT619
12
Transfer functions (cont’d)
The sigmoid
 The sigmoid transfer function produces a continuous
value in the range 0 to 1
 The parameter gain affects the slope of the function
around zero
ICT619
13
Transfer functions (cont’d)
The hyperbolic tangent
 A variant of the sigmoid transfer function
outputi
e activationi  e  activationi

e activationi  e activationi
Has a shape similar to the sigmoid (like an S), with the
difference being that the value of outputi ranges
between –1 and 1.
ICT619
14
Structure and operation of
an ANN
 The building block of an ANN is the artificial
neuron. It is characterised by
 weighted inputs
 summing and transfer function
 The most common architecture of an ANN consists
of two or more layers of artificial neurons or nodes,
with each node in a layer connected to every node
in the following layer
 Signals usually flow from the input layer, which is
directly subjected to an input pattern, across one
or more hidden layers towards the output layer.
ICT619
15
Structure and operation of an
ANN
 The most popular ANN architecture, known as the
multilayer perceptron (shown in diagram above),
follows this model.
 In some models of the ANN, such as the selforganising map (SOM) or Kohonen net, nodes in
the same layer may have interconnections among
them
 In recurrent networks, connections can even go
backwards to nodes closer
ICT619 to input
16
Problem solving by an ANN
 The inputs of an ANN are data values grouped
together to form a pattern
 Each data value (component of the pattern vector) is
applied to one neuron in the input layer
 The output value(s) of node(s) in the output layer
represent some function of the input pattern
ICT619
17
Problem solving by an ANN
(cont’d)
• In the example above, the ANN maps the input pattern
to either one of two classes
 The ANN produces the output for an accurate
prediction, only if the functional relationships between
the relevant variables, namely the components of the
input pattern, and the corresponding output, have been
“learned” by the ANN
 Any three-layer ANN can (at least in theory) represent
the functional relationship between an input pattern
and its class
 It may be difficult in practice for the ANN to learn a
given relationship
ICT619
18
Learning in ANN
Common human learning behaviour: repeatedly going
through same material, making mistakes and learning
until able to carry out a given task successfully
 Learning by most ANNs is modelled after this type of
human learning
 Learned knowledge to solve a given problem is stored
in the interconnection weights of an ANN
 The process by which an ANN arrives at the right
values of these weights is known as learning or training
ICT619
19
Learning in ANN (cont’d)
 Learning in ANNs takes place through an
iterative training process during which node
interconnection weight values are adjusted
 Initial weights, usually small random values,
are assigned to the interconnections between
the ANN nodes.
 Like knowledge acquisition in ES, learning in
ANNs can be the most time consuming phase
in its development
ICT619
20
Learning in ANNs (cont’d)
ANN learning (or training) can be supervised or
unsupervised
In supervised training,
 data sets consisting of pairs, each one an input patterns and
its expected correct output value, are used
 The weight adjustments during each iteration aim to reduce
the “error” (difference between the ANN’s actual output and
the expected correct output)
 Eg, a node producing a small negative output when it is
expected to produce a large positive one, has its positive
weight values increased and the negative weight values
decreased
ICT619
21
Learning in ANNs
In supervised training,
 Pairs of sample input value and corresponding output
value are used to train the net repeatedly until the
output becomes satisfactorily accurate
In unsupervised training,
 there is no known expected output used for guiding the
weight adjustments
 The function to be optimised can be any function of the
inputs and outputs, usually set by the application
 the net adapts itself to align its weight values with
training patterns
 This results in groups of nodes responding strongly to
specific groups of similar inputs patterns
ICT619
22
The two states of an ANN
 A neural network can be in one of two
states: training mode or operation mode
 Most ANNs learn off-line and do not change their
weights once training is finished and they are in
operation
 In an ANN capable of on-line learning, training and
operation continue together
 ANN training can be time consuming, but once
trained, the resulting network can be made to run
very efficiently – providing fast responses
ICT619
23
ANN models

ANNs are supposed to model the structure and
operation of the biological brain

But there are different types of neural networks
depending on the architecture, learning strategy and
operation

Three of the most well known models are:
1. The multilayer perceptron
2. The Kohonen network (the Self-Organising Map)
3. The Hopfield net

The Multilayer Perceptron (MLP) is the most popular
ANN architecture
ICT619
24
The Multilayer Perceptron
 Nodes are arranged into an input layer, an output layer
and one or more hidden layers
 Also known as the backpropagation network because
of the use of error values from the output layer in the
layers before it to calculate weight adjustments during
training.
 Another name for the MLP is the feedforward network.
ICT619
25
MLP learning algorithm
 The learning rule for the multilayer perceptron is known
as "the generalised delta rule" or the "backpropagation
rule"
 The generalised delta rule repeatedly calculates an
error value for each input, which is a function of the
squared difference between the expected correct
output and the actual output
 The calculated error is backpropagated from one layer
to the previous one, and is used to adjust the weights
between connecting layers
ICT619
26
MLP learning algorithm (cont’d)
New weight = Old weight + change calculated from square of error
Error = difference between desired output and actual output
 Training stops when error becomes acceptable, or
after a predetermined number of iterations
 After training, the modified interconnection weights
form a sort of internal representation that enables the
ANN to generate desired outputs when given the
training inputs – or even new inputs that are similar to
training inputs
 This generalisation is a very important property
ICT619
27
The error landscape in a
multilayer perceptron
 For a given pattern p, the error Ep can be plotted
against the weights to give the so called error surface
 The error surface is a landscape of hills and valleys,
with points of minimum error corresponding to wells
and maximum error found on peaks.
 The generalised delta rule aims to minimise Ep by
adjusting weights so that they correspond to points of
lowest error
 It follows the method of gradient descent where the
changes are made in the steepest downward direction
 All possible solutions are depressions in the error
surface, known as basins of attraction
ICT619
28
The error landscape in a
multilayer perceptron
Ep
j
i
ICT619
29
Learning difficulties in
multilayer perceptrons - local
minima
 The MLP may fail to settle into the global minimum of
the error surface and instead find itself in one of the
local minima
 This is due to the gradient descent strategy followed
 A number of alternative approaches can be taken to
reduce this possibility:
 Lowering the gain term progressively
 Used to influence rate at which weight changes are made
during training
 Value by default is 1, but it may be gradually reduced to reduce
the rate of change as training progresses
ICT619
30
Learning difficulties in
multilayer perceptrons
(cont’d)
 Addition of more nodes for better representation of patterns
 Too few nodes (and consequently not enough weights) can cause
failure of the ANN to learn a pattern
 Introduction of a momentum term
 Determines effect of past weight changes on current direction of
movement in weight space
 Momentum term is also a small numerical value in the range 0 -1
 Addition of random noise to perturb the ANN out of local minima
 Usually done by adding small random values to weights.
 Takes the net to a different point in the error space – hopefully out
of a local minimum
ICT619
31
The Kohonen network (the selforganising map)
 Biological systems display both supervised and
unsupervised learning behaviour
 A neural network with unsupervised learning
capability is said to be self-organising
 During training, the Kohonen net changes its
weights to learn appropriate associations,
without any right answers being provided
ICT619
32
The Kohonen network (cont’d)
 The Kohonen net consists of an input layer, that
distributes the inputs to every node in a second layer,
known as the competitive layer.
 The competitive (output) layer is usually organised into
some 2-D or 3-D surface (feature map)
ICT619
33
Operation of the Kohonen Net
 Each neuron in the competitive layer is connected to other
neurons in its neighbourhood
 Neurons in the competitive layer have excitatory (positively
weighted) connections to immediate neighbours and
inhibitory (negatively weighted) connections to more distant
neurons.
 As an input pattern is presented, some of the neurons in the
competitive layer are sufficiently activated to produce
outputs, which are fed to other neurons in their
neighbourhoods
 The node with the set of input weights closest to the input
pattern component values produces the largest output. This
node is termed the best matching (or winning) node
ICT619
34
Operation of the Kohonen Net
(cont’d)
 During training, input weights of the best matching node and
its neighbours are adjusted to make them resemble the
input pattern even more closely
 At the completion of training, the best matching node ends
up with its input weight values aligned with the input pattern
and produces the strongest output whenever that particular
pattern is presented
 The nodes in the winning node's neighbourhood also have
their weights modified to settle down to an average
representation of that pattern class
 As a result, the net is able to represent clusters of similar
input patterns - a feature found useful for data mining
applications, for example.
ICT619
35
The Hopfield Model
 The Hopfield net is the most widely
known of all the autoassociative pattern completing - ANNs
 In autoassociation, a noisy or partially
incomplete input pattern causes the
network to stabilise to a state
corresponding to the original pattern
 It is also useful for optimisation tasks.
 The Hopfield net is a recurrent ANN in
which the output produced by each
neuron is fed back as input to all other
neurons
 Neurons computer a weighted sum
with a step transfer function.
ICT619
36
The Hopfield Model (cont’d)
 The Hopfield net has no iterative
learning algorithm as such. Patterns
(or facts) are simply stored by
adjusting the weights to lower a term
called network energy
 During operation, an input pattern is
applied to all neurons simultaneously
and the network is left to stabilise
 Outputs from the neurons in the stable
state form the output of the network.
 When presented with an input pattern,
the net outputs a stored pattern
nearest to the presented pattern.
ICT619
37
When ANNs should be applied
Difficulties with some real-life problems:
 Solutions are difficult, if not impossible, to define
algorithmically due mainly to the unstructured nature
 Too many variables and/or the interactions of relevant
variables not understood well
 Input data may be partially corrupt or missing, making it
difficult for a logical sequence of solution steps to
function effectively
ICT619
38
When ANNs should be applied
(cont’d)
 The typical ANN attempts to arrive at an answer by
learning to identify the right answer through an iterative
process of self-adaptation or training
 If there are many factors, with complex interactions
among them, the usual "linear" statistical techniques
may be inappropriate
 If sufficient data is available, an ANN can find the
relevant functional relationship by means of an
adaptive learning procedure from the data
ICT619
39
Current applications of ANNs
 ANNs are good at recognition and classification tasks
 Due to their ability to recognise complex patterns,
ANNs have been widely applied in character,
handwritten text and signature recognition, as well as
more complex images such as faces
 They have also been used successfully for speech
recognition and synthesis
 ANNs are being used in an increasing number of
applications where high-speed computation of
functions is important, eg, in industrial robotics
ICT619
40
Current applications of ANNs
(cont’d)
 One of the more successful applications of ANNs has
been as a decision support tool in the area of finance
and banking
 Some examples of commercial applications of ANN
are:






Financial market analysis for investment decision making
Sales support - targeting customers for telemarketing
Bankruptcy prediction
Intelligent flexible manufacturing systems
Stock market prediction
Resource allocation – scheduling and management of
personnel and equipment
ICT619
41
ANN applications - broad
categories
 According to a survey (Quaddus & Khan, 2002)
covering the period 1988 up to mid 1998, the
main business application areas of ANNs are:






Production (36%)
Information systems (20%)
Finance (18%)
Marketing & distribution (14.5%)
Accounting/Auditing (5%)
Others (6.5%)
ICT619
42
ANN applications - broad
categories (cont’d)
Table 1: Distribution of the Articles by Areas and Year
AREA
1988
Accounting/Auditing
1
Finance
0
Human resources
0
Information systems
4
Marketing/Distribution
2
Production
2
Others
0
Yearly Total
9
% of Total
1.32
89
0
0
0
6
2
6
0
14
2.05
90
1
4
0
9
2
8
1
25
3.65
91
1
11
1
7
3
21
7
51
7.46
92
6
19
0
15
8
31
3
82
11.99
93
3
28
1
24
10
38
8
112
16.37
94
3
27
1
21
12
24
7
95
13.89
95
7
18
0
18
17
50
8
118
17.25
96
7
5
0
13
29
29
7
90
13.16
97
5
9
0
18
14
31
5
82
11.99
98 Total % of Total
0
34
4.97
2
123
17.98
0
3
0.44
3
138
20.18
0
99
14.47
1
241
35.23
0
46
6.73
6
684 100.00
0.88 100.00
 The levelling off of publications on ANN applications
may be attributed to the ANN moving from the research
to the commercial application domain
 The emergence of other intelligent system tools may
be another factor
ICT619
43
Some advantages of ANNs
 Able to take incomplete or corrupt data and provide
approximate results.
 Good at generalisation, that is recognising patterns
similar to those learned during training
 Inherent parallelism makes them fault-tolerant – loss of
a few interconnections or nodes leaves the system
relatively unaffected
 Parallelism also makes ANNs fast and efficient for
handling large amounts of data.
ICT619
44
ANN State-of-the-art overview
 Currently neural network systems are available as
 Software simulation on conventional computers - prevalent
 Special purpose hardware that models the parallelism of
neurons.
 ANN-based systems not likely to replace conventional
computing systems, but they are an established
alternative to the symbolic logic approach to
information processing
 A new computing paradigm in the form of hybrid
intelligent systems has emerged - often involving ANNs
with other intelligent system tools
ICT619
45
REFERENCES
 AI Expert (special issue on ANN), June 1990.
 BYTE (special issue on ANN), Aug. 1989.
 Caudill,M., "The View from Now", AI Expert, June 1992, pp.27-31.
 Dhar, V., & Stein, R., Seven Methods for Transforming Corporate
Data into Business Intelligence., Prentice Hall 1997
 Kirrmann,H., "Neural Computing: The new gold rush in
informatics", IEEE Micro June 1989 pp. 7-9
 Lippman, R.P., "An Introduction to Computing with Neural Nets",
IEEE ASSP Magazine, April 1987 pp.4-21.
 Lisboa, P., (Ed.) Neural Networks Current Applications, Chapman
& Hall, 1992.
 Negnevitsky, M. Artificial Intelligence A Guide to Intelligent
Systems, Addison-Wesley 2005.
ICT619
46
REFERENCES (cont’d)
 Quaddus, M. A., and Khan, M. S., "Evolution of Artificial Neural
Networks in Business Applications: An Empirical Investigation
Using a Growth Model", International Journal of Management and
Decision Making, Vol.3, No.1, March 2002, pp.19-34.(see also
ANN application publications end note library files, ICT619 ftp site)
 Wasserman, P.D., Neural Computing, Theory and Practice, Van
Nostrand Reinhold, New York 1989
 Wong, B.K., Bodnovich, T.A., Selvi, Yakup, "Neural Networks
applications in business: A Review and Analysis of the literature
(1988-95)", Decision Support Systems, 19, 1997, pp. 301-320.
 Zahedi, F., Intelligent Systems for Business, Wadsworth
Publishing, Belmont, California, 1993.
 http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.ht
ml
ICT619
47