Artificial Intelligence
Download
Report
Transcript Artificial Intelligence
AI – CS289
Machine Learning
Introduction to Machine Learning
30th October 2006
Dr Bogdan L. Vrusias
[email protected]
AI – CS289
Machine Learning
Contents
•
•
•
•
Learning
Artificial Neural Networks
Supervised Learning
Unsupervised Learning
30th October 2006
Bogdan L. Vrusias © 2006
2
AI – CS289
Machine Learning
What is Learning?
• ‘The action of receiving instruction or acquiring knowledge’.
• ‘A process which leads to the modification of behaviour or the
acquisition of new abilities or responses, and which is additional to
natural development by growth or maturation’.
Source: Oxford English Dictionary Online: http://www.oed.com/, accessed October 2003.
30th October 2006
Bogdan L. Vrusias © 2006
3
AI – CS289
Machine Learning
Machine Learning
•
Negnevitsky:
‘In general, machine learning involves adaptive mechanisms that enable
computers to learn from experience, learn by example and learn by
analogy’ (2005:165)
•
Callan:
‘A machine or software tool would not be viewed as intelligent if it could
not adapt to changes in its environment’ (2003:225)
•
Luger:
‘Intelligent agents must be able to change through the course of their
interactions with the world’ (2002:351)
•
Learning capabilities can improve the performance of an intelligent
system over time.
•
The most popular approaches to machine learning are artificial
neural networks and genetic algorithms.
30th October 2006
Bogdan L. Vrusias © 2006
4
AI – CS289
Machine Learning
Types of Learning
• Inductive learning
– Learning from examples
• Evolutionary/genetic learning
– Shaping a population of individual solutions through survival of the fittest
– Emergent learning
– Learning through social interaction: game of life
30th October 2006
Bogdan L. Vrusias © 2006
5
AI – CS289
Machine Learning
Inductive Learning
• Supervised learning
– Training examples with a known classification from a teacher
• Unsupervised learning
– No pre-classification of training examples
– Competitive learning: learning through competition on training examples
30th October 2006
Bogdan L. Vrusias © 2006
6
AI – CS289
Machine Learning
Key Concepts
• Learn from experience
– Through examples, analogy or discovery
• To adapt
– Changes in response to interaction
• Generalisation
– To use experience to form a response to novel situations
30th October 2006
Bogdan L. Vrusias © 2006
7
AI – CS289
Machine Learning
Generalisation
30th October 2006
Bogdan L. Vrusias © 2006
8
AI – CS289
Machine Learning
Uses of Machine Learning
• Techniques and algorithms that adapt through experience.
• Used for:
–
–
–
–
–
Interpretation / visualisation: summarise data
Prediction: time series / stock data
Classification: malignant or benign tumours
Regression: curve fitting
Discovery: data mining / pattern discovery
30th October 2006
Bogdan L. Vrusias © 2006
9
AI – CS289
Machine Learning
Why Machine Learning?
• Complexity of task / amount of data
– Other techniques fail or are computationally expensive
• Problems that cannot be defined
– Discovery of patterns / data mining
• Knowledge Engineering Bottleneck
– ‘Cost and difficulty of building expert systems using traditional […]
techniques’ (Luger 2002:351)
30th October 2006
Bogdan L. Vrusias © 2006
10
AI – CS289
Machine Learning
Common Techniques
•
•
•
•
•
•
•
Least squares
Decision trees
Support vector machines
Boosting
Neural networks
K-means
Genetic algorithms
30th October 2006
Bogdan L. Vrusias © 2006
11
AI – CS289
Machine Learning
Decision Trees
• A map of the reasoning process, good at solving classification
problems (Negnevitsky, 2005)
• A decision tree represents a number of different attributes and values
– Nodes represent attributes
– Branches represent values of the attributes
• Path through a tree represents a decision
• Tree can be associated with rules
30th October 2006
Bogdan L. Vrusias © 2006
12
AI – CS289
Machine Learning
Example 1
• Consider one rule for an ice-cream seller (Callan
2003:241)
IF Outlook = Sunny
AND Temperature = Hot
THEN Sell
30th October 2006
Bogdan L. Vrusias © 2006
13
AI – CS289
Machine Learning
Example 1
Branch
Node
Temperature
Hot Mild
Sell
Overcast
Sunny
Leaf
Holiday Season
Cold
Holiday Season
Yes
Don’t Sell
No
Sell
Don’t Sell
Sell
Bogdan L. Vrusias © 2006
No
Temperature
Hot Mild
Yes
30th October 2006
Root node
Outlook
Don’t Sell
Cold
Don’t Sell
Don’t Sell
14
AI – CS289
Machine Learning
Construction
• Concept learning:
– Inducing concepts from examples
• We can intuitively construct a decision tree for a small set of examples
• Different algorithms used to construct a tree based upon the examples
– Most popular ID3 (Quinlan, 1986)
30th October 2006
Bogdan L. Vrusias © 2006
15
AI – CS289
Machine Learning
Which Tree?
• Different trees can be constructed from the same set of examples
• Which tree is the best?
– Based upon choice of attributes at each node in the tree
– A split in the tree (branches) should correspond to the predictor with the
maximum separating power
• Examples can be contradictory
– Real-life is noisy
30th October 2006
Bogdan L. Vrusias © 2006
16
AI – CS289
Machine Learning
Extracting Rules
• We can extract rules from decision trees
– Create one rule for each root-to-leaf path
– Simplify by combining rules
• Other techniques are not so transparent:
– Neural networks are often described as ‘black boxes’ – it is difficult to
understand what the network is doing
– Extraction of rules from trees can help us to understand the decision
process
30th October 2006
Bogdan L. Vrusias © 2006
17
AI – CS289
Machine Learning
Issues
• Use prior knowledge where available
• Not all the examples may be needed to construct a tree
– Test generalisation of tree during training and stop when desired
performance is achieved
– Prune the tree once constructed
• Examples may be noisy
• Examples may contain irrelevant attributes
30th October 2006
Bogdan L. Vrusias © 2006
18
AI – CS289
Machine Learning
Artificial Neural Networks
• An artificial neural network (or simply a neural network) can be
defined as a model of reasoning based on the human brain.
• The brain consists of a densely interconnected set of nerve cells, or
basic information-processing units, called neurons.
• The human brain incorporates nearly 10 billion neurons and 60 trillion
connections, synapses, between them.
• By using multiple neurons simultaneously, the brain can perform its
functions much faster than the fastest computers in existence
today.
30th October 2006
Bogdan L. Vrusias © 2006
19
AI – CS289
Machine Learning
Artificial Neural Networks
• Each neuron has a very simple structure, but an army of such elements
constitutes a tremendous processing power.
• A neuron consists of a cell body, soma, a number of fibers called
dendrites, and a single long fiber called the axon.
Synapse
Axon
Soma
Dendrites
Synapse
Axon
Soma
Dendrites
Synapse
30th October 2006
Bogdan L. Vrusias © 2006
20
AI – CS289
Machine Learning
Artificial Neural Networks
• Our brain can be considered as a highly complex, non-linear and
parallel information-processing system.
• Learning is a fundamental and essential characteristic of biological
neural networks. The ease with which they can learn led to attempts to
emulate a biological neural network in a computer.
30th October 2006
Bogdan L. Vrusias © 2006
21
AI – CS289
Machine Learning
Artificial Neural Networks
An artificial neural network consists of a number of very simple processors,
also called neurons, which are analogous to the biological neurons in the brain.
•
The neurons are connected by weighted links passing signals from one neuron
to another.
•
The output signal is transmitted through the neuron’s outgoing connection.
The outgoing connection splits into a number of branches that transmit the
same signal. The outgoing branches terminate at the incoming connections of
other neurons in the network.
Input Signals
Out put Signals
•
Middle Layer
30th October 2006
Bogdan L. Vrusias ©
Input Layer
2006Output Layer
22
AI – CS289
Machine Learning
The Perceptron
• The operation of Rosenblatt’s perceptron is based on the McCulloch
and Pitts neuron model. The model consists of a linear combiner
followed by a hard limiter.
• The weighted sum of the inputs is applied to the hard limiter, which
produces an output equal to +1 if its input is positive and 1 if it is
negative.
Inputs
x1
w1
Linear
Combiner
Hard
Limiter
w2
x2
30th October 2006
Output
Y
Threshold
Bogdan L. Vrusias © 2006
23
AI – CS289
Machine Learning
The Perceptron
• How does the perceptron learn its classification tasks?
• This is done by making small adjustments in the weights to reduce the
difference between the actual and desired outputs of the perceptron.
• The initial weights are randomly assigned, usually in the range [-0.5,
0.5], and then updated to obtain the output consistent with the training
examples.
x2
x2
x2
1
1
1
x1
x1
0
30th
October 2006
1
(a) AND (x1 x2)
0
1
(b) OR (x1 x2)
Bogdan L. Vrusias © 2006
x1
0
1
(c) Exclusive-OR
(x1 x2)
24
AI – CS289
Machine Learning
Multilayer neural networks
• A multilayer perceptron is a feedforward neural network with one or
more hidden layers.
• The network consists of an input layer of source neurons, at least one
middle or hidden layer of computational neurons, and an output layer
of computational neurons.
Input Signals
Out put S ignals
• The input signals are propagated in a forward direction on a layer-bylayer basis.
30th
October 2006
Input
layer
First
Second
hidden
hidden
layer L. Vrusias ©layer
Bogdan
2006
Output
layer
25
AI – CS289
Machine Learning
What does the middle layer hide?
• A hidden layer “hides” its desired output.
• Neurons in the hidden layer cannot be observed through the
input/output behaviour of the network.
• There is no obvious way to know what the desired output of the hidden
layer should be.
• Commercial ANNs incorporate three and sometimes four layers,
including one or two hidden layers. Each layer can contain from 10 to
1000 neurons. Experimental neural networks may have five or even
six layers, including three or four hidden layers, and utilise millions of
neurons.
30th October 2006
Bogdan L. Vrusias © 2006
26
AI – CS289
Machine Learning
Supervised Learning
• Supervised or active learning is learning with an external “teacher” or
a supervisor who presents a training set to the network.
• The most populat supervised neural network is the back-propagation
neural network.
30th October 2006
Bogdan L. Vrusias © 2006
27
AI – CS289
Machine Learning
Back-propagation neural network
• Learning in a multilayer network proceeds the same way as for a
perceptron.
• A training set of input patterns is presented to the network.
• The network computes its output pattern, and if there is an error – or in
other words a difference between actual and desired output
patterns – the weights are adjusted to reduce this error.
30th October 2006
Bogdan L. Vrusias © 2006
28
AI – CS289
Machine Learning
Back-propagation neural network
Input signals
1
x1
x2
2
i
y1
2
y2
k
yk
l
yl
1
2
xi
1
wij
j
wjk
m
n
xn
Input
layer
Hidden
layer
Output
layer
Error signals
30th October 2006
Bogdan L. Vrusias © 2006
29
AI – CS289
Machine Learning
Back-propagation neural network
• Network represented by McCulloch-Pitts model for solving the
Exclusive-OR operation.
1
+1.5
x1
1
+1.0
3
1
2.0
+1.0
+0.5
5
+1.0
x2
2
+1.0
y5
+1.0
4
+0.5
1
30th October 2006
Bogdan L. Vrusias © 2006
30
AI – CS289
Machine Learning
Back-propagation neural network
Inputs
Desired
output
x1
x2
yd
1
0
1
0
1
1
0
0
0
1
1
0
Actual
output
y5
Y
0.0155
0.9849
0.9849
0.0175
(a) Decision boundary
constructed by hidden neuron 3;
x2
(b) Decision boundary
constructed by hidden neuron 4;
1
(c) Decision boundaries
constructed by the complete
three-layer network
30th October 2006
Error
Sum of
squared
errors
e
0.0155
0.0151
0.0151
0.0175
0.0010
e
x2
x2
x1 + x2 – 1.5 = 0
x1 + x2 – 0.5 = 0
1
1
x1
x1
0
1
(a)
Bogdan L. Vrusias © 2006
0
1
(b)
x1
0
1
(c)
31
AI – CS289
Machine Learning
Unsupervised Learning
• Unsupervised or self-organised learning does not require an external
teacher.
• During the training session, the neural network receives a number of
different input patterns, discovers significant features in these patterns
and learns how to classify input data into appropriate categories.
• Unsupervised learning tends to follow the neuro-biological
organisation of the brain.
• Most popular unsupervised neural networks are the Hebbian network
and the Self-Organising Feature Map.
30th October 2006
Bogdan L. Vrusias © 2006
32
AI – CS289
Machine Learning
Hebbian Network
•
Hebb’s Law can be represented in the form of two rules:
30th October 2006
i
j
Bogdan L. Vrusias © 2006
Output Signals
Input Signals
1. If two neurons on either side of a connection are activated synchronously,
then the weight of that connection is increased.
2. If two neurons on either side of a connection are activated
asynchronously, then the weight of that connection is decreased.
33
AI – CS289
Machine Learning
Competitive Learning
• In competitive learning, neurons compete among themselves to be
activated.
• While in Hebbian learning, several output neurons can be activated
simultaneously, in competitive learning, only a single output neuron
is active at any time.
• The output neuron that wins the “competition” is called the winnertakes-all neuron.
• Self-organising feature maps are based on competitive learning.
30th October 2006
Bogdan L. Vrusias © 2006
34
AI – CS289
Machine Learning
What is a self-organising feature map?
• Our brain is dominated by the cerebral cortex, a very complex
structure of billions of neurons and hundreds of billions of synapses.
• The cortex includes areas that are responsible for different human
activities (motor, visual, auditory, somatosensory, etc.), and associated
with different sensory inputs. We can say that each sensory input is
mapped into a corresponding area of the cerebral cortex. The cortex is
a self-organising computational map in the human brain.
• The self-organising feature map has been introduced by Teuvo
Kohonen and therefore is also called Kohonen network.
30th October 2006
Bogdan L. Vrusias © 2006
35
AI – CS289
Machine Learning
What is a self-organising feature map?
Kohonen layer
Kohonen layer
Input layer
1
0
0
1
(b)
(a)
30th October 2006
Input layer
Bogdan L. Vrusias © 2006
36
AI – CS289
Machine Learning
The Kohonen network
• The Kohonen model provides a topological mapping. It places a fixed
number of input patterns from the input layer into a higher-dimensional
output or Kohonen layer.
30th October 2006
y1
x1
y2
x2
y3
Input
Output
layer
Bogdan L. Vrusiaslayer
© 2006
Output Signals
Input Signals
• Training in the Kohonen network begins with the winner’s
neighbourhood of a fairly large size. Then, as training proceeds, the
neighbourhood size gradually decreases.
37
AI – CS289
Machine Learning
The Kohonen network
• The lateral connections are used to create a competition between
neurons.
• The neuron with the largest activation level among all neurons in the
output layer becomes the winner.
• This neuron is the only neuron that produces an output signal. The
activity of all other neurons is suppressed in the competition.
• The lateral feedback connections produce excitatory or inhibitory
effects, depending on the distance from the winning neuron.
30th October 2006
Bogdan L. Vrusias © 2006
38
AI – CS289
Machine Learning
Competitive learning in the Kohonen network
• To illustrate competitive learning, consider the Kohonen network with
100 neurons arranged in the form of a two-dimensional lattice with 10
rows and 10 columns.
• The network is required to classify two-dimensional input vectors each
neuron in the network should respond only to the input vectors
occurring in its region.
• The network is trained with 1000 two-dimensional input vectors
generated randomly in a square region in the interval between –1 and
+1.
30th October 2006
Bogdan L. Vrusias © 2006
39
AI – CS289
Machine Learning
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
-1
-1
W(2,j)
W(2,j)
1
-0.8
-0.6
-0.4
-0.2
0
W(1,j)
0.2
0.4
0.6
0.8
-1
-1
1
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
W(2,j)
W(2,j)
Competitive learning in the Kohonen network
0
-0.2
-0.4
-0.4
-0.6
-0.6
-0.8
-0.8
-0.8
-0.6
30th October 2006
-0.4
-0.2
0
W(1,j)
0.2
0.4
0.6
0.8
1
-0.6
-0.4
-0.2
0
W(1,j)
0.2
0.4
0.6
0.8
1
-0.8
-0.6
-0.4
-0.2
0
W(1,j)
0.2
0.4
0.6
0.8
1
0
-0.2
-1
-1
-0.8
-1
-1
Bogdan L. Vrusias © 2006
40