Artificial Intelligence Methods

Download Report

Transcript Artificial Intelligence Methods

Artificial Intelligence
Methods
Neural Networks
Lecture 3
Rakesh K. Bissoondeeal
Supervised learning in single layer
networks

Learning in perceptron

perceptron learning rule
Learning in Adaline
-
Widrow-Hoff learning rule (delta rule, least
mean square)
Issue common to single layer
networks


Single layer networks can solve only
linearly separable problems
Linear separability
-
Two categories are linearly separable patterns if
their members can be separated by a single line
Linearly separable
• Consider a system like AND
x1
1
0
1
0
x2
1
1
0
0
x1 AND x2
1
0
0
0
Decision
boundary
1-
1
Linearly inseparable - XOR
• Consider a system like XOR
x1
x2
x1 XOR x2
1
1
0
0
1
1
1
0
1
1
0
0
0
1
Single layer perceptron




A perceptron neuron has the step function
as the transfer function
Output is either 1 or 0
1 when net input into transfer function is 0
or greater than 0
0 otherwise, i.e., when net input is less
than 0
Single layer perceptron
x1
w1
A bias acts as a weight on a
connection from a unit
whose value is always one.
w2
The bias shifts the function
f b units to the left
f
x2
b
1
bias
If bias not included decision
boundary would be forced
to go through origin. Many
linearly separable function
would change into linearly
inseparable
Perceptron learning rule

Supervised learning

We have both inputs and outputs

Let pi=input i
a=output of network
t= target

E.g. AND function
x1
1
0
1
0

x2
1
1
0
0
x1 AND x2
1
0
0
0
We train the network with the aim that a new (unseen) input
similar to old (seen) pattern will be classified correctly.
Perceptron learning rule

3 cases to consider:

Case 1:

an input vector is presented to the network and
the output of the network is correct.

a=t and e=t-a=0.

the weights are not changed
Perceptron learning rule








Case 2: If neuron output is 0 and should have been 1, then
a=0 and t=1,
e=t-a=1-0=1
then the inputs are added to their corresponding weights
Case 3: if neuron output is 1 and should have been 0, then
a=1 and t=0,
e=t-a=0-1=-1
then the inputs are subtracted from their corresponding
weights
Perceptron learning rule

Perceptron learning rule can be more
conveniently represented as:
wnew=wold+LR*e*p (LR=learning rate)
bnew=bold+LR*e

Convergence
The perceptron learning rule will converge to a
solution in a finite number of steps if a solution
exists. These include all classification problems
that are linearly separable.
Perceptron Learning Algorithm
While epoch produces an error
Present network with next inputs from
epoch
e=t–a
If e <> 0 then
wj = wj + LR * pj * e
bj=bj+LR*e
End If
End While
Epoch : Presentation of the entire training set to the
neural network.
In the case of the AND function an epoch
consists of four sets of inputs being presented to
the network (i.e. [0,0], [0,1], [1,0], [1,1])
Example
x1
2
1
-2
-1
x2
2
-2
2
1
Learning rate =1
Initial weights =0, 0
Bias = 0
t
0
1
0
1
Adaline

Adaline – Adaptive Linear Filter

Similar to perceptrons but has the identity function (f(x)=x)
as transfer function instead of the step function

Uses the Widrow-Hoff learning rule (delta rule, least mean
square-LMS)

More powerful than perceptron learning rule.

Rule provides basis for the backpropagation algorithm
which can learn with many interconnected neurons and
layers
Adaline


LMS learning rule adjusts the
weights and biases so as to minimise
the mean squared error for each
pattern
is based on the gradient descent
algorithm
Gradient Descent
The ADALINE

Training algorithm goes through the all
training examples a number of times, until
a stopping criterion is reached
Step 1
Initialise all weights and set learning rate
wi= (small random values)
LR = 0.2 (for example)
Step 2 While stopping condition is false
(for example, error >0.01)
Update bias and weights
bi(new) = bi(old) + 2*LR*ei
wi(new) = wi(old) + 2*LR*e*pi
Comparison Perceptron and
Adaline learning rules


One fixes binary error, the other
minimises continuous error
The perceptron rule converges after
a finite number of iterations if
solution is linearly separable, LMS
converges asymptotically towards
the minimum error, probably
requiring unbounded time
Recommended Reading



Fundamentals of neural networks;
Architectures, Algorithms and
Applications, L. Fausett, 1994.
Artificial Intelligence: A Modern Approach,
S. Russel and P. Norvig, 1995.
An Introduction to Neural Networks. 2nd
Edition, Morton, IM.