lecture notes - The College of Saint Rose

Download Report

Transcript lecture notes - The College of Saint Rose

Artificial Intelligence
CIS 342
The College of Saint Rose
David Goldschmidt, Ph.D.
Machine Learning

Machine learning involves adaptive mechanisms
that enable computers to:
–
–
–

Learn from experience
Learn by example
Learn by analogy
Learning capabilities improve the performance
of intelligent systems over time
The Brain

How do brains work?
–

How do human brains differ from that
of other animals?
Can we base models of
artificial intelligence on
the structure and inner
workings of the brain?
The Brain

The human brain consists of:
–
–

Approximately 10 billion neurons
…and 60 trillion connections
The brain is a highly complex, nonlinear,
parallel information-processing system
–
By firing neurons simultaneously, the brain performs
faster than the fastest computers in existence today
The Brain

Building blocks of the human brain:
Synapse
Axon
Soma
Synapse
Dendrites
Axon
Soma
Dendrites
Synapse
The Brain

An individual neuron has a very simple structure
–
–
–

Cell body is called a soma
Small connective fibers are called dendrites
Single long fibers are called axons
An army of such elements constitutes
tremendous processing power
Artificial Neural Networks

An artificial neural network consists of a number
of very simple processors called neurons
–
Neurons are connected
by weighted links
–
The links pass signals from
one neuron to another based
on predefined thresholds
Artificial Neural Networks

An individual neuron (McCulloch & Pitts, 1943):
–
–
–
–
Computes the weighted sum of the input signals
Compares the result with a threshold value, q
If the net input is less than the threshold,
the neuron output is –1 (or 0)
Otherwise, the neuron becomes activated
and its output is +1
Artificial Neural Networks
Input Signals
W eights
Output Signals
x1
Y
w1
x2
w2
Neuron
Q
Y
Y
X = x1w1 + x2w2 + ... + xnwn
wn
xn
Y
Activation Functions

Individual neurons adhere to an activation function,
which determines whether they propagate their
signal (i.e. activate) or not:
n
X   xi wi
i 1
1, if X  q
Y
1, if X  q
Sign Function
Activation Functions
Step function
Sign function Sigmoid function Linear function
hard limit functions
Y
Y
+1
+1
0
X
0
Y
1
1
X
-1
-1
Y
0
-1
1, if X  0 sign 1, if X  0 sigmoid
step
Y

Y 
Y

0, if X  0
1, if X  0
X
0
-1
1
1  e X
Y linear X
X
Write functions or methods for the
activation functions on the previous slide
Activation Functions

The step, sign, and sigmoid activation functions
are also often called hard limit functions

We use such functions in
decision-making neural networks
–
Support classification and
other pattern recognition tasks
Perceptrons

Can an individual neuron learn?
–
–
In 1958, Frank Rosenblatt introduced a
training algorithm that provided the
first procedure for training a
single-node neural network
Rosenblatt’s perceptron model consists
of a single neuron with adjustable
synaptic weights, followed by a hard limiter
Write code for a single two-input neuron – (see below)
Perceptrons
Set w1, w2, and Θ through trial and error
to obtain a logical AND of inputs x1 and x2
Inputs
x1
w1
Linear
Combiner
Hard
Limiter
Output
Y
X = x1w1 + x2w2
Y = Ystep
w2
x2
q
Threshold
Perceptrons

A perceptron:
–
–
x2
Classifies inputs x1, x2, ..., xn
into one of two distinct
classes A1 and A2
Forms a linearly separable
function defined by:
n
 xi wi  q  0
i 1
Class A1
1
Class A2
2
x1w1 + x2w2 q = 0
x1
Perceptrons

Perceptron
with three
x2
inputs x1, x2, and x3
A1
classifies its Class
inputs
into two distinct 1
sets A1 and A2
n
Class A2
2
 xi wi  q  0
x2
1
2
x1
x1
i 1
x1w1 + x2w2 q = 0
x3
x1w1 + x2w2 + x3w3 q = 0
Perceptrons

How does a perceptron learn?
–
–
–
A perceptron has initial (often random) weights
typically in the range [-0.5, 0.5]
Apply an established training dataset
Calculate the error as
expected output minus actual output:
error e = Yexpected – Yactual
–
Adjust the weights to reduce the error
Perceptrons

How do we adjust a perceptron’s
weights to produce Yexpected?
–
–
If e is positive, we need to increase Yactual (and vice versa)
Use this formula:
wi = wi + Δwi , where Δwi = α x xi x e and

α is the learning rate (between 0 and 1)

e is the calculated error
Use threshold Θ = 0.2 and
learning rate α = 0.1
Perceptron Example – AND

Train a perceptron to recognize logical AND
Inputs
Epoch
Desired
output
Yd
Initial
weights
w1
w2
Actual
output
Y
Error
Final
weights
w1
w2
x1
x2
1
0
0
1
1
0
1
0
1
0
0
0
1
0.3
0.3
0.3
0.2
 0.1
 0.1
 0.1
 0.1
0
0
1
0
0
0
1
1
0.3
0.3
0.2
0.3
 0.1
 0.1
 0.1
0.0
2
0
0
1
1
0
1
0
1
0
0
0
1
0.3
0.3
0.3
0.2
0.0
0.0
0.0
0.0
0
0
1
1
0
0
1
0
0.3
0.3
0.2
0.2
0.0
0.0
0.0
0.0
3
0
0
0
1
0
0
0.2
0.2
0.0
0.0
0
0
0
0
0.2
0.2
0.0
0.0
e
Inputs
Epoch
x1
x2
0
0
1
1
0
1
0
1
Desired
output
Yd
Initial
weights
w1
w2
Use threshold Θ = 0.2 and
Actual
Error
Finalα = 0.1
learning rate
output
Y
weights
w1
w2
e
0
0
1
0
0
0
1
1
0.3
0.3
0.2
0.3
0
Error
1
0
e
0.3Final0.0
0.2
0.0
weights
0.2
0.0
w1
w
2
Perceptron Example – AND
1

0
0
0
1
0.3
0.3
0.3
0.2
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
 0.1
0.0
Train
a0perceptron
to
recognize
logical
AND
2
0
0
0.3
0.0
0
0
0.3
0.0
0
1
Inputs
0
Epoch 1
x11 x12
0
Desired
0 ut
outp
1d
Y
0.3Initial0.0
0.3
0.0
weights
0.2
0.0
w1
w
2
0
Actual
1 ut
outp
1
Y
1
3
0
0
1
1
0
1
0
1
0
0
0
1
0.3
0.2
0.3
0.2
0.3
0.2
0.2
0.1
0.1
 0.0
0.1
 0.0
0.0
 0.1
0.1
 0.0
0
0
1
0
0
0
1
1
0.3
0.2
0.3
0.2
0.2
0.1
0.3
0.2
0.1
 0.0
0.1
 0.0
0.0
 0.1
0.0
0.1
2
4
0
0
1
1
0
1
0
1
0
0
0
1
0.3
0.2
0.3
0.2
0.3
0.2
0.2
0.1
0.0
0.1
0.0
0.1
0.0
0.1
0.0
0.1
0
0
1
1
0
0
1
0
0.3
0.2
0.3
0.2
0.2
0.1
0.2
0.1
0.0
0.1
0.0
0.1
0.0
0.1
0.0
0.1
3
5
0
0
0
1
0
0
0.2
0.1
0.2
0.1
0.0
0.1
0.0
0.1
0
0
0
0
0.2
0.1
0.2
0.1
0.0
0.1
0.0
0.1
2
0
1
1
1
0
1
0
0
1
0.3
0.3
0.2
 0.1
 0.1
 0.1
0
1
0
0
0.3  0.1
0.2  0.1
1
0.3 Θ 0.0
Use1threshold
= 0.2 and
0
0
1
1
0
1
0
1
0
0
0
1
0.3
0.3
0.3
0.2
0.0
0.0
0.0
0.0
0
0
1
1
0
1
0
0.3
0.2
0.2
0.0
0.0
0.0
0
0
1
1
0
1
0
1
0
0
0
1
0.2
0.2
0.2
0.1
0.0
0.0
0.0
0.0
0
0
1
0
0
0
1
1
0.2
0.2
0.1
0.2
0.0
0.0
0.0
0.1
0
Error
1
0
e
0.1
0.2Final0.1
0.1
0.1
weights
0.1
0.1
w1
w
2
learning
= 0.1
0
0.3rate α0.0
Perceptron Example – AND
3

Repeat until convergence
– i.e.
not change
and no 0error 0.2
4 final
0 weights
0
0 do 0.2
0.1
0
0
1
Inputs
0
Epoch 1
x11 x12
1
5
0
0
1
1
0
1
0
1
0
Desired
0 ut
outp
1d
Y
0
0
0
1
0.2Initial0.1
0.2
0.1
weights
0.1
0.1
w1
w
2
0.3
0.1
0.1
0.3
0.3
0.1
0.2
0.1
Thres
rate:
2 hold:0 q =00.2; learning
0
0.3
0
0
0.3
1
1
0
0
0.3
1
1
1
0.2
3
0
0
0
0.2
0
Actual
1 ut
outp
1
Y
 0.1
 0.1
 0.1
 0.1
0
0
1
0
1
0
0
0
1
0
1
0
0.3
0.1
0.1
0.3
0.2
0.1
0.3
0.1
 0.1
 0.1
 0.1
0.1
0.0
=0.0
0.1
0.0
0.0
0.0
0
0
1
1
0
0
1
0
0.3
0.3
0.2
0.2
0.0
0.0
0.0
0.0
0.0
0
0
0.2
0.0
Perceptron Example – AND


Two-dimensional plot
of logical AND operation:
A single perceptron can
be trained to recognize
any linear separable function
–
–
x2
x2
1
1
0
Can we train a perceptron to
recognize logical OR?
How about logical exclusive-OR (i.e. XOR)?
x1
1
(b
Perceptron – OR and XOR

Two-dimensional plots of logical OR and XOR:
x2
x2
1
1
x1
x1
0
1
(b) OR ( x 1  x 2 )
x1
0
1
(c) Ex cl usive- OR
(x  x )
Perceptron Coding Exercise

Modify your code to:
–
–
Calculate the error at each step
Modify weights, if necessary

–

i.e. if error is non-zero
Loop until all error values are zero
for a full epoch
Modify your code to learn to recognize
the logical OR operation
–
Try to recognize the XOR operation....
Multilayer Neural Networks

Multilayer neural networks consist of:
–
–
–

An input layer of source neurons
One or more hidden layers of
computational neurons
An output layer of more
computational neurons
Middle Layer
Input Layer
Input signals are propagated in a
layer-by-layer feedforward manner
Output Layer
Input Signals
Output Signals
Multilayer Neural Networks
Middle Layer
Input Layer
Output Layer
Input Signals
Output Signals
Multilayer Neural Networks
Input
layer
First
hidden
layer
Second
hidden
layer
Output
layer
XOUTPUT = yH1w11 + yH2w21 + ... + yHjwj1 + ... + yHmwm1
Multilayer Neural Networks
XINPUT = x1
XHInput
= xsignals
1w11 + x2w21 + ... + xiwi1 + ... + xnwn1
1
x1
x2
2
xi
y1
2
y2
k
yk
l
yl
1
2
i
1
wij
j
wjk
m
n
xn
Input
layer
Hidden
layer
Error signals
Output
layer
Multilayer Neural Networks

Three-layer network:
1
q3
x1
1
w13
3
1
w35
w23
q5
5
w
w24
14
x2
2
w45
4
w24
Input
layer
q4
1
Hiddenlayer
Output
layer
y5
Multilayer Neural Networks

Commercial-quality neural networks often
incorporate 4 or more layers
–

Each layer consists of
about 10-1000 individual neurons
Experimental and research-based neural
networks often use 5 or 6 (or more) layers
–
Overall, millions of individual neurons may be used
Back-Propagation NNs

A back-propagation neural network is a multilayer
neural network that propagates error backwards
through the network as it learns
–
Weights are modified based on the calculated error
–
Training is complete when the error is
below a specified threshold

e.g. less than 0.001
Back-Propagation NNs
Input signals
1
x1
x2
2
xi
y1
2
y2
k
yk
l
yl
1
2
i
1
wij
j
wjk
m
n
xn
Input
layer
Hidden
layer
Error signals
Output
layer
Write code for the three-layer neural network below
Use the sigmoid activation function; and
apply Θ by connecting fixed input -1 to weight Θ
Back-Propagation NNs
1
q3
x1
1
w13
3
1
w35
w23
q5
5
w
w24
14
x2
2
w45
4
w24
Input
layer
q4
1
Hiddenlayer
Output
layer
y5
Back-Propagation NNs
Start with
random weights
–
–
Repeat until
the sum of the
squared errors
is below 0.001
Depending on
initial weights,
final converged
results may vary
Sum-Squared Network Erro r for 224 Epochs
10 1
10 0
Sum-Squared Error

10 -1
10 -2
10 -3
10 -4
0
50
100
Epoch
150
200
Back-Propagation NNs

After 224 epochs (896 individual iterations),
the neural network has been trained successfully:
Inputs
Desired
output
x1
x2
yd
1
0
1
0
1
1
0
0
0
1
1
0
Actual
output
y5
0.0155
0.9849
0.9849
0.0175
e
Sum of
squared
errors
0.0010
Back-Propagation NNs


No longer limited to linearly separable functions
1
Another solution:
+1.5
x1
–
1
Isolate neuron 3,
then neuron 4....
+1.0
3
2.0
+1.0
2
+0.5
5
+1.0
x2
1
+1.0
+1.0
4
+0.5
1
y5
Back-Propagation NNs

Combine linearly separable functions of neurons 3 and 4:
x2
x2
x2
x1 + x22 – 1.5 = 0
x11 + x22 – 0.5 = 0
1
1
1
x1
x1
0
1
(a)
0
1
(b)
x1
0
1
(c)
Using Neural Networks

Handwriting recognition
0
4
A
1
0
Input
layer
First
hidden
layer
Second
hidden
layer
Output
layer
0
0100 => 4
0101 => 5
0110 => 6
0111 => 7
etc.
Using Neural Networks

Advantages of neural networks:
–
–

Given a training dataset, neural networks learn
Powerful classification and pattern matching
applications
Drawbacks of neural networks:
–
–
Solution is a “black box”
Computationally intensive