عرض تقديمي من PowerPoint

Download Report

Transcript عرض تقديمي من PowerPoint

Artificial Intelligence
Neural Networks
History
•
Roots of work on NN are in:
•
Neurobiological studies (more than one century ago):
•
•
How do nerves behave when stimulated by different magnitudes
of electric current? Is there a minimal threshold needed for
nerves to be activated? Given that no single nerve cel is long
enough, how do different nerve cells communicate among each
other?
Psychological studies:
•
How do animals learn, forget, recognize and perform other types
of tasks?
•
Psycho-physical experiments helped to understand how individual
neurons and groups of neurons work.
•
McCulloch and Pitts introduced the first mathematical model of
single neuron, widely applied in subsequent work.
History
•
•
Widrow and Hoff (1960): Adaline
Minsky and Papert (1969): limitations of single-layer perceptrons (and
they erroneously claimed that the limitations hold for multi-layer
perceptrons)
Stagnation in the 70's:
•
Individual researchers continue laying foundations
•
von der Marlsburg (1973): competitive learning and self-organization
Big neural-nets boom in the 80's
•
Grossberg: adaptive resonance theory (ART)
•
Hopfield: Hopfield network
•
Kohonen: self-organising map (SOM)
Applications
• Classification:
–
–
–
–
–
Image recognition
Speech recognition
Diagnostic
Fraud detection
…
• Regression:
– Forecasting (prediction on base of past history)
– …
• Pattern association:
– Retrieve an image from corrupted one
– …
• Clustering:
– clients profiles
– disease subtypes
– …
Real Neurons
• Cell structures
–
–
–
–
Cell body
Dendrites
Axon
Synaptic terminals
5
Non Symbolic Representations
• Decision trees can be easily read
– A disjunction of conjunctions (logic)
– We call this a symbolic representation
• Non-symbolic representations
– More numerical in nature, more difficult to read
• Artificial Neural Networks (ANNs)
– A Non-symbolic representation scheme
– They embed a giant mathematical function
• To take inputs and compute an output which is interpreted as
a categorisation
– Often shortened to “Neural Networks”
• Don’t confuse them with real neural networks (in heads)
Complicated Example:
Categorising Vehicles
• Input to function: pixel data from vehicle images
– Output: numbers: 1 for a car; 2 for a bus; 3 for a tank
INPUT
OUTPUT = 3
INPUT
OUTPUT = 2
INPUT
OUTPUT = 1
INPUT
OUTPUT=1
Real Neural Learning
• Synapses change size and strength with
experience.
• Hebbian learning: When two connected neurons
are firing at the same time, the strength of the
synapse between them increases.
• “Neurons that fire together, wire together.”
8
Neural Network
Input Layer
Hidden 1
Hidden 2
Output Layer
Simple Neuron
X1
W1
Inputs
X2
W2
Wn
Xn

f
Output
Neuron Model
• A neuron has more than one input x1,
x2,..,xm
• Each input is associated with a weight w1,
w2,..,wm
• The neuron has a bias b
• The net input of the neuron is
n = w1 x1 + w2 x2+….+ wm xm + b
n   wi xi  b
Neuron output
• The neuron output is
y = f (n)
• f is called transfer function
Transfer Function
• We have 3 common transfer functions
– Hard limit transfer function
– Linear transfer function
– Sigmoid transfer function
Exercises
• The input to a single-input neuron is 2.0, its weight is
2.3 and the bias is –3.
• What is the output of the neuron if it has transfer
function as:
– Hard limit
– Linear
– sigmoid
Architecture of ANN
• Feed-Forward networks
Allow the signals to travel one way from input to
output
• Feed-Back Networks
The signals travel as loops in the network, the
output is connected to the input of the network
Learning Rule
• The learning rule modifies the weights of the
connections.
• The learning process is divided into Supervised
and Unsupervised learning
Perceptron
• It is a network of one neuron and hard limit
transfer function
X1
W1
Inputs
X2
W2
Wn
Xn

f
Output
Perceptron
• The perceptron is given first a randomly weights
vectors
• Perceptron is given chosen data pairs (input and
desired output)
• Preceptron learning rule changes the weights
according to the error in output
Perceptron
• The weight-adapting procedure is an iterative
method and should reduce the error to zero
• The output of perceptron is
Y = f(n)
= f ( w1x1+w2x2+…+wnxn)
=f (wixi) = f ( WTX)
Perceptron Learning Rule
W new = W old + (t-a) X
Where W new is the new weight
W old is the old value of weight
X is the input value
t is the desired value of output
a is the actual value of output
Example
• Consider a perceptron that has two real-valued
inputs and an output unit. All the initial weights
and the bias equal 0.1. Assume the teacher has
said that the output should be 0 for the input:
x1 = 5 and x2 = - 3. Find the optimum weights
for this problem.
Example
• Covert the classification problem into
perceptron neural network model
(start w1=1, b=3 and w2=2 or any
other values).
• X1 = [0 2], t1=1 & x2 = [1 0], t2=1 &
x3 = [0 –2] , t3=0 & x4=[2 0], t4=0
Example Perceptron
• Example calculation: x1=-1, x2=1, x3=1, x4=-1
– S = 0.25*(-1) + 0.25*(1) + 0.25*(1) + 0.25*(-1) = 0
• 0 > -0.1, so the output from the ANN is +1
– So the image is categorised as “bright”
The First Neural Neural
Networks
X1
1
Y
X2
1
AND Function
Threshold(Y) = 2
AND
X1
1
1
0
0
X2
1
0
1
0
Y
1
0
0
0
Simple Networks
-1
W = 1.5
x
t = 0.0
W=1
y
Exercises
• Design a neural network to recognize the
problem of
• X1=[2 2] , t1=0
• X=[1
-2], t2=1
• X3=[-2 2], t3=0
• X4=[-1 1], t4=1
Start with initial weights w=[0 0] and bias =0
Problems
• Four one-dimensional data belonging to two
classes are
X = [1
-0.5 3
-2]
T = [1
-1 1
-1]
W = [-2.5 1.75]
Example
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
-1
-1
-1
-1
+1
-1
-1
+1
-1
+1
-1
-1
+1
-1
-1
+1
-1
+1
-1
-1
+1
-1
-1
+1
+1
+1
+1
+1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
Example
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
+1
-1
-1
-1
-1
+1
-1
-1
-1
-1
+1
-1
-1
+1
-1
+1
-1
-1
+1
-1
-1
+1
-1
+1
-1
-1
+1
-1
-1
+1
+1
+1
+1
+1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
AND Network
• This example means we construct a network for
AND operation. The network draw a line to
separate the classes which is called
Classification
Perceptron Geometric View
The equation below describes a (hyper-)plane in the input space
consisting of real valued m-dimensional vectors. The plane
splits the input space into two regions, each of them
describing one class.
decision
region for C1
x2 w x + w x + w >= 0
m
1 1
2 2
0
w x  w
i 1
i i
0
0
decision
boundary
C1
C2
x1
w1x1 + w2x2 + w0 = 0
Perceptron: Limitations
• The perceptron can only model linearly separable
classes, like (those described by) the following
Boolean functions:
• AND
• OR
• COMPLEMENT
• It cannot model the XOR.
• You can experiment with these functions in the
Matlab practical lessons.
Multi-layers Network
• Let the network of 3 layers
– Input layer
– Hidden layers
– Output layer
• Each layer has different number of neurons
Multi layer feed-forward NN
FFNNs overcome the limitation of single-layer NN: they can
handle non-linearly
separable learning tasks.
Input
layer
Output
layer
Hidden Layer
Types of decision regions
1
w0  w1x1  w2 x2  0
Network
with a single
node
w0
x1 w1
w0  w1x1  w2 x2  0
L1
L2
w2
1
1
1
Convex
region
L3
x2
x1
L4
x2
1
-3.5
1
1
One-hidden layer
network that realizes
the convex region
Learning rule
• The perceptron learning rule can not be applied
to multi-layer network
• We use BackPropagation Algorithm in learning
process
Backprop
• Back-propagation training algorithm illustrated:
Network activation
Error computation
Forward Step
Error propagation
Backward Step
• Backprop adjusts the weights of the NN in order to
minimize the network total mean squared error.
Bp Algorithm
• The weight change rule is

   .error. f ' (inputi )
• Where  is the learning factor <1
• Error is the error between actual and trained
value
• f’ is is the derivative of sigmoid function = f(1-f)
new
ij
old
ij
Delta Rule
• Each observation contributes a variable amount to the
output
• The scale of the contribution depends on the input
• Output errors can be blamed on the weights
• A least mean square (LSM) error function can be
defined (ideally it should be zero)
E = ½ (t – y)2
Calculation of Network Error
• Could calculate Network error as
– Proportion of mis-categorised examples
• But there are multiple output units, with numerical output
– So we use a more sophisticated measure:
• Not as complicated as it looks
– Square the difference between target and observed
• Squaring ensures we get a positive number
• Add up all the squared differences
– For every output unit and every example in training set
Example
• For the network with one neuron in input layer and one
neuron in hidden layer the following values are given
X=1, w1 =1, b1=-2, w2=1, b2 =1, =1 and t=1
Where X is the input value
W1 is the weight connect input to hidden
W2 is the weight connect hidden to output
b1 and b2 are bias
t is the training value
Momentum in Backpropagation
• For each weight
– Remember what was added in the previous epoch
• In the current epoch
– Add on a small amount of the previous Δ
• The amount is determined by
– The momentum parameter, denoted α
– α is taken to be between 0 and 1
How Momentum Works
• If direction of the weight doesn’t change
– Then the movement of search gets bigger
– The amount of additional extra is compounded in each epoch
– May mean that narrow local minima are avoided
– May also mean that the convergence rate speeds up
• Caution:
– May not have enough momentum to get out of local minima
– Also, too much momentum might carry search
• Back out of the global minimum, into a local minimum
Building Neural Networks
• Define the problem in terms of neurons
– think in terms of layers
• Represent information as neurons
– operationalize neurons
– select their data type
– locate data for testing and training
• Define the network
• Train the network
• Test the network
Application: FACE RECOGNITION
• The problem:
– Face recognition of persons of a known group in
an indoor environment.
• The approach:
– Learn face classes over a wide range of poses
using neural network.
Navigation of a car
• Done by Pomerlau. The network takes inputs from a 34X36 video image and
a 7X36 range finder. Output units represent “drive straight”, “turn left” or
“turn right”. After training about 40 times on 1200 road images, the car
drove around CMU campus at 5 km/h (using a small workstation on the
car). This was almost twice the speed of any other non-NN algorithm at the
time.
7/20/2015
46
Automated driving at 70 mph on a
public highway
Camera
image
30 outputs
for steering
4 hidden
units
30x32 weights
into one out of
four hidden
unit
30x32 pixels
as inputs
47
Exercises
• Perform one iteration of backprpgation to
network of two layers. First layer has one
neuron with weight 1 and bias –2. The transfer
function in first layer is f=n2
• The second layer has only one neuron with
weight 1 and bias 1. The f in second layer is 1/n.
• The input to the network is x=1 and t=1
1
n
1 e
W 11
X1
1
( 2t  2 y ) 2
2
W13
W 12
b1
X2
W21
W23
W22
b3
b2
using the initial weights [b1= - 0.5, w11=2, w12=2, w13=0.5, b2= 0.5, w21=
1, w22 = 2, w23 = 0.25, and b3= 0.5] and input vector [2, 2.5] and t = 8.
Process one iteration of backpropagation algorithm.
Consider a transfer function as f(n) = n2. Perform
one iteration of BackPropagation with a= 0.9 for
neural network of two neurons in input layer and
one neuron in output layer. The input values are
X=[1 -1] and t = 8, the weight values between input
and hidden layer are w11 = 1, w12 = - 2, w21 = 0.2,
and w22 = 0.1. The weight between input and
output layers are w1 = 2 and w2= -2. The bias in
input layers are b1 = -1, and b2= 3.
W11
W1
X1
W12
W21
W2
X2
W22
QUIZ
Quiz
• Briefly describe the Turing Test
• Do you agree that if a computer passes the
Turing Test then it does not prove that the
computer is intelligent? State your reasons.
2
8
7
3
1
4
5
6
1. Using breadth first search, show the search tree that
would be built down to level 2 (assume level zero is the
root of the tree).
2. Using depth first search, show the state of the search
tree down the level 3 (stop once you have expanded
one node that goes to level 3)
3. Implement the search algorithm using data structure
methods as you can