Neural Networks
Download
Report
Transcript Neural Networks
Neural Networks
What are they
• Models of the human brain used for
computational purposes
• Brain is made up of many interconnected
neurons
What is a neuron
Components of biological neuron
• Dendrites – serve as inputs
• Soma Cell of neuron which contains
Nucleus
• Nucleus- processing component of neuron
• Axon along which output goes
• Synapse - ends across whose gap connection is
made between other neurons
How does it work
• Signals move from neuron to neuron via
electrochemical reactions. The synapses release
a chemical transmitter which enters the dendrite.
This raises or lowers the electrical potential of
the cell body.
• The soma sums the inputs it receives and once
a threshold level is reached an electrical impulse
is sent down the axon (often known as firing).
• These impulses eventually reach synapses and
the cycle continues.
Synapses
• Synapses which raise the potential within a cell
body are called excitatory. Synapses which
lower the potential are called inhibitory.
• It has been found that synapses exhibit
plasticity. This means that long-term changes in
the strengths of the connections can be formed
depending on the firing patterns of other
neurons. This is thought to be the basis for
learning in our brains.
Artificial model of neuron
Diagram
•
• aj :
Activation value of unit j
• wj,i
:
Weight on the link from unit j
to unit i
• in :
Weighted sum of inputs to unit i
• ai :
Activation value of unit i (also
known as the output value)
• g :
Activation function
i
How does this work
• A neuron is connected to other neurons via its
input and output links. Each incoming neuron
has an activation value and each connection has
a weight associated with it.
• The neuron sums the incoming weighted values
and this value is input to an activation function.
The output of the activation function is the output
from the neuron.
Common ActivationFunctions
Some common activation functions
in more detail
•
•
•
•
•
These functions can be defined as follows.
Stept(x) =
1 if x >= t, else 0
Sign(x) =
+1 if x >= 0, else –1
Sigmoid(x)
=
1/(1+e-x)
On occasions an identify function is also used
(i.e. where the input to the neuron becomes the
output). This function is normally used in the
input layer where the inputs to the neural
network are passed into the network unchanged.
A brief history of Neural Networks
• In 1943 two scientists, Warren McCulloch and
Walter Pitts, proposed the first artificial model of
a biological neuron [McC]. This synthetic neuron
is still the basis for most of today’s neural
networks.
• Rosenblatt came up with his two layered
perceptron which was subsequently shown to
be defective by Papert and Minsky which lead to
a huge decline in funding and interest in Neural
Networks.
The bleak years
• During this period, even though there was a lack of funding and
interest in neural networks, a small number of researchers continued
to investigate the potential of neural models.
• A number of papers were published, but none had any great impact.
Many of these reports concentrated on the potential of neural
networks for aiding in the explanation of biological behaviour (e.g.
[Mal], [Bro], [Mar], [Bie], [Coo]).
• Others focused on real world implementations. In 1972 Teuvo
Kohonen and James A. Anderson independently proposed the same
model for associative memory [Koh], [An1]
• In 1976 Marr and Poggio applied a neural network to a realistic
problem in computational vision, stereopsis [Mar]. Other projects
included [Lit], [Gr1], [Gr2], [Ama], [An2], [McC].
The Discovery of Backpropagation
• The backpropagation learning algorithm was developed
independently by Rumelhart [Ru1], [Ru2], Le Cun [Cun]
and Parker [Par] in 1986.
• It was subsequently discovered that the algorithm had
also been described by Paul Werbos in his Harvard Ph.D
thesis in 1974 [Wer].
• Error backpropagation networks are the most widely
used neural network model as they can be applied to
almost any problem that requires pattern mapping.
• It was the discovery of this paradigm that brought neural
networks out of the research area and into real world
implementation.
Interests in neural network differ
according to profession.
• Neurobiologists and psychologists -understanding our
brain
• Engineers and physicists -a tool to recognise patterns in
noisy data (see Ts at right)
• Business analysts and engineers -a tool for modelling
data
• Computer scientists and mathematicians - networks offer
an alternative model of computing: machines that may
be taught rather than programmed
• Artificial Intelligensia, cognitive scientists and
philosophers -Subsymbolic processing (reasoning with
patterns, not symbols)
Backpropagation Network
Architecture
• A backpropagation network typically consists of
three or more layers of nodes.
• The first layer is the known as the input layer
and the last layer is known as the output layer.
• Any layers of nodes in between the input and
output layers are known as hidden layers.
• Each unit in a layer is connected to every unit in
the next layer. There are no interlayer
connections.
Backpropagation
I
E
N
R
P
R
U
O
T
R
Output Layer
Hidden Layer
Input Layer
Operation of the network
• The operation of the network consists of a
forward pass of the input through the
network (forward propagation) and then a
backward pass of an error value which is
used in the weight modification (Backward
Propagation)
Forward Propagation
• A forward propagation step is initiated when an
input pattern is presented to the network.
• No processing is performed at the input layer.
The pattern is propagated forward to the next
layer, and each node in this layer performs a
weighted sum on all its inputs
• After this sum has been calculated, a function is
used to compute the unit’s output.
Example XOR
Layers of the Network
• The Input Layer
• The input layer of a backpropagation
network acts solely as a buffer to hold the
patterns being presented to the network.
Each node in the input layer corresponds
to one entry in the pattern. No processing
is done at the input layer. The pattern is
fed forward from the input layer to the next
layer.
The Hidden Layers
• is the hidden layers which give the
backpropagation network its exceptional
computation abilities.
• The units in the hidden layers act as
“feature detectors”. They extract
information from the input patterns which
can be used to distinguish between
particular classes. The network creates its
own internal representation of the data.
The Output Layer
• The output layer of a network uses the
response of the feature detectors in the
hidden layer. Each unit in the output layer
emphasises each feature according to the
values of the connecting weights. The
pattern of activation at this layer is taken
as the network’s response.
The sigmoid function
• The function used to perform this operation is the
sigmoid function,
• The main reason why this particular function is chosen is
that its derivative, which is used in the learning law, is
easily computed.
• The result obtained after applying this function to the net
input is taken to be the node’s output value.
• This process is continued until the pattern has been
propagated through the entire network and reaches the
output layer.
• The activation pattern at the output layer is taken as the
network’s result.
Linear Separability and the XOR
Problem
• Consider two-input patterns being classified into two
classes
• Each point with either symbol of or represents a
pattern with a set of values . Each pattern is classified
into one of two classes.
• Notice that these classes can be separated with a single
line . They are known as linearly separable patterns.
• Linear separability refers to the fact that classes of
patterns with -dimensional vector can be separated
with a single decision surface. In the case above, the line
represents the decision surface.
Diagram
Xor
• The most classic example of linearly
inseparable pattern is a logical exclusiveOR (XOR) function. Shown in the next
figure is the illustration of XOR function
that two classes, 0 for black dot and 1 for
white dot, cannot be separated with a
single line.
XOR linearly inseparable
The significance of This
• XOR is separable in 3 dimensions but obviously
not in 2.
• So many classifiers will need more than 2 layers
to classify
• Minsky and Papert pointed out that perceptrons
in 2 layers couldn’t learn in 3 dimensions or
more as far as they could see
• Because so many problems are like Xor then
according to these stars of AI neural networks
had limited applicability
But they were wrong
• Backpropagation showed that neural networks
could learn in 3 and more dimensions
• However such was the stature of this pair that
this impacted negatively on research in Neural
networks for 2 decades
• However the work Werbos and Parker and
Rumelhart proved them wrong and in 1987
working multilayer networks were working away
and learning and have become a he industry
Backward Propagation
• The first step in the backpropagation stage is the calculation of the
error between the network’s result and the desired response. This
occurs when the forward propagation phase is completed.
• Each processing unit in the output layer is compared to its
corresponding entry in the desired pattern and an error is calculated
for each node in the output layer.
• The weights are then modified for all of the connections going into
the output layer.
• Next, the error is backpropagated to the hidden layers and by using
the generalised delta rule, the weights are adjusted for all
connections going into the hidden layer.
• The procedure is continued until the last layer of weights have been
modified. The forward and backward propagation phases are
repeated until the network’s output is equal to the desired result.
The Backpropagation Learning Law
• The Learning Law used is known as the Generalised
Delta Rule.
• It allows for the adjustment of the weights in the hidden
layer, a feat deemed impossible by Minsky and Papert.
• It uses the derivative of the activation function of nodes
(which in most cases is the sigmoid function) to
determine the extent of the adjustment to the weights
connecting to the hidden layers.
• In other words , the network learns from its errors and
uses the difference between expected and actual
results(the error) to make adjustments.
Example
•
•
Calculate the weight adjustments in the
following network for expected outputs of
{1,1} and the learning rate is 1:
The Target Values are 1, 1 and the
learning rate is 1
Sample Neural Network
Hidden Layer Computation
• Xi =iW1 =
• 1 * 1 + 0 * -1 = 1,
• 1 * -1 + 0 * 1 = -1 =
• { 1 - 1} = {Xi1,Xi2} = Xi
1
F
1 x
• h = F(X)
• h1 = F(Xi1) = F(1)
• h2 = F(Xi2) = F(-1)
F ( Xi1)
1
1
0.73
1
1
1
1
F ( Xi 2)
0.27
xi 2
( 1)
1
1
xi1
(1)
Output Layer Computation
• X = hW2 =
• 0.73 * -1 + 0.27 * 0 = -0.73,
• 0.73 * 0 + 0.27 * -1 = -0.27 =
•
{ -0.73 - 0.27} = {X1,X2} = X
1
F
1 x
• O = F(X)
• O1 = F(X1)
• O2 = F(X2)
F ( X 1)
1
1
0.68
1
1
1
1
F ( X 2)
0.58
x2
( 0.27 )
1
1
x1
( 0.73)
Error
• d1 = 0.7( 1 -0.7)(0.7 - 1) = 0.7 (0.3)(-0.3) =
-0.063
• d2 = 0.6(1 - 0.6)(0.6 - 1) = 0.6(0.4)(-0.4) =
-0.096
Error Calculation
e = h(1 - h)W2d
e1
e2
h1 h2
1 h1
1 h 2
W 11 W 12 d1
d 2
W
21
W
22
Another Way to write the error
eh hi (1 hi )
W
koutputs
kh
dk
•
•
e1 = (h1(1-h1)+ h2(1-h2)) W11 D1 +W12D2
e2 =( h1(1-h1)+ h2(1-h2)) W21 D1 +W22D2
•
•
e1 = (0.73(1-0.73)+ 0.27(1-0.27))( -1* -0.063 +0*-0.096)
e2 =( 0.73(1-0.73)+ 0.27(1-0.27)) (0 *-0.063 +-1*-0.096)
•
•
•
•
e1 = (0.73(0.27)+ 0.27(0.73))( 0.063)
e2 =( 0.73(0.27)+ 0.27(0.73)) (0.096)
e1 = 0.3942 * 0.063 = 0.247
e2 = 0.3942 * 0.096 = 0.038
Weight Adjustment
• △W2t = α hd + Θ △W2t-1
• where α = 1
h1
h1d1 h1d 2
hd d1 d 2
h2
h 2 d 1 h 2d 2
0.73
0.063
hd
0.27
(0.73 * 0.063)
0.096
(0.27 * 0.063)
(0.73 * 0.096)
(0.27 * 0.096)
Weight Change
(0.046) (0.107)
(0.017) (0.026)