neural-networks
Download
Report
Transcript neural-networks
Neural Networks
Kostas Kontogiannis
E&CE
General Concepts
• Neurons: the cells that perform information processing in
the brain. It is the fundamental functional unit of all
nervous system tissue, including brain
• Soma: The neuron’s cell body
• Dendrites: collection of fibers branching out of the soma
body cell
• Axon: A single long fiber in the collection of dendrites.
Eventually, the axon also branches into strands and substrands that connect to the dendrites and cell bodies of
other neurons
• Synapse: The point where stands from two neurons
connect
Neural Networks
• A neural network is composed of a number of nodes, or
units, connected by links. Each link has a numeric weight
associated with it.
• Weights are the primary means of long-term storage in
neural networks, and learning usually takes place by
updating the weights.
• Each unit has a set of input links from other units, a set of
output links to other units, a current activation level, and a
means of computing the activation level at the next step in
time, given its inputs and weights.
Neural Networks
• To build a neural network to perform some task, one must
first decide how many units are to be used, what kind of
units, and how the units are connected to form a network.
• One then initializes the weights of the network, and “trains”
the weights using a learning algorithm applied to a set of
training examples for the task.
• The use of examples also implies that one must decide how
to encode the examples in terms of inputs and outputs of the
network.
Neural Networks
• To build a neural network to perform some task, one must
first decide how many units are to be used, what kind of
units, and how the units are connected to form a network.
• One then initializes the weights of the network, and “trains”
the weights using a learning algorithm applied to a set of
training examples for the task.
• The use of examples also implies that one must decide how
to encode the examples in terms of inputs and outputs of the
network.
Simple Computing Elements
• Each unit performs a simple computation: It receives signals
from its input links and computes a new activation level that
it sends along each of its output links.
• The computation of the activation level is based on the
values of each input signal received from a neighboring
node, and the weights of each input link.
• The computation is split into two components. First is a
linear function ini that computes weighted sum of the unit’s
input values. Second is a nonlinear component called the
activation function g, that transforms the weighted sum into
the final value that serves as the unit’s activation value ai.
Models for Activation Functions
• Different models are obtained by using different mathematical
functions for g. Three common choices are the step, sign, and
sigmoid functions.
+1
+1
+1
t
ini
ini
-1
Step
Sign
Sigmoid
Network Structures
• There are a variety of kinds of network structure, each of
which results in a very different computational properties.
• The main distinction is between feed-forward and recurrent
networks.
• In a feed-forward network, the links can form arbitrary
topologies. In essence these networks are DAGs.
• Usually we deal with networks that are arranged in layers. In
a layered feed-forward network, each unit is linked only to
the units in the next layer; there are no links between units in
the same layer, no links backward to a previous layer, and no
links that skip a layer.
Fundamental Network Types
• Hopfield Networks: They use bi-directional connections with symmetric
weights; all of the units are input and output units, the activation function g is
the sign function; and the activation levels can only be +1 or -1.
• Boltzmann Machines: also use symmetric weights, but include units that are
neither input nor output units. They also use a stochastic activation function,
such that the probability of the output being 1 is some function of the total
weighted input.
• Networks with no hidden units are called perceptrons.
• Input units are directly connected to the external input sources. Output units are
connected to the observed output. Hidden units are neither connected to input
sources nor the observed output.
• Networks with one or more layers of hidden units are called multi-layer
networks.
Perceptron Neural Network Learning
function NEURAL-NETWORK-LEARNING(examples) returns network
network = a network with randomly assigned weights;
repeat
for each e in examples do
O = NEURAL-NETWORK-OUTPUT(network, e);
T = the observed output values from e;
update the weights in network based on e, O, T;
end
until all examples correctly predicted or stopping criterion is reached
return network
Essentially
Err = T - O
Wj = Wj + (a * Ij * Err)
Multi-Layer Feed-Forward Networks
• Initial work in the 1950’s.
• Learning algorithms for multi-layer are neither efficient, nor can guarantee that
they can converge to a global optimum
• On the other hand, learning general functions from examples is an intractable
problem in the worst case
•
• The most popular method for learning in multi-layer networks is called backpropagation.
Back Propagation Learning
• Learning in multi-layer feed-forward networks using back-propagation
proceeds the same way as for perceptrons: example inputs are presented to the
network, and if the network computes an output vector that matches the output,
nothing is done. If there is an error, then the weights are adjusted to reduce the
error.
• The trick is to assess the blame for an error and divide it among the
contributing weights. In perceptrons, this is easy because there is only one
weight between each input and the output. But in multilayer networks, there
are many weights connecting each input to an output, and each of these
weights contributes to more than one output
• The back-propagation algorithm is a sensible approach to dividing the
contribution of each weight.
Back Propagation Learning
• As in the perceptron learning algorithm, we try to minimize the error between
each target output and the output value computed by the network.
• At the output layer, the weight update rule is very similar to the rule for the
perceptrons. However, there are two differences: The activation of the hidden
unit aj is used instead of the input value, and the rule contains a term for the
gradient of the activation function.
• If Erri is the error Ti - O at the output node, then the weight update rule for the
link from unit j to unit i is
W j,i = W j,i + (alpha * aj * Erri * g’(ini)
• where g’ is the derivative of the activation function g, and the above can be
rewritten as:
Wj,i = Wj, i + alpha * aj * Deltai
Back Propagation Learning
• On the previous formula, for updating the connections between the input units
and the hidden units, we need to define a quantity analogous to the error term
for output nodes.
• The idea is that hidden node j is “responsible” for some fraction of the error
Deltai, in each of the output nodes to which it connects. Thus, the Deltai values
are divided according to the strength of the connection between the hidden
node and the output node, and propagated back to provide the Deltai values for
the hidden layer. The propagation rule for the Delta values is the following:
Deltai = g’(inj) * Sumi (Wj,i * Deltai)
• Now the update rule for the weights between the inputs and the hidden layer is
almost identical to the update rule for the output layer:
W k,j = W k,j + (alpha * Ik * Deltaj)
Back Propagation Learning
• The learning algorithm can be summarized as follows:
– Compute the Delta values for the output units using the
observed behavior
– Starting with the output layer, repeat the following for each
layer in the network, until the earliest (closest to input) hidden
layer is reached
• Propagate the Delta values values back to the previous layer
• Update the weights between the two layers
Back Propagation Learning Algorithm
Algorithm Back-Prop-Update(network, examples, alpha) : new network weights
repeat
for each e in examples do
O = Run-Network(network, Ie)
Erre = Te - O
W j,i = W j,i + (alpha * aj * Erre i * g’(ini))
for each subsequent layer in network do
Deltaj = g’(inj) * Sum i W j,i * Delta I
W k,j = W k,j +(alpha * Ik * Deltaj)
end
end
until network has converged
Discussion
•
Expressiveness: Well suited for continuous input/output, but do not have the expressive
power of general logical representations
•
Computational Efficiency: For m examples and |W| weights each epoch takes O(m|W|)
time. The worst case number of epochs is exponential to the number of inputs
•
Generalization: Good on generalizing on continuous functions that vary smoothly with
the input
•
Sensitivity to noise: Very sensitive to noise since they do non-linear regression
•
Transparency: Neural networks are essential black boxes
•
Prior knowledge: Difficult to chose good training examples, and the best network
topology