Synaptic DynamicsII : Supervised Learning

Download Report

Transcript Synaptic DynamicsII : Supervised Learning

Synaptic DynamicsII : Supervised
Learning
The Backpropagation Algorithm and
Spport Vector Machines
JingLIU
2004.11.3
History of BP Algorithm





Rumelhart [1986]popularized the BP
algorithm in the Parallel Distributed
Processing edited volume in the late 1980’s.
BP algorithm overcame the limitations of the perceptron
algorithm, limitations that Minsky and Papert[1969] had
carefully enumerated.
BP’s popularity begot waves of criticism. BP algorithm often
failed to converge, and at best converged to local error minima.
The wave of criticism challenged BP’s historical priority.
The wave of criticism challenge whether the BP learning was
new. The algorithm not offer a new kind of learning.
Multilayer feedforward NNs
Feedforward Sigmoidal
Representation Theorems


Feedforward sigmoidal architectures can in
principle represent any Borel-measurable
function to any desired accuracy—if the
network contains enough “hidden” neurons
between the input and output neuronal fields.
So the MLP can solve the problems of nonlinear separable
problems and function approximate.
Feedforward Sigmoidal
Representation Theorems



We can explain the theorem in the following
two aspects:
To improve the NNs’ classification ability, we must use
multilayer networks, at least one hidden layers.
One the other hand, when feedforward –sigmoidal neural
networks finely approximate complicated functions, the
networks may suffer a like “explosion” of hidden neurons and
interconnecting synapses.
How can a MLP learn?

We know only the random sample (xi,Yi) and the vector error
Yi-N(Xi)
mij (k )  ck
MSE
mij
mij (k )  ck
SE( k )
mij
Xi
Yi-N(Xi)
BP Algorithm



The Principle of BP algorithm
Working signals are propagate forward to the output neurons.
Error signals are propagated backward to the input field.
BP Algorithm

The BP Algorithm for the learning process of
Multilayer networks
The error signal of the jth output is:
e j (n)  d j (n)  y j (n)
The instantaneous summed squared error of the output is:
E (n) 
1
e 2j (n)

2 j
The objective function of learning process is:
E AV
1

N
N
 E ( n)
n 1
BP Algorithm

In the case of learning sample by sample:
 j (n)
w ji (n)
y i (n)

v j (n)
d j (n)
y j (n)
e j (n)
 j (n)
The gradient of E(n) at nth iteration is:
E (n)
 e j (n) j (v j (n)) yi (n)
w ji (n)
BP Algorithm

The modification quantity of weight wji is:
w ji (n)  
where
(1)
(2)
E (n)
  j (n) yi (n)
w ji (n)
 j (n)  e j (n) j (v j (n))
For j is an output neuron:
For j is a hidden neuron:
 j (n)  (d j (n)  y j (n)) j (v j (n))
 j ( n)  
E (n)
 (v j (n))
y j (n)
 j (n)   j (v j (n)) k (n)wkj (n)
k
[wij ] [ ]  [ j (n)]  [ yi (n)]
BP Algorithm
j
k
initialize
Calculate output of each layer
Calculate the error of the
output layer
Calculate  j for all neurons
Modify the weights of
Output layer
Modify the weights of
Hidden layer
Some improvements for BP
algorithm
The problems of BP algorithm
(1)
It convergence slowly.
(2)
There are local minima in the objective
function.
Some methods to improvement BP algorithm,
such as: conjugate gradient descent, using
high order derivative and adding
momentum term, etc.
Adding momentum term for
BP Algorithm

Here we modify weights of neurons
with:
w ji (n)  w ji (n  1)   j (n) y j (n)
n
w ji (n)    n1 j (t ) y(t )    n1
t 0
E (t )
w ji (t )
Support Vector Machines

SVM was presented for the binary
classification problems based on the
principle of Structural Risk Minimization.
( xi , yi ),i  1,2,, n, x  R d , y 1,1
g ( x)  w  x  b
w x  b  0
Support Vector Machines
To get the largest margin, we deduce the optimal classification
function:
n
f ( x)  sign{( w  x)  b }  sign{ i* yi ( xi  x)  b* }
*
*
i 1
Using inner product to replace the dot product:
n

1 n
max
Q
(

)

max[


 i j yi y j K ( xi , x j )]


i

2
i 1
i , j 1

w,b


n
s.t.
yi i  0


i 1

 i  0, i  1,2,, n

n
f ( x)  sign{ i* yi K ( xi , x)  b* }
i 1
Support Vector Machines
Support Vector Machines
Support Vector Machines
(i)
(j)
(k)
(l)