Multi layer feed

Download Report

Transcript Multi layer feed

Matrix Pseudoinversion for
Image Neural Processing
Rossella Cancelliere*
University of Turin
Turin, Italy
Thierry Artières
LIP6, P. et M. Curie University
Paris, France
Mario Gai
National Institute of Astrophysics
Turin, Italy
Patrick Gallinari
LIP6, P. et M. Curie University
Paris, France
Summary
- Introduction
- How to use pseudoinversion for neural training
- How to evaluate pseudoinverse matrices
- The application: an astronomical problem
- Results and discussion
2
Introduction 1
Our work moves forward from some new ideas concerning
the use of matrix pseudoinversion to train Single Hidden
Layer Feedforward Networks (SLFN).
Many largely used training techniques random assign initial
weights values, that are then iteratively modified (e.g.
gradient descent methods)
So doing it is necessary to deal with some usual issues, as
slowness, local minima, and optimal learning step
determination
Introduction 2
Some procedures based on evaluation of generalized
inverse matrix (or Moore-Penrose pseudoinverse) have
been recently proposed, as the extreme learning machine
(elm, Huang et al., 2006)
Their main feature is that input weights are randomly
chosen, and never modified, while output weights are
anatically determined by MP pseudoinversion
These non iterative procedures makes training very fast
but some care is required because of the known
numerical instability of pseudoinversion
Summary
- Introduction
- How to use Pseudoinversion for neural training
- How to evaluate pseudoinverse matrices
- The application: an astronomical problem
- Results and discussion
5
Notation

Training set: N distinct pairs x j , t j
o kj 
M 1
 w  c  x
i 1
ki
Training aim:
in matrix notation: T  Hw
i
j
 bi 
t kj  okj

Least-squares solution
The number of hidden nodes is much lower than the number
of distinct training samples so H is a non-square matrix
One of the least-squares solution w of the linear system T  Hw

*

is w  H T where H is the Moore-Penrose pseudoinverse of H
Main properties:
• it has the smallest norm among all least-squares solutions
• it reaches the smallest training error!
Potentially dangerous for generalization: in case of many free
parameters it can cause overfitting
Summary
- Introduction
- How to use Pseudoinversion for neural training
- How to evaluate pseudoinverse matrices
- The application: an astronomical problem
- Results and discussion
8
Pseudoinverse computation
Several methods are available to evaluate MP matrix:
• the orthogonal projection (OP) method

  H
 H H  I 
H  H H

T
1
1
T
• the regularized OP (ROP) method
H
• the singular value decomposition (SVD)
method.
V, U: unitary matrices
  : diagonal matrix. Its entries are the
inverses of the singular values of H
H   VΣ  UT
Potentially sensitive to numerical
instability!
1
1

 
0
0
0
T
0
1
2
0
0
HT
0
0
0
0

0
0
1
N
Summary
- Introduction
- How to use pseudoinversion for neural training
- How to evaluate pseudoinverse matrices
- The application: an astronomical problem
- Results and discussion
10
Chromaticity diagnosis
The measured image profile of a star depends on its spectral type: this
error on measured position is called Chromaticity
Its correction is a major issue of the European Space Agency (ESA)
mission Gaia for global astrometry, approved for launch in 2013
NN inputs: first 5 statistical moments for each simulated image
K = 1,......5
where:
sxn  sign. detected on pixel n, s A xn  ideal sign., xCOG sign. barycenter,
evaluated both for ‘blue’ and ‘red’ stars, plus the ‘red’ barycenter.
The different NN models so have 11 input neurons and 1 output
neuron to detect chromaticity
Summary
- Introduction
- How to use Pseudoinversion for neural training
- How to evaluate pseudoinverse matrices
- The application: an astronomical problem
- Results and discussion
12
Reference result
SLFN with 11 input neurons and 1 output neuron, trained with
backpropagation algorithm
Activation functions: hyperbolic tangent (less saturation problems
because of its non-zero mean value)
Training set size: 10000 instances, test size: 3000 instances
We look for minimum RMSE
when hidden layer size
increases from 10 to 200.
η in the range (0.1, 0.9)
Best RMSE: 3.81
90 hidden neurons
Pseudoinversion results (1)
- Input weights: randomly
chosen according to a uniform
distribution in the interval -1 M ,1 M


Hidden Space Related
Pseudoinversion
(HSR- Pinv)
This controls saturation issues, forcing the use of the central part
of the hyperbolic activation function
- Output weights: evaluated by pseudoinversion via SVD
- Hidden layer size is gradually increased from 50 to 600
- 10 simulation trials are performed for each selected size
σ-SVD : state of the art
Sigmoid activation
functions and random
weights uniformely
distributed in (-1,1)
Pseudoinversion results (2)
Best results are achieved with
the proposed HSR method (blue
curve)
The same method used with
sigmoid functions performs
slightly worse (green curve)
‘Constant weights size +
pseudoinversion’ approach clearly
shows worse performance
(red and pale blue curves)
Hypothesis: saturation control doesn’t allow specialization on particular
training instances, so avoiding overfitting
Pseudoinversion results (3)
Error peak: The ratio of minimum
singular value and matlab default
threshold approaches unity in the peak
region (logarithmic unities)
Results better than BP are anyway
obtained with less neurons (roughly
150)
Solution: Threshold tuning
The new threshold is a
function of singular values size
neareby the peak region(180
hidden neurons)
Greater robustness, slight RMSE
increase
Further developements
The issues of overfitting and numerical instability seem to
have a dramatic impact on performance
Regularization (Tikhonov 1963, 1977) is an established
method to deal with ill-posed problems: thank to introduction of
a penality term, it seems to be promising to avoid overfitting
Possible effects also on instability control have to be
investigated