N - Robert Marks.org

Download Report

Transcript N - Robert Marks.org

Ph.D. Final Exam
Neural Network Ensonification Emulation:
Training And Application
JAE-BYUNG JUNG
Department of Electrical Engineering
University of Washington
August 8, 2001
Overview
Review of adaptive sonar
Neural network training for varying output nodes
– On-line training
– Batch-mode training
Neural network inversion
Sensitivity analysis
Maximal area coverage problem
Conclusions and ideas for future works
2
INTRODUCTION
- Sonar Surveillance
Control
Environment
Sonar
Surveillance
System
Sonar
Performance
Map
Software model emulating acoustic propagation
Computationally intensive
Not suitable for real time control
3
Sonar Performance Map
4
Sonar Data 1
The physical range-depth output surveillance area is sampled by 30 ranges
from 0 to 6 km at steps of 0.2 km and 13 depths from 0 to 200m at steps of
15m
Data Size : 2,500 pattern vectors. (2,000 patterns are used for training the
neural network, and 500 patterns are not used for training and reserved for
testing.
Input
Parameters
Output
Parameters
1
Sonar depth [m]
2
Wind speed [m/s]
3
Surface sound speed [m/s]
4
Bottom sound speed [m/s]
5
Bottom type, grain size
1~390
13x30 range-depth SE values [dB]
5
Sonar Data 2
The wider surveillance area is
considered including 75 sampled
ranges from 0 to 15 km at steps of 0.2
km and 20 sampled depths from 0 to
400m at steps of 20m
The shape of SE map varies depending
on different bathometry
Data Size : 8,000 pattern vectors.
(5,000 patterns are used for training the
neural network, and 3,000 patterns are
not used for training and reserved for
testing)
6
Neural Network Replacement
Fast Reproduction of SE Map
Inversion (Derivative Existence)
Real-time Control (Optimization)
7
Training NN
High dimensionality of output space
 Multi-Layered Perceptrons
 Neural Smithing (e.g. input data jittering, pattern
clipping, and weight decay …)
Widely varying bathymetry
 Adaptive training strategy for flexible output
dimensionality
8
MLP Training

t
Multilayer perceptrons
(MLP’s) typically use a
fixed network topology
for all training patterns
in a data set.
out
MLP
in

i
9
MLP Training with varying output
We consider the case
where the dimension
of the output nodes
can vary from
training pattern to
training pattern

t DC

t
out
MLP

i DC
(supervisory)
in

i
10
Flexible Dimensionality
Generally, MLP must have a fixed network topology
 A single neural network can not handle flexible network topology
A modular neural network structure that has local
experts for different dimension specific training
patterns
 It becomes increasingly difficult, however, to implement a large number of
neural networks as the number of local experts increases
11
Flexible Dimensionality
Let’s define a new output vector, O, for transformation to fix output
dimensionality as
O(n)  O(n), OA (n)
where, O(n) is the nth actual output training pattern vector, OA(n) is an
arbitrary output vector for O(n) and filled with arbitrary “don’t care”
constant numbers
The dimensionality becomes enlarged to be spanned and is fixed as
N

D(O)  min D Span O(n) 
 n1

where N is the number of training pattern vectors, Span(·) represents
a dimensional span from each output vector to the maximally
expandable dimensions over N different pattern vectors, and D(·)
represents a dimensionality of the output vector
12
Flexible Dimensionality
Train a single neural network using a fixeddimensional output vector O(n) by
1) Filling arbitrary constant value into OA(n)
 High spatial freq. components are washed out
2) Smearing neighborhood pixels to OA(n)
 Still need to train unnecessary part
(longer training time)
OA(n) can be ignored when O(n) is projected
onto O(n) in the testing phase.
13
Don’t Care Training
IC (n)
I P (n)
. ..
. ..
Inputs : I(n)={IC(n), IP(n)}
IP(n) : profile inputs
• describe output profile
• assign each output neuron to either O(n) or
OA(n)
Input Layer
IC(n) : characteristic inputs
• contain the other input parameters
. ..
Hidden Layer
Outputs : O’(n) ={O(n), OA(n)}
OA(n) : “Don’t care” category
• The weights associated with these
neurons are not updated for the nth
pattern vector
. ..
. ..
O(n)
O A(n)
Output Layer
O(n) : Normal weight correction
• The other weights associated with O(n)
are updated with step size modification
14
Don’t Care Training
Advantages
 Significantly reduced training time by not correcting weights in
don’t care category
 Boundary problem is solved
 Less training vectors required
 Focus on active nodes only
Drawbacks
 Rough weight space due to irregular weight correction
 Possibly leading to local minima
15
Step Size Modification
Even amount of opportunity for weight correction to
every output neuron
Statistical information : From the training data set,
frequency of weight correction associated with each
output neuron are updated.
PW (o j ) 
f (o j )
N
16
Step Size Modification
On-line training

 O


, for output neuron o j  O(n)
 j
PW (o j )

 O
  j  0, for output neuron o j  O A (n)
 H
    , for all hidden neurons

Batch-mode training
ˆ
 O
ˆ


, for output neuron o j
 j
PW (o j )

 ˆH ˆ
    , for all hidden neurons
17
Performance Comparison
MSE (Mean Squared Error)
e j (n)  d j (n) - o j (n), for o j  O(n)
E ( n) 
EMSE
1
2
e

j ( n)
2 { j|o j O ( n )}
1

N
MSE can not represent the
training performance well
due to the different vector
size(dimensionality)
Average MSE : pixel wise
representation of MSE
N
 E ( n)
n
E AMSE
1

N
N
E ( n)
n D(O(n))
18
Performance Comparison
: Training of neural networks
19
Training Sample
20
Performance Comparison
: Generalization performance from testing error
21
Testing Sample
22
Contributions: Training
A novel neural network learning algorithm for
data sets with varying output node dimension is
proposed.
– Selective weight update
• Fast convergence
• Improved accuracy
– Step size modification
• Good generalization
• Improved accuracy
23
Inversion of neural network
NN Training
: Finding W from given inputoutput relationship
NN Inversion
: Finding I from given target
output T
I
I
W : NN Weight
 I : Input
W
 O : Output
O
T
 T : Target
W
O
T
24
Inversion of neural network
When we want to find a subset of input
vector, i, so that minimize the objective
function, E(i), which can be denoted as
E(i) = 0.5(ti –
oi)2
where oi is the neural network output
for input I, and ti is the desired output
for input i.
If is the kth component of the vector it,
then gradient descent suggests the
recursion
ikt 1  ikt  
E
ikt
where  is the step size and t is the
iteration index
Iteration for inversion in the equation
can be solved as follows
E
 k ,k  I
ik
j O
 j (net j )(o j  t j ) :

j 
  (net j )   j w jm : j  I , H
 j
mH ,O

where, for any neuron,
I , H , O are the sets of input, hidden and output neurons respective ly
w jm is the weight va lue connecting neuron j to neuron m
 j is the derivative of the jth neuron squashing function
o j is the activation of the jth neuron
t j is the desired output of the jth neuron
net j is the weighted sum of incoming signal to the jth neuron
25
Single element inversion
The subset of outputs to be inverted
is confined in one output pixel at a
time during an inversion process
while other outputs are floated.
A single input control parameter is
achieved during the iterative
inversion process while 4
environmental parameters are
clamped (fixed) to specific values
[wind speed = 7m/s, sound speed at
surface = 1500m/s, sound speed at
bottom = 1500m/s, and bottom type
= 9(soft mud)].
26
Multiple parameter inversion
and maximizing the target area
Multiple output SE values can
be inverted at a time
The output target area is tiled
with 2x2 pixel regions. Thus,
2x2 output pixel groups are
inverted one at a time to find
out the best combination of
these 5 input parameters to
satisfy the corresponding SE
values
27
Pre-Clustering for NN Inversion
Pre-clustering of data set from the output space
Separate training of partitioned data sets
(Local Experts)
Training Improvement
Inversion Improvement
28
Unsupervised Clustering
Partitioning a collection of data points into a
number of subgroups
When we know the number of prototypes
 K-NN, Fuzzy C-Means, …
When no a priori information is available.
 ART, Kohonen SOFM, …
29
Adaptive Resonance Theory
Unsupervised learning network developed by Carpenter and
Grossberg in 1987
ART1 is designed for clustering binary vectors
ART2 accepts continuous-valued vectors
F2 layer
 F1 layer is an input processing field
comprising the input portion and the R
interface portion.
 F2 layer is a cluster unit that is a
competitive layer in that the units
compete in a winner-take-all mode for
the right to learn each input pattern.
 The third layer is a reset mechanism
that controls the degree of similarity of
patterns placed on the same cluster
...
Y
...
F1 layer
P
...
U
...
+
...
Q
...
V
...
X
+
+
...
W
+
S
Input Vector(Pattern)
30
Training Phase
Entire Data Set
ART2
sub-data
sub-data
(cluster 1)
(cluster 2)
Training
Training
ANN 1
ANN 2
2.4
RMS Errors
2
1.2
0.8
sub-data

(cluster K)
Training
ANN K

: Supervised Learning
2.19
1.95
1.89
1.57
1.6
: Unsupervised Learning
1.69
1.58
No Clust ering
Clust er1
Clust er2
Clust er3
Clust er4
Clust er5
0.4
0
31
Testing Comparison
32
Inversion Phase
ART2 Cluster 1
(N-Dimension)
ART2 Cluster 3
(N-Dimension)

ART2 Cluster K
(N-Dimension)
Desired Output
(M-Dimension)
Cluster Selection (Projection)
ANN 1
Inversion
ANN 2
Inversion

ANN K
Inversion
Optimal Input Parameters
33
Inversion
from ART2 Modular Local Experts
Multiple parameter
inversion and maximizing
the target area.
The output target area is
tiled with 2x2 pixel regions.
Thus, 2x2 output pixel
groups are inverted one at
a time to find out the best
combination of these 5
input parameters to satisfy
the corresponding SE
values.
34
Contributions: Inversion
A new neural network inversion algorithm was
proposed whereby several neural networks
are inverted in parallel.
Advantages include the ability to segment the
problem into multiple sub-problems which
each can be independently modified as
changes to the system occur over time.
The concept is similar to the mixture of
experts problem applied to neural network
inversion.
35
Sensitivity Analysis
Feature selection as neural network is being trained or after the
training. Useful to eliminate superfluous input parameters
[Rambhia].
– reducing the dimension of the decision space and
– increasing speed and accuracy of the system.
When implemented in hardware, the non-linearity occuring in
the operation of various network component may practically
make a network impossible to train significantly [Jiao].
– very important in the investigation of non-ideal effects. (important
issue from the view point of engineering).
Once neural network is trained, it is very important to determine
which of the control parameters are critical to the decision
making at a certain operating point.
36
NN Sensitivity
The neural network sensitivity shows how sensitively the output OT responds
with respect to the change of the input parameter k.
ODC
(Don’t Care Area)
ik
OT
(Target Surveillance Area)
37
NN Sensitivity
OT
Chain rule is used to derive
ik
OT OT h


ik
h ik
where h represents hidden neurons.
Generally, for n hidden layers
OT OT hn
h2 h1




ik
hn hn 1
h1 ik
where hn represents neurons in nth hidden layer.
38
Inversion vs. Sensitivity
Inversion
finds
Sensitivity
E
ikt
from output error to input
using chain rule
to update new input
ikt 1  ikt  
…
E
ikt
neti
finds
O
ik
from output to input
using chain rule
ti

oi
…
-1
neti

oi
ei
39
NN Sensitivity – output neuron
1. Output Layers
: find local gradient of oi at output neuron i
i 
hj
neti
wij

i
oi
oi
  (neti )
neti
where, neti   h j wij ,
j
 (neti )  1 /(1  exp(  neti )), and
 (neti )   (neti )(1   (neti ))
40
NN Sensitivity – hidden neuron
2. Hidden Layers
: find gradient at hidden neuron j with respect to OT
netj

hj
wji
j(netj)
i
 j   j (net j )   i w ji
iOT
j
41
NN Sensitivity – input neuron
3. Input Layer
: find gradient of OT at input neuron k
 k    j wkj
ik
wkj
k
j
jH
OT
 k
ik
42
Nonlinear Sensitivity
Absolute Sensitivity
OT
 k
ik
Relative(Logarithmic) Sensitivity
i
ln (OT ) OT / OT OT / ik


 k k
ln (ik )
ik / ik
OT / ik
OT
43
Sonar Sensitivity
44
Contributions: Sensitivity Analysis
Once neural network is trained,
especially, it is very important to
determine which of the control
parameters are critical to the decision
making at a certain operating point
such that environmental situation
or/and control criteria is given.
45
Multiple Objects Optimization
Optimization of multiple
objects in order to satisfy
system’s maximum
cooperating performance.
The composite effort of the
system team is significantly
more important than a single
system’s individual
performance
46
Target Covering Problem
Multiple rectangular boxes
move to cover the circular
target area.
Each box is situated by 4
parameters including 2
position variables, (x0, y0), an
orientation, , and an aspect
ratio, r.
The area of each box is fixed.
47
Box Parameters
y
L
W
(x0,y0)
θ
x
 (x  x0 ) cos θ  (y  y0 ) sin θ   (y  y0 ) cos θ  (x  x0 ) sin θ 
f xy(x0 ,y0 ,θ , r )  Π
  Π

L
W

 

where L 
1

A
 1, if x 
, W  r  A , and  ( x)  
2
r
 0, else
48
Target Parameters
y
 ( x  c1 ) 2  ( y  c2 ) 2 

t ( x, y)  
2
2R


(c1,c2)
R
where (c1, c2) is center of gravity and
R is radius of circular target.
x
49
Aggregation & Evaluation
Aggregation of N boxes :
N
g ( x, y )   f xy ( x i 0 , y i 0 , i , r i )
i
Evaluation of Coverage :
eval( x, y )  g ( x, y )  t ( x, y )
50
Genetic Algorithm(Optimization)
Optimization deals with problems of function maximization or minimization
with several variables usually subject to certain constraints in general.
While traditional search algorithms commonly impose severe constraints to
the functions to be minimized (or maximized) such as continuity, derivative
existence or unimodality, genetic algorithms work in a different ways, acting
as a global probabilistic search method.
Chromosomal Representation.
Box 1
x1
y1
1
r1
...
...
Box N
xN
yN
N
rN
51
Experiment 1 : 2-Box Problem
Box and Target Parameters
 A = 2000 chromosomes
 R = 64
 (C1,C2) = (128,128)
GA Parameters
 Population Size = 20
chromosomes
 Bit Resolution = 8 bits
 Probability of Crossover = 0.7
 Probability of Mutation = 0.03
52
Experiment 2 : 4-Box Problem
Box and Target Parameters
 A = 1000
 R = 64
 (C1,C2) = (128,128)
GA Parameters




Population Size = 30 chromosomes
Bit Resolution = 8 bits
Probability of Crossover = 0.8
Probability of Mutation = 0.02
53
MULTIPLE SONAR PING OPTIMIZATION
Sonar Coverage : When the output pixels above a certain
threshold value are only meaningful, those pixels in the output
surveillance area are considered as “covered”.
Maximization of the aggregated sonar coverage from the given
number of pings allows minimization of the counter detection by
the object for which we are looking.
Genetic algorithm based approach to find the best combination
of control parameters when the environment is given.
54
Multiple Sonar Ping Coverage Optimization
55
Aggregation
Maximization
O1 , O 2 ,O K
K
Omax, j  Max Om, j
m 1
MAX
Sigmoid squashing function

Omax,
j 
1
1  exp
Summation
A
O max
 (Omax, j  )
O  max
SUM
N

 Omax,
j 1
j
A
56
Genetic Algorithm : Population
The sonar control parameter is in the range of [10m 130m] and
the required precision is 3 places after the decimal point. So, the
required number of bits for a depth is 17. Thus the population size
has 40 chromosomes and each has 68(=4x17) bit strings
depth 1
depth 2
depth 3
depth 4
Chromosome 1 :
17 bits
17 bits
17 bits
17 bits
Chromosome 2 :
17 bits
17 bits
17 bits
17 bits
...
...
...
...
Chromosome P :
17 bits
17 bits
17 bits
17 bits
57
4 Sonar Ping Coverage Problem
58
2 Sonar Ping Problem
59
Contributions: Maximal Area Coverage
The systems need not be the replications of each
other but can, for example, specialize in different
aspects of appeasing the fitness function.
The search can be constrained, for example, a
constraint imposing module can be inserted.
60
Conclusions
A novel neural network learning algorithm (Don’t care
training with step size modification) for data sets with
varying output dimension is proposed.
A new neural network inversion algorithm was proposed
whereby several neural networks are inverted in parallel.
(the ability to segment the problem into multiple subproblems)
The sensitivity of neural network is investigated. Once
neural network is trained, especially, it is critical to the
decision making at a certain operating point. This can be
done through input parameter sensitivity analysis.
There exist numerous generalizations of the fundamental
architecture of maximal area coverage problem that allow
application to a larger scope of problems.
61
Ideas for Future Works
More work could be done for more accurate training of
sonar data. Especially, multi resolution neural networks
could help extract discrete detection maps.
Data pruning using nearest neighbor analysis before
training or query-based learning using sensitivity analysis
during training could improve the training time or/and
accuracy.
Extensive research for the use of evolutionary algorithms
to improve the inversion speed and precision. Particle
swarm optimization or genetic algorithms could be
considered for more flexibility on imposing feasibility
constraints.
62
63