Study of topographic maps and Clustering for Fault Classification
Download
Report
Transcript Study of topographic maps and Clustering for Fault Classification
Study of Topographic and
Equiprobable Mapping with
Clustering for Fault
Classification
Ashish Babbar
EE645 Final Project
1
Introduction
A typical control system consists of four basic elements:
Dynamic plant
Controllers
Actuators
Sensors
Any kind of malfunction in these components can result in
unacceptable anomaly in overall system performance.
They are referred to as fault in a control system. The Objective of
fault detection and identification is to detect, isolate and identify
these faults so that system performance can be recovered
Condition Based Maintenance (CBM) is the process of executing
repairs when objective evidence indicates need for such actions or
in other words when anomalies or faults are detected in a control
system.
2
Motivation
Model based CBM can be applied when we have a mathematical
model of the system to be monitored.
When CBM needs to be performed based on just the data available
from sensors, data driven methodologies are utilized for this
purpose.
SOM is widely used in data mining as a tool for exploration and
analysis of large amounts of data.
It can be used for data reduction or vector quantization so that we
can analyze the system data for anomalies by using only the data
clusters formed from the trained map instead of the large initial data
sets.
3
Competitive Learning
Assume a sequence of input
samples v(t) in d-dimensional input
space and a lattice of N neurons,
labeled i = 1,2,……..,N and with the
corresponding weight vectors
wi(t) =[wij(t)] .
If v(t) can be simultaneously
compared with each weight vector
of the lattice then the best matching
weight, for example wi* can be
determined and updated to match
or even better the current input.
As a result of the competitive
learning, different weights will
become tuned to different regions in
the input space.
Output layer
i’
v
V
Input layer
4
Self Organizing Maps
SOM is an unsupervised neural network technique which finds
application in:
Density estimation e.g. clustering or classification purposes
Blind Source Separation
Visualization of data sets
It projects the input space on prototypes of low-dimensional regular
grid that can be effectively utilized to visualize and explore
properties of data.
The SOM consists of a regular, two dimensional grid of map units
(neurons).
Each unit i is represented by a prototype vector
wi(t) = [wi1(t), …, wid(t)];
where d is input vector dimension
5
Self Organizing Maps (Algorithm)
Given a data set the number of map units (neurons) is first chosen.
The Map units can be selected to be approximately equal to √N to
5√N, where N is the number of data samples in the given data set.
The SOM is trained iteratively. At each training step, a sample vector
v is randomly chosen from the input data set.
Distances between v and all prototype vectors are computed. The
Best Matching Unit (BMU) or the winner, which is denoted here by b
is the map unit with prototype closest to v.
||v – wb|| = mini {||v – wi||}
6
Self Organizing Maps
The BMU or winner and its topological neighbors are moved closer to
the input vector in the input space
wi(t+1) = wi(t) + α(t)*hbi(t)*[v-wi(t)]
t
time
α(t) adaptation coefficient
hbi(t) neighborhood kernel centered on winner unit
|| rb ri ||2
hbi (t ) exp
2
2 (t )
where rb and ri are positions of neurons b and i on the SOM grid. Both
α(t) and σ(t) decrease monotonically with time.
7
Clustering- The Two Level Approach
Group the input data into clusters where the data is grouped into
same cluster if it’s similar to one another.
A widely adopted definition of clustering is a partitioning that
minimizes the distances within and maximizes the distance between
clusters.
Once the neurons are trained the next step is clustering of SOM. For
clustering of SOM the two level approach is followed.
First a large set of neurons much larger than the expected number of
clusters is formed using the SOM.
The Neurons in the next step are combined to form the actual clusters
using the k-means clustering technique .
The number of clusters is K and number of Neurons is M.
K<<M<<N
8
Two level Approach
Level 1
Level 2
Data Samples
M Prototypes of the SOM
K Clusters
N Samples
K<M<N
9
Advantage of using the two level approach
The primary benefit is the reduction of computational cost. Even with
relatively small number of data samples many clustering algorithms
become intractably heavy.
For e.g. By using two level approach the reduction of computational
load is about √N /15 or about six fold for N=10,000 from the direct
clustering of data using k-means
Another benefit is noise reduction. The prototypes are local
averages of data and therefore less sensitive to random variations
than the original data.
10
K-means decision on number of clusters
K-means algorithm was used at level 2 for clustering of the trained
SOM neurons.
K-means algorithm clusters the given data into k clusters where we
define k.
To decide the value of k one method is to run the algorithm from k=2
to k= √N where N is the number of data samples.
K-means algorithm minimizes the error function:
C
E || x ck ||2
k 1 xQk
Where C is the number of clusters and ck is the center of cluster k.
The approach followed in this project was to pick the number of
clusters as the value k which makes the error E 0.10E’ to 0.15*E’ or
10 to 15% of E’ where E’ is the error when k=2
11
Selection of number of clusters
12
Reference Distance Calculation
The clusters thus formed using training/nominal data sets are used
to calculate the reference distance (dRef).
Knowing the cluster centers calculated from the k-means algorithm
and the prototypes/Neurons which formed a particular cluster, we
calculate the reference distance for each cluster.
Reference distance specific to a particular cluster is equal to the
distance between the cluster center and the prototype/Neuron
belonging to this cluster that is at the maximum distance from this
cluster center.
Similarly the reference distance for each of the clusters formed from
the nominal data set is calculated and serves as a base for fault
detection.
To classify the given data cluster as nominal or faulty this underlying
structure of the initial known nominal data set is used.
13
Fault Identification
The assumption made in this case is that the nominal data sets
available which are used to form the underlying cluster structure
spans the space of all information that is not faulty.
The same procedure is then repeated for the unknown data sets (no
idea if nominal or in fault) i.e. first the N data points are reduced to a
mapping of M neurons and then clustered using k-means algorithm.
Now taking the training data clusters as centers and knowing the
reference distance for each cluster, we see if the clusters from the
unknown data set are a member of the region spanned by the radius
equal to the specific reference distance for that training cluster.
Any unknown data set cluster which is not a part of the region
spanned by taking the training data cluster as center and radius
equal to reference distance for that cluster is termed as faulty.
14
Block Diagram
Training
Data
Mapping
Algorithm
Clustering
N samples
M Neurons
K clusters
Unknown
Data
Mapping
Algorithm
Clustering
N samples
M Neurons
K clusters
Reference
Distance
Distance
Deviation
Fault
Identification
K<<M<<N
15
SOM Using Nominal data
16
Clustering of nominal data
17
SOM using Unknown data
18
Clustering of unknown data
19
Fault Identification
20
SOM Performance
21
Equiprobable Maps
For Self Organizing feature maps the weight density at convergence
is not a linear function of the input density p(v) and hence the
neurons of the map will not be active with equal probabilities (i.e. the
map is not equiprobabilistic).
For a discrete lattice of neurons, the weight density will be
proportional to:
1
p( wi ) p
1
2
d
(v )
Regardless of the type of neighborhood function used SOM tends to
undersample the high probability regions and oversample the low
probability regions
22
Avoiding Dead Units
SOM algorithm converges to a mapping which yields neurons that
are never active (“dead units”).
These units do not contribute to the minimization of the overall
(MSE) distortion of the map.
To produce maps in which the neurons would have an equal
probability to be active (Equiprobabilistic Maps) the idea of adding
conscience to the winning neuron was introduced.
Techniques of generating Equiprobabilistic Maps discussed are:
Conscience Learning
Frequency Sensitive Competitive Learning (FSCL)
23
Conscience Learning
When a neural network is trained with unsupervised competitive
learning on a set of input vectors that are clustered into K
groups/clusters then a given input vector v will activate neuron i* that
has been sensitized to the cluster containing the input vector.
However if some region in the input space is sampled more
frequently than the others, then a single unit begins to win all
competitions for this region.
To counter this defect, one records for each neuron i the frequency
with which it has won competition in the past, ci, and adds this
quantity to the Euclidean distance between the weight vector wi and
the current input v.
24
Conscience Learning
In Conscience learning two stages are distinguished
First the winning neuron is determined out of the N units:
arg min i 1,...., N || wi v ||2
i* =
Second, the winning neuron i* is not necessarily the one that will
have its weight vectors updated.
Which neurons need to be updated depends on an additional term
for each unit, which is related to the number of times the unit has
won the competition in recent past.
The rule of update is that for each neuron the number of times it has
won the competition is recorded and a scaled version of this quantity
is added to the distance metric used in minimum Euclidean distance
rule
25
Conscience Learning
Update rule
|| wi* v || Cci* || wi v || Cci
i
With ci the number of times neuron i has won the competition, and
C the scaling factor (“Conscience Factor”).
After determining the winning neuron i* it’s conscience is
incremented
ci* ci* 1
The weight of the winning neuron is updated using
wi* (v wi* )
where is the learning rate and its value is equal to small positive
constant
26
Conscience learning using nominal data
27
Clusters shown with neurons
28
Clustering of nominal data
29
Conscience learning on new data set
30
Clustering of unknown data set
31
Clusters represented on data set
32
Fault Identification
33
Conscience learning Performance
34
Frequency Sensitive Competitive Learning
Another competitive learning scheme which is used is the
Frequency Sensitive Competitive Learning
This learning scheme keeps a record of the total number of times
each neuron has won the competition during training.
The distance metric in the Euclidean distance rule is then scaled as
follows:
|| wi* v || x ci* || wi v || x ci
i
After selection of winning neuron its conscience is then incremented
and the weight vector updated using the UCL rule:
wi* (v wi* )
35
FSCL using Nominal data
36
Clustering of nominal data
37
Clusters shown with neurons
38
FSCL using the unknown data set
39
Clustering using unknown data
40
Clusters of unknown data
41
Fault Identification
42
FSCL performance
43
Conclusions
As shown in the results the performance of SOM algorithm was not
as good as compared to the CLT and FSCL approaches.
As SOM produces dead units even if the neighborhood function
converges slowly so it was not able to train the neurons well
according to the available data sets.
Due to undersampling of high probability regions, SOM was able to
detect only two faulty clusters out of the four and thus its
performance was not good.
Using CLT and FSCL approach all four faulty clusters were detected
using the reference distance as the distance measure.
Thus the equiprobable maps perform much better than the SOM by
avoiding the dead units and training the neurons by assigning a
conscience with the winning neuron
44
References
Marc M. Van Hulle, Faithful Representations and Topographic maps
‘From Distortion to Information based Self Organization’, John
Willey & sons 2000
T. Kohonen, Self Organizing Maps, Springer 1997
Anil K. Jain and Richard C. Dubes, Algorithms for Clustering Data,
Prentice Hall 1988
Juha Vesanto and Esa Alhoniemi, “Clustering of the Self Organizing
Map”, IEEE Trans. Neural Networks, Vol 11, No 3, May 2000
45
Questions/Comments ?
46