Adaptive Fuzzy Clustering of Data With Gaps

Download Report

Transcript Adaptive Fuzzy Clustering of Data With Gaps

Adaptive Clustering Of Incomplete Data
Using Neuro-fuzzy Kohonen Network
Outline
Data
with gaps clustering on the basis of
neuro-fuzzy Kohonen network
Adaptive algorithm for probabilistic fuzzy
clustering
Adaptive probabilistic fuzzy clustering
algorithm for data with missing values
Adaptive algorithm for possibilistic fuzzy
clustering
Adaptive algorithm for possibilistic fuzzy
clustering of data with missing values
2/22
Introduction

The clustering problem for multivariate observations often
encountered in many applications connected with Data
Mining and Exploratory Data Analysis. Conventional
approach to solving these problems requires that each
observation may belong to only one cluster. There are many
situations when a feature vector with different levels of
probabilities or possibilities can belong to several classes.
This situation is the subject of fuzzy cluster analysis,
intensively developing today.

In many practical tasks of Data Mining, including clustering,
data sets may contain gaps, information in which, for
whatever reasons, is missing. More effective in this situation
are approaches based on the mathematical apparatus of
Computational Intelligence and first of all artificial neural
networks and different modifications of classical fuzzy cmeans (FCM) method.
3/22
Processing of data with gaps
Data Set
With Gaps
Filling in missing
values using
specialized
algorithm
Clustering
Data Set
4/22
Algorithms for filling the missing
values
5/22
Data with gaps clustering on the basis of
neuro-fuzzy Kohonen network
The
clustering of multivariate observations problem often occurs in many
applications associated with Data Mining.
The
traditional approach to solving these tasks requires that each observation
may relate to only one cluster, although the situation more real is when the
processed feature vector with different levels of probabilities or possibilities
may belong more than one class. [Bezdek, 1981; Hoeppner 1999; Xu, 2009].
Notable
approaches and solutions are efficient only in cases when the
original array data set has batch form and does not change during the analysis.
However there is enough wide class of problems when the data are fed to the
processing sequentially in on-line mode as this occurs when training Kohonen
self-organizing maps [Kohonen, 1995]. In this case it is not known beforehand
which of the processed vector-images contains missing values.
This
work is devoted to solving the problem on-line clustering of data based
on the Kohonen neural network, adapted for operation in presence of
overlapping classes.
6/22
Adaptive algorithm for probabilistic
fuzzy clustering
1
…
i
…
k
…
N
1
x11
…
xi1
…
xk1
…
xN1
…
…
…
…
…
…
…
…
p
x1p
…
xip
…
xkp
…
xNp
…
…
…
…
…
…
…
…
j
x1j
…
xij
…
xkj
…
xNj
…
…
…
…
…
…
…
…
n
x1n
…
xin
…
xkn
…
xNn
”object-property“ table
X  {x1, x2 ,..., xN }  R n , xk  X , k  1,2,..., N
(1)
U q (k )
(1  m  N )
7/22
The steps algorithm
wq
Introducing the objective function of clustering
N m
E (U q (k ), wq )    U q (k ) D 2 ( xk , wq )
k 1 q 1
m
N
q 1
k 1
(3)
 U q (k )  1, 0   U q (k )  N
1

2
( ) 1 
( D ( xk , wq ))
 ( 1)
,
1
U q (k )  m
2
( ) 1 

(
D
(
x
,
w

k
l ))

l 1

N
( 1) 

 (U q ) xk
 w( 1)  k 1
,
N
 q
( 1)

 (U q (k ))

k 1
(4)
1


( D 2 ( xk 1 , wq( k ) ))1 
U q (k  1) 
,
1

m

(5)
( D 2 ( xk 1 , wl ( k )))1 


l 1

 wq (k  1)  wq (k )   (k  1)U q (k  1)( xk 1  wq (k )),

8/22
Adaptive probabilistic fuzzy clustering
algorithm for data with gaps
In the situation if the data in the array contain gaps, the
approach discussed above should be modified accordingly. For
example, in [Hathaway, 2001] it was proposed the modification
of the FCM-procedure based on partial distance strategy (PDS
FCM).
XF
X
XP
XG
9/22
Processing of data with gaps
Data Set
With Gaps
Filling in missing
values using
specialized
algorithm
Data Set
Clustering
10/22
1.Partition of
original data set
into classes
X  {x1 ,..., xk ,..., xN }  Rn
Data set
3. Standardized
2. Centered
X F  {xk  X | xk - vector containing all gaps}
X P  {xki ,1  i  n,1  k  N | values xk , available in X }
X G  {xki  ?,1  i  n,1  k  N | values xk , absent in X }
11/22
DP2 ( xk , wq )
DP2 ( xk , wq )

n
( xki  wqi ) 2 ki

k 
n
(6)
i 1
Objective function of clustering
E (U q (k ), wq ) 
here
n
xki  X G ,   
0 | ~

k
ki
 ki   ~
i

1
1 | xki  X F ,
N
m

k 1 q 1

U q (k )
n
k 
n

i 1
( xki  wqi ) 2  ki
(7)
12/22
Adaptive probabilistic fuzzy clustering
algorithm for data with gaps
1

2
( ) 1 

(
D
(
x
,
w
))
P
k
q
U q( 1) 
,
1
m

( DP2 ( xk , wq( ) ))1 



l 1

N

( 1)

(
U
(
k
))
 ki xki
q


( 1)
 wqi
 k 1N
,

 (U q( 1) (k ))  ki

k 1

wq(0)
U q(1)
wq(1)
(8)
U q(2)
wq( 1)  wq( )    1  q  m,
…
(9)
13/22
Adaptive probabilistic fuzzy clustering
algorithm for data with gaps
1


( DP2 ( xk 1 , wq (k )))1 
U q (k  1) 
,
1

m

( DP2 ( xk 1 , wq (k )))1 

l 1

 wqi (k  1)  wqi (k )   (k  1)U q (k  1)( xk 1,i  wqi (k )) ki ,


wq (k  1)  wq (k )  (k  1)q (k  1)( xk 1  wq (k ))  k
(10)
(11)
Kohonen’s “Winner – takes – more” rule
14/22
Adaptive algorithm for possibilistic fuzzy
clustering
The main disadvantage of probabilistic algorithms is connected with
the constraints on membership levels which sum has to be equal
unity. This reason has led to the creation of possibilistic fuzzy
clustering algorithms [Krishnapuram, 1993].
In possibilistic clustering algorithms the objective function has the
form
N
m
m
N
E (U q (k ), wq , q )  U q (k ) D ( ~
xk , wq )   q  (1  U q (k )) 

2
k 1 q 1
where the scalar parameter   0
determines the distance at which
level of membership equals to 0.5
q 1
k 1
(12)

1

U
(
k

1
)

,
q
2

~
xk  wq (k )

1
(13)

 q (k )

2
~
wq (k  1)  wq (k )   (k  1)U q (k  1)( xk 1  wq (k )),

k 1
2
2
~

U
(
p
)
x

w
(
k

1
)

q
p
q

p 1
.
 q (k  1) 
k

U q2 ( p )

15/22

p 1
Adaptive algorithm for possibilistic fuzzy
clustering of data with gaps
Adopting instead of Euclidean metric partial distance (PD), we can
write the objective function as
N
m
E (U q (k ), wq , q )  U q (k )

k 1 q1
n
k 
n
m
N
i 1
q 1
k 1
 (~xki  wqi )2  ki   q  (1  U q (k )) 
(14)
and then solving the equations system
 E (U q (k ), wq ,  q )
 0,


U
(
k
)
q

 E (U q (k ), wq ,  q )
 0,



q



 wq E (U q (k ), wq ,  q )  0,

(15)


1
,
U q (k ) 
1
2 ~
D
(
x
,
w
(
k
))

1  ( P k 1 q
)  1

(16)
 q (k )


~
wqi (k  1)  wqi (k )   (k  1)U q (k  1)( xk 1,i  wqi (k )) ki ,

k 1

U q ( p ) DP2 ( ~
x p , wq (k  1))


p 1
,
 q (k  1) 
k 1

U q ( p )


p 1
Thus, the process of fuzzy possibilistic clustering data with gaps
can also be realized by using neuro-fuzzy Kohonen network.
16/22
Results
Iris data set: This is a benchmark data set in pattern recognition analysis, freely
available at the UCI Machine Learning Repository.

It contains three clusters (types of Iris plants: Iris Setosa, Iris Versicolor and
Iris Virginica) of 50 data points each, of 4 dimensions (features): sepal
length, sepal width, petal length and petal width.

The class Iris Setosa is linearly separable from the other two, which are not
linearly separable in their original clusters.
Validation

Cluster validity refers to the problem whether a given fuzzy partition fits to
the data all. The clustering algorithm always tries to find the best fit for a
fixed number of clusters and the parameterized cluster shapes. However this
does not mean that even the best fit is meaningful at all. Either the number of
clusters might be wrong or the cluster shapes might not correspond to the
groups in the data, if the data can be grouped in a meaningful way at all. Two
main approaches to determining the appropriate number of clusters in data
can be distinguished:
17/20
Results
Validity measures:
 Partition Coefficient (PC) - measures the amount of
"overlapping" between cluster. The optimal number of cluster is
at the maximum value.
 Classification Entropy (CE) - it measures the fuzzyness of the
cluster partition only, which is similar to the Partition Coeffcient.
 Partition Index (SC) - is the ratio of the sum of compactness and
separation of the clusters. It is a sum of individual cluster validity
measures normalized through division by the fuzzy cardinality of
each cluster. A lower value of SC indicates a better partition.
 Separation Index (S) - on the contrary of partition index (SC), the
separation index uses a minimum-distance separation for
partition validity
 Xie and Beni’s Undex (XB) - it aims to quantify the ratio of the
total variation within clusters and the separation of clusters. The
optimal number of clusters should minimize the value of the
index.
18/20
Results
Clustering
algorithm
PC
SC
XB
Adaptive algorithm for
probabilistic fuzzy clustering
0.2547
2.7039e-004
0.0169
Adaptive probabilistic fuzzy
clustering algorithm for data with
gaps
0.3755
6.4e-003
0.1842
Adaptive algorithm for
possibilistic fuzzy clustering
0.2691
4.4064e-005
0.0122
Adaptive algorithm for
possibilistic fuzzy clustering of
data with gaps
0.2854
5.103e-003
0.4571
Fuzzy c-means
0.5036
1.6255
0.191
Gustafson-Kessel
0.2755
4.7481e-002
0.5717
Gath-Geva
0.2542
4.941e-005
3.3548
19/20