A methodology for dy..

Download Report

Transcript A methodology for dy..

A methodology for dynamic
data mining based on fuzzy
clustering
Source: Fuzzy Sets and Systems
Volume: 150, Issue: 2, March 1, 2005,
pp. 267-284
Authors: Fernando Crespo、Richard Weber
Speaker: 黃琬淑(Wan-Shu Huang)
Date: 2005/12/22
1
Outline




Introduction
Dynamic data mining using fuzzy
clustering
Application
Conclusions and comment
2
Introduction(1/3)

Clustering technique is to group similar
objects into the same classes

Keep applying data mining system in a
changing environment
1.Neglects changes and without any updating
2.A new system is developed
3.Update of the classifier
3
Introduction(2/3)

Propose a methodology follow strategy 3

First identify the need for a system’s update
by applying it to new data.

Second perform the update by using
efficiently the previous system.
4
Introduction(3/3)

Hierarchical clustering
e.g. CHAMELEON

Partitional clustering
e.g. c-means and fuzzy c-means

Taxonomy of dynamic data mining for clustering
5
Dynamic data mining
using fuzzy clustering(1/11)

Possible changes of the classifier’s
structure



Creation of new classes
Elimination of classes
Movement of classes
6
Dynamic data mining
using fuzzy clustering(2/11)
7
Dynamic data mining
using fuzzy clustering(3/11)

Step 1 Identify objects that represent changes
d (vi , v j ) i  j ,
dˆik  dˆ ( xk , vi ),
i, j  1,..., c.
i  1,..., c,
k  n  1,..., n  m.
uˆik , i  1,..., c, k  n  1,..., n  m.
8
Dynamic data mining
using fuzzy clustering(4/11)

Condition 1:not classified well by the existing classifier
1
uˆ ik    k  n  1,..., n  m i  1,..., c.
c

Condition 2:far away from the current classes
1
ˆ
d i k  min d (vi , v j ) k  n  1,..., n  m i  j  1,..., c.
2
9
Dynamic data mining
using fuzzy clustering(5/11)

Based on these two conditions
1 x k fulfills Conditions 1 and 2,
1IC ( x k )  
0 else.

If
n m
1 (x )  0
k n1 IC k

, process with step 3.1
else go to step 2
10
Dynamic data mining
using fuzzy clustering(6/11)

Step2 Determine changes of class structure
nm
 k  n  1 1IC ( xk )  
m
with a parameter  , 0    1.
Above β create new classes (step 3.2)
else just move the existing classes (step3.1)
11
Dynamic data mining
using fuzzy clustering(7/11)

Step3.1 Move classes
1 object k is assigned to class i,
1Ci ( xk )  
0 else.
ˆvi  (1  i )vi  i vi ,
i 
n m
(
1
Ci ( x k )  (1  1IC ( x k ))  uˆ ik )
k n1
.
n
n m
(1Ci ( x j )  uij )  k n1 (1Ci ( xk )  (1  1IC ( xk ))  uˆik )
j 1



12
Dynamic data mining
using fuzzy clustering(8/11)

Step 3.2 Create classes
N
c
L(c)   uik d ik2 ,
k 1 i 1
S (c)  structure strength
  (effectiveness of classifica tion)  (1   )( accuracy of classifica tion)
  log( N / c)  (1   ) log( L(1) / L(c)).


C 越大,E越小,L(c)越小,L越大
C 越小,E越大,L(c)越大,L越小
13
Dynamic data mining
using fuzzy clustering(9/11)
14
Dynamic data mining
using fuzzy clustering(10/11)

Step 4 Identify trajectories of classes



t
c
First set counter is i
t
Created class i in cycle t-1,set counter: ci =1
Class I is the result of moving a class j in cycle t-1,
set counter: c t  c t 1  1
i
j
15
Dynamic data mining
using fuzzy clustering(11/11)

Step 5 Eliminate unchanged classes

A class has to be eliminated if it did not receive
new objects for a long period.
16
Application of the proposed
methodology(1/8)


500 objects for
each of the four
classes
Shows the initial
data set
(0,15)(8,35)
(15,0)(15,20)
17
Application of the proposed
methodology(2/8)

Apply fuzzy c-means with c=4 and m=2

Presents the respective cluster solution
18
Application of the proposed
methodology(3/8)
In the first
cycle 600
new
objects
arrive
19
Application of the proposed
methodology(4/8)

Results after fist cycle
20
Application of the proposed
methodology(5/8)
In the
second cycle
500 new
objects arrive
21
Application of the proposed
methodology(6/8)

Results after second cycle
22
Application of the proposed
methodology(7/8)

In the third cycle 600 new objects arrive
23
Application of the proposed
methodology (8/8)

Results after third cycle
24
Conclusions and comment

Presented a methodology
Used fuzzy c-means
Provide updated class structures
Analyzing changes in application domain

The parameters of set is a question



25