Hard and fuzzy k-Modes algorithms

Download Report

Transcript Hard and fuzzy k-Modes algorithms

國立雲林科技大學
National Yunlin University of Science and Technology
A Fuzzy k-Modes Algorithm for
Clustering Categorical Data
Advisor:Dr. Hsu
Graduate:Chien-Ming Hsiao
Author:Zhexue Huang and Michael K. Ng
Outline









Motivation
Objective
Introduction
Notation
Hard and fuzzy k-means algorithms
Hard and fuzzy k-Modes algorithms
Experimental Results
Conclusions
Personal Opinion
Intelligent Database Systems Lab
Motivation

Working only on numeric data limits the use of
these k-means-type algorithms in data mining.

Most algorithms for clustering categorical data
suffer from a common efficiency problem when
applied to massive categorical-only data sets.
Intelligent Database Systems Lab
Objective

To tackle the problem of clustering large
categorical data sets in data mining
Intelligent Database Systems Lab
Introduction

Fuzzy versions of k-means algorithm

Each pattern is allowed to have membership functions
to all clusters.

Working only on numeric data limits the use of these kmeans-type algorithms in such areas data mining.
Intelligent Database Systems Lab
Introduction

To cluster categorical data methods





the k-means algorithm [Ralambondrainy, 1995]
hierarchical clustering methods [Gower, 1991]
the PAM algorithm [Kaufman et al, 1990]
the fuzzy-statistical algorithms [Woodbury, 1974]
The conceptual clustering methods [Michalski, 1983]
Intelligent Database Systems Lab
Notation

The set of objects to be clustered is stored in a database
table T defined by a set of attributes A1, A2,…, Am.

Let X  X 1,X 2 , ,X n  be a set of n objects.

Object X i is represente d as xi,1 , xi,2 ,, xi,m .
Intelligent Database Systems Lab
Hard and fuzzy k-means algorithms

Let X be a set of n objects described by m numeric
attributes.
Intelligent Database Systems Lab
Hard and fuzzy k-means algorithms

The usual method toward optimization of F is to
use partial optimization for Z and W


fix Z and find necessary conditions on W to minimize F
Fix W and minimize F with respect to Z
Intelligent Database Systems Lab
Hard and fuzzy k-means algorithms

Theorem 1

Let
be fixed and consider Problem (P1)
Intelligent Database Systems Lab
Hard and fuzzy k-means algorithms

Theorem 2

Let
be fixed and consider Problem (P2)
Intelligent Database Systems Lab
Hard and fuzzy k-means algorithms




The complexity of the algorithm
O(tkmn)
The space of the algorithm
O(n(m+k) + km)
Intelligent Database Systems Lab
Hard and fuzzy k-Modes algorithms

Using a simple matching dissimilarity measure for
categorical objects

Replacing the means of clusters with the modes

Using a frequency-based method to find the
modes
Intelligent Database Systems Lab
Hard and fuzzy k-Modes algorithms

Let X and Y be two categorical objects



X=
Y=
The simple matching dissimilarity measure
between X and Y is defined as follows:
Intelligent Database Systems Lab
Hard and fuzzy k-Modes algorithms

Using a frequency-based method to update Z

The Hard k-modes Update Method

The Fuzzy k-modes Update Method
Intelligent Database Systems Lab
Hard and fuzzy k-Modes algorithms

Theorem 3 : The Hard k-modes Update Method

The category of attribute Aj of the cluster mode Zl is
determined by the mode of categories of attribute Aj in
the set of objects belonging to cluster l

the quantity
Intelligent Database Systems Lab
Hard and fuzzy k-Modes algorithms

Theorem 4 : The Fuzzy k-modes Update Method


The category of attribute Aj of the cluster mode Zl is
given by the category that achieves the maximum of the
summation of wli to cluster l over all categories.
the quantity
Intelligent Database Systems Lab
Hard and fuzzy k-Modes algorithms

Theorem 5
Let   1. The fuzzy k - modes algorithm
converges in a finite number of iterations .
Intelligent Database Systems Lab
Hard and fuzzy k-Modes algorithms
Intelligent Database Systems Lab
Experimental Results

To evaluate the performance and efficiency of the
fuzzy k-modes algorithm

To compare the fuzzy k-modes algorithm with the
conceptual k-means algorithm and the hard kmodes algorithm

Use real and artificial data

Soybean disease data set.
Intelligent Database Systems Lab
Experimental Results
Intelligent Database Systems Lab
Experimental Results
Intelligent Database Systems Lab
Experimental Results
Intelligent Database Systems Lab
Experimental Results
Intelligent Database Systems Lab
Experimental Results
Intelligent Database Systems Lab
Conclusions

Introduced the fuzzy k-modes algorithm for
clustering categorical objects based on extensions
to the fuzzy k-means algorithm.

The consequence of Theorem 4 that allows the kmeans paradigm to be used in generating the fuzzy
partition matrix from categorical data
Intelligent Database Systems Lab
Personal Opinion

The fuzzy partition matrix provides more
information to help the user to determine the final
clustering and to identify the boundary objects
Intelligent Database Systems Lab