Multiobjective Clustering with Automatic k

Transcript Multiobjective Clustering with Automatic k

國立雲林科技大學
National Yunlin University of Science and Technology
Multiobjective Clustering with Automatic
k-determination for Large-scale Data
Presenter : Shao-Wei Cheng
Authors : Nobukazu Matake, Tomoyuki Hiroyasu,
Mitsunori Miki, Tomoharu Senda
CECCO 2007
Intelligent Database Systems Lab
Outline

Motivation

Objective

Methodology

Original MOCK

New scalable k-determination scheme

Experiments and Results

Conclusion

Personal Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Motivation
N.Y.U.S.T.
I. M.

Web behavior mining has attracted a great deal of
attention today.

MOCK is powerful and strict. But the computational costs
are too high when applied to clustering huge data.
Too Much
Data !!
3
Intelligent Database Systems Lab
Objectives

Apply MOCK to web data clustering with a scalable
automatic k-determination scheme.
Determine the appropriate k at low cost.


N.Y.U.S.T.
I. M.
It contains two complementary objectives.

Determination of appropriate k.

Find partitions between k clusters.
4
Intelligent Database Systems Lab
Methodology

Original MOCK
N.Y.U.S.T.
I. M.
Third Step
First Step
Forth Step
Second Step
Gap statistic
5
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
New scalable k-determination scheme
First Step
Second Step
First scheme：Calculate adjacent angles
x
y
Second scheme
x
x
6
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
7
Intelligent Database Systems Lab
Conclusion

N.Y.U.S.T.
I. M.
The new scheme is able to determine the appropriate k at
low cost, although the performance is poorer than the
original algorithm.

Reduce the Pareto size by about 50-70%.

Doesn’t need random data clustering.
8
Intelligent Database Systems Lab
Personal Comments

N.Y.U.S.T.
I. M.
Advantage

MOCK can be applied to large-scale data.

Drawback

Application

Web data.
9
Intelligent Database Systems Lab

Multiobjective Clustering with Automatic k

Transcript Multiobjective Clustering with Automatic k

Directory