Multiobjective Clustering with Automatic k
Download
Report
Transcript Multiobjective Clustering with Automatic k
國立雲林科技大學
National Yunlin University of Science and Technology
Multiobjective Clustering with Automatic
k-determination for Large-scale Data
Presenter : Shao-Wei Cheng
Authors : Nobukazu Matake, Tomoyuki Hiroyasu,
Mitsunori Miki, Tomoharu Senda
CECCO 2007
Intelligent Database Systems Lab
Outline
Motivation
Objective
Methodology
Original MOCK
New scalable k-determination scheme
Experiments and Results
Conclusion
Personal Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Motivation
N.Y.U.S.T.
I. M.
Web behavior mining has attracted a great deal of
attention today.
MOCK is powerful and strict. But the computational costs
are too high when applied to clustering huge data.
Too Much
Data !!
3
Intelligent Database Systems Lab
Objectives
Apply MOCK to web data clustering with a scalable
automatic k-determination scheme.
Determine the appropriate k at low cost.
N.Y.U.S.T.
I. M.
It contains two complementary objectives.
Determination of appropriate k.
Find partitions between k clusters.
4
Intelligent Database Systems Lab
Methodology
Original MOCK
N.Y.U.S.T.
I. M.
Third Step
First Step
Forth Step
Second Step
Gap statistic
5
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
New scalable k-determination scheme
First Step
Second Step
First scheme:Calculate adjacent angles
x
y
Second scheme
x
x
6
Intelligent Database Systems Lab
Experiments
N.Y.U.S.T.
I. M.
7
Intelligent Database Systems Lab
Conclusion
N.Y.U.S.T.
I. M.
The new scheme is able to determine the appropriate k at
low cost, although the performance is poorer than the
original algorithm.
Reduce the Pareto size by about 50-70%.
Doesn’t need random data clustering.
8
Intelligent Database Systems Lab
Personal Comments
N.Y.U.S.T.
I. M.
Advantage
MOCK can be applied to large-scale data.
Drawback
Application
Web data.
9
Intelligent Database Systems Lab