Clustering on Wavelet and Meta

Download Report

Transcript Clustering on Wavelet and Meta

Clustering using
Wavelets and Meta-Ptrees
Anne Denton, Fang Zhang
What do we want to do?
Clustering huge amount of spatial data
accumulated from satellite images, GIS
system,etc.
Compare methods between Wavelets
Trans. and Meta-Ptree, try to mix them
up.
Try to find a efficient method to do
clustering on accuracy.
What is a good clustering
method?
Ability to identify clusters of arbitrary
shapes
nested within one another
have holes, etc
Good time efficiency
High quality on accuracy
Why do we use wavelet?
Insensitive to the ordering of input data
Do not make any assumption about the
number of clusters present
Ability to classify or cluster objects at a
different level of accuracy
Handling noise and outliers
Special characteristics (1)
It is a high dimensional basis for some
high dimensional data.
For 2-dimension, if the wavelet set is
given by  j ,k (t ) for indices of j, k  1,2,...
a linear expansion would be
f (t )   a j ,k j ,k (t )
k
j
for some set of coefficients a j ,k
Special characteristics (2)
Most of the energy of the data is well
represented by a few expansion coefficients, a j , k
(The set of expansion coefficients a j , k are called the
discrete wavelet transform)
Wavelet transforms operations increase
linearly with the length of the data.
The clustering of the coefficients from the
data can be done efficiently.
The data I got
Steps
Data from Ag maps
Clustering the data
by DWT coefficients
Mix with Meta-Ptree
Calculate the sum
of each cluster
Visualization
Are Wavelets and P-trees
related?
Both operate on multiple scales
Same quadrant-based structure
Same problems with quadrant boundaries
(i.e., if wavelets work so do P-trees!)
Technical similarity
Moving averages of Haar Wavelets can be
efficiently computed from P-trees
So are P-trees and Wavelets
the same thing?
Wavelets are transformations in an
orthogonal space
P-tree are not and should not be that:
“Signal” approach cannot cover all data
mining issues
P-trees naturally represent concept
hierarchies
P-trees keep count information directly
Can we use P-trees for
Clustering just as Wavelets?
P-trees defined in structure space
Clustering is done in attribute space
(Wavelet clustering has same problem!)
P-trees in attribute space?
Counts other than 0 and 1 at leaf level
Store results of anding of basic P-trees
What will Meta P-trees look
like?
Design decisions
Break up into count bit planes?
Counts
as attributes (special normalization)
Keep one big Meta P-tree?
Plan
Compare approaches in practice
Potential for Meta P-trees
Attribute space central to data mining
Attribute space is huge, but sparse
(maximum one point per data item)
Compression
essential
Mixed quadrants similar to detail
coefficients for wavelets
Naturally suggests a variant of densitybased clustering