Spatial Clustering

Download Report

Transcript Spatial Clustering

Name: Sujing Wang
Advisor: Dr. Christoph F. Eick
Data Mining & Machine Learning Group
Outline
1. Introduction
2. Framework Architecture
3. Methodology
4. Case Study
5. Conclusion and Future Work
Data Mining & Machine Learning
Sujing Wang
2
Introduction
 Spatial Data Mining (SDM):
 the process of analyzing and discovering interesting and
useful patterns, associations, or relationships from large
spatial datasets.
 Spatial object structures:
(<spatial attributes>;<non-spatial attributes>)
 Example:
Data Mining & Machine Learning
Sujing Wang
3
Introduction
 Spatial objects:
point, trajectory(line) polygon(region)
Data Mining & Machine Learning
Sujing Wang
4
Introduction
 Challenges:
 Complexity of spatial data types
 Spatial relationships
 Spatial autocorrelation
 Motivation:
 Polygons, specially overlapping polygons are very important
for mining spatial datasets.
 Traditional Clustering algorithms do not work for spatial
polygons.
 Research goal:
 Develop new distance functions and new spatial clustering
algorithms for polygons clustering.
 Implement novel post-clustering techniques with plug-in
reward functions to capture domain experts notation of
interestingness.
Data Mining & Machine Learning
Sujing Wang
5
A Polygon-based Clustering and Analysis
Framework for Mining Spatial Datasets
Geospatial Datasets
Domain Experts
DCONTOUR
Notion of Interestingness
Spatial Clusters
Poly_SNN
Reward Functions
Meta Clusters
Post-processing
Summaries and Interesting
Patterns
Methodology
1. Domain Driven Final Clustering Generation Methodology
Inputs:
 A meta-clustering M={X1, …, Xk} —at most one object will be
selected from each meta-cluster Xi (i=1,...k).
 The user provides the individual cluster reward function
RewardU whose values are in [0,).
 A reward threshold U —clusters with low rewards are not
included in the final clusterings.
 A cluster distance threshold d, which expresses to what extent
the user would like to tolerate cluster overlap.
 A cluster distance function dist.
Find ZX1…Xk that maximizes:
q(Z )  cZ rewardU (c)
subject to:
 xZ x’Z (xx’  Dist(x,x’)>d)
 xZ (RewardU(x)>U)
 xZ x’Z ((x Xi  x’ Xk  xx’ )  ik)
Data Mining & Machine Learning
Sujing Wang
7
Methodology
2. Finding interesting clusters with respect to
continuous non spatial variable V:
Let Xi 2A be a cluster in the A-space
 be the variance of v with respect in dataset D
(Xi) be the variance of variable v in a cluster Xi
mv(Xi) the mean value of variable v in a cluster Xi
t10 a mean value reward threshold and t21 be a
variance reward threshold
Interestingness function  for each cluster:
( Xi) = max (0, |mv(Xi)| - t1) × max(0, - ((Xi) × t2))
Data Mining & Machine Learning
Sujing Wang
8
Case Study
1. Meta-clusters generated from multiple spatial datasets:
30.4
30.2
30.0
Latitude
29.8
29.6
29.4
29.2
29.0
-95.8
-95.6
-95.4
-95.2
-95.0
-94.8
Longitude
Data Mining & Machine Learning
Sujing Wang
9
Case Study
2. Final Clusters with area of polygons as plug-in reward function
30.4
30.2
80
Latitude
30.0
29.8
150
29.6
21
29.4
125
29.2
13
29.0
-95.8
-95.6
-95.4
-95.2
-95.0
-94.8
Longitude
Polygon ID
13
21
80
125
150
Temperature (oF)
79.0
86.35
89.10
84.10
88.87
Solar Radiation (Langleys per minute)
N/A
1.33
1.17
0.13
1.10
Wind Speed (Miles per hour)
4.50
6.10
6.20
4.90
5.39
Time of Day
6 p.m.
1 p.m.
2 p.m.
2 p.m.
12 p.m.
Data Mining & Machine Learning
Sujing Wang
10
Case Study
3. Finding interesting meta-clusters with respect to solar radiation:
30.2
5
5
29.8
29.6
15
12
edutitaL
Latitude
30.0
51
21
29.4
29.2
29.0
-95.8
-95.6
-95.4
Longitude
-95.2
-95.0
-94.8
Longitude
Cluster ID
Mean
Variance
Number of Polygon
5
15
21
-0.9144
1.1218
1.0184
0.1981
0.1334
0.0350
5
5
3
Data Mining & Machine Learning
Sujing Wang
11
Conclusion & future work
 Conclusions:
 Our framework can effectively cluster spatial overlapping
polygons similar in size, shape and locations.
 Our post-clustering techniques with different plug-in reward
functions can guide the knowledge extraction of interesting
patterns and generate summaries from large spatial datasets.
 Future Works:
 Develop novel spatial-temporal clustering techniques and
embed them to our framework.
 Investigating novel change analysis techniques to identify
spatial and temporal changes of spatial data.
 Evaluate our framework in challenging case studies.
Data Mining & Machine Learning
Sujing Wang
12
Publication:
 S. Wang, C.S. Chen, V. Rinsourongkawong, F. Akdag, C.F. Eick, “Polygon-based
Methodology for Mining Related Spatial Datasets”, ACM SIGSPATIAL GIS
Workshop on Data Mining for Geoinformatics (DMG) in conjunction with
ACM SIGSPATIAL GIS 2010, San Jose, CA, Nov. 2010.
NSF travel Award for ACM GIS 2010
 S. Wang, C. Eick, Q. Xu, “A Space-Time Analysis Framework for Mining
Geospatial Datasets”, CyberGIS’12 the First International Conference on Space,
Time, and CyberGIS, University of Illinois at Urbana-Champaign, Champaign,
IL Aug 6-9, 2012.
NSF travel Award for CyberGIS 2012
 C. Eick, G. Forestier, S. Wang, Z. Cao, S. Goyal, “A Methodology for Finding
Uniform Regions in Spatial Data”, CyberGIS’12 the First International
Conference on Space, Time, and CyberGIS, University of Illinois at Urbana-
Champaign, Champaign, IL Aug 6-9, 2012.
 S. Wang, C.F. Eick, “A Polygon-based Clustering and Analysis Framework for
Mining Spatial Datasets”, Geoinformatica, (Under Review).
Data Mining & Machine Learning
Sujing Wang
13
Thank you!
Data Mining & Machine Learning
Sujing Wang
14