eneralized Partial Global Planning

Download Report

Transcript eneralized Partial Global Planning

CS599 Spatial & Temporal Database
Spatial Data Mining: Progress and Challenges
Survey Paper
appeared in DMKD96
by Koperski, K., Adhikary, J. and Han, J.
Simon Fraser University, Canada
represented by Chung-hao Tan
Nov.16.2000
1
Outlines
•
•
•
•
•
•
•
What is data mining?
What is spatial data mining?
Generalization-based knowledge discovery.
Clustering-based analysis.
Exploring spatial association rules.
Mining in image database.
Future direction & conclusion.
2
What Is Data Mining?
• A short definition:
“extracting implicit knowledge from large amount of data.”
• The form of discovered knowledge:
– Regression and classification.
– Association rules.
– Clustering.
• What can be contributed by database research?
– Efficient data access method (indexing).
– Query optimizer.
– Data integration.
– …
=> Data Warehousing research provides a convenient platform for
data mining.
3
An Example of Data Mining Technique
• Example:
– Data:
Stock trading data (price, size, number of trades, etc.).
– Query:
Given the current and past trading information, can you tell me
whether it will go up or go down in the next minute?
– Method:
Bayesian CART model search (Chipman, 1997).
=> try to find a classification or regression tree to model the data.
– Result:
1. Reduce the misclassification rate from 53% to 30%.
2. Identify those important classification rules.
3. Identify those important variables (predictors).
4
An Example of Data Mining Technique (Cont.)
5
An Example of Data Mining Technique (Cont.)
6
What Is Spatial Data Mining?
• A short definition:
Extraction of implicit knowledge, spatial relations, or other patterns
not explicitly stored in spatial database.
• Benefits:
– Understand spatial data; query optimization.
– Discover relationships between spatial data and non-spatial data.
– Construction of spatial knowledge base (e.g. associations).
• Application:
–
–
–
–
GIS.
Image database exploration.
Robot navigation.
… (any applications which use spatial data).
7
Primitives of Spatial Data Mining
• Spatial characteristic rules:
– A general description of spatial data.
– E.g. price range of houses in various regions.
• Spatial discriminating rules:
– A general description of comparison among spatial data.
– E.g. a comparison of price ranges of houses in various regions.
• Spatial association rules:
– Implication of one or a set of features by another set of features.
– E.g. house near beach -> is expensive.
8
Primitives of Spatial Data Mining (Cont.)
• Thematic maps:
– Present the spatial distribution of a single or a few attributes.
– E.g. Temperature thematic map.
– Data stored by raster image or vector image.
• Image database:
– A special kind of spatial database where data almost entirely
consists of image or pictures (e.g. satellite image or medical
image).
– These images have coordination properties.
9
Data Mining Architecture
• An example: (by Matheus, 1993)
10
Mining By Statistic Methods
• Methods:
– Regression model.
• Disadvantage.
– Assumption of statistical independence among the spatially
distributed data.
– Need experts’ domain knowledge (in spatial data).
– Cannot model non-linear rules or symbolic values very well.
– Do not work well with incomplete or inconclusive data.
11
Generalization-based Method
• Ideas:
– Learning from examples.
– Combined with
generalization.
• Concept hierarchy.
– Explicitly given by the
domain experts.
– Higher levels are more
general terms.
• Attributed-oriented induction:
– Performed by climbing the generalization hierarchies and
summarizing the general relationships between spatial and nonspatial data at higher concept levels.
– Until reaching a generalization threshold.
12
Spatial-data-dominant Generalization
• Ideas:
– First step: Spatial-oriented induction.
Merging spatial regions according to the spatial concept hierarchy.
– Second step: Attribute-oriented induction.
Non-spatial data at each merged regions are generalized at a
given level by the threshold.
13
Non-spatial-data-dominant Generalization
• Ideas:
– First step: Attribute-oriented induction.
Non-spatial data are generalized at a given level by the threshold.
– Second step: Spatial-oriented induction.
Merging spatial regions which have the same non-spatial
description. Ignore those small regions with different non-spatial
descriptions but inside a large merged region.
14
Generalization-based Method (Cont.)
15
Clustering-based Method
• Ideas:
– Clusters can be found without using any background knowledge.
– Unsupervised learning.
– Methods:
PAM – Repeat to find a better k representatives by trying all
possible pairs of combinations.
CLARA – Same as PAM, but using a subset of data as samples.
CLARANS – Same as PAM, but randomly changing the samples
at each iteration.
16
SD-CLARANS
• Ideas:
– First step: Spatial-oriented induction.
Spatial-relevant data are collected and clustered.
– Second step: Attributed-oriented induction.
Find out the non-spatial description of objects in each cluster.
17
NSD-CLARANS
• Ideas:
– First step: Attributed-oriented induction.
Produce a number of generalized tulples.
– Second step: Spatial-oriented induction.
For each such generalized tuple, all spatial components are
collected and clustered.
18
Other Issues In Clustering
• Need a fast access method to the spatial data (e.g. R*tree).
• Focus on relevant data only.
• Using CF tree (for example) to store clustered results:
– A tuple of data is incrementally inserted into the closet leaf node
(a sub-cluster).
– If the diameter of the sub-cluster exceeds a threshold after
insertion, split that leaf node.
– Each internal node contains a Clustering Feature (CF).
CF = (N, LS, SS)
N: #points in the sub-cluster.
LS: linear sum of the N points.
SS: square sum of the N points.
– Linear scalability; insensibility to the input order; good quality of
clustering.
19
Exploring Spatial Associations
• Example:
–
–
–
–
Is_a(x, school) -> close_to(x, park) 80%.
Topological relations: intersect, overlap, disjoint…
Spatial orientation: left_of, west_of…
Distance information: close_to, far_away…
• Minimum Support:
– Ignore those rules with small number of evidences.
– E.g. Ignore the relation associating only 5% house in that area
and a single school.
– Strong rule: A rule with large support (exceeds the minimum
support threshold).
• Minimum Confidence:
– Filter out those rules with low confidence.
– E.g. Ignore the relations X->Y with only 5% confidence.
20
Multi-level Spatial Associations Rules
• Using tree to explore:
–
–
–
–
Collect task-relevant data.
Computation starts at high level of spatial predicates like close_to.
Utilize spatial indexing methods.
For those pattern that pass the filtering at the high levels, do
further refinements at the lower levels, like adjacent_to, intersects,
distance_less_than_x, etc.
– Filter out those patterns that do not exceed Minimum Support
Threshold or Minimum Confidence Threshold.
– Derive the strong association rules!
21
Using Approximation and Aggregation
• Ideas:
– Instead of asking “where the clusters in the spatial database?”, we
want to know “what are the characteristics of the clusters in terms
of the features that are close to them?”
– E.g. “90% of the expensive house in a cluster are close to a lake”.
– Using computational geometry concept.
– First step: Eliminate unnecessary features.
– Second step: Calculate the aggregate proximity of points in the
cluster to the convex boundary of each features.
– Experiment result: processing 50,000 features within 2 seconds.
22
Mining In Image Database
• Ideas:
– Mining useful information in image database.
– Example: Automatically identify volcano on the surface of Venus
from images transmitted by the spacecraft.
– Question: Is the above example related to spatial data mining
research?
23
Future Directions
•
•
•
•
•
•
•
•
•
Data mining in spatial object-oriented database.
Mining under uncertainty.
Alternative Clustering Techniques.
Mining spatial data deviation and evolution rules.
Using multiple thematic maps.
Interleaved generalization.
Generalization using temporal spatial data.
Spatial Data Mining Query Language.
Multidimensional rule visualization.
24
Conclusion
•
•
•
•
•
•
What is spatial data mining?
(Non-)Spatial-data-dominant generalization
(Non-)Spatial-data-dominant clustering
Spatial association rules
Using approximation and aggregation
Mining in image database
25