CSE591 Data Mining

Download Report

Transcript CSE591 Data Mining

6. Spatial Mining
Spatial Data and Structures
Images
Spatial Mining Algorithms
Spring 2003
Data Mining by H. Liu, ASU
1
Definitions
• Spatial data is about instances located in a
physical space
• Spatial data has location or geo-referenced
features
• Some of these features are:
– Address, latitude/longitude (explicit)
– Location-based partitions in databases (implicit)
Spring 2003
Data Mining by H. Liu, ASU
2
Applications and Problems
• Geographic information systems (GIS) store
information related to geographic locations on Earth
– Weather, climate monitoring, community infrastructure
needs, disaster management, and hazardous waste
• Homeland security issues such as prediction of
unexpected events and planning of evacuation
• Remote sensing and image classification
• Biomedical applications include medical imaging
and illness diagnosis
Spring 2003
Data Mining by H. Liu, ASU
3
Use of Spatial Data
• Map overlay – merging disparate data
– Different views of the same area: (Level 1) streets, power
lines, phone lines, sewer lines, (Level 2) actual elevations,
building locations, and rivers
• Spatial selection – find all houses near ASU
• Spatial join – nearest for points, intersection for areas
• Other basic spatial operations
– Region/range query for objects intersecting a region
– Nearest neighbor query for objects closest to a given place
– Distance scan asking for objects within a certain radius
Spring 2003
Data Mining by H. Liu, ASU
4
Spatial Data Structures
• Minimum bounding rectangles (MBR)
• Different tree structures
– Quad tree
– R-Tree
– kd-Tree
• Image databases
Spring 2003
Data Mining by H. Liu, ASU
5
MBR
• Representing a spatial object by the smallest
rectangle [(x1,y1), (x2,y2)] or rectangles
(x2,y2)
(x1,y1)
Spring 2003
Data Mining by H. Liu, ASU
6
Tree Structures
• Quad Tree: every four quadrants in one layer
forms a parent quadrant in an upper layer
– An example
Spring 2003
Data Mining by H. Liu, ASU
7
R-Tree
• Indexing MBRs in a tree
– An R-tree order of m has at most m entries in one node
– An example (order of 3)
R6
R8
R8
R1
R6
R7
R7
R2
R3
Spring 2003
R4
R5
R1
R2
Data Mining by H. Liu, ASU
R3
R4
R5
8
kd-Tree
• Indexing multi-dimensional data, one dimension
for a level in a tree
– An example
Spring 2003
Data Mining by H. Liu, ASU
9
Common Tasks dealing with Spatial Data
• Data focusing
– Spatial queries
– Identifying interesting parts in spatial data
– Progress refinement can be applied in a tree structure
• Feature extraction
– Extracting important/relevant features for an
application
• Classification or others
– Using training data to create classifiers
– Many mining algorithms can be used
• Classification, clustering, associations
Spring 2003
Data Mining by H. Liu, ASU
10
Spatial Mining Tasks
• Spatial classification
• Spatial clustering
• Spatial association rules
Spring 2003
Data Mining by H. Liu, ASU
11
Spatial Classification
• Use spatial information at different (coarse/fine)
levels (in different indexing trees) for data
focusing
• Determine relevant spatial or non-spatial features
• Perform normal supervised learning algorithms
– e.g., Decision trees, NBC, etc.
Spring 2003
Data Mining by H. Liu, ASU
12
Spatial Clustering
• Use tree structures to index spatial data
• Cluster locally
– DBSCAN: R-tree
– CLIQUE: Grid or Quad tree
Spring 2003
Data Mining by H. Liu, ASU
13
Spatial Association Rules
• Spatial objects are of major interest, not transactions
• AB
– A, B can be either spatial or non-spatial (3 combinations)
– What is the fourth combination?
• Association rules can be found w.r.t. the 3 types
Spring 2003
Data Mining by H. Liu, ASU
14
Summary
• Spatial data can contain both spatial and nonspatial features.
• When spatial information becomes dominant
interest, spatial data mining should be applied.
• Spatial data structures can facilitate spatial mining.
• Standard data mining algorithms can be modified
for spatial data mining, with a substantial part of
preprocessing to take into account of spatial
information.
Spring 2003
Data Mining by H. Liu, ASU
15
Bibliography
• M. H. Dunham. Data Mining – Introductory and
Advanced Topics. Prentice Hall. 2003.
• R.O. Duda, P.E. Hart, D.G. Stork. Pattern
Classification, 2nd edition. Wiley-Interscience.
• J. Han and M. Kamber. Data Mining – Concepts
and Techniques. 2001. Morgan Kaufmann.
Spring 2003
Data Mining by H. Liu, ASU
16