Spatial Mining

Download Report

Transcript Spatial Mining

Spatial Mining
Introduction
• Spatial Mining is a specialised domain of data
mining whose goal is to find implicit
knowledge in spatial data.
• Spatial data may be viewed as objects with
some location in physical space.
• Spatial data also contain non-spatial attributes
that may be either dependent or independent
on location.
Spatial Data
• Also known as geospatial data or geographic
information it is the data or information that
identifies the geographic location of features
and boundaries on Earth, such as natural or
constructed features, oceans, and more.
Spatial data is usually stored as coordinates
and topology, and is data that can be mapped.
Spatial Data
• Spatial data has location or geo-referenced
features.
• Some of these features are:
– Address, latitude/longitude (explicit)
– Location-based partitions in databases (implicit)
Spatial Object
• Contains both spatial and nonspatial
attributes.
• Must have a location type attributes:
– Latitude/longitude
– Zip code
– Street address
• May retrieve object using either (or both)
spatial or nonspatial attributes.
5
Applications and Problems
• Geographic information systems (GIS) store
information related to geographic locations on
Earth
– Weather, community infrastructure needs, disaster
management, and hazardous waste
• Homeland security issues such as prediction of
unexpected events and planning of evacuation.
• Remote sensing and image classification.
• Biomedical applications include medical imaging
and illness diagnosis.
Spatial Queries
• Spatial selection may involve specialized selection
comparison operations:
–
–
–
–
Near
North, South, East, West
Contained in
Overlap/intersect
• Region (Range) Query – find objects that intersect a
given region.
• Nearest Neighbor Query – find object close to
identified object.
• Distance Scan – find object within a certain distance of
an identified object where distance is made
increasingly larger.
7
Spatial Data Structures
• Data structures designed specifically to store
or index spatial data
• These are:• Minimum bounding rectangles (MBR)
• Different tree structures
– Quad tree
– R-Tree
– kd-Tree
• Image databases
8
MBR
• Minimum Bounding Rectangle
• Smallest rectangle that completely contains
the object
9
MBR Examples
10
Quad Tree
• Hierarchical decomposition of the space into
quadrants (MBRs)
• Each level in the tree represents the object as
the set of quadrants which contain any
portion of the object.
• Each level is a more exact representation of
the object.
• The number of levels is determined by the
degree of accuracy desired.
© Prentice Hall
11
Tree Structures
• Quad Tree: every four quadrants in one
layer forms a parent quadrant in an upper
layer
– An example
R-Tree
• As with Quad Tree the region is divided into
successively smaller rectangles (MBRs).
• Rectangles need not be of the same size or
number at each level.
• Rectangles may actually overlap.
• Lowest level cell has only one object.
• Tree maintenance algorithms similar to those
for B-trees.
13
R-Tree
• Indexing MBRs in a tree
– An R-tree of order m has at most m entries in
one node
– An example (order of 3)
R6
R8
R8
R1
R6
R7
R7
R2
R3
R4
R5
R1
R2
R3
R4
R5
14
R-Tree Example
15
K-D Tree
• Designed for multi-attribute data, not
necessarily spatial
• Variation of binary search tree
• Each level is used to index one of the
dimensions of the spatial object.
• Lowest level cell has only one object
• Divisions not based on MBRs but successive
divisions of the dimension range.
16
k-D Tree Example
17
kd-Tree
• Indexing multi-dimensional data, one
dimension for a level in a tree
– An example
18
Common Tasks dealing with Spatial Data
• Data focusing
– Spatial queries
– Identifying interesting parts in spatial data
– Progress refinement can be applied in a tree
structure
• Feature extraction
– Extracting important/relevant features for an
application
• Classification or others
– Using training data to create classifiers
– Many mining algorithms can be used
• Classification, clustering, associations
19
Spatial Mining Tasks
• Spatial classification
• Spatial clustering
• Spatial association rules
Spatial Classification
• Use spatial information at different
(coarse/fine) levels (different indexing trees)
for data focusing
• Determine relevant spatial or non-spatial
features
• Perform normal supervised learning
algorithms
– e.g., Decision trees,
21
Spatial Clustering
•
•
•
•
Use tree structures to index spatial data
DBSCAN: R-tree
CLIQUE: Grid or Quad tree
Clustering with spatial constraints (obstacles
 need to adjust notion of distance)
22
Spatial Association Rules
• Spatial objects are of major interest, not
transactions
• AB
– A, B can be either spatial or non-spatial (3
combinations)
– What is the fourth combination?
23