Spatial Data Mining

Download Report

Transcript Spatial Data Mining

Spatial Data Mining
Introduction
• Spatial data mining is the process of discovering
interesting, useful, non-trivial patterns from large
spatial datasets
– E.g. co-location patterns of water pumps and cholera
– Determining hotspots: unusual locations
• Spatial Data Mining Tasks
– Classification/Prediction
– Co-location Mining
– Clustering
• Recap of special properties of Spatial Data
– Spatial autocorrelation
– Spatial heterogeneity
– Implicit Spatial Relations
2
Spatial Relations
• Spatial databases do not store spatial
relations explicitly
– Additional functionality required to compute them
• Three types of spatial relations specified by
the OGC reference model
– Distance relations
• Euclidean distance between two spatial features
– Direction relations
• Ordering of spatial features in space
– Topological relations
• Characterise the type of intersection between spatial
features
3
Distance relations
•
If dist is a distance
function and c is
some real number
1. dist(A,B)>c,
2. dist(A,B)<c and
3. dist(A,B)=c
A
A
A
B
B
B
4
Direction relations
• If directions of B and C
are required with
respect to A
• Define a representative
point, rep(A)
• rep(A) defines the
origin of a virtual
coordinate system
• The quadrants and half
planes define the
direction relations
• B can have two values
{northeast, east}
• Exact direction relation
is northeast
C north A
C
B northeast A
A
B
rep(A)
5
Topological Relations
• Topological relations describe how geometries
intersect spatially
• Simple geometry types
– Point, 0-dimension
– Line, 1-dimension
– Polygon, 2-dimension
• Each geometry represented in terms of
•
– boundary (B) – geometry of the lower dimension
– interior (I) – points of the geometry when boundary is
removed
– exterior (E) – points not in the interior or boundary
Examples for simple geometries
– For a point, I = {point}, B={} and E={Points not in I and B}
– For a line, I={points except boundary points}, B={two end
points} and E={Points not in I and B}
– For a polygon, I={points within the boundary}, B={the
boundary} and E={points not in I and B}
6
DE-9IM
• Topological relations are defined using any
one of the following models
– 4IM, four intersection model (only B and E
considered)
– 9IM, nine intersection models (B, I, and E)
– DE-9IM, dimensionally extended 9 intersection
model
• DE-9IM is an OGC complaint model
• Dim is the dimension function
7
Example
• Consider two
polygons
– A - POLYGON ((10
10, 15 0, 25 0, 30 10,
25 20, 15 20, 10 10))
– B - POLYGON ((20
10, 30 0, 40 10, 30
20, 20 10))
8
9-Intersection Matrix of example
geometries
I(B)
B(B)
E(B)
I(A)
B(A)
E(A)
9
DE-9IM for the example
geometries
I(B)
B(B)
E(B)
I(A)
2
1
2
B(A)
1
0
1
E(A)
2
1
2
10
Relationships using DE-9IM
•
•
•
Different geometries may give
rise to different numbers in the
DE-9IM
For a specific type of
relationship we are only
interested in certain values in
certain positions
– That is, we are interested in
patterns in the matrix than
actual values
Actual values are replaced by
wild cards
– T: value is "true" - non empty any dimension >= 0
– F: value is "false" - empty dimension < 0
– *: Don't care what the value is
– 0: value is exactly zero
– 1: value is exactly one
– 2: value is exactly two
A
I(B) B(B) E(B)
over
laps
B
I(A) T
*
T
B(A) *
*
*
E(A) T
*
*
11
Topological Relations
•
x.Disjoint(y)
•
x.Touches(y)
•
x.Crosses(y)
•
x.Within(y)
•
x.Overlaps(y)
•
DE-9IM string for example geometries was ‘212101212’ (from earlier
slide)
– FF*FF****
– FT******* Area/Area, Line/Line, Line/Area, Point/Area
– F**T***** Not Point/Point
– F***T****
– T*T****** Point/Line, Point/Area, Line/Area
– 0******** Line/Line
– TF*F*****
– T*T***T** Point/Point, Area/Area
– 1*T***T** Line/Line
– A crosses B
– A overlaps B
12
Approaches to Spatial Data
Mining
• Materialize spatial features and use
Weka
– Required features are added as additional
attributes to the main feature
– To create a flat file of data
• Use special data mining techniques that
take spatial dependency into account
13
Materializing features- Example
14
Materializing features- Example
(2)
15
Spatial Data Mining Architecture
• Retrieve data belonging
to multiple themes
• Preprocess spatial data
to materialize spatial
features
– Select the required
features
– Use the methods to
compute spatial relations
to create a flat file of
data
• Use Weka like tool to
perform data mining
Weka
Flat File
Feature Selection &
OGC complaint methods
to compute relations
Multiple Themes
OGC Complaint Spatial DBMS
16
Spatial Clustering
•
•
Also called spatial segmentation
Input
•
Output
•
Census Website has plenty of examples
– a table of area names and their corresponding attributes such as population
density, number of adult illiterates etc.
– Information about the neighbourhood relationships among the areas
– A list of categories/classes of the attributes
– Grouped (segmented) areas where each group has areas with similar
attribute values
– http://www.statistics.gov.uk/census2001/censusma
ps/index.html
17
Similarity with image
segmentation
• Spatial segmentation is
performed in image
processing
– Identify regions (areas)
of an image that have
similar colour (or other
image attributes).
– Many image segmentation
techniques are available
• E.g. region-growing
technique
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
1 2
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
18
Region Growing Technique
• There are many flavours of
this technique
• One of them is described
below:
– Assign seed areas to each of
the segments (classes of the
attribute)
– Add neighbouring areas to
these segments if the
incoming areas have similar
values of attributes
– Repeat the above step until
all the regions are allocated
to one of the segments
1
1
1
1
1
1
2
2
2
2
2
2
2
• Functionality to compute
spatial relations (neighbours)
assumed
19
Summary
• Spatial data storage available as extensions of
RDBMS
• Visualization of Spatial data available in GIS
• Spatial Data Mining requires functionality to compute
spatial relations
• OGC specifications provide the standards for all the
above resources
• MYSQL provides data spatial data storage
– But only partially provides the functionality for computing
relations
• Several OpenSource systems provide all the above
resources for spatial data
– OpenJump, GeoTools
20