Title: Spatial Data Mining in Geo

Download Report

Transcript Title: Spatial Data Mining in Geo

Title:
Spatial Data Mining in Geo-Business
Overview
Paper available online at
www.innovativegis.com/basis/present/GeoTec08/
Twisting the Perspective of Map Surfaces —
describes the character of spatial distributions through the
generation of a customer density surface
Linking Numeric and Geographic Distributions —
investigates the link between numeric and geographic distributions
of mapped data
Interpolating Spatial Distributions — discusses the
basic concepts underlying spatial interpolation
Interpreting Interpolation Results — describes the use
of “residual analysis” for evaluating spatial interpolation performance
Characterizing Data Groups — describes the use of
“data distance” to derive similarity among the data patterns in a set
of map layers
Identifying Data Zones — describes the use of “levelslicing” for classifying locations with a specified data pattern (data
zones)
Mapping Data Clusters — describes the use of “clustering”
to identify inherent groupings of similar data patterns
Mapping the Future — describes the use of “linear
regression” to develop prediction equations relating dependent and
independent map variables
Mapping Potential Sales — describes an extensive geobusiness application that combines retail competition analysis and
product sales prediction
Density Surface Analysis
Customer
Street
Address
Customer
GIS
Location
Geo-Coding
Density
Surface
Totals
Customer
Counts
(# per cell)
Vector to Raster
Roving Window
Classified
Density
Levels
Classify
Calculates the total number of customers within
Counts the number of
customers (points)
within in each grid cell
a roving window– customer
density
91
3D surface plot
2D grid display of customer counts
2D perspective display
of density contours
Density Map
Identifying Pockets of High Density
Unusually High = Mean + 1 Standard Deviation
Customer
Density
(Non-spatial Statistics)
Customer
Density
(Map Surface)
Grid-based Analysis Frame (Keystone Concept)
Raster (cell)
Analysis Frame
Latitude, Longitude, C, R
Vector (point)
…GeoCoding plots customers
address on the streets map
…appends Lat, Lon,
Column, Row location
to customer records
Customer
Database
Customer
Database
(non-spatial)
(spatial)
…V to R Conversion
plots customers
location in the
analysis frame (grid)
Surface Modeling (Spatial Interpolation)
…“maps the variance” by using geographic position to help explain
the differences in the sample values.
Surface Map
Avg = 42.9
66.3
Point Samples
66.3
“Spikes”
“Spikes ‘n Blanket”
IDW Interpolation (Inverse Distanced Weighted)
#14
#15
#16
x
#11
1) Identify data
points in window—
#11value = 56.9
#14value = 22.5
#15value = 52.3
#16value = 66.3
Sampled Data
1
2
3
4
5
6
7
8
9
10
11
12
4) Assign
weight-averaged
value— 53.35
3) Weight-average values in the
window based on distance to grid
location— (1/Distance)2 * Value
“closer has more influence”
2) Calculate distance
13
14
15
16
X
from location to data
points— Pythagorean Theorem
#11distance = 22.80
#14distance = 26.08
#15distance = 6.32
#16distance = 14.14
#11
#16
#14
#15
5) Move window to next
grid location and repeat
Average vs. IDW Interpolated Surface
Difference Surface
(IDW – Average)
Average
Min = -26.1
Max = 29.5
IDW Surface
IDW - Average
Reds
Avg>IDW
Greens
Avg<IDW
IDW vs. Krig Interpolated Surfaces
Difference Surface
(IDW – Krig)
Krig Surface
Min = -14.8
Max = 5.0
IDW Surface
IDW - Krig
Reds
Krig>IDW
Greens
Krig<IDW
Assessing Relationships Among Maps
Housing Density
(Units/ac)
South has
Lower Density
Home Value
($K)
South has
Higher Values
Home Age
(Years)
South has
Newer Homes
Geographic Space  Data Space
Point #2
Geographic Space – relative spatial position
of measurements
Point #1
Density
Data Similarity is inversely proportional to Data Distance
…as data distance increases, the
map values for two locations
are less similar
Value
Comparison Point #1
Age
D= Low (2.4 units/ac)
V= High ($407,000)
A= Low (18.3 years)
Least Similar Point #2
D= High (4.8 units/ac)
V= Low ($190,000)
A= High (51.2 years)
Data Space – relative numerical
magnitude of measurements
Assessing Map Similarity
“Data Distance” determines similarity among data patterns
Data Space
Least Similar Point = 4.8, 190, 51.2
Percent
Similar
…the farthest away point in data
space (least similar) is set 0 and the
comparison point is set to 100 —
…all other Data Distances are
scaled in terms of their relative
similarity as “percent similar” to
the comparison point (0 to 100)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Comparison Point = 2.4, 407, 18.3
Geographic Space
Identifying Data Patterns of Interest
Housing Density
Unusually High
Mean = 3.56
+StDev = 0.80
LevelMin = 4.36
Geographic Space
Data Space
67.2 = -StDev
189.8 = LevelMax
257.0 = Mean
Home Value
Geographic Space
Unusually
Low
Level-Slicing Classifier (two variables)
Unusually High
Housing Density
Unusually Low
Home Value
Unusually High Density
and Low Value
Data Space
Geographic Space
Level-Slicing Classifier (three variables)
Data Space
…identifies combinations
of selected
measurements
(high D, low V, high A)
1+2+4=7
(high D, low V
but not high A)
1+2+0=3
…common “data zones” can be
mapped by identifying specific
levels of each mapped variable
then adding the binary maps
Geographic Space
…locates combinations of
selected measurements
(high D, low V, high A)
Spatial Data Clustering
Data Space
…plots and identifies groups of similar data values
Relatively high D, low V and high A
Relatively low D, high V and low A
Three
Clusters
Four
Clusters
…“data clusters” are identified as
groups of neighboring data points
in Data Space, and then mapped as
corresponding grid cells
in Geographic Space
Geographic Space
…maps common data patterns (clusters)
Two
Clusters
Spatial Regression (prediction equation)
…relationship between Loan Concentration and
independent variables housing Density, Value and Age
Loan
Concentration
High
Loan Concentration
vs. Housing Density
Housing
Density
V
Y = 26 -5.7 * Xdensity [R2 = 40%]
Loan Concentration
vs. Home Value
Low
Home
Value
V
Y = -13 +0.074 * Xvalue [R2 = 46%]
High
Loan Concentration
vs. Home Age
V
Home
Age
Y = 17 - 0.074 * Xage [R2 = 23%]
Low
Competition Analysis (Spatial Analysis Steps)
Step 1
Build travel time maps for entire market area
•
Compute travel time from every location to our store
•
This requires grid-based map analysis software
•
Update customer record with travel time to our store
•
Add this to every non-customer record in trading area
Step 2
Repeat for every competitor
•
Update every customer record with travel time to
competitor store
•
Add to every non-customer record in trading area
Step 3
Compute Travel Time Gain for travel to main store
•
Every customer and non-customer record is updated
•
The greater gain indicates lower travel effort to visit
our store
Predictive Modeling (Spatial Statistics Steps)
Step 4
Build analytic dataset from customer data
•
Geocoding information
•
Transactions, sales, product category purchases
•
Visitation frequency, recency, spend
•
Customer Segment, travel times, demographics
Step 5
Build predictive models
•
Probability of Visitation (not possible for this demo)
•
Probability of Purchase by Product Category
•
Expected Sales and Transactions
•
Use store travel time and all competitive differences
Step 6
Map the scores
•
The distribution of the scores provide visual evidence
of the effects of travel time and competitive pressure
•
Spatial hypotheses can be tested and evaluated
Map Analysis Framework
While discrete sets of points, lines and
polygons have served our mapping
demands for over 8,000 years and keep
us from getting lost…
Mapping and Geo-query
…the expression of
mapped data as
continuous spatial
distributions (surfaces)
provides a new foothold
for the contextual and
numerical analysis of
mapped data—
“Thinking with Maps”
References
Paper available online at
www.innovativegis.com/basis/present/GeoTec08/
Twisting the Perspective of Map Surfaces —
describes the character of spatial distributions through the
generation of a customer density surface
Linking Numeric and Geographic Distributions —
investigates the link between numeric and geographic distributions
of mapped data
Interpolating Spatial Distributions — discusses the
basic concepts underlying spatial interpolation
Interpreting Interpolation Results — describes the use
of “residual analysis” for evaluating spatial interpolation performance
Characterizing Data Groups — describes the use of
“data distance” to derive similarity among the data patterns in a set
of map layers
Identifying Data Zones — describes the use of “levelslicing” for classifying locations with a specified data pattern (data
zones)
Mapping Data Clusters — describes the use of “clustering”
to identify inherent groupings of similar data patterns
Mapping the Future — describes the use of “linear
regression” to develop prediction equations relating dependent and
independent map variables
Mapping Potential Sales — describes an extensive geobusiness application that combines retail competition analysis and
product sales prediction
www.innovativegis.com/basis/present/GeoTec08/
…to download this PowerPoint slide set
Spatial Data Mining in Geo-Business
Weighted Average Calculations
for Inverse Distance Weighting (IDW) Spatial Interpolation Technique
Evaluating Interpolation Performance
…Residual
Analysis
is used to evaluate
interpolation performance
(Krig at .03 Normalized Error is best)
Average
IDW
Krig