15minuteOverview

Download Report

Transcript 15minuteOverview

Spatial Database
&
Spatial Data Mining
Shashi Shekhar
Dept. of Computer Sc. and Eng.
University of Minnesota
[email protected],
www.cs.umn.edu/~shekhar
www.spatial.cs.umn.edu
Spatial Data
• Location-based Services
– E.g.: MapPoint, MapQuest, Yahoo/Google Maps, …
Courtesy: Microsoft Live Search (http://maps.live.com)
Spatial Data
• In-car Navigation Device
Emerson In-Car Navigation System (Courtesy: Amazon.com)
Book
http://www.spatial.cs.umn.edu
Outline
• Spatial Databases
– Conceptual Modeling
• Pictograms enhanced Entity Relationship Model
– Logical Data Model
• Direction predicates and queries
– Physical Data Model
• Query Processing – Shortest Paths, Evacuation Routes,
– Correlated time-series
• Storage – Connectivity Clustered Access Method
• Spatial Data Mining
–
–
–
–
Location Prediction – fast algorithms
Co-location patterns – definition, algorithms
Spatial outliers – algorithms
Hot-spots – new work on “mean streets”
Geo-Spatial Databases: Management and Mining
1. Recent book from our group!
Nest locations
Vegetation durability
2. Parallelize Range Queries
3. Shortest Path Queries
4. Storing roadmaps in disk blocks
Distance to open water
Water depth
5. Location prediction to characterize nesting grounds.
6. Spatial outlier detect bad sensor (#9) on Highway I-35
Spatial Data Mining (SDM)
• The process of discovering
– interesting, useful, non-trivial patterns
• patterns: non-specialist
• exception to patterns: specialist
– from large spatial datasets
• Spatial pattern families
–
–
–
–
Spatial outlier, discontinuities
Location prediction models
Spatial clusters
Co-location patterns
Spatial Data Mining - Example
Nest locations
Vegetation durability
Distance to open water
Water depth
Spatial Autocorrelation (SA)
• First Law of Geography
– “All things are related, but nearby things are more
related than distant things. [Tobler, 1970]”
Pixel property with independent identical
distribution
Vegetation Durability with SA
Implication of Auto-correlation
Name
Model
Classical Linear Regression
Spatial Auto-Regression
Classification
Accuracy
y  xβ  ε
y  ρWy  xβ  ε
Low
High
 : the spatial auto - regression (auto - correlation) parameter
W : n - by - n neighborho od matrix over spatial framework
Computational Challenge:
Computing determinant of a very large matrix
in the Maximum Likelihood Function:
n ln(2 ) n ln( 2 )
ln(L)  ln I  W 

 SSE
2
2
Outline
• Spatial Databases
– Conceptual Modeling
• Pictograms enhanced Entity Relationship Model
– Logical Data Model
• Direction predicates and queries
– Physical Data Model
• Query Processing – Shortest Paths, Evacuation Routes,
– Correlated time-series
• Storage – Connectivity Clustered Access Method
• Spatial Data Mining
–
–
–
–
Location Prediction – fast algorithms
Co-location patterns – definition, algorithms
Spatial outliers – algorithms
Hot-spots – new work on “mean streets”
Spatio-temporal Query Processing
• Teleconnection
– Find (land location, ocean location) pairs with
correlated climate changes
• Ex. El Nino affects climate at many land
locations
Average Monthly Temperature
(Courtsey: NASA, Prof. V. Kumar)
Global Influence of El Nino during
the Northern Hemisphere Winter
(D: Dry, W: Warm, R: Rainfall)
Auto-correlation saves computation cost
• Challenge
– high dimensional (e.g., 600) feature space
– 67k land locations and 100k ocean locations (degree by degree
grid)
– 50-year monthly data
• Computational Efficiency
– Spatial autocorrelation
• Reduce Computational Complexity
– Spatial indexing to organize locations
• Top-down tree traversal is a strong filter
• Spatial join query: filter-and-refine
– save 40% to 98% computational cost at θ = 0.3 to 0.9
Evacuation Route Planning - Motivation

No coordination among local plans means
 Traffic congestions on all highways


Florida, Lousiana
(Andrew, 1992)
Houston
(Rita, 2005)
e.g. 60 mile congestion in Texas (2005)
Great confusions and chaos
( National Weather Services)
"We packed up Morgan City residents to
evacuate in the a.m. on the day that
Andrew hit coastal Louisiana, but in early
afternoon the majority came back home.
The traffic was so bad that they
couldn't get through Lafayette."
Mayor Tim Mott, Morgan City, Louisiana
( http://i49south.com/hurricane.htm )
( www.washingtonpost.com)
( National Weather Services)
I-45 out of Houston
( FEMA.gov)
A Real Scenario
Nuclear Power Plants in Minnesota
Twin Cities
Monticello Emergency Planning Zone
Emergency Planning Zone (EPZ) is a 10-mile radius
around the plant divided into sub areas.
Monticello EPZ
Subarea Population
2
5N
5E
5S
5W
10N
10E
10SE
10S
10SW
10W
10NW
Total
4,675
3,994
9,645
6,749
2,236
391
1,785
1,390
4,616
3,408
2,354
707
41,950
Estimate EPZ evacuation time:
Summer/Winter (good weather):
3 hours, 30 minutes
Winter (adverse weather):
5 hours, 40 minutes
Data source: Minnesota DPS & DHS
Web site: http://www.dps.state.mn.us
http://www.dhs.state.mn.us
A Real World Testcase
Experiment Result
Total evacuation time:
- Existing Plan: 268 min.
- New Plan: 162 min.
Monticello Power Plant
Source cities
Destination
Routes used only by old plan
Routes used only by result plan of
capacity constrained routing
Routes used by both plans
Congestion is likely in old plan near evacuation
destination due to capacity constraints. Our plan
has richer routes near destination to reduce
congestion and total evacuation time.
Twin Cities
Outline
• Spatial Databases
– Conceptual Modeling
• Pictograms enhanced Entity Relationship Model
– Logical Data Model
• Direction predicates and queries
– Physical Data Model
• Query Processing – Shortest Paths, Evacuation Routes,
– Correlated time-series
• Storage – Connectivity Clustered Access Method
• Spatial Data Mining
–
–
–
–
Location Prediction – fast algorithms
Co-location patterns – definition, algorithms
Spatial outliers – algorithms
Hot-spots – new work on “mean streets”
Resource Description Framework (RDF)
Physical model
 Representation
 Directed Acyclic Graph, TAGs
 Storage method
 Connectivity-Clustered Access Method (CCAM)
 Frequent Operations
 Breadth First Search
 Path Computation
Semantics in Databases
• Ontology
- Shared Conceptualization of knowledge in a specific domain.
• Resource Description Framework (RDF)
- Representation of resource information in World Wide Web.
• Patterns
Ontology based Semantic Computing
 Example Query
SELECT * FROM travelmode
WHERE ONT_RELATED (transport,
Transport
‘IS_A’,
‘Road’,
‘Transport_Ontology’,
123) = 1;
Walk
Road
Commuter Rail
Drive
Result: All walk and drive modes.
 Applications
Homeland Security, Life Sciences, Web Services
Bus
…
Resource Description Framework (RDF)
Multimodal Transportation System
Commonwealth Ave. and Subway (Green Line), Boston (between BU Central and Blandford St)
[source: http://maps.google.com/]
N1
N2
N3
N4
N5
Road Intersections
R1
R2
R3
Subway Stations
Transition Edge
Graph Representation
Resource Description Framework (RDF)
Multimodal Transportation System
: RailRoute
: RailRoute
: bus
crosscuts used_by
parallel
: Street
has
halts
: Trains
: busStops
: Stations
serves
Start/end
start/end
:busTerminals
: TrafficLight
: TrafficLight
has
parallel : Streets
: Rail_line
used_by
crosscuts
: Terminals
Light Rail System
: Streets
Road System
Transit Edges(*)
*Note: A subset of possible transition edges is shown.
Find all routes from the Commonwealth Avenue to the Logan Airport using bus and subway systems.
SELECT S.street, S.busStop, R.Stations, R.RailRoute,R.Terminal
FROM TABLE(SDO_RDF_MATCH(
‘(?x : halts ?b)
‘(?rr :serves ? z),
‘(?rr :start/end ?tr),
SDO_RDF_Models(‘rail_line R’,’street S’)),
WHERE S.b hasTransitTo R.z and S.Street = ‘Commonwealth’
and R.terminal = ‘Logan airport’;
: Streets
Geo-Spatial Databases: Management and Mining
1. Recent book from our group!
Nest locations
Vegetation durability
2. Parallelize Range Queries
3. Shortest Path Queries
4. Storing roadmaps in disk blocks
Distance to open water
Water depth
5. Location prediction to characterize nesting grounds.
6. Spatial outlier detect bad sensor (#9) on Highway I-35