vis - CSE User Home Pages

Download Report

Transcript vis - CSE User Home Pages

Spatial DBMS and Intelligent
Transportation System
Shashi Shekhar
Intelligent Transportation Institute
and Computer Science Department
University of Minnesota
[email protected]
(612) 624-8307
http://www.cs.umn.edu/~shekhar
http://www.cs.umn.edu/research/shashi-group/
Biography Highlights










7/01-now : Professor, Dept. of CS, U. of MN
12/89-6/01 : Asst./Asso. Prof. of CS, U of MN
Ph.D. (CS), M.B.A., U of California, Berkeley (1989)
Member: CTS(since 1990),Army Center, CURA
Author: “A Tour of Spatial Database” (Prentice Hall,
2002) and 100+ papers in Journals, Conferences
Editor: Geo-Information(2002-onwards), IEEE
Transactions on Knowledge and Data Eng.(96-00)
Program chair: ACM Intl Conf. on GIS (1996)
Tech. Advisor: UNDP(1997-98), ESRI(1995), MNDOT
GuideStar(1993-95 on Genesis Travlink)
Grants: FHWA, MNDOT, NASA, ARMY, NSF, ...
Supervised 7+ Ph.D Thesis (placed at Oracle, IBM TJ
Watson Research Center etc.), 30+ MS. Thesis
Research Interests





Knowledge and Data Engineering
Spatial Database Management
Spatial Data Mining(SDM) and Visualization
Geographic Information System
Application Domains : Transportation,
Climatology, Defence Computations
Spatial Data Mining, SDBMS

Historical Examples



London Cholera (1854)
Dental health in Colorado
Current Examples





Environmental justice
Crime mapping - hot spots (NIJ)
Cancer clusters (CDC)
Habitat location prediction (Ecology)
Site selection, assest tracking, spatial outliers
ITS Database Systems
Highway
Based
Sensor
Drivers
ITS
Database
Home, office
Shopping mall
Information
center, PCS
Traffic Reports
Road Maps
City Maps
Construction Schedule
Business Directory
Transportation
Planners,
Policy Maker
SDBMS & SDM in ITS

Operational






Tactical





Routing, Guidance, Navigation for travelers and Commuters
Asset tracking in APTS, CVO for security, and customer service
Emergency services
Ramp meter control (freeway operation)
Incident management
Event planning (maintenance, sports connection)
Infrastructure security - patrol routes
Snow cleaning routes and schedules
Impact analysis (e.g. Mall of America)
Strategic




Travel demand forecasting for capacity planning
Public transportation route selection
Policy decision(e.g. HOV lanes, ramp meter study)
Research: Driving Simulation and Safety
SDBMS and SDM in ITS

Transportation Manager



Traffic Engineering





Where are the congestion (in time and space)?
Which of these recurrent congestion?
Which loop detection are not working properly?
How congestion start and spread?
Traveler, Commuter




How the freeway system performed yesterday?
Which locations are worst performers?
What is the travel time on a route?
Will I make to destination in time for a meeting?
Where are the incident and events?
Planner and Research


How much can information technique to reduce congestion?
What is an appropriate ramp meter strategy given specific evolution of
congestion phenomenon?
Transportation Projects






Traffic Database System
Traffic Data Visualization
Spatial Outlier Detection
Roadmap storage and Routing Algorithms
Road Map Accuracy Assessment
Other:


Driving Simulation
In-vehicle headup display evaluation
Project:
Traffic Database System



Sponsor and time-period: MNDOT, 1998-1999
Students: Xinhong Tan, Anuradha Thota
Contributions to Transportation Domain
Reduce response of queries from hours to minutes
Performance tuning (table design, index selection)


Contributions to Computer Science
GUI design for extracting relevant summaries
 Evaluate technologies with large dataset

Map of Station in Mpls
Gui Design

http://www.cs.umn.edu/research/shashi-group/TMC/html/gui.html
Flow of Data From TMC
TMC
Server
FTP link
Storage at University of
Minnesota
Binary
Data made
available for
researchers
ASCII
FTP
link
FTP
link
PC
FTP link
Convert binary to
5min data
Conversion
programs
Existing Table
Fivemin
Detector
ReadDate
Time
Dayofweek
Volume
Occupancy
Validity
Speed
Table Designs
Current
Fivemin
Proposed-1
Fivemn_day
Proposed-2
Detector ReadDate
Time
Volume
occupancy
ReadDate
Detector
Vol_Occ_Validty
Five_min
Detector
Time_id
Volume
DateTime
Time_id
ReadDate
MN/Dot
Five_min
Detector ReadDate
Hour Day_week
time
Binary
Five_min
Detector ReadDate
Time
occupancy
occupancy
validity
speed
Day_week
validity
Time
Volume
Vol_5_ Occl_5_ Validity_5 15mn 1hr
min
min
_ min
validity
Benchmark Queries
1. Get 5-min Volume, occupancy for detector ID = 10 on
Oct. 1st, 1997 from 7am to 8am
2. Get 5-min volume, Occupancy for detector ‘5’ on Aug1
1997.
3. Get 5-min volume, Occupancy for detector ‘5’ on Aug1
1997 from 6.30am to 7.30am.
4. Get average 5-min volume, occupancy, for Monday in
Aug1997 between 8.00 - 8.05,8.05-8.10 …… 9.00
5. Get maximum volume, Occupancy for detector ‘5’ on
Aug1 1997 from 6am to 7am
6. Get the average of AM rushhour hourly volume for a set
of stations on highway I35W-NB with milepoint between
0.0 and 4.0 from Oct. 1st, 1997 to Oct. 5th , 1997
Conclusion
Examples of the Query

Example1:

Query description:


Get 5-min Volume, occupancy for detector ID = 10 on Oct. 1st,
1997 from 7am to 8am
SQL statement:






SELECT readdate, time, xtan.fivemin.detector, occupancy, volume
FROM xtan.fivemin, xtan.datetime
WHERE ReadDate = to_date('01-OCT-97', 'DD-MON-YYYY')
AND time BETWEEN '0705' AND '0800'
AND xtan.fivemin.Detector = '10'
AND xtan.fivemin.
Examples of the Query

Query result 1:
Examples of the Query

Example2:

Query description:


Get the average of AM rushhour hourly volume for a set of stations
on highway I35W-NB with milepoint between 0.0 and 4.0 from Oct.
1st, 1997 to Oct. 5th , 1997
SQL statement:
 SELECT hour, xtan.v_stat_hour.station, avg(volume)
 FROM tan.v_stat_hour, xtan.statrdwy
 WHERE ReadDate BETWEEN to_date('01-OCT-97','DD-MONYYYY') AND to_date('05-OCT-97','DD-MON-YYYY')
 AND hour BETWEEN '06' AND '09'
 AND statrdwy.route = 'I35W-I'
 AND statrdwy.mp >= 0.0
 AND statrdwy.mp <= 4.0
 AND xtan.v_stat_hour.station = statrdwy.station
 GROUP BY xtan.v_stat_hour.station, hour
Examples of the Query

Query result 2:
Conclusions

MN/Dot model and Proposed-II(Normalized) are the two
recommended models for the final structure
Proposed-II
Conversion
effort
MN/Dot
Little modification on
existing loading process
Needs new loading
program
Future
Compatibility
Same format remains
Number of
columns increases
Derived data
Effort needed for
derived data
Query
More flexible
Fifteenmin & hourly
data exist, station data
needs to be derived
Less flexible
Project:
Traffic Data Visualization



Sponsor and time-period: USDOT/ITS Inst., 2000-2001
Students: Alan Liu, CT Lu
Contributions to Transportation Domain



Allow intuitive browsing of loop detector data
Highlight patterns in data for further study
Contributions to Computer Science


Mapcube - Organize visualization using a dimension lattice
Visual data mining, e.g. for clustering
Motivation for Traffic Visualization

Transportation Manager



Traffic Engineering





Where are the congestion (in time and space)?
Which of these recurrent congestion?
Which loop detection are not working properly?
How congestion start and spread?
Traveler, Commuter




How the freeway system performed yesterday?
Which locations are worst performers?
What is the travel time on a route?
Will I make to destination in time for a meeting?
Where are the incident and events?
Planner and Research


How much can information technique to reduce congestion?
What is an appropriate ramp meter strategy given specific evolution of
congestion phenomenon?
Dimensions

Available
•
•
•
•

TTD : Time of Day
TDW : Day of Week
TMY : Month of Year
S : Station, Highway, All Stations
Others
• Scale, Weather, Seasons, Event types, …
Mapcube :
Which Subset of Dimensions ?
TTDTDWTMYS
TTDTDWS
TTDTDW
TDW
S
STTD
TTD
TDW
S
Next Project
Singleton Subset : TTD
Configuration:  X-axis: time of day; Y-axis: Volume
 For station sid 138, sid 139, sid 140, on 1/12/1997
Trends:
 Station sid 139: rush hour all day long
 Station sid 139 is an S-outlier
Singleton Subset: TDW

Configuration:

X axis: Day of week; Y axis: Avg. volume.
For stations 4, 8, 577
Avg. volume for Jan 1997

Trends:
Friday is the busiest day of week
Tuesday is the second busiest day of week
Singleton Subset: S
Configuration:  X-axis: I-35W South; Y-axis: Avg. traffic volume
 Avg. traffic volume for January 1997
Trends?:

High avg. traffic volume from Franklin Ave to Nicollet Ave

Two outliers: 35W/26S(sid 576) and 35W/TH55S(sid 585)
Dimension Pair: TTD-TDW
Configuration:
Trends:



X-axis: time of date; Y-axis: day of Week

f(x,y): Avg. volume over all stations for Jan 1997, except
Jan 1, 1997

Evening rush hour broader than morning rush hour
Rush hour starts early on Friday.
Wednesday - narrower evening rush hour
Dimension Pair: S-TTD
Configuration:



X-axis: Time of Day
Y-axis: Highway
f(x,y): Avg. volume over all stations for
1/15, 1997
Trends:



3-Cluster
• North section:Evening rush hour
• Downtown area: All day rush hour
• South section:Morning rush hour
S-Outliers
• station ranked 9th
• Time: 2:35pm
Missing Data
Dimension Pair: TDW-S
Configuration:
Trends:




X-axis: stations; Y-axis: day of week

f(x,y): Avg. volume over all stations for Jan-Mar 1997
Busiest segment of I-35 SW is b/w Downtown MPLS & I-62
Saturday has more traffic than Sunday
Outliers – highway branch
Post Processing of cluster patterns
Clustering Based Classification:
 Class 1: Stations with Morning Rush Hour
 Class 2: Stations Evening Rush Hour
 Class 3: Stations with Morning + Evening Rush Hour
Triplet: TTDTDWS: Compare Traffic Videos
Configuration: Traffic volume on Jan 9 (Th) and 10 (F), 1997
Trends:
 Evening rush hour starts earlier on Friday
 Congested segments: I-35W (downtown Mpls – I-62);
I-94 (Mpls – St. Paul); I-494 ( intersection I-35W)
Size 4 Subset: TTDTDWTMYS(Album)
Configuration:
Trends:
 Outer: X-axis (month of year); Y-axis (highway)
 Inner: X-axis (time of day); Y-axis (day of week)
 Morning rush hour: I-94 East longer than I-35 W North
 Evening rush hour: I-35W North longer than I-94 East
 Evening rush hour on I-94 East: Jan longer than Feb
Project:
Spatial Outlier Detection



Sponsor and time-period: USDOT/ITS Inst. (2000-2002)
Students: C T Lu, Pusheng Zhang
Contributions to Transportation Domain

Filter/reduce data for manual browsing



Identify days with spatial outliers
Identify sensors with anamolous behaviour
Contributions to Computer Science


Unified definition of spatial outliers using algebraic aggregates
Spatial outlier detection algorithm = scan spatial join
Algorithms for Spatial outlier detection
Spatial outlier
A data point that is extreme relative to
it neighbors
Given
A spatial graph G={V,E}
A neighbor relationship (K neighbors)
An attribute function f: V -> R
Test T for spatial outliers
Find
O = {vi | vi V, vi is a spatial outlier}
Objective
Correctness, Computational efficiency
Constraints
Computation cost dominated by I/O op.
Test T is an algebraic aggregate function
Spatial outlier detection
Example Outlier Detection Test
1. Choice of Spatial Statistic
S(x) = [f(x)–E y N(x)(f(y))]
Theorem: S(x) is normally distributed
if f(x) is normally
distributed
2. Test for Outlier Detection
| (S(x) - s) / s | > 
Hypothesis
I/O cost = f( clustering efficiency )
f(x)
Spatial outlier and its neighbors
S(x)
Spatial outlier detection
Results
1. CCAM achieves higher
clustering efficiency (CE)
2. CCAM has lower I/O cost
3. Higher CE leads to lower
I/O cost
4. Page size improves CE for
all methods
Cell-Tree
CE value
I/O cost
CCAM
Z-order
Project:
Roadmap storage and
Routing Algorithms


Sponsor and time-period: FHWA/MNDOT, 1993-1997
Students: Prof. Du-Ren Liu, Dr. Mark Coyle,


Contributions to Transportation Domain



Andrew Fetterer, Ashim Kohli, Brajesh Goyal
CRR = measure of storage methods for roadmaps
In-vehicle navigation devices, routing servers on web
Contributions to Computer Science


CCAM - Better storage method for roadmaps
Hierarchical routing - optimal routes
even when map-size > memory size

Road Map Storage - Problem Statement



Given roadmaps
Find efficient data-structure to store
roadmap on disk blocks
Goal - Minimize I/O-cost of operations



Find(), Insert(), Delete(), Create()
Get-A-Successor(), Get-Successors()
Constraint

Roadmaps larger than main memories
Mpls map partitioning 1
Another way that we may partition the street network for Minneapolis
among disk blocks for improving performance of network computations.
Mpls map partitioning:CCAM
This is one way that we may partition the street network for Minneapolis
among disk blocks for improving performance of network computations.
Road Map Storage

Insight: I/O cost of network operations is
minimized by maximizing



CRR = Pr. ( road-intersection nodes connected by a
road-segment edge are together in a disk page)
WCRR = weighted CRR (edges have weights)
Commercial database support geometric storage
methods even though CRR is a graph property
Measurements of CRR
Shortest Path Problem

Route computation



Find a rout from current location to destination
Criteria: Shortest travel distance or smallest travel
time
Useful for



Travel during rush hour
Travel in an unfamiliar area
Travel to an unfamiliar destination
Problem definition

Given





Graph G=(N,E,C)
Each edge (u,v) in E has a cost C(u,v)
Path from source to destination is a
sequence of nodes
Cost of path=C(vi-1,vi)
A path cost estimation is a function f(u,v)
that computes estimated cost of an
optional path between the two nodes
Smallest Paths
Blue: Smallest travel
time path between
two points.
It follows a freeway
(I-94) which is faster
but not shorter in
distance.
Red: Shortest
distance path
between the same
two points
Routing around incidents
Algorithm for Single pair Path Computation

Road Map Size<<Main Memory Size



Iterative Algorithm
Dijkstra’s Algorithm
A* algorithm



A* with euclidean distance heuristic
A* with manhattan distance heuristic
Road Map Size >> Main Memory Size


Traditional algorithm run into difficulties!
Hierarchical Algorithm
Motivation for Hierarchical Algorithms

Road Map Size >> Main Memory Size



Traditional algorithms yield sub-optimal path
Heuristics - bounding box (source,
destination) or Freeway first then sideroads
Example: Microsoft Expedia


route(Tampa FL to Miami, FL via Canada)
Need an algorithm to give optimal route


A piece of roadmap in memory at a time
Intuition - travelling from island to island
Hierarchical Routing : Step 1

Step 1: Choose Boundary Node Pair


Minimize COST(S,Ba)+COST(Ba,Bd)+COST(Bd,D)
Determining Cost May Be Non-Travial
Hierarchical Routing : Step 2

Step 2: Examine Alternative Boundary Paths

Between Chosen Pair (Ba,Bd)
Hierarchical Routing : Step 2 result

Step 2 Result: Shortest Boundary Path
Hierarchical Routing : Step 3

Step 3: Expand Boundary Path: (Ba1,Bd) -> Ba1 Bda2 Ba3 Bda4…Bd

Boundary Edge (Bij,Bj) ->fragment path (Bi1,N1N2N3…….Nk,Bj)
Project:
Road Map Accuracy
Assessment
Sponsor and time-period: 10+ State DOTs, 2001-2003
 Co-investigators: Prof. Max Donath, Dr. Pi-Ming Chen
 Students: Weili Wu, Hui Xiong, Zhihong Yao
Contributions to Transportation Domain




Defining map accuracy for navigable roadmaps
Site selection for evaluating GPS and roadmap accuracy
Contributions to Computer Science


Definition of Co-location patterns with linear features
Efficient algorithms for finding those
Motivation: Identify road given GPS

GPS accuracy and roadmap accuracy


Garmin error circle
USA topo
Road Map Accuracy

Evaluation of digital road map databases


Goals



road user charge system needs: accuracy, coverage
Recommend a cost-effective approach
Develop the content and quality requirements
Rationale

Each GIS dataset can contain various errors



From different sources
E.g. Map Scale, Area Cover, Density of Observations
Failure to control and manage error

Limit or invalidate GIS applications
Map analysis questions


Site Selection:
 Which road segments are vulnerable for mis-classification given
GPS accuracy?
Feasibility Issue:
 What fraction of highway miles are vulnerable?
Problem Definition

Given:
A digital roadmap and a Gold standard

Find:
Spatial Accuracy of the given GIS dataset

Objective:
Fair, reliable

Constrains:
Gold-standard accuracy is better than GIS
dataset accuracy
Framework to test positional accuracy

Compare with a reference of higher accuracy source




Use internal evidence



find a larger scale map
use the Global Positioning System (GPS)
use raw survey data
Indications of inaccuracy:

Unclosed polygons, lines which overshoot or undershoot junctions
A measure of positional accuracy:

The sizes of gaps, overshoots and undershoots
Compute accuracy from knowledge of the errors




By different sources, e.g
1 mm in source document
0.5 mm in map registration for digitizing
0.2 mm in digitizing
Approach 1 : Visual

Overlay of GPS Tracks Vs. Road Maps
Tiger-based Map
USGS Digital Map
2: National Map Accuracy Standard

Pr. [ distance( P on map, real P) < D ] > 0.9

Tiger file in Windham County, VT (50025)
Limitations of Related Work,
Our Approach

Natl. map accuracy standard




Based on land survey of a sample of points
Not aware of GPS accuracy
Mixes lateral error and longitudal error
Our Approach



Lateral vs. longitudal positional accuracy
Road classification accuracy
Attribute accuracy
Positional Accuracy


Lateral accuracy
 Perpendicular (RMS) distance from GPS reading to center line of
road in road map.
Longitudinal accuracy
 Definition: horizontal distance from GPS reading to
corresponding Geodetic point.
Comment: Lateral error is more important when closest road is paralled
Longitudinal error is important for other case
Road Classification Accuracy


Probability of correctly classifying road for a given GPS
Fraction of miles of roads correctly classified

at given confidence level (e.g. 90%)
Attribute Accuracy & Completeness

Interesting Attributes:





Definition of Attribute Accuracy:


Pr[Value of an attribute for given road segment is correct]
Definition of Completeness:



Economic attributes - administration zone(s), congestion zones
Route attribute - name, type, time restrictions
Route segment - direction, type (e.g. bridge), restrictions
Routing attributes - intersections, turn restrictions
Pr[a road’s segment is in digital map]
Pr[attribute value is not defined for a road segment]
Scope:

Small sample
Core Activities
1.
2.
3.
4.
5.
6.
7.
Acquire digital road maps
Select test sites
Gather gold standard data for test site
 GPS tracks, Surveys, etc.
Complete subsets of road maps for test sites
Compute accuracy measures
Statistical analysis
Visualization
Map Acquisition

Etak/Tele Atlas map data for 7 counties of
metropolitan Twincities
Site Selection


Red : another road within digen distance threshold (e.g. 30m)
Blue: no other road withindistance threshold
Site Selection - Zoom in

Around Hwy 100, 169,7 in SW metro
Comparing GPS tracks and maps

Overlay of GPS tracks and digital road map (Hwy 7)
Comparing GPS tracks and maps

Overlay of GPS tracks and digital road map (Hwy 7)
Other Challenges
1.
2.
3.
Center-line representation of roads
Two-dimensional maps
 Multi-level roads
 Altitude issues
Map matching
Conclusions

Spatial databases, data mining and visualization



Are useful for many ITS problems
We have only scratched the surface so far
Many new exciting opportunities



ATMS : visualize freeway operations for operations,
and planning, communicate impact of policies on
freeway operations to public and lawmakers, new
insights into congestion patterns,
APTS : track buses for customer service, sercurity;
communicate impact of APTS in reducing congestion.
ATIS : understand traffic behaviour for route and
transportation mode selection
Motivation for Traffic Visualization

Transportation Manager



Traffic Engineering





Where are the congestion (in time and space)?
Which of these recurrent congestion?
Which loop detection are not working properly?
How congestion start and spread?
Traveler, Commuter




How the freeway system performed yesterday?
Which locations are worst performers?
What is the travel time on a route?
Will I make to destination in time for a meeting?
Where are the incident and events?
Planner and Research


How much can information technique to reduce congestion?
What is an appropriate ramp meter strategy given specific evolution of
congestion phenomenon?