Slides - Microsoft

Download Report

Transcript Slides - Microsoft

Trajectory Data Mining
Dr. Yu Zheng
Lead Researcher, Microsoft Research
Chair Professor at Shanghai Jiao Tong University
Editor-in-Chief of ACM Trans. Intelligent Systems and Technology
http://research.microsoft.com/en-us/people/yuzheng/
Paradigm of Trajectory Data Mining
Uncertainty
Privacy
Preserving
Reducing
Uncertainty
Traj. Pattern Mining
Moving
Freq. Seq.
Together
Patterns
Patterns
Periodic
Clustering
Patterns
Trajectory Indexing and Retrieval
Distance of
Query Historical
Trajectory
Trajectories
Trajectory
Classification
Trajectory
Outlier/Anomaly
Detection
Managing Recent
Trajectories
Trajectory Preprocessing Map-Matching
Stay Point Detection
Noise Filtering
Graph
Mining
Routing
Matrix
Analysis
TD
Compression
MF
Segmentation
CF
Matrix
Spatial
Trajectories
Spatial
Trajectories
Spatial
Trajectories
Tensor
Graph
Yu Zheng. Trajectory Data Mining: An Overview. ACM Transactions on Intelligent Systems and Technology. 2015, vol. 6, issue 3.
Anomaly Detection from Trajectories
• Detect outlier trajectories
– Taxi drivers’ malicious detour
– Unexpected road change (caused by accident and construction)
• Detect anomalous events based on trajectories
– Traffic accidents
– Disasters
– …….
regions
Detect Anomalies Based on Trajectories
• Categorized by the form of locations
– Link-based: Between regions or on a road network
– Region-based: in a region/grid or a set of regions/grids
• Categorized by methods
– Distance-based outlier detection methods
– Probabilistic distribution-based methods
Link-based
Region-based
a
b
b
4
a
3
5
c
2
6
c
r5 r6
r4
r3
(c)Example of traffic among regions
(d) A graph of regions
r1
r2
A) Taxi flow
B) Social me
Traffic Anomalies Between Regions
Map Segmentation.
Segmentation of
Urban Areas Using
Road Networks. MSRTR-2012-65. 2012.
c
(a) Road network of Beijing
(b) Partitioned regions
a
b
b
4
(b) Partitioned regions
(c)Example of traffic among regions
2
a
3
5
c
(c)Example of traffic am
6
c
(d) A graph of regions
Wei Liu, Yu Zheng, et al. Discovering Spatio-Temporal Causal Interactions in Traffic Data Streams. KDD 2011.
a
b
4
b
2
3
5
6
c
mong regions
a
For each link
(d) A graph of regions
For each property
Using Euclidian distance
Using Mahalanobis distance
Check spatial neighborhoods
Find the minimum distort
Check temporal neighborhoods
Wei Liu, Yu Zheng, et al. Discovering Spatio-Temporal Causal Interactions in Traffic Data Streams. KDD 2011.
Traffic Anomalies Between Regions
• Associate individual anomalies based on temporal adjacency
Wei Liu, Yu Zheng, et al. Discovering Spatio-Temporal Causal Interactions in Traffic Data Streams. KDD 2011.
Diagnosing the Road Traffic Anomalies
40
20
18:00
16:30
15:00
13:30
r1
12:00
0
6:00
L1
Yuyuantan park
Houhai Scenic
spot
Time of day
B) Traffic flows on L1
15, 21.4%
15, 15.8%
30
0
A) Visualization of the anomaly and the root cause paths
18:00
39, 61%
5
16:30
r5
Beijing
10
6:00
Beijing West railway
Station
15
15:00
r4
20
13:30
L3
Anomaly
12:00
The Forbidden
City
25
10:30
L2
r2
9:00
10, 14.3%
7:30
r3
Number of routes passing L2
The 4th ring road
The 3rd ring road
Anomaly
10:30
Anomalous links
Root cause flows
60
9:00
Destinations
7:30
Sources
Number of routes passing L1
April 2, 2011 9am-11am
Time of day
C) Traffic flows on L2
Sanjay Chawla, Yu Zheng, and Jiafeng Hu. Inferring the root cause in road traffic anomalies, ICDM 2012.
Diagnosing the Road Traffic Anomalies
• Link-traffic matrix 𝐿:
•
•
a row is a link and a column corresponds to a time interval
An entry of 𝐿 denotes the number of vehicles traversing a particular link at a time interval
• Link-path matrix 𝐴
•
•
a row standing for a link and column denoting a path.
An entry of 𝐴 is set to 1 if a particular link is contained in a particular path
p1
r1
p2
p3
r2
p5
p4
r1
r3 p6
l3
r4
A) From trajectories to path
l1
l4
r2
l5
t1 t2 t3 t4
r3
l2
r4
l1
l2
l3
l4
l5
1 0 0 0
1 1 1 0
0 0 1 0
0 1 1 0
0 0 0 1
B) Links C) Link traffic matrix
p1 p2 p3 p4 p5 p6
l1
l2
l3
l4
l5
1 0 0 0 0 0
1 1 1 0 0 1
0 0 1 0 1 0
0 1 1 0 0 0
0 0 0 1 1 0
D) Link-path matrix
Sanjay Chawla, Yu Zheng, and Jiafeng Hu. Inferring the root cause in road traffic anomalies, ICDM 2012.
Diagnosing the Road Traffic Anomalies
• PCA-based anomaly detection
• Let 𝐿 = 𝐿 − 𝜇, where 𝜇 is the column sample mean
• Form 𝐶 = 𝐿𝑇 𝐿, t×t matrix, t is the number of time intervals
• Compute the eigen-decomposition of C,
• eigenvalue-eigenvector pairs (𝜆𝑖 , 𝑣𝑖 )
• 𝐶𝑣𝑖 = 𝜆𝑖 𝑣𝑖
• Order the pairs (𝜆𝑖 , 𝑣𝑖 ) in decreasing order of eigenvalues 𝜆𝑖
• Find subspaces
•
•
Let 𝑃𝑛 be the subspace [𝑣1 , . . . , 𝑣𝑟 ] of 𝑅 𝑡 spanned by the first r eigenvectors
Similarly 𝑃𝑎 be the subspace spanned by [𝑣𝑟+1 , . . . , 𝑣𝑡 ]
• Project all data points onto 𝑃𝑎 : 𝑥 → 𝑥𝑎
• Select all points which | 𝑥 − 𝑥𝑎 | > 𝜃
𝑇 = 𝐴𝑉 = 𝑈∑𝑉 𝑇 𝑉 = 𝑈∑
Diagnosing the Road Traffic Anomalies
𝑏 = (0,1,0,1,0)𝑇
Using L1 norm and linear programing
CVX tool in Matlab
L1 norm is more meaningful than L2
l1
p1
r1
p2
p3
r2
p5
p4
r1
r3 p6
l3 l4
r4
A) From trajectories to path
r2
l5
t1 t2 t3 t4
r3
l2
r4
l1
l2
l3
l4
l5
1 0 0 0
1 1 1 0
0 0 1 0
0 1 1 0
0 0 0 1
B) Links C) Link traffic matrix
p1 p2 p3 p4 p5 p6
l1
l2
l3
l4
l5
1 0 0 0 0 0
1 1 1 0 0 1
0 0 1 0 1 0
0 1 1 0 0 0
0 0 0 1 1 0
D) Link-path matrix
Sanjay Chawla, Yu Zheng, and Jiafeng Hu. Inferring the root cause in road traffic anomalies, ICDM 2012.
Crowd Sensing of Traffic Anomalies based
on Human Mobility and Social Media
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
13
Detect Anomalies Based on Routing Patterns
During
regular
times
Anomalous
graph
During
anomalous
event
Increase of
routing
behavior
Decrease of
routing
behavior
14
Understand Anomalies Using Social Media
• Understand the traffic anomalies
– Describe the anomaly using social media
– Impact analysis on travel time delay
Detected anomalous graph
Describing Anomalies with Social Media
Anomalous Events
Event Imp
Stat.
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
System Overview
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
System Overview
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
Routing Behavior Analysis
• Routing Behavior:
– RPOD =< f1 , p1 , f2 , p2 , ... , fn , pn >
– f : traffic flow / p: percentage
– e.g., RPOD =<160, 0.8, 20, 0.1, 20, 0.1>
• Anomaly Detection Problem Definition:
– Given a complete road network, trajectory set in [t0, t1], find graphs
• For each O, at least one D, that the RPOD at time t1 is anomalous compared with
regular RPOD at time [t0, t1):
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
Anomaly Detection
• Our solution:
Index:
– Priority Breadth Graph Expansion
– Verifications of anomalous RP on all OD pairs
e1
Index Update: one edge at a time
System Overview
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
Term Mining
(TH)
(TC)
Bei Pan, Yu Zheng, et al. Crowd Sensing of Traffic Anomalies based on Human Mobility and Social Media. ACM SIGSPATIAL GIS 2013.
Detecting Anomalies Based on
Log Likelihood Ratio Test
L. X. Pang, Sanjay Chawla, Wei Liu, Yu Zheng. On Detection of Emerging Anomalous Traffic Patterns Using GPS Data. Data & Knowledge Engineering, 2013
Yu Zheng, Huichu Zhang, Yong Yu. Detecting Collective Anomalies from Multiple Spatio-Temporal Datasets across Different Domains. ACM SIGSPATIAL 2015
Partition a City into Regions
• Project trajectories of moving objects (e.g. vehicles) into regions
• Count the number of moving objects in each region
• Apply log likelihood ratio test (LRT) to regions
L. X. Pang, Sanjay Chawla, Wei Liu, Yu Zheng. On Detection of Emerging Anomalous Traffic Patterns Using GPS Data. Data & Knowledge Engineering, 2013
Yu Zheng, Huichu Zhang, Yong Yu. Detecting Collective Anomalies from Multiple Spatio-Temporal Datasets across Different Domains. ACM SIGSPATIAL 2015
Apply LRT to a Spatio-Temporal Setting
• LRT
– testing whether a simplifying assumption for a model is valid
– Λ = −2log
𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑓𝑜𝑟 𝑛𝑢𝑙𝑙 𝑚𝑜𝑑𝑒𝑙
𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑓𝑜𝑟 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 𝑚𝑜𝑑𝑒𝑙
– Λ can be approximated by a chi-square distribution χ2 (Λ, 𝑑𝑓)
Region r
An example for a single region and a single dataset
1) 𝐿𝑛𝑢𝑙𝑙 = 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(70|𝑚𝑒𝑎𝑛 = 200, 𝑣𝑎𝑟 = 1300)
2) The maximum likelihood for the alternative model (mean to 70)
𝑝=
PDF
Gaussian
(200,1300)
xt=70
>0
˄=3
B)
12:00-14:00
A)
70
= 0.35
200
2
χ_cdf
𝑚𝑒𝑎𝑛= 200×0.35=70; 𝑣𝑎𝑟 =1300×0.35=455
3) 𝛬 𝑠 = −2 log
𝐿𝑛𝑢𝑙𝑙
𝐿𝑎𝑙𝑡𝑒𝑟
= 14.05
𝑜𝑑 = χ2 _cdf(14.05, 𝑓𝑑 = 1)=0.999
70 200
Region r
Gaussian
(200,1300)
xt=70
PDF
𝐿𝑎𝑙𝑡𝑒𝑟 = 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛(70|𝑚𝑒𝑎𝑛 = 70, 𝑣𝑎𝑟 = 455);
2
1)
χ_cdf(˄,
>0.95
˄
Poisson(
x1=14
12:00-14:00
12:00-14:0
˄=3.84
A)across Different Domains.
B) ACM SIGSPATIAL 2015
Yu Zheng, Huichu Zhang, Yong Yu. Detecting Collective Anomalies from Multiple Spatio-Temporal Datasets
Apply LRT to a Spatio-Temporal Setting
• Apply LRT to multiple regions (or time slots)
r5 r6
r4
A dataset varies in different regions (or time slots) consistently
r3
r1
r2
1) 𝐿𝑛𝑢𝑙𝑙 = 𝑃𝑜𝑖 14 𝜆1 = 8 × 𝑃𝑜𝑖 14 𝜆2 = 10 × 𝑃𝑜𝑖𝑠(8|𝜆3 = 6);
A) Taxi flow
′
′
B) Social media
′
2) Calculate 𝛩 = {𝜆 1 , 𝜆 2 , 𝜆′3 }: To maximize the likelihood of the alternative model (𝑓𝑑=1)
𝑝=
14+14+8
8+10+6
= 1.5
Region r
Region r
PDF
2
1)
𝜆′1 =8×1.5=12, 𝜆′2 =10×1.5=15,
Gaussian𝜆′3 =6×1.5=9;
χ_cdf(˄,
Poisson(8)
>0.95
(200,1300)
x1=14
𝐿𝑎𝑙𝑡𝑒𝑟 = 𝑃𝑜𝑖 14 𝜆′1 × 𝑃𝑜𝑖xt14
=70𝜆′2 × 𝑃𝑜𝑖𝑠(8|𝜆′3 );
˄
𝐿
12:00-14:00
12:00-14:00
˄=3.84
3) 𝛬 𝑠 = −2 log 𝑛𝑢𝑙𝑙 =5.19
𝐿𝑎𝑙𝑡𝑒𝑟
A)
B)
Poisson(6)
x3=8
Poisson(10)
x2=14
14:00-16:00 16:00-18:00
C)
𝑜𝑑 = χ2 _cdf 5.19, 𝑓𝑑 = 1 = 0.978
A dataset changes differently in different regions (or slots).
𝑜𝑑 𝑠 =
∑𝑖
𝑜𝑑 2 (
< 𝑟𝑖 , 𝑡𝑖 > )
𝑚
r5 r6
r4
r3
r1
r2
A) Taxi flow
Yu Zheng, Huichu Zhang, Yong Yu. Detecting Collective Anomalies from Multiple Spatio-Temporal Datasets across Different Domains. ACM SIGSPATIAL 2015
B
Thanks!
Yu Zheng
[email protected]
Homepage
Yu Zheng. Trajectory Data Mining: An Overview.
ACM Transactions on Intelligent Systems and Technology. 2015, vol. 6, issue 3.