Transcript R*-tree

Index for city
trajectories
Mobile Group, IDKE
Renmin University of China
Caifeng Lai
城市路网的特点:


城市路网中的路线一般较为平直,且地面一般
较为平坦,现代城市的路网规划也较为合理,
因此,我们考虑可以用一种很简单的方法对其
降维。
城市路网中的车流量较大,在同一路线上的距
离相近的移动对象的运动状态基本相似,移动
对象也比较密集,因此我们可以考虑将这类的
移动对象封装起来,在其之上建立索引,然后
进行相应的操作。
两个解决的方法

降维 分为两种情况:



如果路线的斜率小于1,
则将路线上的点映射到x
轴上,求出其相对该路线
的相对位置
反之,则映射到y轴上,
求出其相对该路线的相对
位置
0
1
分组移动对象

将在同一路线上的、距离
相近、移动方向一致、速
度接近的移动对象组合成
一组,将其称为AU。
0
1
Group moving objects
Previous approach –TPR-tree ( t interval )
Our approach –AU ( t interval )
Previous approach –TPR-tree ( t+1 interval ) Our approach –AU ( t+1 interval )
AU index Architecture

The AU index scheme consists of four
components:




The top R*-tree captures the network connectivity.
Each leaf node contains several road segments.
The bottom R*-tree contains current-AU index and
past-AU index.
The current-AU is a main memory index which
represents the current state of each AU.
The past-AU represents the past state of each AU
and was left in a container on disk.
AU index Architecture (cont.)
root
R* tree
current
AU
leaf
T
past
AU
R-tree(A. Guttman
SIGMOD’1984)
R-tree的特点




R-tree是B-Tree对多维对象(点和区域)的扩展
R-tree是一棵平衡树
一个多维对象只能被分到一个子空间中去
若用动态插入算法构建R-tree,在树的结点中会引起
过多的空间重叠和死区(dead-space),使算法性能
降低
R-tree的典型算法




查找
插入
 选择叶子结点
 分裂结点(有多种算法)
 调整树
 必要时增加树的高度
删除
 找到包含要删除记录的叶子结点
 删除
 压缩树
 必要时减小树的高度
更新
 先删除老的记录索引,在插入新的记录索引
R-tree的典型算法




查找
插入
 选择叶子结点
 分裂结点(有多种算法)
 调整树
 必要时增加树的高度
删除
 找到包含要删除记录的叶子结点
 删除
 压缩树
 必要时减小树的高度
更新
 先删除老的记录索引,在插入新的记录索引
R*-Tree(N. Beckmann



SIGMOD’1990)
R*-Tree通过修改插入、分裂算法,并通过引入强制
重插机制对R-Tree的性能进行改进。
R*-Tree和R-Tree一样允许矩形的重叠,
R*-Tree在选择插入路径时同时考虑矩形的面积、空
白区域和重叠的大小,而R-Tree只考虑面积的大小。
R*-tree


Recursively cluster objects into minimum
bounding rectangles (MBR).
Organize the MBRs into a dynamic, diskbased, balanced tree structure, similar to the
B+-tree.
The R*-tree (cont.)
P7
R4
R6
R5
R6
P8
P6
P1
R3
R1
P4
P3
R2
R2
R3
R4
P5
P2
R5
R1
P1
P2
P3
P4
P5
P6
P7
P8
current-AU


At a certain time, in each road segments, several current-AU exits,
the movement of current-AU is performed as function of time.
current-AU is created by relatively strict rules according to the
features of traffic system, so they scarcely expend too much and will
not cause significant overlaps.
MAP
past-AU



Past-AU is designed for indexing vehicle’s
historical movements.
An interesting property about the history
records is that the tuples are sorted by their
end time stamps, so we can store these
history records in the end time order.
We save the current-AU as past-AU, to meet
the need of history query, we store these
history records in the form of data stream.
Current-AU & Past-AU


Current-AU is a finite sequences of tuples.
Past-AU is a possibly infinite sequences of
tuples, so it can be represented to be a
stream.

Streams come with the natural (partial) order
implied by their time attribute.
Indexing Moving Objects

Index on moving objects can be classified into two
different types:


one is indexing current and anticipated future positions of
moving object
 TPR-tree (time-parameterized R-tree) [11] is a balanced,
multi-way tree with the structure of an R-tree that can
answer prediction queries on moving objects.
the other is indexing histories of the positions of moving
objects
 mainly focus on summarized data (or aggregate information)

Aggregate R-tree
What is data stream ?

A data stream is a continuous, time-varying,
unbounded sequence of data-items. Stream items
usually take the form of relational tuples that are
disposed after they get processed, implying that
online stream algorithms are restricted to only one
pass over the data. Data arrival rates are arbitrary
and cannot be controlled by the processing system.
They are unpredictable and fluctuate vastly over
time. Moreover, explicitly storing a stream in its
entirety is impossible, due to its unbounded nature.
数据流



数据流处理方法比传统的处理方法要快很多.
尽管该方法得到的结果并不精确,但是往往并
不影响用户的最终决策.
数据流的处理思路就是设计高效的单遍数据集
扫描算法,在一个远小于数据规模的内存空间
里不断更新一个代表数据集的结构——概要数
据结构,从而实时、高效地获得近似查询结果.
我将考虑采用一种比较简单的数据流处理方法
来获得移动对象的聚集信息。
Type Definition







City trajectory network
Regions: regionid, MBR
roads: rid, ps,pe // start point,end point
lines: lid, lps,lpe // linear start point, linear end point.
Current-AU: auid, ts,pos, length, v, m// start time,
end time, the relative position of the polyline,v is a
vector, the total moving object of the AU
Past-AU: auid, ts,te, plid, pos, length, v, m
Mo: moid, (x,y), v // v is a vector
current-AU operations

Current-AU supports the following operations:
 Create_AU() returns an AU at the entrance of a road segment.


Drop_AU() is specific in AU index.


the algorithm selects appropriate moving objects which have the
same direction, nearly velocity and distance, to buildup an AU, and
take the AU as one moving objects.
When vehicles goes out the road segment, their direction may
change thus can not fit the roles of AU. So we drop it to disk for the
AU records all details of the movements in this segment.
Delete_obj() and Insert_obj() is invoked when the position of the
vehicle is outside of the AU at a certain time.

It finds the nearby AU in a greedy manner (including the original AU)
to check yes or not it fit the rules for create AU. If an AU is fined, an
Insertion will be operated, otherwise, a new AU is created for this
moving object.
Query





Window query
Range query
NN queries: “find which object became the closest to a
given point s during time interval T,”
Aggregate queries: “find how many objects passed
through area Q during time interval T,” or, “find the
fastest object that will pass through area Q in the next
5 minutes from now”
similarity queries: “find objects that moved similarly to
the movement of a given object o over an interval T”
Related work
MON-Tree index
Trajectory Bundle tree
(TB-tree)





Strictly preserves
trajectories : Each leaf
node only contains line
segments belonging to the
same trajectory.
Strength of TB-tree is the
efficient retrieval of
trajectory for a small
number of moving objects.
SEB-tree
Number of moving
objects
Processing strategy
Small
TB-tree
Large
R-tree


Range query
Time slice query
MON-Tree index

The index structure consists of three components:

a 2D R-Tree (the top R-Tree)


a set of 2D R-Trees (the bottom R-Trees)


indexing objects’ movements along the polylines.
a hash structure in the top level containing entries of
the form hpolyid, bottreepti, The hash structure is
organized by polyid.



indexing polyline bounding boxes
polyidis the polyline identification
bottreept is a pointer to the corresponding bottom R-Tree.
Hence, we have two top level index structures: an
R-Tree and a hash structure
MON-Tree index (cont.)
SEB-tree (Start/End time stamp
B-tree) (MDM2003)

Suppose n is the population of objects. In zone z,
after a period of time, N tuples, which include both
the current status records and the history records,
are generated. There is at most one current status
record for each object, therefore, at most n among N
records could be the current status record, where n
N.The format of each tuple is (id, z, ts, te). Since all
records have identical zoneids, we omit field z. Now
each tuple has three fields (id, ts, te), where ts <
te.The maximum number of records that can be
stored in one disk page is B.

An interesting property about the history
records is that the tuples are sorted by their
end time stamps, To index the history records,
the first step is to construct a 2-dimensional
space where the start time stamp and the
end time stamp are the horizontal and the
vertical axes. Each record are mapped to a
point in the space. Since in each record ts <
te, the points are all in upper-left half of the
plane.

Three steps are executed when we insert a new point: 1)
check the start stamp list and find the column where the point
is inserted; 2) check the end stamp list and find the last node
in the column. If the node is full, create a new node in the
column; 3) insert the point into the node. The cost for each
step is O(logB N), O(1) and O(1), and the overall cost for one
insertion is O(logB N).

The query becomes to retrieve all points in the polygon. Observe that if a leaf
node is inside the polygon, the points in the node must be in the query results;
and if a leaf node is intersected with the polygon, the points in it require further
check. The total query cost is O((logB N) + ((n logB n)/B) + ((n + B)/B) + (t/B)),
where t is the size of the results. The first part logB N is for searching the
columns that contains t1 and t2; the second part (n logB n)/B is for searching
one node of each n/B 4 columns before column i; the third part (n+B)/B is for
searching column j and the fourth part t/B is for retrieving the nodes that fall into
the query range.
TPR-Tree


Entries of TPR-Tree in leaf nodes are pairs of
the position of a moving point which is
represented by a reference position and a
corresponding velocity – (x, v). Reference
position is the location at time tref which
usually takes the index creating time tl.
A internal node of TPR-Tree is represented
with


(1) minimum bounding rectangle (MBR)
(2) a velocity vector
Query future location




When answering query about future location, TPR-tree employs
a simple linear function x(t) =x(tref ) + v(t − tref ) to predict object
position at future time t.
The MBRs at the future query time will also be computed based
on the current extents and velocity vectors.
So the queries at future time can be processed in exactly the
same way as in the tradition R-Tree, except that the extents of
the MBRs are computed dynamically and then compared with the
query window.
TPR-Tree is a successful index method in answering queries on
moving objects. But due to its simple prediction method, it is
unavoidably inaccuracy on prediction.