CSIS7101 – Advanced Database Technologies

Download Report

Transcript CSIS7101 – Advanced Database Technologies

CSIS7101 – Advanced Database Technologies
Spatio-Temporal Data (Part 3)
The MV3-Tree:
A Spatio-Temporal Access Method for
Timestamp and Interval Queries
Kwong Chi Ho Leo
Wong Chi Kwong Simon
Lui, Tak Sing Arthur
CSIS7101 - Advanced Database Technologies
Introduction

Spatial-Temporal Database Management
Systems (STDBMS)

Mobile Phone Systmes



Traffic Supervision Systems



Track users efficiently
Provide better communication services
Monitor vehicle locations
Motion patterns
Urban Planning


Record the development of landscapes over the years
Retrieve urban situations at any given time in the past
Spatio-Temporal Data (Part 3)
2
CSIS7101 - Advanced Database Technologies
Spatio-Temporal Queries

Traditional STDBMS

Focus on static objects



Attempting to update the database whenever the objects change
their positions which will cause the STDBMS to spend most of
the time just handling the updates
It would result in huge space requirement
To deal with objects that have dynamic behavior, new
querying languages, modeling methods, novel attribute
representation and specialized access methods are needed

STR-trees and TB-trees


Focus on efficient trajectory retrieval
TPR-trees

Focus on predicting objects’ future locations by storing their current
positions and velocities
Spatio-Temporal Data (Part 3)
3
CSIS7101 - Advanced Database Technologies
Spatio-Temporal Queries (Continue)

To deal with historical information retrieval,
windows queries about objects that move in
discrete time is commonly used.

Timeslice or Timestamp Queries


Retrieve all objects that intersect a windows at a specific
timestamp
Interval Queries

Include several consecutive timestamps
Spatio-Temporal Data (Part 3)
4
CSIS7101 - Advanced Database Technologies
Historical Information Retrieval

Types of Indexing

MR-trees and HR-trees


Maintain a separate R-tree for each timestamp, but allow
consecutive trees to share branches
Advantages


Efficient for timestamp queries, as search degenerates into
a static spatial window query for which R-trees are very
efficient
Disadvantages

Extensive duplication of objects (even if they do not move)
which lead to huge space requirements for most typical
application
 Poor performance on interval queries
Spatio-Temporal Data (Part 3)
5
CSIS7101 - Advanced Database Technologies
Historical Information Retrieval (Continue)

3-Dimensional R-trees (the 3rd dimension
corresponding to time)



An object which does not change its position during a
certain period of time is modeled as a 3D box, bounding
both its spatial and temporal attributes
A moving object is modeled by multiple boxes, each
corresponding to a different version.
Advantages

The temporal attribute is integrated tightly with the spatial
attributes thus interval queries can be answered efficiently
 Redundant duplication is avoided thus space usage can be
reduced.
Spatio-Temporal Data (Part 3)
6
CSIS7101 - Advanced Database Technologies
Historical Information Retrieval (Continue)

Disadvantages

Poor performance on timestamp queries as the query time
depends on the total number of entries in history rather than
the live entries at the query timestamp.
Object states at the
same location from
t1 to t2
time
At location B
t3
t2
At location A
x
t1
time
t2
y
Spatio-Temporal Data (Part 3)
t1
x
Object moves
from location A
to B at time t2
y
7
CSIS7101 - Advanced Database Technologies
Induction of MV3R-tree

MV3R-tree utilizes the concepts of multiversion R-tree (MVR-tree) and a small
auxiliary 3D R-tree




The auxiliary 3D R-tree builts on the leave of the
MVR-tree
Aims at timestamp and interval window queries
For retrieving the past locations of discretely
moving objects
Enhancing the performance of multi-version
framework for multi-dimensional access methods
Spatio-Temporal Data (Part 3)
8
CSIS7101 - Advanced Database Technologies
Induction of MV3R-tree (Continue)
 MVR-tree

involves several heuristics that take into
account the features of R-trees to improve
performance significantly
 The

auxiliary 3D R-trees
outperform traditional 3D R-trees for most
queries
 MV3R-tree

space consumption
up to an order of magnitude smaller than that of
an HR-tree, while maintaining comparable
timestamp query performance
Spatio-Temporal Data (Part 3)
9
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees, HR-trees & 3D R-trees

Multi-version B-trees




Extensions of B-trees
Index the evolution of one-dimensional data in
transaction time temporal databases
Insertions and deletions can only happen at current
time
Entry in the form of <key, tstart, tend, pointer>



For root and intermediate entries, pointers points to a next
level node
For leaf entries, pointers points to the actual record with
the corresponding key value.
An object is said to be alive at time t if : tstart ≤ t < tend
Spatio-Temporal Data (Part 3)
10
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees (Continue)





Temporal attributes tstart and tend denotes the time that the
record was inserted and deleted in the database
respectively
Deletions are logical, i.e. actual records are not physically
removed from database
For currently live entries, tend would be denotes as * where
* means “NOWTIME”
Can have multiple roots and each root has a jurisdiction
interval
For each timestamp t, each node, except the roots, is
required that either none or at least b.Pversion entries are
alive at t

Pversion = tree parameter
 b = node capacity
Spatio-Temporal Data (Part 3)
11
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees (Continue)
Examples: Pversion = 1/3, b = 6, minimum entries = 1/3 * 6 = 2
Root
Point to leaf node A.
Created at time 1
and is alive
Key 43 in leaf node
B is created at time
1 is still alive
A
< 5,
< 8,
<13,
<25,
<27,
<29,
< 5, 1, *, A>
<43, 1, *, B>
<72, 1, *, C>
Point to leaf node B.
Created at time 1
and is alive
Key 95 in leaf node C
is created at time 1and
is deleted at time 3
B
1,
1,
1,
1,
1,
1,
*>
*>
*>
3>
3>
3>
Spatio-Temporal Data (Part 3)
<43,
<48,
<52,
<59,
<68,
1,
1,
1,
1,
1,
Point to leaf node C.
Created at time 1
and is alive
*>
*>
2>
3>
3>
C
< 72,
< 78,
< 83,
< 95,
< 99,
<102,
1,
1,
1,
1,
1,
1,
*>
*>
*>
3>
*>
*>
12
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees (Continue)

Insertions
Root
Root
< 5, 1, *, A>
<43, 1, *, B>
<72, 1, *, C>
< 5,
<43,
<72,
< 5,
A
< 5,
< 8,
<13,
<25,
<27,
<29,
B
1,
1,
1,
1,
1,
1,
*>
*>
*>
3>
3>
3>
A
< 5,
< 8,
<13,
<25,
<27,
<29,
<43,
<48,
<52,
<59,
<68,
4>
4>
4>
3>
3>
3>
<43,
<48,
<52,
<59,
<68,
4,
*,
*,
*,
A>
B>
C>
D>
A new node D is
created to store live
entries of A
C
1,
1,
1,
1,
1,
*>
*>
2>
3>
3>
B
1,
1,
1,
1,
1,
1,
1,
1,
1,
4,
A “dies” meaning that
it will not be modified
in the future
< 72,
< 78,
< 83,
< 95,
< 99,
<102,
1,
1,
1,
1,
1,
1,
*>
*>
*>
3>
*>
*>
C
1,
1,
1,
1,
1,
Spatio-Temporal Data (Part 3)
*>
*>
2>
3>
3>
< 72,
< 78,
< 83,
< 95,
< 99,
<102,
Insertion of <28, 4, *>
at timestamp 4 cause
node A overflow
D
1,
1,
1,
1,
1,
1,
*>
*>
*>
3>
*>
*>
< 5,
< 8,
<13,
<28,
4,
4,
4,
4,
*>
*>
*>
*>
13
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees (Continue)

Deletion
Root
Root
< 5, 1, *, A>
<43, 1, *, B>
<72, 1, *, C>
< 5,
<43,
<72,
<43,
<83,
A
< 5,
< 8,
<13,
<25,
<27,
<29,
B
1,
1,
1,
1,
1,
1,
*>
*>
*>
3>
3>
3>
A
< 5,
< 8,
<13,
<25,
<27,
<29,
<43,
<48,
<52,
<59,
<68,
*>
*>
*>
3>
3>
3>
<43,
<48,
<52,
<59,
<68,
*,
4,
4,
*,
*,
A>
B>
C>
D>
E>
Deletion of <48, 1, *> at
timestamp 4 causes node B
underflow. Key split is
performed.
C
1,
1,
1,
1,
1,
*>
*>
2>
3>
3>
B
1,
1,
1,
1,
1,
1,
1,
1,
1,
4,
4,
Nodes B and C died.
Node D and E created.
< 72,
< 78,
< 83,
< 95,
< 99,
<102,
1,
1,
1,
1,
1,
1,
C
1,
1,
1,
1,
1,
Spatio-Temporal Data (Part 3)
4>
4>
2>
3>
3>
< 72,
< 78,
< 83,
< 95,
< 99,
<102,
1,
1,
1,
1,
1,
1,
A sibling is chosen, say node
C. Live entries are copied to
new nodes (D & E).
*>
*>
*>
3>
*>
*>
4>
4>
4>
3>
4>
4>
D
E
<43, 4, *>
<72, 4, *>
<78, 4, *>
< 83, 4, *>
< 99, 4, *>
<102, 4, *>
14
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees (Continue)

Summary



Insertions and deletions may cause overflow and
underflow respectively, thus create version splits.
Version splits create data redundancy for those
entries duplicated. Such redundancy harms
interval query performance as both the original and
duplicated versions may need to be retrieved.
Underflow – Strong and Weak Versions


Strong version underflow happens after a version split.
Weak version underflow occurs when the weak version
condition is violated.
Spatio-Temporal Data (Part 3)
15
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees (Continue)


MVB-trees require O(N/b) space, where N is the
number of updates ever made to the database and
b is the block capacity.
Answering a timestamp range query requires
O(logbM + r/b) I/O’s, where M is the number of live
object at the queried timestamp, and r is the
number of output objects.
Spatio-Temporal Data (Part 3)
16
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees, HR-trees & 3D R-trees

Historical R-trees



Based on the overlapping technique, another
framework for transforming a single version data
structure into a transaction time access method
The structure maintains an R-tree for each
timestamp, but common branches of consecutive
trees are stored only once in order to save space
A timestamp query is directed to the corresponding
R-tree and search is performed inside the tree body.
Thus, the query degenerates into an ordinary
window query and is handled very efficiently
Spatio-Temporal Data (Part 3)
17
CSIS7101 - Advanced Database Technologies
Overview of Historical R-trees (Continue)


An interval query should search the corresponding
trees of all the timestamps involved.
Example: object e changes position at timestamp 1
Timstamp 0
Timstamp 1
As no object in node A changes
from timestamp 0 to 1, it can be
shared by other trees.
R0
A0
B0
C0
a0 b0 a0
D0
R1
B1
c0 d0 e0
E0
Object e deleted from
node E at timestamp 1
C1
a0 b0 e1
D1
c0 d0 e0
E1
Object e added to
node D at timestamp 1
Spatio-Temporal Data (Part 3)
18
CSIS7101 - Advanced Database Technologies
Overview of MVB-trees, HR-trees & 3D R-trees

3D R-trees




They view time as just another dimension and
integrate it in the tree construction
The movements of 2D objects can be modeled as
distinct boxes in three dimensional space
Temporal projection denotes the period when the
corresponding object remains static
Spatial projections of the box correspond to the
object’s position and extents during the period
Spatio-Temporal Data (Part 3)
19
CSIS7101 - Advanced Database Technologies
Overview of 3D R-trees (Continue)


No mechanism to ensure that each node has a
minimum number of live entries at a given shortinterval queries
Poor in timestamp and short-interval query
performance




A single tree for the whole history
A node may have a lot of dead space at a timestamp t,
meaning that there is a high chance that the query window
intersects the bounding box but no object inside it
Where there are many objects with long lifespans, the
problem becomes more serious because these objects will
force the node that contain them to have long lifespans as
well
It depends on the total number of records, rather than on
the number of records alive at the queried timestamps
Spatio-Temporal Data (Part 3)
20
CSIS7101 - Advanced Database Technologies
Overview of 3D R-trees (Continue)

Good performance in long interval queries


No redundancy
R-trees optimize queries with similar extents along all
dimensions
Different objects at
different locations at
different time interval
time
Only one live
entry may be
retrieved at a
timestamp t.
Ttimestamp t
x
y
Spatio-Temporal Data (Part 3)
A high chance that
the query window
intersects the
bounding box but no
object inside
21
CSIS7101 - Advanced Database Technologies
MV3R-Trees

Why needs MV3R-Trees


Currently, there are no such structure that can
effectively handle both timestamp and interval
queries.
Why is MV3R-Trees good


Reduce the structure size but improve query
performance
Are applicable to other multi-dimensional access
methods when they are converted to corresponding
multi-version structures
Spatio-Temporal Data (Part 3)
22
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Continue)

How MV3R-Trees work

Contain a multi-version R-Tree (MVR-tree) and a
small auxiliary 3D R-tree built on the leaf nodes of
the MVR-tree


MVR-tree can contain multiple R-trees, which refer to as
“logical trees”. Each entry has the form as with MVB-trees:
<S, tstart, tend, pointer>. S denotes the spatial minimum
bounding rectangle (MBR) as defined in R-trees.
MVR-tree inherit the concept of weak version condition.
Spatio-Temporal Data (Part 3)
23
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Insertion and overflow)

Intermediate nodes insertion


Allow redundancy in order to maintain good performance
for timestamp and short-interval queries
Process of insertion for intermediate nodes:
Block
Overflow
Insertion
A
The lifespan of A1 does not include
timestamp 10, its MBR does not cover
C6. The MBR of A1 may be tightedned
Version
Split
. . .
Version Split at
timestamp 10
MBR of B1 is small than A1
because only C5 and C6 is
bounded by B1 at timestamp 10
Spatio-Temporal Data (Part 3)
Key
Split
A
. . .
. . .
<A1, 1, *, C>
Strong
Version
Overflow
C
<A1, 1, 10, C>
<C1, 1, 3>
. . .
<C2, 1, 3>
Insertion of
object C6 at
timestamp 10
<C3, 2, 8>
B
. . .
<B1, 10, *, C>
. . .
<C4, 2, 8>
<C5, 5, *>
<C6, 10, *>
24
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Insertion and overflow)

Do not consider at all strong version underflows because



Underflow in MVR-trees happens much less frequently than
overflows
Handle underflows by entry re-insertion, which may trigger
block overflows in several other nodes
Version splits need to take into account the spatial extents
of the nodes.
Spatio-Temporal Data (Part 3)
25
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Insertion and overflow)

Leaf nodes insertion



Try to avert version splits thus reduce redundancy to
reduce storage space but maintain timestamp query
performance
Small number of leaf nodes will facilitate interval query
processing using the auxiliary 3D R-tree
Process of insertion for leaf nodes:

To avoid version splits, try the following alternatives in order:
1.
2.
3.
4.
General Key Split
Insert in node after reinserting one of its entries
Insert in another node
Version split
Spatio-Temporal Data (Part 3)
26
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Insertion and overflow)
Insertion
1.
Block
Overflow
1. General Key Split
2. Insert in node after
reinserting one of its entries
3. Insert in another node
4. Version Split
Strong
Version
Overflow
Key
Split
General Key Split
 If a new entry is to be inserted, the entries can be distributed to two nodes so that for
each timestamp in a time range, there exist at least b.Pversion entries alive. Thus
version split, which generate version redundancy for entries, can be avoided.
However, it may be difficult or impossible.
 Two new nodes should have small overlap.
 General key split is different from ordinary key split because ordinary key split is
applied when all the entries in the node to be split are alive and their tstart equal
current time.
2.
Reinsert an Existing Entry of the Node
 Any leaf node can store a re-inserted entry provided that:
 Its lifespan must cover that of the entry
 It should be dead if the entry is dead
 Its area should not be enlarged much, in order to ensure good performance for
timestamp queries
Spatio-Temporal Data (Part 3)
27
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Insertion and overflow)
 Dead entries always come before live ones as live entries will be reinserted only into
live nodes, which may also overflow and induce the same problem in the future
 Among the dead and live entries, sorting is based on the area decrease of the nod
MBR caused by the entry deletion
 Reinsert a single entry even if it is possible to reinsert more because
 Reinsertion saves space but does not achieve structure improvement
 Reinsertion of a single entry already achieves the objective of averting the
version split
3.
Insert in Another Node




Tries to insert the new entry into another node that is not full
Backtrack to the upper level and try to insert the entry into another branch
Only consider branches that will incur small area enlargement
The area enlargement of candidate branches can only exceed that of the best branch
by a certain percentage
Spatio-Temporal Data (Part 3)
28
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Insertion and overflow)

Conclusion

General Key Split

Does not require reading any more pages. It is the most
efficient method in terms of update cost
 It reduces the number of entries in the new nodes so that
they will not overflow again in the near future

Reinsert an Existing Entry of the Node

It can search multiple branches
 Update costs are compensated by the space savings

Insert in Another Node


Requires backtracking up to level 2
Update costs are compensated by the space savings
Spatio-Temporal Data (Part 3)
29
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Deletion and Underflow)




It is handled in a way similar to R*-trees if a
deletion does not incur structural changes
An entry is physically deleted, only if its tstart is equal
to the current time (multiple updates may happen at
the same timestamp)
Intermediate and leaf nodes deletions are handled
in the similar way as of insertion.
Intermediate node deletion


Suppose an underflow occurs at the current timestamp t,
the live entries of tend to be set to t. Then these entries are
re-inserted into the most recent logical R-tree after setting
tstart = t.
Apply the R*-tree algorithms
Spatio-Temporal Data (Part 3)
30
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Deletion and Underflow)

Leaf node deletion


To avoid redundancy caused by reinsertion, a live entry
from a sibling node will be borrowed
The borrowed (moved) entry should have the properties:

It must be alive and its lifespan must be covered by the
original sibling node, say node A
 After the removal of the entry, the version condition of the
borrowed sibling node, say node B, must still be satisfied
 Inserting this entry to the original sibling node, node A, will
not cause its MBR to increase above a threshold
Spatio-Temporal Data (Part 3)
31
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Deletion and Underflow)

Example:
Timestamp 1
S (root)
. .
<S1, 1,
<S2, 1,
. .
A
.
*, A>
*, B>
.
Deletion of object A1
at timestamp 2 which
cause underflow of
node A
B
<A1, 1, *>
<A2, 1, *>
<A3, 1, *>
<B1, 1, *>
<B2, 1, *>
<B3, 1, *>
Objects B2 and B3 cannot be
moved because their deletion
from B will cause weak version
underflow for timestamp 1
Timestamp 2
A
<A1, 1, 2>
<A2, 1, *>
<A3, 1, *>
Insertion of
objects B4,
B5 and B6 at
timestamp 2
Spatio-Temporal Data (Part 3)
A
B
<B1,
<B2,
<B3,
<B4,
<B5,
<B6,
1,
1,
1,
2,
2,
2,
2>
*>
*>
*>
*>
*>
B
<A1,
<A2,
<A3,
<A4,
1,
1,
1,
2,
2>
*>
*>
*>
<B1,
<B2,
<B3,
<B4,
<B5,
<B6,
1,
1,
1,
2,
2,
2,
2>
*>
*>
*>
*>
*>
Object B4 has
been borrowed
by A
32
CSIS7101 - Advanced Database Technologies
MV3R-Trees (Auxiliary 3D R-tree)





Built on the leaves of the MVR-tree in order to
process interval queries
For a given moderate node capacity, the number of
leaf nodes in an MVR-tree is much lower than the
actual number of objects.
Smaller in size as compare to a complete 3D R-tree
Adding auxiliary 3D R-tree not only improves
interval query performance, but may also provides
flexibility in other scenarios
Construction of auxiliary 3D R-tree

Whenever a leaf node of the MVR-tree is updated, the
change is propagated to its entry in the 3D R-tree
Spatio-Temporal Data (Part 3)
33
CSIS7101 - Advanced Database Technologies
Query Processing with MV3R-trees

MVR-tree


Timestamp query involves retrieval of the root whose jurisdiction
interval covers the queried timestamp, and then search is
performed similarly to R-trees
3D R-tree

For interval queries, multiple trees may need to be searched

Should avoid duplicate visits to the same node via different parents,
otherwise, result in severe IO cost
 Duplicate pointers to a node are created in version splits or entry
reinsertions
 In both cases, the two entries pointing to the same node have
disjoint lifespans


For short interval, it will be used whenever the temporal query
length exceeds a certain threshold
Its performance deteriorates gradually as the tree grows
Spatio-Temporal Data (Part 3)
34
CSIS7101 - Advanced Database Technologies
Query Processing with MV3R-trees

Example
A
Cube of B1
B
. . .
. . .
<A1, 1, 10, C>
<B1, 10, 20, C>
. . .
. . .
C <C1,
<C2,
1,
time
8, D>
1, 12, E>
Cube of C4
<C3, 10, 20, F>
<C4, 10, 20, G>
Cube of C1
Query
Cube
1. A1 and B1 are temporally adjacent
Cube of C3
2. A1 spatially covers C1 and C2
Cube of C2
3. B1 spatially covers C2, C3 and C4
4. C2 and C3 intersect the query box so their
subtrees (node E and F) should be search
5. Since node C may be reached twice (by
following A1 and B1), we may attemp
redundant visits to E and F
Spatio-Temporal Data (Part 3)
x
Cube of A1
y
35
CSIS7101 - Advanced Database Technologies
Conclusion of MV3R-trees



A structure that combines the concepts of MVBtrees and 3D R-trees
MV3R-trees can handle timestamp and interval
queries efficiently with relatively small space
requirements
MV3R-trees could be further improved by:


Analytical cost models for determining the optimal tree to
answer short interval queries
Overflow and underflow handling heuristics that are more
efficient in terms of update cost, and can avert more
version splits
Spatio-Temporal Data (Part 3)
36
CSIS7101 - Advanced Database Technologies
References

Tao, Y., Papadias, D. The MV3R-Tree: A
Spatio-Temporal Access Method for
Timestamp and Interval Queries, 2000
Spatio-Temporal Data (Part 3)
37