Access methods for time
Download
Report
Transcript Access methods for time
Presenters : Virag Kothari ,Vandana Ayyalasomayajula
Date: 04/21/2010
Outline
Introduction to temporal databases
Goal of the paper
Access method costs
Queries
Index pagination & Data Clustering
Efficient method design for transaction data
References
Introduction
Based on time dimension
Transaction time database
Valid time database
Bitemporal database
Goal
Attempt to identify the implications for access
method design from support of each time
dimension
In this presentation, transaction time
databases are considered.
Access Method Costs
Performance of an access method depends on
storage space to physically store the data
records and the structures of the access method
update processing time (the time to update the
method’s data structures as a result of a change)
the query time for each of the basic queries
( discussed in the next slide !)
Queries
Given a contiguous interval T, find all objects
alive during this interval.
Given a key range and a contiguous time
interval T, find the objects with keys in the
given range that are alive during interval T.
Given a key range, find the history of the
objects in this range.
Queries
- Special cases !
“transaction pure-timeslice”
A special case of class (I) occurs when interval T is
reduced to a single transaction time instant t.
“transaction range-timeslice”
representative case of class (II) where the time interval
is reduced to a single transaction time instant.
“transaction pure-key query”
representative case of class (III), key range is reduced
to a single key
Cost parameters
In the case of transaction , Bitemporal
databases,
n - > summation of insertions, deletions, and
modification updates.
For Valid time databases,
L - > the number of interval objects currently stored in the
method, i.e., the size of the collection
a -> to denote the answer size of a query in
general.
Index pagination & Data
Clustering
Cost depends on IO cost !
Performance of an index depends on how
well it is ‘Paginated’
Example: B+ trees.
Data Clustering improves performance by
storing logically near data , physically close
on the disk.
pure-timeslice query takes O(logBn + a/B )
page accesses. This method is more I/O
efficient than another method that solves the
same query in O(logBn + a) page accesses.
Efficient Method Design for
Transaction
Transaction Pure-Timeslice Query
‘copy’ approach
○ Stores a copy of the transaction database state s(t)
(timeslice) for each transaction time that at least one
change occurred
○ Copies are indexed by time t.
‘log’ approach
○ Stores only the changes that occur in the database
timestamped by the time instant on which they
occurred.
○ Copies indexed by time t.
Comparison
Space
Copy approach
Log approach
O( n^2/ B)
O(n/B)
– Transaction pure timeslice
Update
O(n/B)
O(1)
Query
O(logB n) +
O(a/B)
O(n/B)
Transaction Pure Key
“copy” and “log” solutions could be used for the
pure-key query. But they are both very inefficient
!!.
A better solution is to store the history of each
key separately, i.e., cluster data by key only.
Access to a key’s (transaction time) history can
be implemented by a hashing function or B tree.
The list of versions of each key can be further
organized in a separate array indexed by
transaction time to answer a pure-key query with
time predicate .
Costs
– Transaction pure key
Cost to index into hash table or B tree +
cost of searching in the array.
Array length can be n/B , so cost would
O( log B n) .
Transaction Range-Timeslice
To answer a range query efficiently, it is best to
cluster by transaction time and key within
pages.
Very similar to spatial indexing concept.
Two dimensions , time & key need to be
considered.
Data bounding technique
Another possibility data mapping, maps a
record to three (or more) coordinates –
transaction start_time, end-_time, and key(s)—
and then uses a multiattribute point index.
References
Betty Salzberg, Vassilis J. Tsotras:
Comparison of Access Methods for
Time-Evolving Data. ACM Comput. Surv.
(CSUR) 31(2):158-221 (1999)
Thank you!!