Temporal Databases
Download
Report
Transcript Temporal Databases
Temporal Databases
S. Srinivasa Rao
April 12, 2007
[Part 1 based on Ch23 of C.J. Date (slides by Prof. Ghafoor, EE 562)]
[Part 2 based on slides by Prof. Arge, I/O-algorithms]
Outline
• Part 1: Introduction to temporal databases
• Part 2: Temporal index: Persistent B-tree and its applications
2
Introduction
• Temporal database: a database that contains historical data as well
as current data.
– Note: ‘historical’ is a misleading term – temporal databases may contain
data regarding the future as well as the past.
• Extreme case: data is only inserted, never deleted from a temporal
database (eg. vehicle position data in the ‘project’).
• So far, we have studied the other extreme - i.e. ‘snapshot’ databases.
• Distinguishing feature: the element of time.
3
Introduction
• Temporal data: encoded representation of timestamped facts.
– Each tuple must include at least one timestamp.
– Problem:What about queries that produce results that are not
temporal? i.e. result of query is outside the domain of (temporal)
database.
– eg. Get names of all people who have supplied something in the
past.
• Redefine temporal database: database that includes, but is not
limited to, temporal data.
4
Motivation
• Queries on time-varying data are difficult to express in SQL.
• Temporal databases provide build-in support for recording and
querying such information.
• It is possible to use SQL to evaluate these queries, but performance
is poor.
5
Motivation
• Most applications manage temporal data.
• If a temporal database is used for such data:
– Schemas, including integrity constraints are simpler.
– Queries are simpler
• Application code is less complex
– easier to understand
– easier to produce
– easier to maintain
6
Applications
Most applications of database technology are temporal in nature:
• Financial apps.: portfolio management, accounting & banking, stock
market analysis, audit analysis
• Record-keeping apps.: personnel, medical records, inventory management,
legal records (commercial laws change frequently)
• Data Warehousing: historical trends for analysis
• Scheduling apps.: airline, car, hotel reservations and project management
• Scientific apps.: weather monitoring, chemical process monitoring
7
Intervals
• An interval [s,e] is a set of times from time s to time e.
– Does interval [s,e] represent an infinite set?
– Assumption: Timeline is a finite sequence of discrete, indivisible
time quanta.
• Time Quanta: smallest unit of time system can represent.
• Timepoints/point: time unit considered indivisible for our purpose.
• An interval is treated as a single type, not as pair of separate values.
• Interval can be open/closed w.r.t. start point/end point.
– eg. [d04,d10],[d04,d11),(d03,d10],(d03,d11)
all represent the sequence of days from day4 to day10 inclusive.
8
Operators on Intervals
• Temporal predicate operators:
i1 = [s1,e1]; i2 = [s2,e2]
i1
– i1 BEFORE i2
(e1<s2)
– i1 MEETS i2
i1
(s2 = e1)
– i1 EQUALS i2
(s1 = s2 AND e1 = e2)
– i1 OVERLAPS i2
(s2 < s1 < e2 OR s1 < s2 < e1)
i2
i2
i1
i2
i1
i2
9
Operators on Intervals
i1
– i1 DURING i2
(s2 < s1 AND e2 > e1 )
– i1 STARTS i2
(s1 = s2 AND e1 < e2)
– i1 FINISHES i2
(e1 = e2 AND s1 > s2)
i2
i1
i2
i1
i2
• Additional operators:
– i1 MERGES i2:
– i1 CONTAINS i2:
(i1 MEETS i2 OR i1 OVERLAPS i2)
(i2 DURING i1)
10
Scalar and Relational Operators
• DURATION(i) - returns the number of time points in i
– eg. DURATION ([d03,d07]) returns 5
• i1 UNION i2
– returns [MIN(s1,s2),MAX(e1,e2) ]
if (i1 MERGES i2)
otherwise undefined
• i1 INTERSECT i2
– returns [MAX(s1,s2),MIN(e1,e2)]
if (i1 OVERLAPS i2)
otherwise undefined
11
Aggregate Operators
• EXPAND(X):
Where X is a set. The output is also a set.
Used to generate time quantum intervals.
– The expanded form of X is the set of all intervals of the form [p,p]
where p is a time point in some interval in X.
• e.g.:
– X1 = { [d01,d01],[d03,d05],[d04,d06] }
– X2 = { [d01,dp1],[d03,d04],[d05,d05],[d05,d06] }
– X3 = { [d01,d01],[d03,d03],[d04,d04],[d05,d05],[d06,d06] }
– Then
EXPAND(X1) = EXPAND(X2) = X3
12
Aggregate Operators
• COLLAPSE(X):
The collapsed form of X is the set Y of intervals of the same type
such that
– (a) X & Y have the same unfolded form.
– (b) no two distinct members i1 and i2 of Y are such that
(i1
MERGES i2) is true.
• e.g.:
– X1 = { [d01,d01],[d03,d05],[d04,d06] }
– X2 = { [d01,d01],[d03,d04],[d05,d05],[d05,d06] }
– X3 = { [d01,d01],[d03,d06] }
– Then
COLLAPSE (X1) = COLLAPSE (X2) = X3
13
Relation Operators Involving
Intervals
• PACK r on A: groups the relation r by all its attributes apart from A
This is equivalent to
WITH ( r GROUP {A} AS X ) AS R1
( EXTEND R1 ADD COLLAPSE (X) AS Y )
{ALL BUT X } AS R2 :
R2 UNGROUP Y
• UNPACK r on A:
Replace COLLAPSE with EXPAND in PACK.
14
Example
Given two temporal relations:
S: Supplier S# was under contract
during the interval During
SP: Supplier S# was able to supply
part P# during the interval During
SP S# P# During
S1 P1 [d04,d10]
S1 P7 [d05,d10]
S1 P3 [d09,d10]
S1 P5 [d06,d10]
S S# During
S1
[d04,d10]
S2 P1 [d02,d04]
S2 P9 [d03,d03]
S2 P1 [d08,d10]
S2
[d02,d04]
S2
[d07,d10]
S3
[d03,d10]
S4 P2 [d06,d09]
S4
[d04,d10]
S4 P5 [d04,d08]
S5
[d02,d10]
S4 P7 [d05,d10]
S2 P5 [d09,d10]
S3 P1 [d08,d10]
15
Example 1
• Active supplier intervals: Get S#-DURING pairs for
suppliers who have been able to supply at least one
part during at least one interval of time, where
DURING designates such an interval.
• PACK SP {S#,DURING} ON DURING
RESULT S# During
SP S# P# During
S1 P1 [d04,d10]
S1 P7 [d05,d10]
S1 P3 [d09,d10]
S1 P5 [d06,d10]
S2 P1 [d02,d04]
S2 P9 [d03,d03]
S2 P1 [d08,d10]
S1
[d04,d10]
S2
[d02,d04]
S3 P1 [d08,d10]
S2
[d08,d10]
S4 P2 [d06,d09]
S3
[d08,d10]
S4 P5 [d04,d08]
S4
[d04,d10]
S4 P7 [d05,d10]
S2 P5 [d09,d10]
16
Example 2
• Inactive (passive) supplier intervals: Get S#-DURING pairs for
suppliers who have been unable to supply any parts at all during at
least one interval of time, where DURING designates such an
interval.
• PACK
( ( UNPACK S {S#,DURING} ON DURING )
MINUS
( UNPACK SP {S#,DURING} ON DURING ) )
ON DURING
RESULT S# During
S2
[d07,d07]
S3
[d03,d07]
S5
[d02,d10]
• Shorthand: U_MINUS
17
More Relational Operators
• USING ( AList ) ◄ r1 op r2 ► is a shorthand for:
PACK
( ( UNPACK r1 on (AList) ) op ( UNPACK r1 on (AList) ) )
ON (AList)
Where op is either UNION, INTERSECT, MINUS or JOIN
• Various comparison operators on relations are defined similarly.
USING ( AList ) ◄ r1 rel-op r2 ► is equivalent to
( ( UNPACK r1 on (AList) ) rel-op ( UNPACK r1 on (AList) ) )
18
Part 2
Persistent B-trees
and applications
19
Persistent B-tree
• In some applications we are interested in being able to access
previous versions of data structure
– Databases
– Geometric data structures
• Partial persistence:
– Update the current version (getting a new version)
– Query all versions
• We would like to have partial persistent B-tree with
– O(N/B) space – N is number of updates performed
– O(log B N ) update
– O(log B N T B) query in any version
20
Persistent B-tree
• East way to make B-tree partial persistent
– Copy structure at each operation
– Maintain “version-access” structure (B-tree)
update
i
i+1
i+2
i
i+1
i+2
i+3
• Good O(log B N T B) query in any version, but
– O(N/B) I/O update
– O(N2/B) space
21
Persistent B-tree
• Idea: Elements augmented with “existence interval” and stored in
one structure
• Persistent B-tree with parameter b:
– Directed graph
* Nodes contain elements augmented with existence interval
* At any time t, nodes with elements alive at time t form B-tree
with leaf and branching parameter b (i.e., each node/leaf has
at least b/4 and at most b children/keys in them)
– B-tree with leaf and branching parameter b on indegree 0 nodes
If b=B: Query at any time t in O(log B N T B) I/Os
22
Persistent B-tree: Updates
• Updates performed as in B-tree
• To obtain linear space we maintain new-node invariant:
– New node contains between 3 8 B and 7 8 B alive elements and no
dead elements
1
4
1
8
B
B
3
8
1
2
B
1
8
B
7
8
B
B
B
23
Persistent B-tree Insert
• Search for relevant leaf u and insert new element
• If u contains B+1 elements: Block overflow
– Version split:
Mark u dead and create new node u’ with x alive element
– If x 7 8 B: Strong overflow
– If x 3 8 B: Strong underflow
– If 3 8 B x 7 8 B then recursively update parent(u):
Delete (persistently) reference to u and insert reference to u’
1
4
B
3
8
B
7
8
B
B
1
4
B
3
8
B
7
8
B
B
24
Persistent B-tree Insert
• Strong overflow ( x 7 8 B)
– Split u into u’ and u’’ with x 2 elements each ( 3 8 B x 2
– Recursively update parent(u):
Delete reference to u and insert reference to v’ and v’’
1
4
B
3
8
B
171 B 33B
B BB
484 B 88
771
BB
884 B
B83BB
1
2 B)
7
8
B
B
• Strong underflow ( x 3 8 B)
– Merge x elements with y live elements obtained by version split on
sibling ( 1 2 B x y 118 B)
– If x y 7 8 B then (strong overflow) perform split into nodes
with (x+y)/2 elements each ( 7 16 B ( x y) / 2 1116 B )
– Recursively update parent(u): Delete two insert one/two references
25
Persistent B-tree Delete
• Search for relevant leaf u and mark element dead
• If u contains x 1 4 B alive elements: Block underflow
– Version split:
Mark u dead and create new node u’ with x alive element
– Strong underflow ( x 3 8 B ):
Merge (version split) and possibly split (strong overflow)
– Recursively update parent(u):
Delete two references insert one or two references
1
4
1
8
B
B
3
8
1
2
B
1
8
B
7
8
B
B
B
26
Persistent B-tree
Insert
Delete
done
Block underflow
0,0
Block overflow
Version split
done
Version split
Strong overflow
Strong underflow
Split
Merge
-1,+1
Strong overflow
done -1,+2
1
4
1
8
B
B
3
8
1
2
B
1
8
B
7
8
B
B
B
done
-2,+1
Split
done
-2,+2
27
Persistent B-tree Analysis
• Update: O(log B N )
– Search and “rebalance” on one root-leaf path
• Space: O(N/B)
– At least 18 B updates in leaf in existence interval
– When leaf u dies
* At most two other nodes are created
* At most one block over/underflow one level up (in parent(u))
1
1
1
B
B
B
– During N updates we create:
8
8
2
* O( N B) leaves
7
* O( N Bi ) nodes i levels up
1
B B
B 83 B
8
4
O( N B i ) O( N B ) blocks
i
28
Summary/Conclusion: Persistent B-tree
• Persistent B-tree
– Update current version
– Query all versions
• Efficient implementation obtained using existence intervals
– Standard technique
• During N operations
– O(N/B) space
– O(log B N ) update
– O(log B N T B) query
29
Interval Management
• Problem:
– Maintain N intervals with unique endpoints dynamically such
that stabbing query with point x can be answered efficiently
x
• As in (one-dimensional) B-tree case we are interested in
– O( N B) space
– O(log B N ) update
– O(log B N T B) query
30
Interval Management: Static Solution
• Sweep from left to right maintaining persistent B-tree
– Insert interval when left endpoint is reached
– Delete interval when right endpoint is reached
x
• Query x answered by reporting all intervals in B-tree at “time” x
– O( N B) space
– O(log B N T B) query
– O( NB log B N ) construction using buffer technique
• Dynamic with O(log 2B N ) insert bound using logarithmic method
31
Internal Memory Logarithmic Method Idea
• Given (semi-dynamic) structure D on set V
– O(log N) query, O(log N) delete, O(N log N) construction
• Logarithmic method:
– Partition V into subsets V0, V1, … Vlog N, |Vi| = 2i or |Vi| = 0
– Build Di on Vi
..................................
* Delete: O(log N)
2
2
2
* Query: Query each Di O(log2 N)
* Insert: Find first empty Di and construct Di out of
0
1
2
2
log N
1 ij10 2 j 2i elements in V0,V1, … Vi-1
– O(2i log 2i) construction O(log N) per moved element
– Element moved O(log N) times O(log2 N ) amortized
32
External Logarithmic Method Idea
• Decrease number of subsets Vi
to logB N to get O(log 2B N ) query
..................................
B
0
B
1
B
2
B
log B N
• Problem: Since 1 ij10 B j Bi there are not enough elements in
V0,V1, … Vi-1 to build Vi
• Solution: We allow Vi to contain any number of elements Bi
i
i
– Insert: Find first Di such that j 0 V j B and construct new
Di from elements in V0,V1, … Vi
* We move
i 1
j 0 V j B i 1elements
* If Di constructed in O((|Vi|/B)logB |Vi|) = O(Bi-1logB N) I/Os
every moved element charged O(logB N) I/Os
* Element moved O(logB N) times O(log 2B N ) amortized
33
External Logarithmic Method Idea
• Given (semi-dynamic) linear space external data structure with
– O(log B N T B) I/O query
– O( NB log B N ) I/O construction
(– O(log B N ) I/O delete)
• Linear space dynamic data structure with
– O(log2B N T B) I/O query
– O(log 2B N ) I/O insert amortized
(– O(log B N ) I/O delete)
• Dynamic interval management
– O(log2B N T B) I/O query
– O(log 2B N ) I/O insert amortized
x
34
Planar Point Location
• Static problem:
– Store planar subdivision with N segments on disk such that
region containing query point q can be found I/O-efficiently
• We concentrate on vertical ray shooting query
– Segments can store regions it bounds
– Segments do not have to form subdivision
q
• Dynamic problem:
– Insert/delete segments
(we will not discuss this)
35
Static Solution
• Vertical line imposes above-below order on intersected segments
• Sweep from left to right maintaining
persistent B-tree on above-below order
– Left endpoint: Insert segment
– Right endpoint: Delete segment
q
• Query q answered by successor query on B-tree at time qx
– O( N B) space
– O(log B N T B) query
36
Static Solution
• Note: Not all segments comparable!
– Have to be careful about what we compare
q
• Problem: Routing elements in internal nodes of leaf oriented B-trees
– Luckily we can modify persistent B-tree to use regular (live)
elements as routing elements
• However, buffer technique construction cannot be used
• Only O( N log B N ) I/O construction algorithm
• Cannot be made dynamic using logarithmic method
37
References
• External Memory Geometric Data Structures
Lecture notes by Lars Arge.
– Section 1-4
• I/O-efficient Point Location using Persistent B-trees
– Lars Arge, Andrew Danner and Sha-Mayn Teh
38