Time in Databases

Download Report

Transcript Time in Databases

Time in Databases
CSCI 6442
With thanks to Richard Snodgrass, 1985 ACM 0-89791-160-1/85/005/0236
Agenda
• Taxonomy of time in databases
• Point-in-time databases
2
The Problem
• Suppose you are away from your company for 6 months
assigned to work at an outpost in Australia, in the opal mines
in Coober Pedy, underground, out of communication.
• You are eligible for a raise in the midst of this. When you
return they notice this.
• Now it is approved and entered into the database within a few
days, but you get the extra money from 3 months ago
• Which times get recorded in the database?
3
Types of Time
• Transaction time—time when the information was stored in
the database
• Valid time—the time that the stored information models
reality
• User-defined time—a time that is important to and is defined
by the application
4
Static Databases
• A conventional database models the current state of a
dynamically changing world
• When an update is made, the past values are forgotten
completely
• This sort of database is called a static database
• It can’t tell you what Jones’s salary was two years ago or how
many raises Jones has had in the past four years
5
Static Rollback Databases
• We can provide the capability to roll back a database to a
previous state, such as by using a log
• The effect of recent transactions is removed, until the desired
state is reached
• Then the intended query can be posed
• Past values cannot be changed; only operations on current
values are permitted, because this preserves the series of past
operations
• Transaction time must be recorded for every change, to
support rollback
• Current operations must be suspended while rollback
operations are under way
6
Historical Databases
• Historical databases record the history of the values of data as
it is best known
• As errors are discovered, they are corrected by changing the
database
• Previous states are not retained; at any time, the database
reflects the best-known history of the data
• Time represented in historical database is valid time, the time
that the stored information models
7
Temporal Databases
• A temporal database allows one to view tuples valid at one
moment as seen from another moment
• Such a database completely captures the history of current
and retroactive changes
8
Snodgrass
• Wrote “A Taxonomy of Time in Databases” (1985)
• Regarded as the authoritative work
• Defines three types of time
9
Types of Time
• There are three times:
• Transaction time—when the change is made to the database
• Valid time—when the change was approved
• User-defined time—when the raise is effective (applicationspecific)
10
Static Database
• Database that reflects changing values with a snapshot at any
particular time
• Does not necessarily reflect the state of the real world
• We discard past states of the database
• Can’t answer questions about what past values were
• Can’t show trends in change over time
11
Static Rollback Database
•
•
•
•
Adds some time sequence
Ability to back out transactions is provided
Can back out one at a time to previous states of the database
Such a database shows the history of transactions rather than
the history of the actual data
12
Snodgrass’s Taxonomy
• Is a useful way to talk about time in databases
• His levels of databases are theoretical, a basis for research
• What if we want to model current and past states of a single
table in a SQL database system?
13
Point in Time Databases
• We can model current and former values of data in a single
relational table
• We need to add time to the data model so that changes are
captured
• It is possible to use a single table to capture current and
former values of data in a single relation
• As performance limitations of databases disappear because of
increasing machine power, this becomes practical
• In fact, this approach is used in some advanced applications
• However, it is not widely known
14
Point In Time Databases
• A single database models the present and all past states of the
data as it matters to the organization
• Enough information about past states is stored so that the
past state can be obtained using a SQL query
• Instead of deleting rows they are marked as no longer valid
• When values are updated, the old values are kept as well
• A point in time table stores present values as well as all of the
past values—ever
• Is that practical? In 1970 it mostly wasn’t.
• Is it practical today?
15
What We Store
• With each change to a row, we insert a new row, instead of
changing the previous row
• Consider EMP(EMPNO, ENAME, SAL)
• We add the attributes START and END
• EMP(EMPNO, ENAME, SAL, START, END)
• New attributes are start and end times for validity
• When we change a row, that time is the END time for validity of the
previous row, START time of validity for the new row
• These times are usually what Snodgrass calls “user-defined” time
because that’s the time that matters to the organization
• His “valid time” is just a date of transaction approval
• Transaction time is when entered into the database
• Let’s call this time “effective time” because it’s the time when
the row’s values are effective
16
Point In Time Operations
• Insert:
• New row is added
• Start time is set
• Stop time is left null
• Delete:
• Stop time is set
• Update:
• New row added with new values, start time
• Old row is kept and stop time is set
• Read:
• Current values: STOP ISNULL
• As of values: ASOF >= START AND (ASOF < STOP OR STOP ISNULL)
17
Example: Retroactive Pay
Increase
•
•
•
•
Chen is given a salary increase
The salary increase is approved at date t1
The increase is effective at date t2.
However, the increase does not get entered into the database
until some other time, t3>t2 (called a retroactive salary
increase).
• In this case, t1 is an attribute of the approval process, t2 is the
start of effective time and t3 is the transaction time.
• Most important to the organization is the time when Chen is
getting the higher pay
18
An Example: EMP
Schema Definition:
EMP(EMPNO,ENAME,JOB,SAL,DEPTNO)
Now we add START and STOP for validity dates:
EMP(EMPNO,ENAME,JOB,SAL,DEPTNO,START,STOP)
• START gives the first time of validity for the values in the row
• STOP gives the last time of validity for the values in the row
• A row with no value for STOP is the current value
EMP
ENAME JOB
SAL
DEPT#
START
30
LIU
PRES
200
1
1/1/2016
40
CHEN
VP
175
1
2/1/2016
STOP
19
Now Let’s Give the Pres a Raise
EMP
ENAME
JOB
SAL
DEPT#
START
STOP
30
LIU
PRES
200
1
1/1/2016
today
40
CHEN
VP
175
1
2/1/2016
30
LIU
PRES
250
1
today
An update now requires an update of the old row—to mark it obsolete—
and insertion of a new row with the new value.
How do we do a delete from a point in time database?
20
Example
To find the status of EMP on 3/15/2016
SELECT * FROM EMP
WHERE
START <= to_date(‘15-MAR-2016','dd-mm-yyyy')
AND
(STOP >= to_date(‘15-MAR-2016','dd-mmm-yyyy') OR STOP ISNULL);
21
Complexities
• Not every column value needs to be tracked
• Can simplify by not tracking some column values
• This is a semantic issue
• Change in primary key
• If primary key changes, hard to relate changed row to original row
• Potential case for use of generated key
• Cascading changes
• Other tables may change because of one change
• Use trigger instead of automatic foreign key update
• Cascading deletes
• Other tables may have rows deleted because of one change
• Use trigger instead of automatic foreign key delete
22
Capabilities
• These techniques allow you to implement a database that can
show its state as of any time in the past
• This can be done while the database is actively being used and
even updated by multiple processes
• One process can be looking at the database as of two years
ago while another is doing current processing
• You can record retroactive changes
• You can implement applications to process retroactive
changes
23
Bottom Line
• This is a practical approach that potentially simplifies data
models and processing
• One table replaces duplication of data and completely
different functions
• One somewhat more complex function replaces two (or more)
separate functions
• Use can be presented a unifying interface that simplifies use
of the application
• It’s a tradeoff—sometimes use a separate history table
24
Homework
• You have the opportunity to explore this
• You get to build a point-in-time EMP table
• Then you make a few changes to it
25