Time in Databases

Download Report

Transcript Time in Databases

Time in Databases
CSCI 6442
With thanks to Richard Snodgrass, 1985 ACM 0-89791-160-1/85/005/0236
Agenda
• Taxonomy of time in databases
• Point-in-time databases
2
The Problem
• Suppose you are away from your company for 6 months
assigned to work at an outpost in Australia, in the opal mines
in Coober Pedy, underground, out of communication.
• You are eligible for a raise in the midst of this. When you
return they notice this.
• Now it is approved and entered into the database within a few
days, but you get the extra money from 3 months ago
• Which times get recorded in the database?
3
Types of Time
• Transaction time—time when the information was stored in
the database
• Valid time—the time that the stored information models
reality
• User-defined time—a time that is important to and is defined
by the application
4
Static Databases
• A conventional database models the current state of a
dynamically changing world
• When an update is made, the past values are forgotten
completely
• This sort of database is called a static database
• It can’t tell you what Jones’s salary was two years ago or how
many raises Jones has had in the past four years
5
Static Rollback Databases
• We can provide the capability to roll back a database to a
previous state, such as by using a log
• The effect of recent transactions is removed, until the desired
state is reached
• Then the intended query can be posed
• Past values cannot be changed; only operations on current
values are permitted, because this preserves the series of past
operations
• Transaction time must be recorded for every change, to
support rollback
• Current operations must be suspended while rollback
operations are under way
6
Historical Databases
• Historical databases record the history of the values of data as
it is best known
• As errors are discovered, they are corrected by changing the
database
• Previous states are not retained; at any time, the database
reflects the best-known history of the data
• Time represented in historical database is valid time, the time
that the stored information models
7
Temporal Databases
• A temporal database allows one to view tuples valid at one
moment as seen from another moment
• Such a database completely captures the history of current
and retroactive changes
8
Snodgrass
• Wrote “A Taxonomy of Time in Databases” (1985)
• Regarded as the authoritative work
• Defines three types of time
9
Types of Time
• There are three times:
• Transaction time—when the change is made to the database
• Valid time—when the change was approved
• User-defined time—when the raise is effective (applicationspecific)
10
Static Database
• Database that reflects changing values with a snapshot at any
particular time
• Does not necessarily reflect the state of the real world
• We discard past states of the database
• Can’t answer questions about what past values were
• Can’t show trends in change over time
11
Static Rollback Database
• Adds some time sequence
• Ability to back out transactions is provided
• Can back out one at a time to previous states of the database
• Such a database shows the history of transactions rather than
the history of the actual data
12
Snodgrass’s Taxonomy
• Is a useful way to talk about time in databases
• His levels of databases are theoretical, a basis for research
• What if we want to model current and past states of a single
table in a SQL database system?
13
Point in Time Databases
• We can model current and former values of data in a single
relational table
• We need to add time to the data model so that changes are
captured
• It is possible to use a single table to capture current and
former values of data in a single relation
• As performance limitations of databases disappear because of
increasing machine power, this becomes practical
• In fact, this approach is used in some advanced applications
• However, it is not widely known
14
Point In Time Databases
• A single database models the present and all past states of the
data as it matters to the organization
• Enough information about past states is stored so that the
past state can be obtained using a SQL query
• Instead of deleting rows they are marked as no longer valid
• When values are updated, the old values are kept as well
• This was impractical in 1970 because of cost and efficiency
issues
• Is it practical today?
15
What We Store
• With each change to a row, we insert a new row, instead of
changing the previous row
• Consider EMP(EMPNO, ENAME, SAL)
• We add the attributes START and END
• EMP(EMPNO, ENAME, SAL, START, END)
• New attributes are start and end times for validity
• When we change a row, that time is the END time for validity of the
previous row, START time of validity for the new row
• These times are usually what Snodgrass calls “user-defined” time
because that’s the time that matters to the organization
• His “valid time” is just a date of transaction approval
• Transaction time is when entered into the database
• Let’s call this time “effective time” because it’s the time when
the row’s values are effective
16
Point In Time Operations
• Insert:
• New row is added
• Start time is set
• Stop time is left null
• Delete:
• Stop time is set
• Update:
• New row added with new values, start time
• Old row is kept and stop time is set
• Read:
• Current values: STOP ISNULL
• As of values: ASOF >= START AND (ASOF < STOP OR STOP ISNULL)
17
Example: Retroactive Pay
Increase
•
•
•
•
Chen is given a salary increase
The salary increase is approved at date t1
The increase is effective at date t2.
However, the increase does not get entered into the database
until some other time, t3>t2 (called a retroactive salary
increase).
• In this case, t1 is an attribute of the approval process, t2 is the
start of effective time and t3 is the transaction time.
• Most important to the organization is the time when Chen is
getting the higher pay
18
An Example: EMP
Schema Definition:
EMP(EMPNO,ENAME,JOB,SAL,DEPTNO)
Now we add START and STOP for validity dates:
EMP(EMPNO,ENAME,JOB,SAL,DEPTNO,START,STOP)
• START gives the first time of validity for the values in the row
• STOP gives the last time of validity for the values in the row
• A row with no value for STOP is the current value
19
Example
To find the status of EMP on 1/1/2008
SELECT *
FROM EMP
WHERE START <= to_date( ‘01-JAN2008','dd-mmm-yyyy') “1/1/2008” AND (STOP
>= to_date( '01-JAN-2008','dd-mmm-yyyy') OR
STOP ISNULL);
DELETE statements will now be UPDATEs instead
20
Complexities
• Not every column value needs to be tracked
• Can simplify by not tracking some column values
• This is a semantic issue
• Change in primary key
• If primary key changes, hard to relate changed row to original row
• Potential case for use of generated key
• Cascading changes
• Other tables may change because of one change
• Use trigger instead of automatic foreign key update
• Cascading deletes
• Other tables may have rows deleted because of one change
• Use trigger instead of automatic foreign key delete
21
Capabilities
• These techniques allow you to implement a database that can
show its state as of any time in the past
• This can be done while the database is actively being used and
even updated by multiple processes
• One process can be looking at the database as of two years
ago while another is doing current processing
• You can record retroactive changes
• You can implement applications to process retroactive
changes
22
Bottom Line
• This is a practical approach that potentially simplifies data
models and processing
• One table replaces duplication of data and completely
different functions
• One somewhat more complex function replaces two (or more)
separate functions
• Use can be presented a unifying interface that simplifies use
of the application
• It’s a tradeoff—sometimes use a separate history table
23
Homework
•
•
•
•
You have the opportunity to explore this
You get to build a point-in-time EMP table
Then you use it to give a retroactive raise
And then compute how much we owe to the person with the
retroactive raise
24