Bitemporal Student Information System

Download Report

Transcript Bitemporal Student Information System

Bitemporal Student
Information System
Chris Hance
Advisor: Dr. Farimah Fleschute
About Chris
• Sr. Programmer for Polk County School
Board, since April 2006
• Microsoft Certified Database Administrator
• Former USF IT Program Tutor
• Absolutely no blog, MySpace, or
Facebook
What is this?
• The School Board maintains a Student
Information System.
• Need student data for multiple school
years, and records of who made changes
for auditing.
• Dr. Richard Snodgrass’ Developing TimeOriented Database Applications in SQL
appeared to be a solution.
What is this?
• Project: Implement a Student Information
System database, using Snodgrass’
Bitemporal model.
• Data elements intentionally reduced.
Entities need to exist, but can be sparse.
• Notes and code available at:
http://incautio.us/seniorproject/
Why might I care?
• “Increasingly, businesses need real-time
access to historical data.”
• “For those IT managers who think that
modeling and database development are
commodity skills, the management of
temporal data is the strongest disproof
possible.”
• -Tom Johnston, Randall Weis, DM Review
Magazine
Topics
• Bitemporal Definition
• Domain Background (Student Data)
• Bitemporal Implementation
– Primary Key Constraints
– Foreign Key Constraints
• Conclusions
Bitemporal Definition
• Most databases are “Snapshot”
• Values right now
ID
First
100 John
101 Jane
Last
DOB
Smith 1/9/2003
Doe 7/1/1990
102 Mary Sue
103 Arthur Dent
School
Floral Ave
Bartow
High
6/5/1995 Westwood
11/1/1994 Kathleen
Grade
KG
12
7
8
“Actual” Time
• When a change occurred
• A student’s grade and/or school may change
every year:
ID
ActualBegin ActualEnd School
100 8/1/2004
6/5/2005
100 8/1/2005
100 8/1/2006
100 8/1/2007
6/5/2006
6/5/2007
6/5/2008
Kathleen
Middle
Bartow High
Bartow High
Bartow High
Grade
08
09
10
11
“Actual” Time
• Snapshots don’t include history.
• Can’t ask questions like:
“What schools did this student attend?”
“Were there any gaps in enrollment?”
“Was the student retained (held back)?”
• Actual-time history answers these questions.
ID
ActualBegin ActualEnd School
100 8/1/2004
6/5/2005
100 8/1/2007
6/5/2008
Grade
Kathleen
08
Middle
Bartow High 11
“Record” Time
• When a change occurred
• A person cannot have multiple dates of birth,
but the system may have the wrong date.
• What if we made a decision, or printed a report
based on invalid data?
ID
RecordBegin
RecordEnd
DOB
100
100
8/1/2004
3/21/2008
3/21/2008
“forever”
1/9/2003
9/1/2003
“Record” Time
• Allows auditing:
Q: “Why did we enroll a four-year-old in
Kindergarten?”
A: “Because we had the month & day of birth
switched.”
ID
RecordBegin
RecordEnd
DOB
100
100
8/1/2004
3/21/2008
3/21/2008
“forever”
1/9/2003
9/1/2003
“Record” Time
• What’s “forever”?
• don’t use today for RecordEnd.
– have to update it every day.
• don’t pick a value in the near future.
• Nulls make comparisons difficult.
• Most DBs support 12/31/9999; close enough.
ID
RecordBegin
RecordEnd
DOB
100
100
8/1/2004
3/21/2008
3/21/2008
12/31/9999
1/9/2003
9/1/2003
Bitemporal Tables
Two “dimensions” of time:
• When did it really change?
• When did we record the change?
– Mistyped grade level on 8/1/2005
– Corrected on 9/15/2005
– Two Record time rows for the same Actual time
ID
RecordBegin RecordEnd ActualBegin ActualEnd Grade
100 8/1/2004
12/31/9999 8/1/2004
6/5/2005
08
100 8/1/2005
9/15/2005
8/1/2005
6/5/2006
10
100 9/15/2005
12/31/9999 8/1/2005
6/5/2006
09
100 8/1/2006
12/31/9999 8/1/2006
6/5/2006
10
Bitemporal Database
• Every entity models Actual and/or Record
time.
• Some entities may not seem to require
both.
– even name and SSN can change.
– Marriage, legal name change, citizenship.
• Assumed all entities used both temporal
dimensions for this project.
Topics
• Bitemporal Definition
• Domain Background (Student Data)
• Bitemporal Implementation
– Primary Key Constraints
– Foreign Key Constraints
• Conclusions
Domain Background
• Tables to represent
– student enrollment
– schedules
– attendance
• Grades were omitted to reduce complexity.
School
• Names change occasionally, and schools
open and close.
– Identifier
– Current Name
– Date Opened
– Date Closed
Calendar / Term
• Determine days and hours of operation.
Some schools have multiple calendars
(e.g. year-round). Calendars may be used
by multiple schools.
– Identifier
– School Year
– Start Date
– End Date
SchoolCalendar
• Many-to-many relationship between
school and calendar.
– Identifier
– School Calendar ID
– School ID
Student
• A person enrolled in a school and/or
scheduled course.
– Student ID
– First Name
– Last Name
– Date of Birth
SchoolEnrollment
• A student registers at a particular school,
and must be assigned to a calendar.
– Identifier
– School Calendar ID
– Student ID
– Start Date
– End Date
Course
• State defines a list of courses and titles.
– Identifier (Local)
– Name
– Description
– DOE Course Number
AdoptedCourse
• A school includes a state-defined course in
the school’s course catalog.
• Elementary schools shouldn’t see
Precalculus when selecting courses to
schedule.
– Identifier
– School ID
– Course ID
ScheduledCourse
• One section of a course.
– Identifier
– School Calendar (or Term) ID
– Adopted Course ID
– Start Date / Time
– End Date / Time
– Seats
CourseEnrollment
• Student enrollment in a Scheduled Course
(section). A student may enroll late or drop
early.
– Identifier
– Scheduled Course ID
– Student ID
– Start Date
– End Date
CourseAttendance
• Records of student absences and tardies
in a scheduled course section.
– Course Enrollment ID
– Date
– Attendance Code
– Note
Domain Summary
• Representative of a Student Information
System (SIS).
• Often used synthetic keys for convenience
in joins.
– Bitemporal joins are difficult enough
– a common pattern reduces error
• Not a perfect model.
– would be a project in itself.
Topics
• Bitemporal Definition
• Domain Background (Student Data)
• Bitemporal Implementation
– Primary Key Constraints
– Foreign Key Constraints
• Conclusions
Bitemporal Implementation
• First, add 4 columns to every table:
– RecordBegin
– RecordEnd
– ActualBegin
– ActualEnd
• ActualBegin & ActualEnd may replace an
existing StartDate & EndDate pair.
Record Begin / End
• RecordBegin: date on which a bitemporal
row was entered in the database.
– Usually GetDate() / CURRENT_DATE.
• RecordEnd: date on which a bitemporal
row is corrected or deleted.
– Starts at 12/31/9999.
– Replaced with GetDate() on correction /
deletion.
Actual Begin / End
• ActualBegin: date on which a bitemporal
row is effective.
– Date of birth, date a school opened, date of
legal name change for new name.
• ActualEnd: date on which a bitemporal row
is no longer effective.
– Death, date school closed, date of legal name
change for old name.
– Still correct for the ActualBegin to ActualEnd
range.
Bitemporal Implementation
• Either Record or Actual time may overlap
• If both overlap, we don’t know which record is
correct and effective.
– Was this student in 10th or 9th grade from 8/1/2005 to
6/5/2006?
– If you ask after 9/15/2005, it’s not clear.
ID
RecordBegin RecordEnd ActualBegin ActualEnd Grade
100 8/1/2004
12/31/9999 8/1/2004
6/5/2005
08
100 8/1/2005
12/31/9999 8/1/2005
6/5/2006
10
100 9/15/2005
12/31/9999 8/1/2005
6/5/2006
09
100 8/1/2006
12/31/9999 8/1/2006
6/5/2006
10
Bitemporal Implementation
Also:
• A student cannot be enrolled before Date of
Birth or after Death.
• A course cannot be scheduled after a school is
closed, or without a calendar.
• These are Integrity Constraints.
– Primary Key
– Foreign Key
Topics
• Bitemporal Definition
• Domain Background (Student Data)
• Bitemporal Implementation
– Primary Key Constraints
– Foreign Key Constraints
• Conclusions
Bitemporal Primary Keys
• Built-in primary keys only check if each
column/attribute matches another row.
• Can’t use StudentID, because there can be
multiple records for different Record time
ranges.
• Can’t use StudentID and all dates, because we
could add 9/1/2008 to 12/31/9999, then add
9/2/2008 to 12/31/9999 for the same ID.
– Per-column uniqueness not violated
– Temporal uniqueness definitely violated
Overlapping Time
s1.Begin
s1.End
s1
Overlap
s2
s2.Begin
s1.Begin
s2.End
s1.End
s1
Overlap
s2
s2.Begin
s1.Begin < s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End < s2.End
s2.End
s1.Begin > s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End > s2.End
Overlapping Time (2)
s1.Begin
s1.End
s1
Overlap
s2
s2.Begin
s1.Begin
s2.End
s1.End
s1
Overlap
s2
s2.Begin
s1.Begin < s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End > s2.End
s2.End
s1.Begin > s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End < s2.End
Overlapping Time (3)
s1.Begin < s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End < s2.End
s1.Begin < s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End > s2.End
s1.Begin > s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End > s2.End
s1.Begin > s2.Begin
s1.Begin < s2.End
s1.End > s2.Begin
s1.End < s2.End
• Overlap occurs when:
– s1.Begin < s2.End
– s1.End > s2.Begin, rewrite as s2.Begin < s1.End
• Comparing s1.Begin to s2.Begin, or s1.End to
s2.End is irrelevant.
Bitemporal Primary Keys
• Violated if two rows in the same table match on
Identifier, and overlap on both time dimensions:
Select null
From Student s1
Join Student s2
On s1.StudentID = s2.StudentID
And s1.RecordBegin < s2.RecordEnd
And s2.RecordBegin < s1.RecordEnd
And s1.ActualBegin < s2.ActualEnd
And s2.ActualBegin < s1.ActualEnd
• Exception: the “two” records are really the same
record, since we’re joining the table to itself.
Bitemporal Primary Keys
If Exists (
Select null
From Student s1
Join Student s2
On s1.StudentID = s2.StudentID
And s1.RecordBegin < s2.RecordEnd
And s2.RecordBegin < s1.RecordEnd
And s1.ActualBegin < s2.ActualEnd
And s2.ActualBegin < s1.ActualEnd
And Not (
s1.RecordBegin = s2.RecordBegin
And s1.RecordEnd = s2.RecordEnd
And s1.ActualBegin = s2.ActualBegin
And s1.ActualEnd = s2.ActualEnd
)
)
Triggers as Primary Keys
• If a match exists to the query, then the data is
already in the database.
• How do we know which one is the original, and
which to delete?
• Much easier to use a database trigger.
• Triggers are queries that run before, after, or
instead of an insert, update, and/or delete.
• After is easiest to implement, and can roll back
(undo) the change.
Triggers as Primary Keys
CREATE TRIGGER [dbo].[trStudentTemporalUniqueStudentID]
ON [dbo].[Student]
AFTER INSERT, UPDATE
AS
Begin
If Exists (
<Query goes here>
)
Begin
RAISERROR('Row overlaps an existing Student row for the same
Record and Actual time period.',16,1);
/*Note: this rolls back ALL open transactions. If stored procs are the only
access mechanism, we may want to raise the error and let the stored
proc manage transaction state*/
Rollback Transaction
End
End
What about equality?
• Trigger doesn’t stop duplicate inserts.
• Can’t tell whether it’s a duplicate or the same
record.
• Just use a normal unique index.
• The database can prevent exact duplicates.
Alter Table Student
Add Constraint ixStudentTemporalUnique
Unique Clustered (RecordEnd Desc, ActualEnd Desc, StudentID,
RecordBegin, ActualBegin, Sequence)
Topics
• Bitemporal Definition
• Domain Background (Student Data)
• Bitemporal Implementation
– Primary Key Constraints
– Foreign Key Constraints
• Conclusions
Bitemporal Foreign Keys
Nontemporal Foreign Keys
• Every row in the child table must have exactly
one corresponding record in the parent table.
Bitemporal Foreign Keys
• For each instant in Record and Actual time of
the child table, exactly one corresponding row
must exist in the parent table.
Bitemporal Foreign Keys
Nontemporal:
• An enrollment requires a student.
Bitemporal:
• A “current” Record-time enrollment (now to
12/31/9999) for a given Actual-time range
(8/1/2008 to 6/5/2009) must have student:
– Record-time rows from Now to 12/31/9999.
– Actual-time rows for every record time in that range,
covering the range from 8/1/2008 to 6/5/2009,
Nontemporal Foreign Key Example
Student
StudentID FirstName LastName
100
101
John
Jane
Smith
Doe
Enrollment
StudentID
100
100
102
SchoolYear
20072008
20082009
20072008
StartDate
8/1/2007
8/1/2008
8/1/2008
EndDate
6/5/2008
6/5/2009
6/5/2009
Valid?
Yes
Yes
No.
StudentID
102 does
not exist.
Bitemporal Foreign Key Example
Student
StudentID Record
Actual
FirstName LastName
100
9/1/2007
6/1/2005
John
12/31/9999 12/31/9999
Smith
101
8/1/2004
11/15/2008 Jane
12/31/9999 12/31/9999
Doe
Enrollment
StudentID Record
Actual
Valid?
100
8/1/2007
8/1/2007
12/31/9999 6/5/2008
No. Student was not recorded
in database until 9/1/2007.
100
9/1/2007
8/1/2004
12/31/9999 6/5/2005
No. Student ActualBegin is
DOB. Student not born on
enrollment date.
100
9/1/2007
8/1/2006
12/31/9999 6/5/2007
Yes. Student rows exist for
Record and Valid time range.
Bitemporal Foreign Keys
• Now we see what they prevent.
• Like Primary Keys, we can’t use the built-in
version.
• What if we try triggers again?
• Let’s use school as the example…
Bitemporal Foreign Key Example
Student
StudentID Record
100
Actual
FirstName LastName
9/1/2007
6/1/2005
John
12/31/9999 12/31/9999
Smith
Enrollment
StudentID Record
100
Actual
9/1/2007
8/1/2006
12/31/9999 6/5/2007
Valid?
Yes. Student rows exist for
Record and Valid time range.
• We find out the user mistyped the date of birth.
• Student.ActualBegin should be 1/6/2005.
Bitemporal Foreign Key Example
Student
StudentID Record
Actual
FirstName LastName
100
9/1/2007
9/7/2007
6/1/2005
John
12/31/9999
Smith
100
9/7/2007
1/6/2005
John
12/31/9999 12/31/9999
Smith
Enrollment
StudentID Record
100
Actual
9/1/2007
8/1/2006
12/31/9999 6/5/2007
Valid?
Yes. Student rows exist for
Record and Valid time range.
This is what we want:
• First, update the existing RecordEnd.
• Then, insert the second row.
Bitemporal Foreign Key Example
Student
StudentID Record
Actual
100
6/1/2005
John
12/31/9999
9/1/2007
9/7/2007
FirstName LastName
Smith
Enrollment
StudentID Record
100
Actual
9/1/2007
8/1/2006
12/31/9999 6/5/2007
Valid?
Yes. Student rows exist for
Record and Valid time range.
Problem:
• The update breaks the bitemporal foreign key.
• If we try insert first, the insert breaks the
bitemporal primary key.
Bitemporal Foreign Keys
Workarounds:
• “Deferred” Constraint Checking: only run
the trigger on transaction commit.
– not supported in MS SQL Server.
• No Constraint Checking
– Data WILL be entered incorrectly.
• Control all data entry.
– Stored Procedures
Foreign Key Management
in Stored Procedures
• Restrict table access to db_owner role.
• Create Add, Update, and Delete Stored
Procedures.
• Give stored procedures permissions of
db_owner role.
• Give users permissions to run stored
procedures.
• Users edit data through procedures.
Foreign Key Management
in Stored Procedures
• Restrict table access to db_owner role.
• Create Add, Update, and Delete Stored
Procedures.
• Give stored procedures permissions of
db_owner role.
• Give users permissions to run stored
procedures.
• Users edit data through procedures.
Foreign Key Checks
1. a parent record exists with an ActualBegin
before or on the child record's ActualBegin,
and an ActualEnd after the child record's
ActualBegin.
2. a parent record exists with an ActualEnd on or
after the child record's ActualEnd, and an
ActualBegin before the child record's
ActualEnd.
3. no parent record with an ActualEnd between
the child record's ActualBegin and ActualEnd,
exists without another parent record having an
ActualBegin matching its ActualEnd.
Foreign Key Checks
1.
a parent record exists with an ActualBegin before or on
the child record's ActualBegin, and an ActualEnd after
the child record's ActualBegin.
Select null
From child
Where child.RecordEnd = '9999-12-31'
And Not Exists (
Select Null
From parent
Where parent.key = child.key
And parent.RecordEnd = '9999-12-31'
And parent.ActualBegin <= child.ActualBegin
And parent.ActualEnd > child.ActualBegin
)
Foreign Key Checks
2.
a parent record exists with an ActualEnd on or after the
child record's ActualEnd, and an ActualBegin before
the child record's ActualEnd.
Select null
From child
Where child.RecordEnd = '9999-12-31'
And Not Exists (
Select Null
From parent
Where parent.key = child.key
And parent.RecordEnd = '9999-12-31'
And parent.ActualBegin < child.ActualEnd
And parent.ActualEnd >= child.ActualEnd
)
Foreign Key Checks
3. no parent
record with an
ActualEnd
between the
child record's
ActualBegin and
ActualEnd,
exists without
another parent
record having an
ActualBegin
matching its
ActualEnd.
Select null
From child
Where child.RecordEnd = '9999-12-31'
And Exists (
Select Null
From parent
Where parent.key = child.key
And parent.RecordEnd = '9999-12-31'
And parent.ActualEnd > child.ActualBegin
And parent.ActualEnd < child.ActualEnd
And Not Exists (
Select null
From parent parent2
Where parent2.key = parent.key
And parent2.RecordEnd = '9999-12-31'
And parent2.ActualBegin = parent.ActualEnd
)
)
Foreign Key Management
in Stored Procedures
• Each procedure on a parent table checks
all 3 conditions on its child tables:
– After inserts and updates
– Rolls back transaction if any check is violated
• Each procedure on a child table checks all
3 conditions on its parent table.
– Slightly simpler, since there is only one
ActualBegin and ActualEnd.
– Before update, and abort if violated.
Foreign Key Management
in Stored Procedures
• 4000 lines of SQL for the database
structure of this project.
• Still does not cover every unique
constraint that should be in the database.
• Requires huge maintenance effort for any
schema changes.
• Can be bypassed by a DBA, so not 100%
safe for audit purposes.
Topics
• Bitemporal Definition
• Domain Background (Student Data)
• Bitemporal Implementation
– Primary Key Constraints
– Foreign Key Constraints
• Conclusions
Conclusions
• Wait until the database vendors build it
into the engine, or use a programming
framework.
– Oracle Flashback
– Hibernate Envers
– CLR Code in SQL 2008 (includes interval data
types, and separate DATE/TIME types)