Transcript Slides

Dependency Problems for
Future Database Systems:
Ingres Project D
Adrian Hudnott
Warwick Postgraduate Colloquium in Computer
Science 2008
Research Problems
•
The Third Manifesto is the subtitle of a book written by Chris Date and
Hugh Darwen that defines what a relational database system should and
should not do
•
The Third Manifesto and related publications do not include any
implementation techniques
•
There is no scalable implementation of any part of The Manifesto
currently in existence
•
There are concerns that some of the demands of The Manifesto cannot
be met using extensions of techniques in use by SQL DBMS
•
When these were catalogued all of the difficult ones had a common
theme: constraints
•
All the problems involve the DBMS efficiently deducing a fact using
known properties about the data source
•
Little or no access to the actual data
Research Problems
• Constraint Enforcement
• Semantic Query Optimization
• Multiple Simultaneous Assignment
• Updates to Views
• Interface with SQL Databases
• Determination of Type (compile- and runtime)
Database Representation
• Use Datalog (like Prolog)
• E.g.
supplier(s1,
supplier(s2,
supplier(s3,
supplier(s4,
supplier(s5,
smith, 20, london).
jones, 10, paris).
blake, 30, paris).
clarke, 20, london).
adams, 30, athens).
Views
• Expressed using Datalog rules
• E.g.
supplier_city(C) :supplier(SN, N, ST, C).
Notation
• Extensional Database: EDB
• Intensional Database: IDB
• Statistics: Stat
• Constraints (invariant): Inv
• Assignments: Ri := Ui
• Queries: Qi(t)
• Boolean expressions: Bi(t)
• Cost function: c(exp)
Constraint Enforcement


 



 
Inv  IDB U / R  Stat  EDB
 
(Prove Safety)
 Inv U / R
Or, alternatively:
 
Inv  IDB U / R  Stat  EDB
 
 ¬Inv U / R (Prove Failure)

N.B. Minimize access to the database (EDB)
Updates in the Presence of Views
 
IDB U / R


• Example insertion:
supplier(s6, carter, 30, oxford).
• R is “supplier”
• U is <previous definition>
OR <tuple is the one above>
• supplier_city(C) :- supplier(SN, N, ST,
C); (SN=s6, N=carter, ST=30, C=oxford).
Materialized Views
• Materialized View: contents of a view cached
to avoid re-calculation.
• Can be modelled using constraints
CREATE TABLE supplier (City PRIMARY KEY
REFERENCES supplier_city,...
CREATE TABLE supplier_city (City PRIMARY KEY
REFERENCES supplier);
• And vice versa: A constraint is a view that
is always empty.
Multiple Assignment: Naive
Implementation
• Example: X := Y, Y := X;
• Internal representation:
atomic {
Z := X;
X := Y;
Y := Z;
}
Multiple Assignment
.... R’:= U’, R1 := U1, …, Rn := Un;
Stat  EDB  IDB U / R U ' U / Rt 
 Stat  EDB  IDB  U ' t 
IIF U’ is the first assignment:
Inv (
Stat  EDB  IDB U / R U ' U / Rt 
 Stat  EDB  IDB  U ' t )
Multiple Assignment Idea
1)
Semantically optimize RHS expressions
2)
Place RHS expressions into canonical form and find
subexpressions equal to other RHS expressions
3)
Build a dependency graph
4)
Depth first search to find cycles
5)
Break cycles by creating simulated copies
6)
Schedule and execute assignments
Transaction Repair
• Transaction repair is modifying an update
request so that it conforms with the
constraints
• Compensating actions
– E.g. ON DELETE CASCADE
• Research on more advanced repairs
– Generates more than one repair
View Update =
Transaction Repair + Disambiguation
DELETE FROM supplier_city WHERE City = ‘London’;
Transaction repair needed:
DELETE FROM supplier WHERE city=‘London’;
For more complex cases, disambiguation is needed
– Default values, rules, etc.
Semantic Optimization
Inv  IDB  Stat
 Q1 t   Q2 t 
(Equivalence)
and:
CQ1  < CQ2 
No access to the data
(Lower Cost)
Missing Information
• NULLs are flawed for many reasons
– Simple example: B OR NOT B is NULL if B is NULL
• Can consider missing data as free variables
• Problem is to efficiently determine if a formula is true
for all possible values of the free variable
Inv  IDB  Stat  EDB Q(t )  B1 (t )  B2 (t ) 
and x  dom( a ) : B2 (t )  B2 (t[ x / a ])
Current Progress
• Catalogued all known issues requiring investigation
• Outlined solutions or literature for the lesser problems
– E.g. comparison of large values
• Devised formal semantics for type checking Tutorial
D programs
• Elaborated on the problems requiring original
research in the context of dependency
• Resolved perceived ambiguities in The Manifesto
• Detailed ideas for future development into a solution
to the multiple assignment problem
• Completed a literature review on constraint checking
techniques
Project D Roadmap
Further literature review & code
familiarization
Address multiple assignment problem
Summer 2008
Oct 2008 – Jan 2009
Development Phase 1
Feb – Aug 2009
Development Phase 2
Aug 2009 – Mar 2010
Thesis write-up
Apr – Sep 2010
First complete build
Mid – Late 2011
Product released
Early 2012
Summing up Project D
• Essentially about relieving the industry of the burdens
that have been imposed by SQL implementations in
the past 30 years
• We think that the relational model can be properly
implemented without SQL’s restrictions and ad hoc
additions
– … but requires innovative research
• Project D is still in very early development
• However, some small successes already