Transcript Containment
Containment
CSE 590 DB
Rachel Pottinger
Outline
Introduction
Motivation
Formal definition
Algorithms for different complexities
An application: rewriting queries using
views
Containment, what is it?
For two queries, Q1 and Q2, if all of the
answers to Q1 are a subset of those for Q2
for all databases, then Q1 is contained in
Q2.
Denoted as Q1 Q2.
For general datalog, this is undecidable
(by reduction from decision problems for
context free languages)
Why should I care?
Containment is useful in a number of
situations, including:
Query minimization
Independence of queries using updates
Rewriting queries using views
Interesting logic problem
More definitions
Equivalence of queries: Q1Q2 if they
return the same answers for all
databases. This is the same as Q1 Q2
and Q2 Q1
Conjunctive query - a query that is
formed only of conjunctions of predicates.
Q(X,Y):- e(X,Z),e(Z,Y)
Containment Mapping
Let Q1 and Q2 be two conjunctive queries
Q1: I :- J1, …, Jl
Q2: H :- G1, …, Gk
A symbol mapping h is said to be a
containment mapping if h turns Q2 into
Q1; that is, h(H)= I, and for each i =
1,2,…,k, there is some j such that
h(Gi)=Jj. There is no requirement that
each Jj be the target of some Gi
Proof Sketch
If there’s a containment mapping from Q2
to Q1, then Q1 Q2
Suppose maps Vars(Q2)Vars(Q1)
Let D be a database and be an answer
is a mapping from Vars(Q1) D
• Vars(Q2) D
The rest of the proof follows later
Example of homomorphism
rules
Q1: fp(X,Y) :- e(Y,X), e(X,Z)
Q2: fp(A,B) :- e(B,A), e(C,A),e(A,D)
For Q1 Q2, map from Q2 to Q1
Test for containment of a
conjunctive query (Q1 Q2)
Freeze the body of Q1, and put this into a
canonical database
Apply Q2 to the canonical database
If Q1 can be derived from Q2 on the
canonical database, then Q1 Q2,
otherwise not
A chilling example
Q1: p(X,Z) :- a(X,Y), a(Y,Z)
Q2: p(X,Z) :- a(X,U), a(V,Z)
Canonical Database of Q1
Proof continued
If Q1 Q2,then there is a containment
mapping
Since Q1 Q2, we know that if we apply Q2
to the canonical database formed from Q1,
we’ll get back the same fact we got from
applying it to Q1, which makes a mapping
from Q2 to Q1.
Conjunctive queries with
negation
Negation in the heads of the subgoals, ie: Q(X,Y):e(X,Z),e(Z,Y)
The Levy and Sagiv test looks at an exponential number
of canonical databases, thus is P2 complete
Consider all partitions of Q1; form canonical databases for all
of them, D1, … Dk
For each database Di, see if the database makes all subgoals
of Q1 true.
For all Di’s passing step 2, see if it the head of Q1 can be
derived by applying Q2
If so, then Q1 Q2, else not
A negative example
Q1: p(X,Z):-a(X,Y), a(Y,Z), a(X,Z)
Q2: p(A,C):-a(A,B),a(B,C), a(A,D)
Conjunctive Queries with
Arithmetic Comparisons
Q(X,Y):-e(X,Z),e(Z,Y), Z < Y
Treat the same as the negated subgoals,
only a check must be made for each
ordering of each partition
Also P2 complete for dense domain such
as reals
Example with arithmetic
comparisons
Q1:p(X,Z):-a(X,Y), a(Y,Z), X < Y
Q2:p(A,C):-A(A,B),A(B,C), A < C
false, see x = z = 0, y = 1
Other complexity results
queries restricted to queries Q1 and Q2 such that all
database predicates have arity at most 2 and every
database predicate occurs at most three times in the
body of Q1 - P2
Conjunctive queries where Q1 is fixed- NP complete
Conjunctive queries where Q2 is fixed - polynomial
Conjunctive query containment where Q2 is an acyclic
query - polynomial time
Conjunctive queries where every database predicate
occurs at most twice in the body of Q1 - linear time
Rewriting Queries Using
Views
Useful in query optimization
Good for query minimization
Needed to make the best use of cached
information
Necessary in data integration
Views
A view is a relation that is not part of the
conceptual model, but is visible to the
user.
Useful for common expressions, or
protecting data
Example: If you had faculty(name, office,
ssn) you may want students to access
faculty_office(name, office)
Views (con’t.)
Views can be either materialized or virtual
In data integration, data sources can be
thought of as views
An example of rewriting
queries using views
Suppose you had two databases:
One has famous people and whether they
are right or left handed
One has the birthdays of famous people
You want the birthdays of all of the lefties
Containment in rewriting
Query of q(X):-e(X,Y), e(Y,X)
View of v(A,B):- e(A,C),e(C,B)
A more complicated
example
Q(x,u):-p(x,y),p0(y,z),p1(x,w),p2(w,u)
V1(a,b):-p(a,c),p0(c,b),p1(a,d)
V2(a,b):-p1(a,b)
V3(a,b):-p2(a,b)