Transcript Containment

Containment
CSE 590 DB
Rachel Pottinger
Outline
Introduction
Motivation
Formal definition
Algorithms for different complexities
An application: rewriting queries using
views
Containment, what is it?
For two queries, Q1 and Q2, if all of the
answers to Q1 are a subset of those for Q2
for all databases, then Q1 is contained in
Q2.
Denoted as Q1  Q2.
For general datalog, this is undecidable
(by reduction from decision problems for
context free languages)
Why should I care?
Containment is useful in a number of
situations, including:
Query minimization
Independence of queries using updates
Rewriting queries using views
Interesting logic problem
More definitions
Equivalence of queries: Q1Q2 if they
return the same answers for all
databases. This is the same as Q1  Q2
and Q2  Q1
Conjunctive query - a query that is
formed only of conjunctions of predicates.
Q(X,Y):- e(X,Z),e(Z,Y)
Containment Mapping
Let Q1 and Q2 be two conjunctive queries
Q1: I :- J1, …, Jl
Q2: H :- G1, …, Gk
A symbol mapping h is said to be a
containment mapping if h turns Q2 into
Q1; that is, h(H)= I, and for each i =
1,2,…,k, there is some j such that
h(Gi)=Jj. There is no requirement that
each Jj be the target of some Gi
Proof Sketch
If there’s a containment mapping from Q2
to Q1, then Q1  Q2
Suppose  maps Vars(Q2)Vars(Q1)
Let D be a database and  be an answer
 is a mapping from Vars(Q1) D
•  Vars(Q2) D
The rest of the proof follows later
Example of homomorphism
rules
Q1: fp(X,Y) :- e(Y,X), e(X,Z)
Q2: fp(A,B) :- e(B,A), e(C,A),e(A,D)
For Q1  Q2, map from Q2 to Q1
Test for containment of a
conjunctive query (Q1 Q2)
 Freeze the body of Q1, and put this into a
canonical database
 Apply Q2 to the canonical database
 If Q1 can be derived from Q2 on the
canonical database, then Q1  Q2,
otherwise not
A chilling example
Q1: p(X,Z) :- a(X,Y), a(Y,Z)
Q2: p(X,Z) :- a(X,U), a(V,Z)
Canonical Database of Q1
Proof continued
If Q1  Q2,then there is a containment
mapping
Since Q1  Q2, we know that if we apply Q2
to the canonical database formed from Q1,
we’ll get back the same fact we got from
applying it to Q1, which makes a mapping
from Q2 to Q1.
Conjunctive queries with
negation
 Negation in the heads of the subgoals, ie: Q(X,Y):e(X,Z),e(Z,Y)
 The Levy and Sagiv test looks at an exponential number
of canonical databases, thus is P2 complete
 Consider all partitions of Q1; form canonical databases for all
of them, D1, … Dk
 For each database Di, see if the database makes all subgoals
of Q1 true.
 For all Di’s passing step 2, see if it the head of Q1 can be
derived by applying Q2
 If so, then Q1  Q2, else not
A negative example
Q1: p(X,Z):-a(X,Y), a(Y,Z), a(X,Z)
Q2: p(A,C):-a(A,B),a(B,C), a(A,D)
Conjunctive Queries with
Arithmetic Comparisons
Q(X,Y):-e(X,Z),e(Z,Y), Z < Y
Treat the same as the negated subgoals,
only a check must be made for each
ordering of each partition
Also P2 complete for dense domain such
as reals
Example with arithmetic
comparisons
Q1:p(X,Z):-a(X,Y), a(Y,Z), X < Y
Q2:p(A,C):-A(A,B),A(B,C), A < C
false, see x = z = 0, y = 1
Other complexity results
  queries restricted to queries Q1 and Q2 such that all
database predicates have arity at most 2 and every
database predicate occurs at most three times in the
body of Q1 - P2
 Conjunctive queries where Q1 is fixed- NP complete
 Conjunctive queries where Q2 is fixed - polynomial
 Conjunctive query containment where Q2 is an acyclic
query - polynomial time
 Conjunctive queries where every database predicate
occurs at most twice in the body of Q1 - linear time
Rewriting Queries Using
Views
Useful in query optimization
Good for query minimization
Needed to make the best use of cached
information
Necessary in data integration
Views
A view is a relation that is not part of the
conceptual model, but is visible to the
user.
Useful for common expressions, or
protecting data
Example: If you had faculty(name, office,
ssn) you may want students to access
faculty_office(name, office)
Views (con’t.)
Views can be either materialized or virtual
In data integration, data sources can be
thought of as views
An example of rewriting
queries using views
Suppose you had two databases:
One has famous people and whether they
are right or left handed
One has the birthdays of famous people
You want the birthdays of all of the lefties
Containment in rewriting
Query of q(X):-e(X,Y), e(Y,X)
View of v(A,B):- e(A,C),e(C,B)
A more complicated
example
Q(x,u):-p(x,y),p0(y,z),p1(x,w),p2(w,u)
V1(a,b):-p(a,c),p0(c,b),p1(a,d)
V2(a,b):-p1(a,b)
V3(a,b):-p2(a,b)