cikm2004.heqi - School of Computing

Download Report

Transcript cikm2004.heqi - School of Computing

Extending and Inferring
Functional Dependencies in
Schema Transformation
Qi He
Tok Wang Ling
Dept. of Computer Science
School of Computing
National Univ. of Singapore
1
Agenda







Background
Schematic discrepancy and schema
transformation
Qualified FD
Inference rules of qualified FD
Propagation rules of qualified FD during schema
transformation
Related work
Conclusion
2
Motivation

Functional dependencies (FDs) are useful





Ensure the integrity of data in insertion, deletion and
update
Check lossless join decomposition
Normalize relations
Optimize queries
…
3
Issues on FDs in Individual DB


Inference rules of FDs in fixed relations
Derivation of dependencies on views from those
on original relations
How about in multidatabase systems?
4
FDs in Multidatabase


Multidatabase is distributed (i.e., data may be divided and
stored in several databases) and heterogeneous (i.e., similar
data may be represented in quite different forms in component
databases)
The distribution and heterogeneity of multidatabase require:



Complex schema transformations rather than the relational algebra
to integrate database schemas
A flexible expression of FDs to represent dependencies in component
databases
Consequently, the expression and derivation of FDs in
multidatabase are more difficult than in individual databases
5
Agenda







Background
Schematic discrepancy and schema
transformation
Qualified FD
Inference rules of qualified FD
Propagation rules of qualified FD during schema
transformation
Related work
Conclusion
6
Schematic discrepancy



A semantic heterogeneity among component
schemas
Schematic discrepancy occurs when one DB’s
attribute values correspond to schema labels
(attribute names or relation names) in others
See next page for an example
7
DB1:
Supply
product
supplier
month
price
p1
s1
Jan
100
p1
s1
Feb
105
…
…
…
…
unite({s1,…,sn},supplier)
fold(Supply,
month,price)
DB2:
Supply
product
supplier
Jan
Feb
…
Dec
p1
s1
100
105
…
97
p1
s2
99
107
…
110
…
…
…
…
…
…
unfold(Supply,
month,price)
split(Supply,supplier)
split(Supply,supplier)
unite({s1,…,sn},supplier)
DB3:
s1
DB4:
s1
product
month
price
p1
Jan
100
product
Jan
Feb
…
Dec
p1
Feb
105
p1
100
105
…
97
…
…
…
…
…
…
…
…
product
Jan
Feb
…
Dec
p1
99
107
…
110
…
…
…
…
unfold(si,month,price)
s2
s2
fold(si,month,price)
(i =1,2,… )
product
month
price
p1
Jan
99
p1
Feb
107
…
…
…
…
…
…
 Supplier names modeled as attribute values in DB1 and DB2, but relation
names in DB3 and DB4;
 Months modeled as attribute values in DB1 and DB3, but attribute names
in DB2 and DB4
8
Schematic Discrepant
Transformation


We call a transformation between schematic discrepant
schemas schematic discrepant transformation
A schematic discrepant transformation can be
implemented by a sequence of restructuring operators
(unfold, fold, split and unite [5]).


Unfold transforms attribute values to attribute names, and fold
does the converse.
Split transforms attribute values to relation names, and unite does
the converse.
[5] L. V. S. Lakshmanan, F. Sadri, and S. N. Subramanian. On efficiently implementing
schemaSQL on SQL database system. VLDB, 1999, 471-482.
9
Agenda







Background
Schematic discrepancy and schema
transformation
Qualified FD
Inference rules of qualified FD
Propagation rules of qualified FD during schema
transformation
Related work
Conclusion
10
Jan
product
Qualified FD (E.g.)
supplier
price
supplier
price
Feb
product
…



Constraint: in the 1st quarter, the products’ prices supplied by
supplier s1 are uniquely determined by the products.
That is, in the union R = Mi{Jan, Feb, Mar}(supplier = s1Mi), the FD
productprice holds.
We represent this constraint as a qualified FD as follows:
the dependency holds in the union
of the relations Jan, Feb and Mar
the dependency holds
when supplier is s1
{Jan, Feb, Mar}(product, supplierσ={s1}  price)
called qualification attribute
11
Qualified FD



FDs holding over a set of relations or a set of horizontal
partitions of relations
If a qualified FD only contains regular attributes and
holds in one relation, then it is just a conventional FD.
Flexible to represent dependencies in distributed and
heterogeneous schemas
12
Inference rules of qualified FDs



Let F be a set of qualified FDs and f be a qualified
FD for the schemas of a set of relations R . We say
F implies f, if every instance of R that satisfies F
also satisfies f.
We define F+ to be set of qualified FDs implied
by F.
Inference rules will allow us to deduce all the true
qualified FDs for R , i.e., those in F+
13
Inference Rules
Given X a mixed set of regular and qualification attributes, Y and Z sets of
regular attributes, A an attribute, R a set of relations, R1  R, D1 D dom(A),
we have:
(A1) Partition on relation set. If R (XY) holds, then R1 (XY) holds.
(A2) Composition on relation set. If {Ri, Rj}(XY) holds for any Ri, Rj  R, then
R (XY) holds.
(A3) Partition on qualification. If R (Aσ=D, XY) holds, then R (Aσ=D1, XY)
holds.
(A4) Composition on qualification. If R (A={ai,aj}, XY) holds for any ai, ajD,
then R (A=D, XY) holds.
(A5) Single-valued qualification. If a  dom(A), then R (Aσ={a}A) holds.
(A6) Assembly. If R (Aσ={a}, XY) holds for each a  D, then R (Aσ=D, A, XY)
holds.
14
Inference Rules (cont.)
(A7) Reflexivity. If Y  X, then R (XY) holds.
(A8) Augmentation. If R (XY) holds, then R (X, ZY, Z).
(A9) Transitivity. If R (XY) and R (X1, YZ) hold, where X1 is a set (possibly
an empty set) of some qualification attributes in X, then R (XZ) holds.
(A10) Dummy qualification. R (XY) iff R (Aσ=dom(A), XY) holds.

A useful rule derived from the above inference rules:
(A11) Disassembly. R (A, XY) holds iff R (Aσ={a}, XY) holds for each a 
dom(A)
15
Inference Rules
Rule A5 and A7 give trivial dependencies
 Rule A7, A8 and A9 extend Armstrong's
Axiom, the inference rules for FDs
 The rules are sound and complete to infer
qualified FDs in fixed relations

16
Inference Rule of Disassembly
(E.g.)
Supply
product supplier
month
price
Original FD: {product, supplier, month}price
equivalent to: {product, supplierσ={si}, month}price for each sidom(supplier)
equivalent to: {product, supplierσ={si}, monthσ={mj}}price for each
sidom(supplier) and mj dom(month)
17
Agenda







Background
Schematic discrepancy and schema
transformation
Qualified FD
Inference rules of qualified FD
Propagation rules of qualified FD during schema
transformation
Related work
Conclusion
18
Propagation Rules



Let R be a set of original relations, and S be a set
of relations transformed from R by applying
some restructuring operator (i.e., unfold, fold, split
or unite).
From a given set of qualified FDs F for R ,
propagation rules allow us to deduce the qualified
FDs for S.
We give the rules for split/unite and unfold/fold in
a pairwise way.
19
Propagation Rules for split/unite
split(R, B)
…
R(A1, …, An, B)
b1(A1, …, An)
unite({b1,…, bm}, B).
where dom(B)
= {b1, …, bm}
bm(A1, …, An)
Let X be a mixed set of regular and qualification attributes from {A1, …, An},
Y  {A1, …, An} be a set of regular attributes, and R  {b1, …, bm} be a set of
relation names. We have the following rule:
(P1) R(Bσ=R , XY) holds iff R (XY) holds.
20
Propagation Rules for split/unite


The qualification on the values of attribute B in a
qualified FD becomes the qualification on the
relation set over which the inferred qualified FD
holds, as B values become relation names in the
transformed schemas.
unite is a qualified FD preserving transformation,
but split is not

The FD R(X  B) will not be transformed to any
dependency in application of split
21
Propagation Rules for unfold/fold
R1(A1, …, An, B, C)
Rk(A1, …, An, B, C)
fold(Si, B, C) for
each i = 1, …, k
Sk(A1, …, An, b1, …, bm)
…
S1(A1, …, An, b1, …, bm)
…
unfold(Ri, B, C) for
each i = 1, …, k
where dom(B) = {b1, …, bm}, and the
values of attributes bi are from dom(C)
Let X be a mixed set of regular and qualification attributes from {A1, …, An},
Y  {A1, …, An} be a set of regular attributes. Let R {R1, …, Rk} and
S  {S1, …, Sk}, such that the relations of S are transformed from R. We have
the following rules:
(P2) R (Bσ={bi}, XC) holds iff S(Xbi) holds.
(P3) R (Bσ={bi}, X, CY) holds iff S(X, biY) holds.
(P4) R (XY) holds iff S(XY) holds.
22
Propagation Rules for unfold/fold



The rules are based on a set of unfold/fold
operators because some qualified FDs may hold
over a set of relations which are transformed
together by unfold/fold operations.
Rule P2 and P3 indicate that the qualification on
the value of attribute B in a qualified FD becomes
the qualification on the attribute name in the
inferred qualified FD.
Both fold and unfold are not qualified FD
preserving transformations.
23
Derive Qualified FDs in Schema
Transformation


Let R be a set of original relations, and S be a set
of relations transformed from R by applying a
sequence of restructuring operators
From a given set of qualified FDs F for R , we
deduce the qualified FDs for S, using both
inference rules and propagation rules
24
Derive Qualified FDs in Schema
Transformation (E.g.)
fold(si,month,price)
DB3:
(i=1,2,…)
s1
DB4:
s1
unite({s1,…,sn},supplier)
product Jan Feb … Dec
product month price
s2
s2
product Jan Feb … Dec
product month price
…
DB1:
Supply
product supplier month price
…
Given qualified FDs in DB4:
si(product  Jan,…,Dec) for each si = s1, s2,…,sn
After fold, in DB3, we get (by propagation rule P2)
si(product, monthσ={mj}  price) for each si = s1, s2,…,sn and mj = Jan, …, Dec
then (by disassembly rule A11):
si(product, month  price)
After unite, in DB1, we get (by propagation rule P1)
Supply(product, month, supplierσ={si}  price) for each si = s1, s2,…,sn
then (by disassembly rule A11):
Supply(product, month, supplier  price)
25
Derive Qualified FDs in Schema
Transformation (E.g.)



In the above example, the qualified FD on the
original relations are transformed to an equivalent
qualified FD on the transformed relation.
Directly applying the inference rules and
propagation rules need the computation of
qualified FD closure, which take much time.
Instead, in the paper, we gave some derived rules
to propagate qualified FDs in schema
transformation.
26
Related work (1)

An extension to FDs in the database design world
are FDs partially holding in a relation, in the sense
that only some tuples, called exceptions, break the
dependencies. These dependencies include weak
FDs [7], afunctional dependencies [3] and partial
FDs [2].
[2] F. Berzal, J. C. Cubero, F. Cuenca, J. M. Medina. Relational decomposition through partial functional
dependencies. Data & Knowledge Engineering 43(2), 2002, 207-234.
[3] P. De Bra and J. Paredaens. Conditional dependencies for horizontal decompositions. ICALP, 1983.
[7] Tok Wang Ling. Extending classical functional dependencies for physical database design (lecture notes),
2001. http://www.comp.nus.edu.sg/~lingtw/cs4221/extended.fds.pdf
27
Related work (1)


Differences between weak FD (or some other similar
dependencies) and qualified FD:
 A weak FD predicates that some tuples (but don’t know
which tuples) in a relation would violate the dependency,
while a qualified FD indicates exactly what kind of tuples
satisfy the dependency.
 Those dependencies are specified on individual relations,
while qualified FDs may hold over a set of relations.
We are not aware of any axiomatization for weak FD,
afunctional dependency or partial FD.
28
Related work (2)



In individual databases, the issue of inferring FDs
on views from FDs on original relations has been
introduced in [1, 4].
They did not consider schematic discrepancy in
schema transformation
Our work complements existing work based on
the relational algebra
[1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995, 173-187, 216-235.
[4] G. Gottlob. Computing covers for embedded functional dependencies. SIGMOD, 1987.
29
Related work (3)



Some work [6, 8, 9] has been done on the
derivation of constraints in schema integration
based on ER or OO model.
They did not consider schematic discrepancy in
schema integration
They did not prove the completeness of their
methods, while we did.
[6] Mong Li Lee and Tok Wang Ling. Resolving constraint conflicts in the integration of ER schemas. ER,
1997, 394-407.
[8] M. P. Reddy, B. E. Prasad, and A. Gupta. Formulating global integrity constraints during derivation of
global schema. Data & Knowledge Engineering, 1995, 241-268
[9] M. W. W. Vermeer and P. M. G. Apers. The role of integrity constraints in database interoperation. VLDB,
1996, 425-435
30
Conclusion



Proposed qualified FD to represent dependencies
for distributed and heterogeneous schemas in a
multidatabase system
Found a set of inference rules and propagation
rules to derive qualified FDs in schematic
discrepant transformation
The rules are sound and complete to derive some
classes (not all) of qualified FDs in schema
transformation
31
Reference
[1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995,
173-187, 216-235.
[2] F. Berzal, J. C. Cubero, F. Cuenca, J. M. Medina. Relational decomposition through
partial functional dependencies. Data & Knowledge Engineering 43(2), 2002, 207-234.
[3] P. De Bra and J. Paredaens. Conditional dependencies for horizontal decompositions.
ICALP, 1983.
[4] G. Gottlob. Computing covers for embedded functional dependencies. SIGMOD, 1987.
[5] L. V. S. Lakshmanan, F. Sadri, and S. N. Subramanian. On efficiently implementing
schemaSQL on SQL database system. VLDB, 1999, 471-482.
[6] Mong Li Lee and Tok Wang Ling. Resolving constraint conflicts in the integration of ER
schemas. ER, 1997, 394-407.
[7] Tok Wang Ling. Extending classical functional dependencies for physical database
design (lecture notes), 2001.
http://www.comp.nus.edu.sg/~lingtw/cs4221/extended.fds.pdf
[8] M. P. Reddy, B. E. Prasad, and A. Gupta. Formulating global integrity constraints during
derivation of global schema. Data & Knowledge Engineering, 1995, 241-268
[9] M. W. W. Vermeer and P. M. G. Apers. The role of integrity constraints in database
interoperation. VLDB, 1996, 425-435
32