Multivalued Dependencies

Download Report

Transcript Multivalued Dependencies

Multivalued Dependencies
By David Wortham
Problem Introduction
Carrie Fisher = Princess Leia Organa

Assume a relation R (from the book):
(credit Ullman and Widom)
Name
AddrStreet
AddrCity
FilmName
FilmYear
C. Fisher
123 Maple Dr.
Hollywood
Star Wars
1977
C. Fisher
5 Locust Ln.
Malibu
Star Wars
1977
C. Fisher
123 Maple Dr.
Hollywood
The Empire
Strikes Back
1980
C. Fisher
5 Locust Ln.
Malibu
The Empire
Strikes Back
1980
C. Fisher
123 Maple Dr.
Hollywood
Return of the Jedi
1983
C. Fisher
5 Locust Ln.
Malibu
Return of the Jedi
1983
Problem Introduction

What is the highest Normal Form R complies with?
Name
AddrStreet
AddrCity
FilmName
FilmYear
C. Fisher
123 Maple Dr.
Hollywood
Star Wars
1977
C. Fisher
5 Locust Ln.
Malibu
Star Wars
1977
C. Fisher
123 Maple Dr.
Hollywood
The Empire
Strikes Back
1980
C. Fisher
5 Locust Ln.
Malibu
The Empire
Strikes Back
1980
C. Fisher
123 Maple Dr.
Hollywood
Return of the Jedi
1983
C. Fisher
5 Locust Ln.
Malibu
Return of the Jedi
1983
Recall 1NF


1NF eliminates wasted space due to
duplicate attributes (columns)
Example (before 1NF normalization):
Manager
Subordinate
1
Subordinate
2
Subordinate
3
Bob
Jim
Mary
Beth
Mary
Mike
Jason
Carol
Jim
Alan
Subordinate
4
Mark
Recall 1NF (con’t)

After 1NF normalization:
Manager
Subordinate
Bob
Jim
Bob
Mary
Bob
Beth
Mary
Mike
Mary
Jason
Mary
Carol
Mary
Mark
Jim
Alan
Identify the FDs


Functional Dependencies in R:
Since every tuple in R is unique, the
attrA
attrB
attrC
attrD
attrE
1
2
3
6
7
1
4
5
6
7
1
2
3
8
9
1
4
5
8
9
1
2
3
10
11
1
4
5
10
11
Identify the FDs


Functional Dependencies in R:
The only FD is: {attrB, attrC, attrD, attrE }  attrA
attrA
attrB
attrC
attrD
attrE
1
2
3
6
7
1
4
5
6
7
1
2
3
8
9
1
4
5
8
9
1
2
3
10
11
1
4
5
10
11
Normal Form of R




1NF (no multivalues) [check]
2NF (no FDs where a subset of the key to
the relation is on the left) [check]
3NF (no non-trivial FDs: either the
determinant is a superkey or the RHS of
the FD is a member of some key) [check]
BCNF (the determnant of any non-trivial
FD is a superkey for the relation) [check]
Problem Intro. (con’t)

Notice:
Name
AddrStreet
AddrCity
FilmName
FilmYear
C. Fisher
123 Maple Dr.
Hollywood
Star Wars
1977
C. Fisher
5 Locust Ln.
Malibu
Star Wars
1977
C. Fisher
123 Maple Dr.
Hollywood
The Empire
Strikes Back
1980
C. Fisher
5 Locust Ln.
Malibu
The Empire
Strikes Back
1980
C. Fisher
123 Maple Dr.
Hollywood
Return of the Jedi
1983
C. Fisher
5 Locust Ln.
Malibu
Return of the Jedi
1983
Problem Intro. (con’t)

Also Notice:
Name
AddrStreet
AddrCity
FilmName
FilmYear
C. Fisher
123 Maple Dr.
Hollywood
Star Wars
1977
C. Fisher
5 Locust Ln.
Malibu
Star Wars
1977
C. Fisher
123 Maple Dr.
Hollywood
The Empire
Strikes Back
1980
C. Fisher
5 Locust Ln.
Malibu
The Empire
Strikes Back
1980
C. Fisher
123 Maple Dr.
Hollywood
Return of the Jedi
1983
C. Fisher
5 Locust Ln.
Malibu
Return of the Jedi
1983
Observe the Pattern

R ~= TxUxV
(R is similar to the Cartesian product of relations
T, U, and V)
Relation T
Relation U
Relation V
Name
AddrStreetAndCity
FilmNameAndYear
C. Fisher
123 Maple Dr. | Hollywood
Star Wars | 1977
5 Locust Ln. | Malibu
The Empire Strikes Back | 1980
Return of the Jedi | 1983
Problem Definition




The Relation R contains unnecessary
duplication of data
R is valid 1NF, 2NF, 3NF, and BCNF (and
there are no exact duplicate tuples)
R has common data on AddrStreet and
AddrCity of all tuples
R has common data on FilmName and
FilmLocat of all tuples
Solution



Introduction of a 4NF
Eliminate “non-trivial” MDs
Eliminate additional FDs that violate BCNF
Definitions


Fourth Normal Form
- if R is valid BCNF and…
- given the “non-trivial” MVD: A1A2…An  B1B2…Bn
{A1A2…An} is a superkey
A MVD:
if:
1.
2.


A1A2…An  B1B2…Bn
for a Relation R is “non-trivial”
none of the Bs are among the As
Not all of the attributes of R are among the As and Bs
A MVD is “trivial” if it contains all the variations of
A1A2…An x B1B2…Bn.
A relation cannot be decomposed any further (under
4NF rules) if it has a trivial MVD
Note about FDs and MVDs

Every FD is a MVD
(if



A1A2…An  B1B2…Bn
, then
A1A2…An  B1B2…Bn
)
The converse is not always true
FDs rule out certain tuples (i.e. if A  B then two tuples
will not have the same value for A and different values
for B)
MVDs do not rule out tuples. They guarantee that
certain tuples must exist. (we will see this later)
Another Example

Another MVD example
(credit www.cs.jcu.edu.au)
emp_name
qualifications
languages
SMITH
B. Sc
NULL
SMITH
Dip. CS
NULL
SMITH
NULL
FORTRAN
SMITH
NULL
COBOL
SMITH
NULL
PASCAL
Another Example (con’t)

Our second example 4NF decomposed:
Relation L
Relation M
emp_name
qualifications
emp_name
qualifications
languages
SMITH
B. Sc
SMITH
B. Sc
NULL
SMITH
Dip. CS
SMITH
Dip. CS
NULL
SMITH
SMITH
SMITH
NULL
NULL
NULL
emp_name
languages
SMITH
FORTRAN
SMITH
COBOL
SMITH
PASCAL
FORTRAN
COBOL
PASCAL
Relation O
Explanation of second example


Unnecessary tuples are eliminated (those
with NULL values)… saving space
Note: for this example, MxO is not similar
to L: this is because the MVD in L is nontrivial
Second example (modified)


If we were to modify the second example
L to be 4NF, we would need to combine
every possible value of M with every one
of O, changing the MVD from non-trivial to
trivial
This relation would look similar to the
product of M and O
Second example (modified)

If we were to modify the second example L, we
would need to combine every possible value of
Relation L
M with every one of O
emp_name
qualifications
languages
SMITH
B. Sc
NULL
SMITH
Dip. CS
NULL
SMITH
NULL
FORTRAN
SMITH
NULL
COBOL
SMITH
NULL
PASCAL
Second example (modified)


If we were to modify the second example L, we
would need to combine every possible value of
Relation L w/ Trivial MVD
M with every one of O
emp_name
qualifications
languages
This relation is 4NF b/c
SMITH
B. Sc
FORTRAN
the only MVD is trivial
SMITH
Dip. CS
FORTRAN
(the original L was not
SMITH
B. Sc
COBOL
4NF)
SMITH
Dip. CS
COBOL
SMITH
B. Sc
PASCAL
SMITH
Dip. CS
PASCAL
Formal Definition of a MVD

Using the following relation,
(credit Ullman & Widom)

Note that A, B, and “others” are not actual attributes,
but rather sets of attributes
tuple t
tuple u
As
Bs
others
a1
b1
c1
=
=
a1
b1
=
tuple v
a1
c2
=
b2
c2
Formal Definition of a MVD (con’t)

Suppose A  B holds, then:





t[A] = u[A] = v[A]
t[B] = u[B]
u[C] = v[C]
If a MVD is to exist, u must exist.
For the MVD to be non-trivial,
every tuple must be in the form
of a u tuple
tuple t
tuple u
As
Bs
others
a1
b1
c1
=
=
a1
b1
=
tuple v
a1
c2
=
b2
c2
Formal Definition of a MVD (con’t)

For the MVD A  B to be trivial, either:
 B  A or
 B  A = R
Must be true
tuple t
tuple u
As
Bs
others
a1
b1
c1
=
=
a1
b1
=
tuple v
a1
c2
=
b2
c2
Armstrong’s Axioms WRT MVDs

Many of Armstrong’s Axioms work with MVDs
including:












Reflexivity rule
Augmentation rule
Transitivity rule
Complementation rule
Multivalued augmentation rule
Multivalued transitivity rule
Replication rule
Coalescence rule
Multivalued union rule
Intersection rule
Difference rule
See below for specifics on these rules
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node15.html
References

Course textbook:
a First Course in Database Systems (Jeffrey D. Ullman and Jennifer Widom)

Normalizing your database: First Normal Form (1NF):
http://webc.nicc.edu/~weedd/sysanaly/Chapter%206%20Database%20normalization/1NF.html

Multivalued Dependencies (Ozmar Zaine):
http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter7/node13.html

Multivalued Dependencies:
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/normalisation/mvd.html