Functional Dependencies - Department of Computer Science

Download Report

Transcript Functional Dependencies - Department of Computer Science

Design of Relational Database Schemas
• The principal problem that we encounter is redundancy,
where a fact is repeated in more than one tuple.
• Most common cause: attempts to group into one relation both
single valued and multi-valued properties of an object.
• Now we will tackle the problem of designing good relational
schemas.
Anomalies
title
year
length
filmTyp
studioNam
starName
Star Wars
Star Wars
Star Wars
Mighty Ducks
Wayne’s World
Wayne’s World
1977
1977
1977
1991
1992
1992
124
124
124
104
95
95
color
color
color
color
color
color
Fox
Fox
Fox
Disney
Paramount
Paramount
Carrie Fisher
Mark Hamill
Harrison Ford
Emilio Estevez
Dana Carvey
Mike Meyers
• Redundancy.
– Information may be repeated unnecessarily in several tuples.
– E.g. length and filmType.
• Update anomalies.
– We may change information in one tuple but leave it unchanged in other tuples.
– E.g. we could change the length of Star Wars to 125, in the first tuple, and forget
to do the same in the second and third tuple.
• Deletion anomalies.
– If a set of values becomes empty, we may lose other information as a side effect.
– E.g. if we delete Emilio Estevez we will lose all the information about Mighty
Ducks.
Decomposing Relations - Example
title
year
length
filmTyp
studioNam
Star Wars
Mighty Ducks
Wayne’s World
1977
1991
1992
124
104
95
color
color
color
Fox
Disney
Paramount
Movie1 relation
title
year
starName
Star Wars
Star Wars
Star Wars
Mighty Ducks
Wayne’s World
Wayne’s World
1977
1977
1977
1991
1992
1992
Carrie Fisher
Mark Hamill
Harrison Ford
Emilio Estevez
Dana Carvey
Mike Meyers
Movie2 relation
• No true redundancy!
• The update anomaly disappeared. If
we change the length of a movie, it is
done only once.
• The deletion anomaly disappeared. If
we delete all the stars from Movie2 we
still will have the other info for a movie.
Boyce-Codd Normal Form
• The goal of decomposition is to replace a relation by
several that do not exhibit anomalies.
• There is a simple condition under which the anomalies
can be guaranteed not to exist.
• This condition is called Boyce-Codd Normal Form, or
BCNF.
• A relation is in BCNF if:
– Whenever there is a nontrivial dependency
A1A2…AnB1B2…Bm
for R, it must be the case that
{A1 , A2 , … , An} is a superkey for R.
Boyce-Codd Normal Form - Example
• Relation Movie in the previous figure is not in BCNF.
Violating
BCNF
– Consider the FD: title yearlength filmType studioName
– Unfortunately, the left side of the above dependency is not a superkey.
• In particular we know that the title and the year does not functionally
determine starName.
• On the other hand, Movie1 is in BCNF.
– The only key is {title, year} and
– title year  length filmType studioName holds in the relation
Decomposition into BCNF
• The decomposition strategy is:
– Find a non-trivial FD A1A2…AnB1B2…Bm that violates BCNF, i.e. A1A2…An
is not a superkey.
– Decompose the relation schema into two overlapping relation schemas:
• One is all the attributes involved in the violating dependency and
• the other is the left side and all the other attributes not involved in the dependency.
• By repeatedly, choosing suitable decompositions, we can break any
relation schema into a collection of smaller schemas in BCNF.
• The data in the original relation is represented faithfully by the data in the
relations that are the result of the decomposition.
– i.e. we can reconstruct the original relation exactly from the decomposed
relations.
Boyce-Codd Normal Form - Example
Consider relation schema:
Movies(title, year, studioName, president, presAddr)
and functional dependencies:
title year  studioName
studioName  president
president  presAddr
Last two violate BCNF. Why?
Compute {title, year}+, {studioName}+, {president}+ and see if
you get all the attributes of the relation.
If not, you got a BCNF violation, and need to break relation.
Boyce-Codd Normal Form – Example
Let’s decompose
starting with:
studioName  president
Let’s add to the right-hand side any other attributes in the closure of
studioName (optional “rule of thumb”).
1. X={studioName}
studioNamepresident
2. X={studioName, president} presidentpresAddr
3. X={studioName}+={studioName, president, presAddr}
Boyce-Codd Normal Form – Example
From the closure we get:
studioNamepresident presAddr
We decompose the relation schema into the following two schemas:
Movies1(studioName, president, presAddr)
Movies2(title, year, studioName)
The second schema is in BCNF.
What about the first schema?
The following dependency violates BCNF.
presidentpresAddr
Why it’s bad to leave Movies1 table as is?
If many studios share the same president than we would have
redundancy when repeating the presAddr in all those studios.
Boyce-Codd Normal Form – Example
We must decompose Movies1, using the FD:
presidentpresAddr
The resulting relation schemas, both in BCNF, are:
Movies11(title, year, studioName)
Movies12(studioName, president)
In general, we must keep applying the decomposition rule as many
times as needed, until all our relations are in BCNF.
So, finally we got Movies11, Movies12, and Movies2.
Finding FDs for the decomposed
relations
•When we decompose a relation, we need to check that the
resulting schemas are in BCNF.
•We can’t tell a relation is in BCNF, unless we can determine
the FDs that hold for that relation.
Finding FDs for the decomposed
relations
•
Suppose S is one of the resulting relations in a decomposition
of R.
For this:
•
Consider each subset X of attributes of S.
•
Compute X+ using the FD on R.
•
At the end throw out the attributes of R, which aren’t in S.
•
Then, for each attribute B such that:
•
B is an attribute of S,
•
B is in X+
we have that the functional dependency XB holds in S.
Example:
Consider R(A, B, C, D, E) decomposed into S(A, B, C) and another
relation. Let FDs of R be:
AD, BE, DEC
First, consider {A}+={A,D}.
Since D is not in the schema of S, we get no dependency here.
Similarly, {B}+={B,E} and {C}+={C}, yielding no FDs for S.
Now consider pairs.
{A,B}+={A, B, C, D, E}.
Thus, we deduce ABC for S.
Neither of the other pairs give us any FD for S.
Of course the set of all three attributes of S, {A, B, C}, cannot yield any
nontrivial dependencies for S.
Thus, the only dependency we need assert for S is ABC.
A Few Tricks
• Never need to compute the closure of the
empty set or of the set of all attributes.
• If we find X + = all attributes, don’t bother
computing the closure of any supersets of X.
Another Example
• R(A,B,C)
• with FD’s A B and B C.
• Project onto S(A,C).
– A +=ABC ; yields A B, A C.
• We do not need to compute AB + or AC +.
– B +=BC ; yields B C.
– C +=C ; yields nothing.
– BC +=BC ; yields nothing.
• Resulting FD’s: A B, A C, and B C.
• Projection onto AC: A C.
– This is the only FD that involves a subset of {A,C }.