CS206 --- Electronic Commerce

Download Report

Transcript CS206 --- Electronic Commerce

IT 244
Database Management System
Topic 7
The Relational Data Model
Functional Dependencies
Ref : -
A First Course in Database System
(Jeffrey D Ullman, Jennifer Widom) + online.
1
Functional Dependencies
Science is the knowledge of consequences,
and dependence of one fact upon another.
Thomas Hobbes
(1588-1679)
2
Functional Dependencies
• Functional Dependence often abbreviated
FD is a unique-value constrain and the
most important type of constraint we use for
relational schema design.
• The requirement that a data item be
dependent on the entire primary key and
not just a part of the primary key.
3
How?
• It is also possible for database designers
to produce relational schemas directly
from application requirements although
doing so can be very difficult.
• Regardless of how relational designer are
produced, we shall see that frequently it is
possible to improve designs systematically
based on certain types of constraints.
4
It is vital?
• Knowledge of this type of constraint is vital
for the redesigning of database schemas
• What it does? - it eliminate redundancy.
• There are also some other kinds of
constraints that help us design good
databases schemas such as multivalued
dependencies and referential integrity
constraints and others.
5
The Evils of Redundancy
• Redundancy is at the root of several
problems associated with relational schemas:
– redundant storage, insert/delete/update
anomalies
• Integrity constraints, in particular functional
dependencies, can be used to identify schemas with
such problems and to suggest refinements.
• Main refinement technique: decomposition
– replacing ABCD with, say, AB and BCD, or ACD and
ABD.
• Decomposition should be used judiciously:
– Is there reason to decompose a relation?
– What problems (if any) does the decomposition cause?
6
Functional Dependency
• A functional dependency (FD) on a relation R
is a statement of the form “If two tuples of R
agree on attributes A1 , A2 , …. An ( ie the
tuples have the same values in their respective
components for each of these attributes) then
they must agree on another attribute B”.
(Book definition)
7
In other words:
• X  Y means
Given any two tuples in r, if the X values
are the same, then the Y values must
also be the same. (but not vice versa)
• Can read “” as “determines
8
In other words cont’
• An FD is a statement about all allowable relations.
– Must be identified based on semantics of application.
– Given some instance r1 of R, we can check if r1 violates
some FD f, but we cannot determine if f holds over R.
• Question: How related to keys?
• if “K  all attributes of R” then K is a superkey for R
(does not require K to be minimal.)
• FDs are a generalization of keys.
9
Expression
•
•
•
•
R represent Relation
A1, A2, A3 … An represent attributes
X, Y, Z represent sets of attributes
Letters at the beginning of the alphabet
represent single attributes, and letters at
the end of the alphabet represent a set of
attributes.
10
• Formally we write FD as A1 , A2 , … An -> B and say
that A1 , A2 , … , An functionally determine B.
• A set of attributes A1 , A2 , … , An functionally
determines more than one attributes
A1 , A2 , … , An -> B1
A1 , A2 , … , An -> B2
……..
A1 , A2 , … , An -> Bm
For short we write this set of FD’s as
A1 , A2 , … , An -> B1B2 … Bm
11
In other words
• X -> A is an assertion about a relation R that
whenever two tuples of R agree on all the
attributes of X, then they must also agree on
the attribute A.
– Say “X -> A holds in R.”
– Notice convention: …,X, Y, Z represent sets of
attributes; A, B, C,… represent single attributes.
– Convention: no set formers in sets of attributes,
just ABC, rather than {A,B,C }.
12
What this FD’s suggest
13
Example 1
•
•
Movie (title, year, length, filmType, studioName, starName)
Instance of what we reproduce is
Title
Year
Length
film Type
studio Name
star Name
Star wars
1977
124
Color
Fox
Carrie Fisher
Star wars
1977
124
Color
Fox
Mark Hamili
Star Wars
1977
124
Color
Fox
Harrison Ford
Mighty Ducks
1991
104
Color
Disney
Emilio Estevez
Wayne’s World
1992
95
Color
Paramount
Dana Carvey
Wayne’s World
1992
95
Color
Paramount
Mike Mayers
Several FD’s that we can reasonably assert about this Movie Relations
title year -> length
title year -> filmType
title year -> studioName
14
What does this FD’s Says?
• Informally it says that if two tuples have the same value in
their title components, and they also have the same value in
their year components then these two tuples must have the
same values in their length component, the same value in
their filmtype component and the same values in their
studioName component. This assertion makes sense if we
remember the original design from which this relation
schema was developed.
• Attributes title and year form a key for the Movie entity set.
Thus we expect that given a title and year, there is a unique
movie, therefore there is a unique length and a unique film
type for the movie. There is a many to one relationship from
Movies to Studios. Consequently, we expect that given a
movie there is only one owing studio.
15
What does this FD’s Says?
• On the other hand, we observe that the
statement
title year -> starsName is false, it is not
functionally dependency. Given a movie, it
is entirely possible that there is more than
one star for the movie listed in our
database.
16
Example 2
•
•
Drinkers(name, addr, beersLiked,
manf, favBeer).
Think of a:
Reasonable FD’s to assert:
17
• Drinkers(name, addr, beersLiked, manf,
favBeer).
• Reasonable FD’s to assert:
1.name -> addr
2.name -> favBeer
3.beersLiked -> manf
18
Example Data
name
Vili
Vili
Tolu
addr
Kalau
Kalau
Manima
Because name -> addr
beersLiked
VB
Ikale
VB
manf
A.B
R.B
A.B
favBeer
Ikale
Ikale
VB
Because name -> favBeer
Because beersLiked -> manf
19
Exercise
• Look at your own database, see if you can
find examples of FD’s from it.
20
Functional Dependencies Tell us
about the Schema
• Recall that FD like any constraint is an
assertion about the schema of a relation, not
about a particular instance.
• If we look at an instance we cannot tell for
certain that a FD holds
we might suppose that a FD like title -> filmtype holds
because for every tuple in this particular instance of the
relation Movie it happens that any two tuples agreeing on title
also agree on filmtype
- two version of King Kong –one is color and one is black and
white
21
FD’s With Multiple Attributes
• No need for FD’s with > 1 attribute on
right.
– But sometimes convenient to combine FD’s
as a shorthand.
– Example: name -> addr and name -> favBeer
become name -> addr favBeer
• > 1 attribute on left may be essential.
– Example: bar beer -> price
22
Another Example
• Attributes {title, year, starName} form a key
for the relation Movies. But first we must
show that they functionally determine all the
other attributes. That is suppose two tuples
agree on these three attributes: title, year,
and starName. Because they agree on title
and year, they must agree on the other
attributes – length, filmType and
studioName. Thus, two different tuples can
not agree on all of title, year and starName
they would in fact be the same tuple,
23
• Now we must agree that no proper subset of {title,
year, starName} functionally determines all other
attributes.
• Why?
- begin by observing that title and year do
not determine starName, because many movies
have more than one star. Thus {title and year} is not
a key – {year, starName} is not a key because we
could have a star in two moives in the same year:
therefor year starName -> title is not a FD. Also we
claim that {title, starName} is not a key because two
movies with the same title made in different years,
occasionally have a star in common.
24
Therefore…
Several FD’s that we can reasonably assert
about the Movie Relations
title year -> length
title year -> filmType
title year -> studioName
Since each have the same left side title and year
we summarize them like
Title year -> length filmType studioName.
25
Keys of Relations
• We say a set of one or more attributes {A1
, A2 , … An } is a key for a relation R if:
•
1. Those attributes functionally
determine all other attributes of
the relation. That is because
relations are sets, it is impossible
for two distinct tuples of R to
agree on all of A1 , A2 , … An
26
•
2.
No proper subset of {A1 , A2 , …
An } functionally determines all
other attributes of R; ie. A key
must be minimal
• *
When a key consists of a single
attribute A, we often say that A (rather
than {A}) is a key.
27
Short form…
•
K is a key for relation R if:
– Set K functionally determines all attributes
of R
– For no proper subset of K is (1) true.
•
•
If K satisfies (1), but perhaps not (2),
then K is a superkey.
Note E/R keys have no requirement for
minimality, as in (2) for relational keys.
28
Superkeys
• Superkeys
A set of attributes that contain a key is called a
superkey, short for superset of a key. Thus
every key is a superkey.
Note that every superkey satisfies the first
condition of a key; it functionally determines all
other attributes of the relation. However, a
superkey need not satisfy the second condition:
minimality.
29
Example
• Consider relation Drinkers(name, addr,
beersLiked, manf, favBeer).
• {name, beersLiked} is a superkey
because together these attributes
determine all the other attributes.
– name -> addr favBeer
– beersLiked -> manf
30
Example, Cont.
• {name, beersLiked} is a key because
neither {name} nor {beersLiked} is a
superkey.
– name doesn’t -> manf; beersLiked doesn’t ->
addr.
• In this example, there are no other keys,
but lots of superkeys.
– Any superset of {name, beersLiked}.
31
Discovering Keys for Relations
• When a relation schema was developed by
converting an E/R design to relations, we can often
predict the key of the relation.
• Rule 1
If the relation comes from an entity set then the key
for the relation is the key attributes of this entity set.
Eg.
Entity set Movies and Stars could be
converted to relations. The keys for these
entity sets were {title, year} and {name}
Movies (title, year, length, filmtype)
Stars (name, address)
32
• Second Rule
concerns binary relationship – if a relation R is
constructed form a relationship, then the multiplicity of
the relationship affects the key for R. There are three
cases
- If the relationship is many-many, then the keys of both
-
-
connected entity sets are the key attributes for R
If the relationship is a many-one from entity set E1 to entity
set E2, then the key attributes of E1 are key attributes of R,
but those of E2 are not.
If the relationship is one-one, then the key attributes for
either of the connected entity sets are key attributes of R,
Thus there is not a unique key for R.
33
E/R and Relational Keys
• Keys in E/R are properties of entities
• Keys in relations are properties of tuples.
• Usually, one tuple corresponds to one entity, so
the ideas are the same.
• But --- in poor relational designs, one entity can
become several tuples, so E/R keys and
Relational keys are different.
34
Example Data
name
Vili
Vili
Tolu
addr
Kalau
Kalau
Manima
beersLiked
VB
Ikale
VB
manf
A.B
R.B
A.B
favBeer
Ikale
Ikale
VB
Relational key = name beersLiked
But in E/R, name is a key for Drinkers, and beersLiked is a key
for Beers.
Note: 2 tuples for Vili entity and 2 tuples for VB entity.
35
A continuous Example!!!
Constraints on Entity Set
• Consider relation obtained from Hourly_Emps:
Hourly_Emps (ssn, name, lot, rating,
wage_per_hr, hrs_per_wk)
• We sometimes denote a relation schema by
listing the attributes: e.g., SNLRWH
• This is really the set of attributes {S,N,L,R,W,H}.
• Sometimes, we refer to the set of all attributes of
a relation by using the relation name. e.g.,
“Hourly_Emps” for SNLRWH
36
A continuous Example!!! count’
What are some FDs on Hourly_Emps?
– ssn is the key: S  SNLRWH
– rating determines wage_per_hr: R  W
– lot determines lot: L  L (“trivial”
dependnency)
Know Trivial Dependency? Find!!!
37
A continuous Example!!!
Problems Due to R  W
S
N
L
R W H
123-22-3666 Attishoo
48 8
10 40
231-31-5368 Smiley
22 8
10 30
131-24-3650 Smethurst 35 5
7
30
434-26-3751 Guldu
35 5
7
32
612-67-4134 Madayan
35 8
10 40
Hourly_Emps
• Update anomaly: Can we modify W in only the 1st tuple of
SNLRWH?
• Insertion anomaly: What if we want to insert an employee
and don’t know the hourly wage for his or her rating? (or we
get it wrong?)
• Deletion anomaly: If we delete all employees with rating 5,
38
we lose the information about the wage for rating 5!
A continuous Example!!!
Detecting Redundancy
S
N
L
R W H
123-22-3666 Attishoo
48 8
10 40
231-31-5368 Smiley
22 8
10 30
131-24-3650 Smethurst 35 5
7
30
434-26-3751 Guldu
35 5
7
32
612-67-4134 Madayan
35 8
10 40
Hourly_Emps
Q: Why was R  W problematic, but S W not?
39
A continuous Example!!!
Decomposing a Relation
• Redundancy can be removed by “chopping” the
relation into pieces.
• FD’s are used to drive this process.
R  W is causing the problems, so decompose SNLRWH
into what relations?
S
N
L
R H
123-22-3666 Attishoo
48 8
40
231-31-5368 Smiley
22 8
30
131-24-3650 Smethurst 35 5
30
434-26-3751 Guldu
35 5
32
612-67-4134 Madayan
35 8
40
Hourly_Emps2
R W
8
10
5
7
Wages
40
Refining an ER Diagram
• 1st diagram becomes:
Workers(S,N,L,D,Si)
Departments(D,M,B)
– Lots associated with
workers.
• Suppose all workers in
a dept are assigned the
same lot: D  L
• Redundancy; fixed by:
Workers2(S,N,D,Si)
Dept_Lots(D,L)
Departments(D,M,B)
• Can fine-tune this:
Workers2(S,N,D,Si)
Departments(D,M,B,L)
Before:
since
name
ssn
dname
lot
Employees
did
Works_In
budget
Departments
After:
budget
since
name
dname
ssn
Employees
did
Works_In
lot
Departments
41
Where Do Keys Come From?
1. We could simply assert a key K. Then
the only FD’s are K -> A for all attributes
A, and K turns out to be the only key
obtainable from the FD’s.
2. We could assert FD’s and deduce the
keys by systematic exploration.
 E/R gives us FD’s from entity-set keys and
many-one relationships.
42
FD’s From “Physics”
• While most FD’s come from E/R Keynes
and many-one relationships, some are
really physical laws.
• Example: “no two courses can meet in the
same room at the same time” tells us:
hour room -> course.
43
Inferring FD’s: Motivation
• In order to design relation schemas well,
we often need to tell what FD’s hold in a
relation.
• We are given FD’s X1 -> A1, X2 -> A2,…,
Xn -> An , and we want to know whether an
FD Y -> B must hold in any relation that
satisfies the given FD’s.
– Example: If A -> B and B -> C hold, surely A > C holds, even if we don’t say so.
44
Reasoning About FDs
• Given some FDs, we can usually infer additional
FDs:
title  studio, star implies title  studio and title  star
title  studio and title  star implies title  studio, star
title  studio, studio  star implies title  star
But,
title, star  studio does NOT necessarily imply that
title  studio or that star  studio
• An FD f is implied by a set of FDs F if f holds
whenever all FDs in F hold.
• F+ = closure of F is the set of all FDs that are
implied by F. (includes “trivial dependencies”)
45
Finding All Implied FD’s
• Motivation: “normalization,” the process
where we break a relation schema into
two or more schemas.
• Example: ABCD with FD’s AB ->C,
C ->D, and D ->A.
– Decompose into ABC, AD. What FD’s hold
in ABC ?
– Not only AB ->C, but also C ->A !
46
Basic Idea
• To know what FD’s hold in a projection,
we start with given FD’s and find all FD’s
that follow from given ones.
• Then, restrict to those FD’s that involve
only attributes of the projected schema.
47
Rules of Inference
• Armstrong’s Axioms (X, Y, Z are sets of attributes):
– Reflexivity: If X  Y, then X  Y
– Augmentation: If X  Y, then XZ  YZ for any Z
– Transitivity: If X  Y and Y  Z, then X  Z
• These are sound and complete inference rules for
FDs!
– i.e., using AA you can compute all the FDs in F+ and only
these FDs.
• Some additional rules (that follow from AA):
– Union: If X  Y and X  Z, then X  YZ
– Decomposition: If X  YZ, then X  Y and X  Z
48
Inference Test
• To test if Y -> B, start assuming two tuples agree
in all attributes of Y.
• Use the given FD’s to infer that these tuples
must also agree in certain other attributes.
• If B is eventually found to be one of these
attributes, then Y -> B is true; otherwise, the two
tuples, with any forced equalities form a twotuple relation that proves Y -> B does not follow
from the given FD’s.
49
Example
• Contracts(cid,sid,jid,did,pid,qty,value), and:
– C is the key: C  CSJDPQV
– Proj purchases each part using single contract: JP  C
– Dept purchases at most 1 part from a supplier: SD  P
• Problem: Prove that SDJ is a key for Contracts
• JP  C, C  CSJDPQV imply JP  CSJDPQV
(by transitivity) (shows that JP is a key)
• SD  P implies SDJ  JP (by augmentation)
• SDJ  JP, JP  CSJDPQV imply SDJ  CSJDPQV
(by transitivity) thus SDJ is a key.
Q: can you now infer that SD  CSDPQV (i.e., drop J on both
sides)?
No! FD inference is not like arithmetic multiplication.
50
Attribute Closure
• Computing the closure of a set of FDs can be
expensive. (Size of closure is exponential in # attrs!)
• Typically, we just want to check if a given FD X  Y
is in the closure of a set of FDs F. An efficient check:
– Compute attribute closure of X (denoted X+) wrt F.
= Set of all attributes A such that X  A is in F+
• X+ := X
• Repeat until no change: if there is an fd U  V in F such that U
is in X+, then add V to X+
– Check if Y is in X+
– Approach can also be used to find the keys of a relation.
• If all attributes of R are in the closure of X then X is a
superkey for R.
• Q: How to check if X is a “candidate key”?
51
X+
Closure Test
• An easier way to test is to compute the
closure of Y, denoted Y +.
• Basis: Y + = Y.
• Induction: Look for an FD’s left side X that
is a subset of the current Y +. If the FD is
X -> A, add A to Y +.
52
Attribute Closure (example)
• R = {A, B, C, D, E}
• F = { B CD, D  E, B  A, E  C, AD B }
• Is B  E in F+ ?
•
B+ = B
– B+ = BCD
– B+ = BCDA
– B+ = BCDAE … Yes!
B is a key for R too!
• Is D a key for R?
– D+ = D
– D+ = DE
– D+ = DEC
– … Nope!
• Is AD a key for R?
AD+ =
AD
AD+ = ABD and B is a key, so
Yes!
• Is AD a candidate key for
and
R?
A+ = A, D+ = DEC
… A,D not keys, so Yes!
• Is ADE a candidate key for
R?
… No! AD is a key, so ADE is a
superkey, but not a cand. key
53
X
Y+
A
new Y+
54
Simple, Exponential Algorithm
1. For each set of attributes X, compute
X +.
2. Add X ->A for all A in X + - X.
3. However, drop XY ->A whenever we
discover X ->A.
 Because XY ->A follows from X ->A.
4. Finally, use only FD’s involving
projected attributes.
55
A Few Tricks
• Never need to compute the closure of the
empty set or of the set of all attributes.
• If we find X + = all attributes, don’t bother
computing the closure of any supersets of
X.
56
Example
• ABC with FD’s A ->B and B ->C.
Project onto AC.
– A +=ABC ; yields A ->B, A ->C.
• We do not need to compute AB + or AC +.
– B +=BC ; yields B ->C.
– C +=C ; yields nothing.
– BC +=BC ; yields nothing.
57
Example, Continued
• Resulting FD’s: A ->B, A ->C, and B ->C.
• Projection onto AC : A ->C.
– Only FD that involves a subset of {A,C }.
58
A Geometric View of FD’s
• Imagine the set of all instances of a
particular relation.
• That is, all finite sets of tuples that have
the proper number of components.
• Each instance is a point in this space.
59
Example: R(A,B)
{(1,2), (3,4)}
{}
{(5,1)}
{(1,2), (3,4), (1,3)}
60
An FD is a Subset of Instances
•
•
•
For each FD X -> A there is a subset of
all instances that satisfy the FD.
We can represent an FD by a region in
the space.
Trivial FD : an FD that is represented by
the entire space.
– Example: A -> A.
61
Example: A -> B for R(A,B)
{(1,2), (3,4)}
A -> B
{}
{(5,1)}
{(1,2), (3,4), (1,3)}
62
Representing Sets of FD’s
• If each FD is a set of relation instances,
then a collection of FD’s corresponds to
the intersection of those sets.
– Intersection = all instances that satisfy all of
the FD’s.
63
Example
Instances satisfying
A->B, B->C, and
CD->A
A->B
B->C
CD->A
64
Implication of FD’s
• If an FD Y -> B follows from FD’s
X1
-> A1,…, Xn -> An , then the region in the
space of instances for Y -> B must
include the intersection of the regions
for the FD’s Xi -> Ai .
– That is, every instance satisfying all the
FD’s Xi -> Ai surely satisfies Y -> B.
– But an instance could satisfy Y -> B, yet
not be in this intersection.
65
Example
A->B A->C B->C
66
Summary
• Functional Dependencies –definition,
-expressions, -applications
• FD’s with multiple attributes
• Keys of Relations
• Superkeys
• Discovering Keys of a Relations
• Inferring FD’s
• Geometric View of FD’s
67