Class Slides

Download Report

Transcript Class Slides

Relational Data Model





Defined by Edgar Codd in 1970
Considered ingenious but impractical
Conceptually simple
Relational DB is percieved as a collection of tables
Provides SQL, a 4GL
Relational Data Model

Terminology
 Relation
 Tuple
 Attribute

Attribute values
 Domains
 Simple (value is atomic) and complex (value is a relation per se)

Some properties of relations
 Order of attributes and tuples has no significance
 There are no duplicate tuples
Relation Keys


The key is the set of attributes that uniquely
indentifies tuples in a relation
Properties of a (candidate) key:
– Uniqueness
– Nonredundancy



Relation may have more than one candidate key
Primary key is the chosen candidate key
Key attribute, non-key attribute
Primary and Foreign Key
EMPLOYEE
FName
MInit LName
SSN
BDate Address
Sex Salary SuperSSN*
DNo*
DEPARTMENT
DName
DNumber MgrSSN*
MgrStartDate
DEPT_LOCATIONS
DNumber*
DLocation
Legend:
PROJECT
PName
PNumber PLocation DNum*
WORKS_ON
ESSN*
PNo*
Hours
DEPENDENT
ESSN*
DependentName
Sex BDate
Relationship
XXXXXX
primary key
*
foreign key
Database Design Goals

Controlled redundancy
– Search for the ideal, nonredundant model
– Performance considerations

Minimal number of tables
Data Consistency
StudentSSNo
123456789
123456789
234567890
234567890
StudentName
Frank Ullman
Frank Ullman
Ann Freeman
Ann Freeman


CourseNo
MIS333
MIS334
MIS333
MIS334
CourseName
CreditHours
Data Structures
3
Database Systems
3
Data Structures
3
Database Systems
3
A data structure can be “good” or “bad”
Goals:
– Preservation of consistency following updates
– Storing data in a nonredundant way
Update Anomalies
ASSIGN
Person
Id
Project Project TimeSpentBy
Budget Id
PersonOnProject
S75
S75
S79
S79
S80
32
40
32
27
40
17
P1
P2
P1
P3
P2
P4
7
3
4
1
5
Anomalies:
• budgets 32 and 40 appear twice
• udesired null values
Add a new person to P1:
Add Tupple (ASSIGN, S85, 35, P1, 9)
Update Anomalies (cont.)
ASSIGN
Person
Id
Project Project TimeSpentBy
Budget Id
PersonOnProject
S75
S75
S79
S79
S80
32
40
32
27
40
17
35
S85
P1
P2
P1
P3
P2
P4
P1
7
3
4
1
5
9
Anomaly:
• conflicting budgets for P1
Remove a person from P3:
Delete Tupple (ASSIGN, S79, 27, P3, 1)
Update Anomalies (cont.)
ASSIGN
Person
Id
Project Project TimeSpentBy
Budget Id
PersonOnProject
S75
S75
S79
S80
32
40
32
40
17
35
S85
P1
P2
P1
P2
P4
P1
7
3
4
5
9
Anomaly:
• deletes also project budget
Change P1’s budget:
Update Tupple (ASSIGN, S75, 32, P1, 7; S75, 35, P1, 7)
Update Anomalies (cont.)
ASSIGN
Person
Id
Project Project TimeSpentBy
Budget Id
PersonOnProject
S75
S75
S79
S79
S80
35
40
32
27
40
17
35
S85
P1
P2
P1
P3
P2
P4
P1
Anomaly:
• different values for P1’s budget
7
3
4
1
5
9
Updating Anomalies Avoided
ASSIGNMENT
Person
Id
Project TimeSpentBy
Id
PersonOnProject
S75
S75
S79
S79
S80
P1
P2
P1
P3
P2
• No null values
• No anomalies after adding person’s assignment:
Add Tuple (ASSIGNMENT, S85, P1, 3)
• No anomalies after deleting person’s assignment:
Delete Tuple (ASSIGNMENT, S79, P3, 1)
• No anomalies after updating budget:
Update Tuple (PROJECT, P1, 32; P1, 35)
7
3
4
1
5
PROJECT
Project Project
Id
Budget
P1
P2
P3
P4
32
40
27
17
Functional Dependency




Definition:
Attribute B is functionally
dependent on attribute A if each
value in column A determines one
and only one value in column B.
Functional dependency is a property
of the meaning of attributes
A relation is made up of attributes
and functional dependencies among
them
Notation:
A
B
A
12
23
34
45
56
67
78
89
90
B
Peter
Judith
Mary
Peter
Jack
Ann
Mary
John
Judith
Let’s Exercise
Multivalued Dependency

Definition
There is a multivalued
dependency of field B
on field A if a value
for A is associated
with a collection of
values for B,
independent of values
for C.
Notation:
A
B
A
B
C
123 Italian
3/12/57
French
German
234 Hebrew 7/14/42
345 Spanish 12/23/68
Italian
French
Relational Database Design




Where do we get the relation, or relations, to start
the proces?
How do we know which relations need to be
restructured?
How do we go about such restructuring?
How do we know when we are done?
Formal criteria for data structuring: normal forms
First Normal Form (1NF)
A relation is in the 1NF if the domains of all
attributes are simple.
EMPL_PROJ = (SSN, EName, {PNumber, Hours})
Second Normal Form (2NF)
A relation is in 2NF if every non-key attribute is
functionally dependent on the key as a whole.
EMPL_PROJ = (SSN, PNumber, Hours, EName, PName)
Third Normal Form (3NF)
A relation is in 3NF if it is in 2NF and no non-key
attribute is functionally dependent on another nonkey attribute.
EMPL_DEPT = (SSN, EName, BDate, DeptNo, DName)
Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if every determinant is a
candidate key.
STUDENT = ([StudentId, PhoneNo, Course, Grade];
[StudentId
PhoneNo; StudentId, Course
Grade])
Fourth Normal Form (4NF)

Multivalued Dependency (X
Y)
A relation is in 4NF if every multivalued
determinant is a key.
EMPLOYEE = ([SSN, FName, Init, LName, Skill]; [SSN
SSN
Skill])
FName, Init, Lname;
Note: the key in the above relation is SSN, Skill and not just SSN.
Normalization Procedure I




Determine functional and multivalued
dependencies
Determine candidate keys, then key and non-key
attributes
If needed, decompose the relation to conform with
the 1NF
Decompose the relation as needed to conform with
the 2NF, 3NF, BCNF, and 4NF
Determining Candidate Keys


Perhaps the most difficult problem is to determine
the candidate keys
Procedure:
– Start with a key made of all the determinants
– Eliminate attributes that are determined by other attributes

Alternate elimination sequences lead to alternate
candidate keys
Normalization Procedure II




Determine functional and multivalued
dependencies
Determine candidate keys, then key and
non-key attributes
If needed, decompose the relation to
conform with the 1NF and 4NF
Decompose the relation to conform with the
BCNF
Decomposition Procedure
1
2
3
Take a relation, R = (A, B, C, D, E, ....), that is not
in the BCNF. Find an FD, C D, that is causing R
to not be in BCNF
Form two new relations: R1 = (A, B, C, E, ....) and
R2 = (C, D), where the dependency part of the FD
has been used to form R2
R1 and R2 must now be checked to see if they are
in BCNF
Note: Every effort should be made to avoid taking out an FD, when the
dependency part of that FD is itself, either all or part of it, a determinant for
another FD. In other words, start at the end of depencencies chain.
Let’s exercise some more!