relation - Csmaster

Download Report

Transcript relation - Csmaster

David M. Kroenke’s
Database Processing:
Fundamentals, Design, and Implementation
Chapter Three:
The Relational Model
and Normalization
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-1
Chapter Premise
• We have received one or more tables of
existing data
• The data is to be stored in a NEW
database
• Should the table of data
– be stored as is, or
– restructured?
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-2
Example 1
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-3
Example 2
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-4
Data Redundancy
• Data redundancy results in data inconsistency
– Different and conflicting versions of the same data
appear in different places
– Errors more likely to occur when the same data
must be entered in several different places
• Data anomalies develop when required
changes in redundant data are not made
successfully
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-5
Modification Anomalies
• Update anomalies
– Occur when changes must be made to
existing records
• Insertion anomalies
– Occur when entering new records
• Deletion anomalies
– Occur when deleting records
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-6
Modification Anomalies
• The EQUIPMENT_REPAIR table before and after an
incorrect update operation on AcquisitionCost for Type =
Drill Press:
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-7
The Relational Model
• Introduced in 1970
• Created by E.F. Codd
– IBM engineer
– The model used the mathematical system
known as “relational algebra”
• Today it is the standard for commercial
DBMS products
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-8
Important Relational Model Terms
•
•
•
•
•
•
•
•
•
•
•
•
Entity
Relation (table)
Functional Dependency
Determinant
Candidate Key
Composite Key
Primary Key
Surrogate Key
Foreign Key
Referential integrity constraint
Normal Form
Multivalued Dependency
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-9
Entity
• An entity is some identifiable thing that users
want to track:
–
–
–
–
–
–
–
–
Customer
Computer
Sale
Student
Invoice
Department
Course
Policy
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-10
Relation
• Data about entities is stored in relations
• A relation is a two-dimensional table with these
characteristics:
– Rows contain data about an entity
– No two rows may contain identical data
– Columns contain data about attributes of the
entity
– All entries in a column are the same data type
– Each column has a unique name
– Cells of the table hold a single value
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-11
Employee Relation
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-12
Normalization
• This is the process of organizing data into
relations (tables) that are structurally
sound from the perspective of the
relational model
– SQL works well on data organized this way
– Data quality is higher; i.e., the data is more
reliable
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-13
Objectives of Normalization
• Develop a good description of the data, its
relationships and constraints
• Produce a stable set of relations that
– Is a faithful model of the enterprise
– Is highly flexible
– Reduces redundancy
• saves space
• reduces data inconsistency
– Is free of all anomalies
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-14
Anomalies are very bad!
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-15
Anomalies
• An anomaly is an inconsistent, incomplete,
or contradictory state of the database
– Insertion anomaly – user cannot insert a new
record when it should be possible to do so
– Deletion anomaly – when a record is deleted,
other information that is tied to it is also
deleted (not by design)
– Update anomaly – a record is updated, but
other appearances of the same data are not
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-16
Data redundancy leads to
anomalies
Find examples of insertion, deletion & update anomalies
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-17
Normalization
• Normalization plays an important role in
database design.
• Through it, we decompose relations (tables)
in stages from lower to higher normal forms
– 1NF, 2NF, 3NF, BCNF
– Other normal forms are 4NF, 5NF, DKNF
• We use normalization and E-R modeling
together for good database design
• It all starts with identifying functional
dependencies (FD’s)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-18
Functional Dependency
• If A and B are (sets of) attributes of relation R, B is
functionally dependent on A if…
– a particular value of A determines a unique
value of B.
• Emp_Name is functionally dependent on Emp_Num
• with a particular value for Emp_Num, I can find the
name of the employee (Emp_Name) with that
Emp_Num
• A→B says A determines B
– or B is dependent on A
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-19
Example of FDs
• R = NewStudent(stuId, lastName, major, credits,
status, ssn)
• Some FDs in R:
stuId → lastName
stuId → (lastName, major, credits, status,
ssn, stuId)
ssn → (stuId, lastName, major,
credits, status, ssn)
credits → status
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-20
Functional Dependency
• A functional dependency occurs when the value of one
(a set of) attribute(s) determines the value of a second
(set of) attribute(s):
StudentID  StudentName
StudentID  (DormName, DormRoom, Fee)
• The attribute on the left side of the functional
dependency is called the determinant
• Functional dependencies may be based on equations:
ExtendedPrice = Quantity X UnitPrice
(Quantity, UnitPrice)  ExtendedPrice
• But, functional dependencies are not equations!
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-21
Composite Determinants
• A determinant of a functional dependency
may itself consist of more than one
attribute:
(StudentName, ClassName)  (Grade)
Note that StudentName  Grade
ClassName  Grade
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-22
Functional Dependencies in the
SKU_DATA Table
Can you find three FDs?
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-23
Functional Dependencies in the
ORDER_ITEM Table
Can you find two?
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-24
Keys are determinants
• A key is a combination of one or more
attributes that is used to identify rows in a
relation
• A composite key is a key that consists of
two or more attributes
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-25
Keys & FD’s
• Superkey – functionally determines all
attributes in a relation
• Candidate key – a superkey that is a
minimal identifier
• Primary key - chosen candidate key
– Must always be filled (non-null) Entity integrity
– Must be unique
– May be composite
– Ideally, it is short, numeric and never changes
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-26
Surrogate Keys
• A surrogate key is an artificial column
added to a relation to serve as a primary
key:
– DBMS supplied
– Short, numeric, never changes, never reused
– Has artificial values that may be meaningless
to users
– See next slide for example
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-27
Utility of Surrogate Keys
• RENTAL_PROPERTY with no surrogate key:
RENTAL_PROPERTY (Street, City,
State/Province, Zip/PostalCode, Country, Rental_Rate)
• RENTAL_PROPERTY with a surrogate key:
RENTAL_PROPERTY (PropertyID, Street, City,
State/Province, Zip/PostalCode, Country, Rental_Rate)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-28
Foreign Keys
• A foreign key is the primary key of one
relation that is placed in another relation
– It forms a link between the two relations
– A foreign key can be atomic (single column) or
composite
DEPARTMENT (DepartmentName, BudgetCode, ManagerName)
EMPLOYEE
(EmployeeNumber, EmployeeName,DepartmentName)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-29
Referential Integrity
• Relations must exhibit integrity in their links to
other relations
– A foreign key field must contain a value that
• equals a primary key value in the corresponding relation, or
• is NULL
SKU_DATA (SKU, SKU_Description, Department, Buyer)
ORDER_ITEM (OrderNumber, SKU, Quantity, Price,
ExtendedPrice)
Where ORDER_ITEM.SKU must exist in
SKU_DATA.SKU
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-30
When might a foreign key value be
NULL?
• A CUSTOMER may have no AGENT
• An EMPLOYEE may be assigned to no
DEPARTMENT (yet)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-31
Normal Forms
• 1NF – A table that qualifies as a relation is in 1NF
• 2NF – A relation is in 2NF if all of its nonkey attributes
are dependent on every attribute in the primary key
• 3NF – A relation is in 3NF if it is in 2NF and has no
determinants except the primary key
• Boyce-Codd Normal Form (BCNF) – A relation is in
BCNF if every determinant is a candidate key
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-32
First Normal Form (1NF)
• A relation is in 1NF if every attribute is
single-valued for each tuple
– each cell of the table contains only one value
• Domains of attributes are atomic
– No sets
– No lists
– No repeating fields or groups allowed
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-33
This relation is not in 1NF
stuid lastName
S1001 Smith
S1003 Jones
S1006 Lee
S1010 Burns
S1060 Jones
major
History
Math
CSC
Math
Art
English
CSC
credits
status
ssn
90
95
15
Sr
Sr
Fr
100429500
010124567
088520876
63
Jr
099320985
25
Fr
064624738
(Assume students can have double majors)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-34
Decomposing into 1NF
• Create a new table for each multi-valued
attribute
– Put the PK of the original table and the multi-valued
attribute in this table
– The PK of this new table is composite
– Will have additional rows for each value of the
attribute
• Remove the multi-valued attribute from the
original table
NewStu2(stuId, lastName, credits,status, ssn)
Majors(stuId, major)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-35
Two new tables
stuId
lastName credits
status
S1001
S1003
S1006
S1010
S1060
Smith
Jones
Lee
Burns
Jones
Sr
Sr
Fr
Jr
Fr
90
95
15
63
25
ssn
100429500
010124567
088520876
099320985
064624738
NewStu2
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
stuId
S1001
major
History
S1003
Math
S1006
CSC
S1006
Math
S1010
Art
S1010
English
S1060
CSC
Majors
3-36
Another method for 1NF
• “Flatten” the original table by making
the multi-valued attribute part of a
new composite key
• Student(stuId, lastName, major,
credits, status, ssn)
– See next slide…
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-37
Flattened and in 1NF
stuid lastName
S1001
S1003
S1006
S1006
S1010
S1010
S1060
Smith
Jones
Lee
Lee
Burns
Burns
Jones
major
History
Math
CSC
Math
Art
English
CSC
credits
status
90
95
15
15
63
63
25
Sr
Sr
Fr
Fr
Jr
Jr
Fr
ssn
100429500
010124567
088520876
088520876
099320985
099320985
064624738
NewStu Table with PK (stuID, major)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-38
Another (generally not-so-good)
method for 1NF
• If the number of repeats is
specified or limited, can make
additional columns for multiple
values
• Student(stuId, lastName, major1,
major2, credits, status, ssn)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-39
This relation is also not in 1NF
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-40
The “flattening” approach works
best here
But we are still prone to all forms of anomalies, and so we
must go on to transform this into 2NF and above.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-41
Second Normal Form (2NF)
• A relation is in second normal form (2NF)
if it is in first normal form and all the nonkey attributes are fully functionally
dependent on the key.
– No non-key attribute is FD on just part of
the key
– If R’s key has only one attribute (ie, is not
composite), and R is 1NF, R is automatically
in 2NF
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-42
Full Functional Dependency
• Transforming to 2NF requires that all
dependencies within a relation are full functional
dependencies
– ie., no partial dependencies on the key
• In relation R, a set of attributes B is fully
functionally dependent on set of attributes A if
B is functionally dependent on A…
– but not functionally dependent on any proper
subset of A
• This means every attribute in A is needed to
functionally determine B
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-43
NewClass is not in 2NF
NewClass(courseNo, stuId, lastName, facId, schedule, room, grade)
FDs:
(courseNo,stuId) → (lastName)
(courseNo,stuId) →(facId)
(courseNo,stuId) →(schedule)
(courseNo,stuId) →(room)
(courseNo,stuId) →(grade)
courseNo → facId
courseNo → schedule
courseNo → room
stuId → lastName
Looks like we’ve found
a primary key
But…these are all
partially dependent on
the key
…plus trivial FDs that are partial…
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-44
Decomposing into 2NF
• Identify each partial FD.
• Create a new relation for each part of the PK that
determines other attributes.
– Remove attributes that depend on each of these determinants
from the original relation & put them in the new relation
– I.e., place all determinants in separate relations along with their
dependent attributes
• In the original relation keep the composite key and any
attributes that are fully functionally dependent on all of it.
• Even if the composite key has no dependent attributes,
keep that relation to connect logically to the other
relations.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-45
Putting NewClass into 2NF
• NewClass(courseNo, stuId, lastName, facId,
schedule, room, grade )
• FDs grouped by determinant:
courseNo → (courseNo, facId, schedule, room)
stuId → (stuId, lastName)
(courseNo,stuId) → (courseNo, stuId, facId, schedule,
room, lastName, grade)
• Create tables grouped by determinants:
Course(courseNo, facId, schedule, room)
Stu(stuId, lastName)
• Keep relation with original composite key, with
attributes FD on it:
NewStu2( courseNo, stuId, grade)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-46
2NF - Putting it all together
• We started with:
– NewClass(courseNo, stuId, lastName,
facId, schedule, room, grade)
• It was already in 1NF
– We decomposed it into 2NF:
• Course(courseNo, facId, schedule, room)
• Stu(stuId, lastName)
• NewStu2( courseNo, stuId, grade)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-47
Third Normal Form
• A relation is in 3NF if whenever a nontrivial functional dependency X→A exists,
– either X is a superkey or
– A is a member of some candidate key
• To be in 3NF, a relation must be in 2NF
and have no transitive dependencies
– I.e., no non-key attribute may determine
another non-key attribute.
– Here key includes “candidate key”
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-48
Transitive Dependency
• If A, B, and C are attributes of relation R, such
that A → B, and B → C, then C is transitively
dependent on A.
• NewStudent (stuId, lastName, major, credits, status)
FD:
credits→status …but credits is not a key
By transitivity:
stuId→credits AND credits→status implies stuId→status
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-49
Decomposing into 3NF
• NewStudent (stuId, lastName, major, credits, status)
• FD
credits→status
• Remove the dependent attribute, status, from the
relation
• Create a new table with the dependent attribute and its
determinant, credits
• Keep the determinant in the original table
NewStu2 (stuId, lastName, major, credits)
Status (credits, status)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
In 3NF
3-50
1NF/2NF/3NF Process
• Before moving onto BCNF…
– Put relation in 1NF
• Remove all multi-valued
attributes/repeating groups/etc.
– List all FD’s / find a key
– Put relation in 2NF
• Remove all partial dependencies on key
– Put relation in 3NF
• Remove all transitive dependencies
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-51
Comprehensive example
• Work(projName, projMgr, empID, hours,
empName, budget, startDate, salary, empMgr,
empDept, rating)
– If not in 1NF, fix it
– List all FD’s
– Find a key
– Remove all partial dependencies (2NF)
– Remove all transitive dependencies (3NF)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition
© 2006 Pearson Prentice Hall
3-52