46-942 - Andrew.cmu.edu
Download
Report
Transcript 46-942 - Andrew.cmu.edu
Information Resources
Management
March 13, 2001
Agenda
Administrivia
Normalization
Homework #7
Mid-Term #2
Administrivia
Homework #4
Homework #5
Homework #6
Quiz 2
Mid-Term #1 Keys
Mid-Term Grades
Regrade Requests HW 5 & 6
Create Database
Enter query(s) as submitted
Submit to me
Database (electronic)
Graded homework (paper)
Reserve the right to change test data
and reexecute query
Normalization
Why & What
1st Normal Form
2nd Normal Form
3rd Normal Form
Boyce-Codd Normal Form
4th Normal Form
Normalization - Why
Eliminate anomalies
Avoid duplication
Increase flexibility and stability
Reduce maintenance
Normalization - What?!?
Analysis of functional dependencies
between attributes
Building several smaller tables from
larger ones
Decomposing relations with anomalies
to produce smaller, well-structured
relations
Reducing complexity & increasing
stability
Normalization - What (2)
Series of Steps
Recipe for constructing a “good”
physical model of a database from a
logical model
Applied
to all existing tables, including
ones produced by earlier
normalization steps
Example
Sales
(Order#, Date, CustID, Name, Address,
City, State, Zip, {Product#,
ProductDesc, Price, QuantityOrdered},
Subtotal, Tax, S&H, Total)
What are the problems with using a
single table for all order information?
Problems
Implementing Repeating Groups
Duplication of Data (customer name &
address)
Unnecessary Data (subtotal, total, tax)
Others
Normalization is a process to eliminate
these problems.
1st Normal Form
Eliminate Repeating Groups
1st Normal Form has no repeating
groups
Create definition with all other attributes,
remove the repeat {}, and change the
primary key to include the “key” for the
repeating group.
Example
Sales
(Order#, Date, CustID, Name, Address,
City, State, Zip, Product#, ProductDesc,
Price, QuantityOrdered, Subtotal, Tax,
S&H, Total)
Why is this better?
1st NF Improvements
Implementation is possible
Querying is possible
2nd Normal Form
Remove all partial functional
dependencies
2nd Normal Form has no partial
functional dependencies and is in 1st
Normal Form
Partial dependencies get their own
tables -- original table gets a foreign key
Partial Functional
Dependencies
An attribute is only dependent on part of
the primary key
must be composite key
single attribute key is 2nd NF
Functional dependencies can be specified
explicitly but usually come from the E-R
model, user specifications, and common
sense
key non-key attributes
Example - Functional
Dependencies
Order# Date, CustID, Name, Address, City, State,
Zip, Subtotal, Tax, S&H, Total
Order#, Product# ProductDesc, Price,
QuantityOrdered
CustID Name, Address, City, State, Zip
Product# ProductDesc, Price
Which are partial functional
dependencies?
Example
Sales (Order#, Date, CustID, Name,
Address, City, State, Zip, Subtotal, Tax,
S&H, Total)
OrderLine (Order#, Product#,
ProductDesc, Price, QuantityOrdered)
Is this 2nd NF?
Example
Sales (Order#, Date, CustID, Name, Address,
City, State, Zip, Subtotal, Tax, S&H, Total)
OrderLine (Order#, Product#, QuantityOrdered)
Product (Product#, ProductDesc, Price)
Is this 2nd NF? Why is this better than 1st
NF?
2nd NF Improvements
Elimination of Duplicate Data
No Loss
3rd Normal Form
Eliminate transitive functional
dependencies
3rd Normal Form has no transitive
depencencies and is in 2nd Normal
Form
Transitive dependencies get their own
tables -- original table gets a foreign key
Transitive Functional
Dependencies
Attribute is dependent on another, nonkey attribute or attributes
Attribute is the result of a calculation
CustID Name, Address, City, State, Zip
Example
Sales (Order#, Date, CustID, Subtotal, Tax, S&H, Total)
OrderLine (Order#, Product#, QuantityOrdered)
Product (Product#, ProductDesc, Price)
Customer (CustID, Name, Address, City, State, Zip)
Is this 3rd NF? Why is this better than 2nd
NF?
Example
Sales (Order#, Date, CustID)
OrderLine (Order#, Product#, QuantityOrdered)
Product (Product#, ProductDesc, Price)
Customer (CustID, Name, Address, City, State, Zip)
Is this 3rd NF? Why is this better than 2nd
NF?
3rd NF Improvements
Elimination of Duplicate Data
No Loss
Data is Well-grouped
Beyond 3rd Normal Form
Assume we also want to track
information about products, builders,
and finishes
The following are the functional
dependencies:
Product, Finish Builder
Builder Finish
Beyond 3rd Normal Form
ProdFinish (Product#, {Finish, Builder})
becomes
ProdFinish (Product#, Finish, Builder)
Is this 3rd NF?
What’s wrong with 3rd NF?
Product
Finish
Builder
1
Cherry
Stan
1
Oak
Sue
2
Oak
Vera
3
Pine
Marv
3
Cherry
Stan
5
Oak
Vera
Product, Finish Builder
Builder Finish
What’s Wrong with 3rd NF?
Product
1
1
2
3
3
5
Finish
Cherry
Oak
Oak
Pine
Cherry
Oak
What happens when:
1. Vera is replaced by Vern?
Builder
Stan
Sue
Vera
Marv
Stan
Vera
2. Vera is rehired to work with Oak?
3. Product #3 in pine is discontinued?
What’s Wrong with 3rd NF?
Problems
1. Multiple changes need to be made
2. Can’t assign a builder without a
product
3. Lose information that Marv works in
Pine
Problem & Solution
Problem:
Builder Finish
Builder is not a key
Solution:
Boyce-Codd Normal Form
Boyce-Codd Normal Form
(BCNF)
Every determinant in a relation (LHS of
the FD’s) is a candidate key and 3rd NF
Make determinant part of the key and
that which is dependent on it an
attribute and renormalize
Example
ProductFinish (Product#, Builder, Finish)
Is this BCNF?
Hint: Is it 3rd NF?
Example
ProductFinish (Product#, Builder)
Builder (Builder, Finish)
Is there anything wrong with this?
Example
ProductBuilder (Product#, Builder)
Builder (Builder, Finish)
Normalization often results in the need
to rename tables so the table name
matches the actual contents.
Beyond BCNF
Normalization with separate repeating
groups can result in other anomalies
CustService (State, {SalesPerson},
{Delivery})
Beyond BCNF
CustService (State, SalesPerson, Delivery)
State
Sales
Delivery
State
Sales
Delivery
PA
George
UPS
NJ
Mike
UPS
PA
George
RPS
NJ
Mike
Truck
PA
Sue
UPS
NJ
Valerie
UPS
PA
Sue
RPS
NJ
Valerie
Truck
Is this BCNF?
Beyond BCNF
Everything is in the key -- must be
BCNF
Still problems with duplication
Multivalued Dependencies
Multivalued Dependency
At least three attributes (A, B, C)
A B and A C
B and C are independent of each other
(they really shouldn’t be in the same
table)
4th Normal Form
No multivalued dependencies and
BCNF
Create separate tables for each
separate functional dependency
Example
SalesForce (State, SalesPerson)
Delivery (State, Delivery)
State
Sales
State
Delivery
PA
George
PA
UPS
PA
Sue
PA
RPS
NJ
Mike
NJ
UPS
NJ
Valerie
NJ
Truck
Beyond 4th Normal Form
5th Normal Form
Project-Join Normal Form
Domain Key Normal Form (DKNF)
User View
Remove partial
functional dependencies
1st NF
Remove
repeating
groups
2nd NF
Remove remaining
functional dependency
anomalies
3rd NF
Remove transitive
functional dependencies
BCNF
Remove multivalued
dependencies
4th NF
In-Class Exercises
Identify the current normal form
If not 4th NF, transform to 4th NF
Homework #7
Normalization
Database schema from HW #3
Earlier due date - post key?
Mid-Term #2
Next week, 3/20
Topics
Converting an E-R Diagram to a
physical database schema
Normalizing that schema (3NF)
SQL
Identification of BCNF, 4NF problems