- Courses - University of California, Berkeley

Download Report

Transcript - Courses - University of California, Berkeley

Database Design: Logical
Models: Normalization and The
Relational Model
University of California, Berkeley
School of Information
IS 257: Database Management
IS 257 – Fall 2014
2014.09.18 - SLIDE 1
Announcements
• Assignment 1 due Today
• Assignment 2 (Personal database
conceptual model) – due Oct. 2
IS 257 – Fall 2014
2014.09.18 - SLIDE 2
Lecture Outline
• Review
– Conceptual Model and UML
– Logical Model for the Diveshop database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2014
2014.09.18 - SLIDE 3
Lecture Outline
• Review
– Conceptual Model and UML
– Logical Model for the Diveshop database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2014
2014.09.18 - SLIDE 4
Class Diagrams
• A class diagram is a diagram that shows a
set of classes, interfaces, and/or
collaborations and the relationships
among these elements.
IS 257 – Fall 2014
2014.09.18 - SLIDE 5
UML Class Diagram
DIVEORDS
Order No
Customer No
Sale Date
Shipvia
PaymentMethod
CCNumber
No of People
Depart Date
Return Date
Destination
Vacation Cost
CalcTotalInvoice()
CalcEquipment()
IS 257 – Fall 2014
Class Name
List of Attributes
List of operations
2014.09.18 - SLIDE 6
Object Diagrams
307:DIVORDS
Order No = 307
Customer No = 1480
Sale Date = 9/1/99
Ship Via = UPS
PaymentMethod = Visa
CCNumber = 12345 678 90
CCExpDate = 1/1/01
No of People = 2
Depart Date = 11/8/00
Return Date = 11/15/00
Destination = Fiji
Vacation Cost = 10000
IS 257 – Fall 2014
2014.09.18 - SLIDE 7
Associations
• An association is a relationship that
describes a set of links between or among
objects.
• An association can have a name that
describes the nature of this relationship.
You can put a triangle next to this name to
indicate the direction in which the name
should be read.
IS 257 – Fall 2014
2014.09.18 - SLIDE 8
Associations: Unary relationships
*
0..1
Person
0..1
Is-married-to
manages
Employee
0..1
manager
IS 257 – Fall 2014
2014.09.18 - SLIDE 9
Associations: Binary Relationship
Employee
0..1
Is-assigned
Parking
Place
0..1
One-to-one
Product
Line
1
contains
*
Product
One-to-many
Student
*
Registers-for
*
Course
Many-to-many
IS 257 – Fall 2014
2014.09.18 - SLIDE 10
Associations: Ternary Relationships
Part
*
Vendor
IS 257 – Fall 2014
*
Supplies
* Warehouse
2014.09.18 - SLIDE 11
Association Classes
Registers-for
Student
*
Course
*
Computer Account
Registration
_________________
________________
acctID
Term
issues
Password
*
0..1
Grade
ServerSpace
________________
CheckEligibility()
IS 257 – Fall 2014
2014.09.18 - SLIDE 12
Derived Attributes, Associations, and
Roles
Course
Student
Course
Offering
_________
____________
____________ Scheduled-for
name
Registers-for
crseCode
term
ssn
*
crseTitle
*
*
1
section
dateOfBirth
creditHrs
time
Derived
/age
location
attribute
*
*
/participant Derived role
{age = currentDate – dateOfBirth}
/Takes
Derived association
IS 257 – Fall 2014
2014.09.18 - SLIDE 13
Generalization
employee
type
Employee
____________
empName
empNumber
address
dateHired
____________
printLabel()
employee
type
Hourly Employee
_______________
HourlyRate
_______________
computeWages()
IS 257 – Fall 2014
Salaried Employee
_______________
AnnualSalary
stockoption
_______________
Contributepension()
employee
type
Consultant
_______________
contractNumber
billingRate
_______________
computeFees()
2014.09.18 - SLIDE 14
Lecture Outline
• Review
– Conceptual Model and UML
– Logical Model for the Diveshop database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2014
2014.09.18 - SLIDE 15
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
IS 257 – Fall 2014
2014.09.18 - SLIDE 16
Logical Model: Mapping to a Relational
Model
• Each entity in the ER Diagram becomes a
relation.
• A properly normalized ER diagram will indicate
where intersection relations for many-to-many
mappings are needed.
• Relationships are indicated by common columns
(or domains) in tables that are related.
• We will examine the tables for the Diveshop
derived from the ER diagram
IS 257 – Fall 2014
2014.09.18 - SLIDE 17
DiveShop ER Diagram
Customer
No
DiveCust
1
Destination
Name
Destination
no
Customer
No
ShipVia
n
Dest
n
1
DiveOrds
n
1
ShipVia
ShipVia
1
Destination
no
Site No
1
n
Site No
BioSite
Species
No
1
Destination
n
Sites
Order
No
n
1
1/n
ShipWrck
Order
No
DiveItem
n
Item
No
n
Site No
1
Species
No
BioLife
IS 257 – Fall 2014
1
DiveStok
Item
No
2014.09.18 - SLIDE 18
Customer = DIVECUST
Customer No
Name
Street
City
State/Prov Zip/Postal Code
Country
1480 Louis Jazdzewski
2501 O'Connor
New Orleans
LA
60332
U.S.A.
1481 Barbara Wright
6344 W. Freeway
San Francisco
CA
95031
U.S.A.
1909 Stephen Bredenburg
559 N.E. 167
Indianapolis
Place IN
46241
U.S.A.
1913 Phillip Davoust
123 First Street
Berkeley CA
94704
U.S.A.
1969 David Burgett
320 Montgomery
SeattleStreet
WA
98105
U.S.A.
2001 Mary Rioux1701 Gateway
Pueblo
Blvd. #385
CO
81002
U.S.A.
2306 Kim Lopez 14134 Nottingham
HonoluluLane
HI
96826
U.S.A.
2589 Hiram Marley
7233 Mill Run
SanDrive
Francisco
CA
94123
U.S.A.
3154 Tanya Kulesa
505 S. Flower,
NewMail
YorkStop
NY 48943 10032
U.S.A.
3333 Charles Sekaron
110 East Park
Miller
Avenue,SD
Box 8
57362
U.S.A.
3684 Lowell Lutz915 E. Fesler
Dallas
TX
75043
U.S.A.
4158 Keith Lucas56 South Euclid
Chicago IL
60542
U.S.A.
4175 Karen Ng 2134 ElmhillKlamath
Pike Falls
OR
97603
U.S.A.
5510 Ken Soule 58 Sansome
Aurora
Street CO
89022
U.S.A.
IS 257 – Fall 2014
Phone
First Contact
(902) 555-88881/29/95
(415) 555-43212/2/93
(317) 555-36441/5/93
(415) 555-91843/9/98
(206) 555-75803/12/99
(719) 555-20103/15/97
(808) 555-50501/29/99
(415) 555-64302/18/99
(212) 555-67501/30/99
(613) 555-43333/16/98
(214) 555-27222/15/99
(312) 555-43103/17/98
(503) 555-47003/20/99
(303) 555-66952/5/99
2014.09.18 - SLIDE 19
Dive Order = DIVEORDS
Order No
Customer NoSale Date
Ship Via
PaymentMethod
CcNumber CcExpDate No Of PeopleDepart Date Return DateDestination VacationCost
307
310
1480
1481
9/1/99 UPS
9/1/99 FedEx
Visa
Check
313
314
317
320
321
325
1909
1913
1969
2001
2306
2589
9/1/99
9/1/99
9/1/99
9/1/99
9/1/99
9/1/99
Visa
456456456
Check
AmEx
432432432
Cash
Master Card1112223334
AmEx
332332332
326
327
3333
3684
9/1/99 FedEx
9/1/99 DHL
Money Order
Master Card122122321
329
330
4158
4175
9/1/99 Walk In
9/1/99 FedEx
331
333
5510
5926
336
5719
IS 257 – Fall 2014
1/1/01
2
1
11/8/00
4/4/00
9/11/00
8/12/00
12/10/99
4
3
4
1
1
1
6/27/00
2/7/00
5/9/00
10/10/00
3/15/00
3/15/00
11/9/99
2
4
2/10/00
3/10/00
2/17/00 Monterey
3/23/00 Florida
4000
24000
Cash
Check
1
2
5/4/00
7/3/00
5/15/00 Cozumel
7/10/00 Florida
1571
6000
9/1/99 FedEx
9/1/99 DHL
Money Order
Discover
123123123
6
2
6/20/00
6/10/00
9/1/99 FedEx
Cash
10
4/2/00
Walk In
FedEx
FedEx
Walk In
Emery
Emery
12345 678 90
12/31/02
12/21/00
11/15/00 Fiji
4/18/00 Santa Barbara
10000
6000
7/11/00
2/14/00
5/16/00
10/17/00
4/12/00
4/12/00
8000
6000
20000
3000
8000
8000
Cozumel
Monterey
Fiji
Santa Barbara
New Jersey
New Jersey
6/30/00 Santa Barbara
6/17/00 Fiji
36000
10000
4/24/00 Great Barrier Reef
200000
2014.09.18 - SLIDE 20
Line item = DIVEITEM
Order No Item No
307
90010
307
90020
307
90021
307
90030
307
90051
310
90011
310
90045
310
90059
310
90074
310
90078
313
90127
314
90072
314
90094
314
90100
317
90012
IS 257 – Fall 2014
Rental/SaleQty
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Rental
Sale
Rental
Rental
Rental
Sale
Line Note
4
1
1
2
2
1
1
1
1
1
1
3
3
3
2
This is our most popular mask.
These are our best selling fins.
A good weight belt for beginners
Holds 10 cubic feet of cargo.
2014.09.18 - SLIDE 21
Shipping information = SHIPVIA
Ship Via
DHL
Emery
FedEx
UPS
US Mail
IS 257 – Fall 2014
Ship Cost
8
11
12
10
6
2014.09.18 - SLIDE 22
Dive Equipment Stock= DIVESTOK
Item No
90010
90011
90012
90020
90021
90022
90023
90024
90025
90030
90031
90032
90033
90040
90041
90042
IS 257 – Fall 2014
DescriptionEquipment On
Class
Hand Reorder Point
Cost
Sale Price Rental Price
Shotgun 2 Snorkel - Clear
12
2 $18.00
$30.00
$2.00
Shotgun 2 Snorkel - Red
12
2 $18.00
$30.00
$2.00
Shotgun 2 Snorkel - Teal
11
2 $18.00
$30.00
$2.00
Tri-Vent Mask
Mask
- Clear
14
2 $62.50 $100.00
$5.00
Tri-Vent Mask
Mask
- Red
10
2 $62.50 $100.00
$5.00
Tri-Vent Mask
Mask
- Teal
14
2 $62.50 $100.00
$7.00
Quad Vision
Mask
Mask - Clear
11
2 $48.25
$80.00
$7.00
Quad Vision
Mask
Mask - Red
13
2 $48.25
$80.00
$7.00
Quad Vision
Mask
Mask - Teal
10
2 $48.25
$80.00
$10.00
Sea Wing Fins
Fins - Clear
12
2 $60.00 $100.00
$12.00
Sea Wing Fins
Fins - Red
11
2 $60.00 $100.00
$12.00
Sea Wing Fins
Fins - Teal
12
2 $60.00 $100.00
$12.00
Jet Fin - Black
Fins
14
2 $30.00
$60.00
$10.00
D350 Second
Regulator
Stage
11
1 $162.50 $270.00
$20.00
G250 Second
Regulator
Stage
13
1 $144.50 $240.00
$20.00
G200 Second
Regulator
Stage
12
1 $105.25 $175.00
$20.00
2014.09.18 - SLIDE 23
Dive Locations = DEST
IS 257 – Fall 2014
2014.09.18 - SLIDE 24
Dive Sites = SITE
IS 257 – Fall 2014
2014.09.18 - SLIDE 25
Sea Life = BIOLIFE
IS 257 – Fall 2014
2014.09.18 - SLIDE 26
BIOSITE -- linking relation
IS 257 – Fall 2014
2014.09.18 - SLIDE 27
Shipwrecks = SHIPWRK
IS 257 – Fall 2014
2014.09.18 - SLIDE 28
Mapping to Other Models
• Hierarchical
– Need to make decisions about access paths
• Network
– Need to pre-specify all of the links and sets
• Object-Oriented
– What are the objects, datatypes, their
methods and the access points for them
• Object-Relational
– Same as relational, but what new datatypes
might be needed or useful (more on OR later)
IS 257 – Fall 2014
2014.09.18 - SLIDE 29
Lecture Outline
• Review
– Logical Model for the Diveshop database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2014
2014.09.18 - SLIDE 30
Normalization
• Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets
of relations containing the same data
• Normalization is a multi-step process
beginning with an “unnormalized” relation
– Hospital example from Atre, S. Data Base:
Structured Techniques for Design,
Performance, and Management.
IS 257 – Fall 2014
2014.09.18 - SLIDE 31
Normal Forms
•
•
•
•
•
•
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
IS 257 – Fall 2014
2014.09.18 - SLIDE 32
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
IS 257 – Fall 2014
BoyceCodd and
Higher
Functional
dependency
of nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependency
of nonkey
attributes on
the primary
key
2014.09.18 - SLIDE 33
Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data may repeat
within a column
IS 257 – Fall 2014
2014.09.18 - SLIDE 34
Unnormalized Relation
Patient #
Surgeon #
145
1111 311
Surg. date
Jan 1,
1995; June
12, 1995
Patient Name
John White
Patient Addr
15 New St.
New York,
NY
Surgeon
Surgery
Beth Little
Michael
Diamond
Gallstone
s removal;
Kidney
stones
removal
Postop drug
Drug side effects
Penicillin,
none-
rash
none
Tetracyclin
e none
Fever
none
Cephalosp
orin
none
Demicillin
none
none
none
Eye
Apr 5,
243
1234 467
1994 May
10, 1995
Mary Jones
10 Main St.
Rye, NY
Charles
Field
Cataract
removal
Patricia
Gold
Thrombos
is removal
Dogwood
Lane
2345 189
Jan 8,
1996
Charles Brown
Harrison,
NY
Open
David
Rosen
Heart
Surgery
55 Boston
Post Road,
4876 145
Nov 5,
1995
Hal Kane
5123 145
May 10,
1995
Paul Kosher
Apr 5,
1994 Dec
6845 243
IS 257 – Fall 2014
15, 1984
Ann Hood
Chester,
CN
Blind Brook
Mamaronec
k, NY
Hilton Road
Larchmont,
NY
Beth Little
Beth Little
Cholecyst
ectomy
Gallstone
s
Removal
Eye
Cornea
Replacem
Charles
ent Eye
cataract
Tetracyclin
Field
removal
e
Fever
2014.09.18 - SLIDE 35
First Normal Form
• To move to First Normal Form a relation
must contain only atomic values at each
row and column.
– No repeating groups
– A column or set of columns is called a
Candidate Key when its values can uniquely
identify the row in the relation.
IS 257 – Fall 2014
2014.09.18 - SLIDE 36
First Normal Form
Pat ient #
Surgeon #
Surgery Date Pat ient Name
Patient Addr
Surgeon Name
15 New St.
New York,
1111
145
01-Jan-95
John White
1111
311
12-Jun-95
John White
1234
243
05-Apr-94
Mary Jones
1234
467
10-May-95
Mary Jones
2345
4876
5123
189
145
145
08-Jan-96
05-Nov-95
10-May-95
NY
15 New St.
New York,
NY
10 Main St.
Rye, NY
10 Main St.
Rye, NY
Dogwood
Lane
6845
IS 257 – Fall 2014
243
243
05-Apr-94
15-Dec-84
Drug admin
Side Effects
G allstone
Beth Litt le
Michael
Diamond
s removal
Kidney
stones
removal
none
none
Charles Field
Eye
Cataract
removal
Tet racyclin
e
Fever
Patricia Gold
Thrombos
is removal
none
none
Penicillin
rash
O pen
Charles
Brown
Harrison,
NY
David Rosen
Heart
Surgery
Cephalosp
orin
none
Hal Kane
55 Boston
Post Road,
Chester,
CN
Beth Litt le
Cholecyst
ectomy
Demicillin
none
Paul Kosher
Blind Brook
Mamaronec
k, NY
Beth Litt le
G allstone
s
Removal
none
none
Eye
Cornea
Hilton Road
6845
Surgery
Ann Hood
Larchmont,
NY
Ann Hood
Hilton Road
Larchmont,
NY
Charles Field
Replacem
ent
Tet racyclin
e
Fever
Charles Field
Eye
cataract
removal
none
none
2014.09.18 - SLIDE 37
1NF Storage Anomalies
• Insertion: A new patient has not yet undergone
surgery -- hence no surgeon # -- Since surgeon
# is part of the key we can’t insert.
• Insertion: If a surgeon is newly hired and hasn’t
operated yet -- there will be no way to include
that person in the database.
• Update: If a patient comes in for a new
procedure, and has moved, we need to change
multiple address entries.
• Deletion (type 1): Deleting a patient record may
also delete all info about a surgeon.
• Deletion (type 2): When there are functional
dependencies (like side effects and drug)
changing one item eliminates other information.
IS 257 – Fall 2014
2014.09.18 - SLIDE 38
Second Normal Form
• A relation is said to be in Second Normal
Form when every nonkey attribute is fully
functionally dependent on the primary
key.
– That is, every nonkey attribute needs the full
primary key for unique identification
• This is typically accomplished by
projecting (think splitting) the relations into
simpler relations with simpler keys
IS 257 – Fall 2014
2014.09.18 - SLIDE 39
Second Normal Form
Patient #
1111
1234
2345
4876
5123
6845
IS 257 – Fall 2014
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
2014.09.18 - SLIDE 40
Second Normal Form
Surgeon #
Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
IS 257 – Fall 2014
2014.09.18 - SLIDE 41
Second Normal Form
Patient # Surgeon # Surgery Date
1111
1111
1234
1234
2345
4876
Drug Admin Side Effects
145
Gallstones
01-Jan-95 removal
Kidney
Penicillin
rash
311
stones
12-Jun-95 removal
none
none
243
Eye Cataract
05-Apr-94 removal
Tetracycline Fever
467
Thrombosis
10-May-95 removal
189
Open Heart
08-Jan-96 Surgery
Cephalospori
n
none
145
Cholecystect
05-Nov-95 omy
Demicillin
none
none
none
none
none
5123
145
6845
243
6845
243
IS 257 – Fall 2014
Surgery
Gallstones
10-May-95 Removal
Eye cataract
15-Dec-84 removal
Eye Cornea
05-Apr-94 Replacement
none
none
Tetracycline Fever
2014.09.18 - SLIDE 42
1NF Storage Anomalies Removed
• Insertion: Can now enter new patients without
surgery.
• Insertion: Can now enter Surgeons who haven’t
operated.
• Deletion (type 1): If Charles Brown dies the
corresponding tuples from Patient and Surgery
tables can be deleted without losing information
on David Rosen.
• Update: If John White comes in for third time,
and has moved, we only need to change the
Patient table
IS 257 – Fall 2014
2014.09.18 - SLIDE 43
2NF Storage Anomalies
• Insertion: Cannot enter the fact that a particular
drug has a particular side effect unless it is given
to a patient.
• Deletion: If John White receives some other drug
because of the penicillin rash, and a new drug
and side effect are entered, we lose the
information that penicillin can cause a rash
• Update: If drug side effects change (a new
formula) we have to update multiple occurrences
of side effects.
IS 257 – Fall 2014
2014.09.18 - SLIDE 44
Third Normal Form
• A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes
– When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be a
transitive functional dependency.
• The side effect column in the Surgery table is
determined by the drug administered
– Side effect is transitively functionally dependent on
drug so Surgery is not 3NF
IS 257 – Fall 2014
2014.09.18 - SLIDE 45
Third Normal Form
Patient # Surgeon # Surgery Date
IS 257 – Fall 2014
Surgery
Drug Admin
1111
145
1111
311
01-Jan-95 Gallstones removal
Kidney stones
12-Jun-95 removal
1234
243
05-Apr-94 Eye Cataract removal Tetracycline
1234
467
10-May-95 Thrombosis removal
2345
189
08-Jan-96 Open Heart Surgery
Cephalosporin
4876
145
05-Nov-95 Cholecystectomy
Demicillin
5123
145
10-May-95 Gallstones Removal
none
6845
243
none
6845
243
15-Dec-84 Eye cataract removal
Eye Cornea
05-Apr-94 Replacement
Penicillin
none
none
Tetracycline
2014.09.18 - SLIDE 46
Third Normal Form
Drug Admin
IS 257 – Fall 2014
Side Effects
Cephalosporin
none
Demicillin
none
none
none
Penicillin
rash
Tetracycline
Fever
2014.09.18 - SLIDE 47
2NF Storage Anomalies Removed
• Insertion: We can now enter the fact that a
particular drug has a particular side effect
in the Drug relation.
• Deletion: If John White recieves some
other drug as a result of the rash from
penicillin, but the information on penicillin
and rash is maintained.
• Update: The side effects for each drug
appear only once.
IS 257 – Fall 2014
2014.09.18 - SLIDE 48
Boyce-Codd Normal Form
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
– Candidate keys in the relation are composite
keys (they are not single attributes)
– There is more than one candidate key in the
relation, and
– The keys are not disjoint, that is, some
attributes in the keys are common
IS 257 – Fall 2014
2014.09.18 - SLIDE 49
Most 3NF Relations are also BCNF – Is
this one?
Patient # Patient Name Patient Address
15 New St. New
1111 John White York, NY
10 Main St. Rye,
1234 Mary Jones NY
Charles
Dogwood Lane
2345 Brown
Harrison, NY
55 Boston Post
4876 Hal Kane
Road, Chester,
Blind Brook
5123 Paul Kosher Mamaroneck, NY
Hilton Road
6845 Ann Hood
Larchmont, NY
IS 257 – Fall 2014
2014.09.18 - SLIDE 50
BCNF Relations
Patient #
IS 257 – Fall 2014
Patient Name
Patient #
1111 John White
1111
1234 Mary Jones
Charles
2345 Brown
1234
4876 Hal Kane
4876
5123 Paul Kosher
5123
6845 Ann Hood
6845
2345
Patient Address
15 New St. New
York, NY
10 Main St. Rye,
NY
Dogwood Lane
Harrison, NY
55 Boston Post
Road, Chester,
Blind Brook
Mamaroneck, NY
Hilton Road
Larchmont, NY
2014.09.18 - SLIDE 51
Fourth Normal Form
• Any relation is in Fourth Normal Form if it
is BCNF and any multivalued
dependencies are trivial
• Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables
IS 257 – Fall 2014
2014.09.18 - SLIDE 52
Fifth Normal Form
• A relation is in 5NF if every join
dependency in the relation is implied by
the keys of the relation
• Implies that relations that have been
decomposed in previous NF can be
recombined via natural joins to recreate
the original 1NF relation
IS 257 – Fall 2014
2014.09.18 - SLIDE 53
Effectiveness and Efficiency Issues for
DBMS
• Our primary focus will be on the relational
model
• Any column in a relational database can
be searched for values
• To improve efficiency indexes using
storage structures such as BTrees and
Hashing are used
• But many useful functions are not
indexable and require complete scans of
the the database
IS 257 – Fall 2014
2014.09.18 - SLIDE 54
Example: Text Fields
• In conventional RDBMS, when a text field
is indexed, only exact matching of the text
field contents (or Greater-than and Lessthan).
– Can search for individual words using pattern
matching, but a full scan is required.
• Text searching is still done best (and
fastest) by specialized text search
programs (Search Engines) that we will
look at more later
IS 257 – Fall 2014
2014.09.18 - SLIDE 55
Normalization
• Normalization is performed to reduce or
eliminate Insertion, Deletion or Update
anomalies.
• However, a completely normalized
database may not be the most efficient or
effective implementation.
• “Denormalization” is sometimes used to
improve efficiency.
IS 257 – Fall 2014
2014.09.18 - SLIDE 56
Normalizing to death
• Normalization splits database information
across multiple tables.
• To retrieve complete information from a
normalized database, the JOIN operation
must be used.
• JOIN tends to be expensive in terms of
processing time, and very large joins are
very expensive.
IS 257 – Fall 2014
2014.09.18 - SLIDE 57
Denormalization
• Usually driven by the need to improve
query speed
• Query speed is improved at the expense
of more complex or problematic DML
(Data manipulation language) for updates,
deletions and insertions.
IS 257 – Fall 2014
2014.09.18 - SLIDE 58
Downward Denormalization
Customer
ID
Address
Name
Telephone
Before:
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
IS 257 – Fall 2014
After:
Customer
ID
Address
Name
Telephone
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
2014.09.18 - SLIDE 59
Upward Denormalization
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Item
Order No
Item No
Item Price
Num Ordered
IS 257 – Fall 2014
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Price
Order Item
Order No
Item No
Item Price
Num Ordered
2014.09.18 - SLIDE 60
Using RDBMS to help normalize
• Example database: Cookie
• Database of books, libraries, publisher and
holding information for a shared (union)
catalog
IS 257 – Fall 2014
2014.09.18 - SLIDE 61
Cookie relationships
IS 257 – Fall 2014
2014.09.18 - SLIDE 62
Cookie BIBFILE relation
ACCNO
A003
T082
C024
B006
B007
B005
B008
B010
B009
B012
B011
B014
B013
B016
B017
F047
B116
S102
B118
B018
C031
C032
C034
AUTHOR
TITLE
LOC
PUBID DATE
AMERICAN LIBRARY ASSOCIATION
ALA BULLETIN
CHICAGO
04
ANDERSON, THEODORE THE TEACHING OF MODERN
PARIS LANGUAGES53
1955
AXT, RICHARD G.
COLLEGE SELF STUDYBOULDER,
: LECTURES
CO.ON INSTITU
51
1960
BALDERSTON, FREDERICKMANAGING
E.
TODAYS UNIVERSITY
SAN FRANCISCO 27
1975
BARZUN, JACQUES
TEACHER IN AMERICA GARDEN CITY
18
1954
BARZUN, JACQUES
THE AMERICAN UNIVERSITY
NEW YORK
: HOW IT RUNS,
24 W 1970
BARZUN, JACQUES
THE HOUSE OF INTELLECT
NEW YORK
24
1961
BELL, DANIEL
THE COMING OF POST-INDUSTRIAL
NEW YORK SOCIETY
09 :
1976
BENSON, CHARLES S. IMPLEMENTING THE LEARNING
SAN FRANCISCO
SOCIETY 27
1974
BERG, IVAR
EDUCATION AND JOBSBOSTON
: THE GREAT TRAINING
10
1971
BERSI, ROBERT M.
RESTRUCTURING THE BACCALAUREATE
WASHINGTON, D.C.03
1973
BEVERIDGE, WILLIAM I. THE ART OF SCIENTIFIC
NEW
INVESTIGATION
YORK
58
1957
BIRD, CAROLINE
THE CASE AGAINST COLLEGE
NEW YORK
08
1975
BISSELL, CLAUDE T.
THE STRENGTH OF THE
TORONTO
UNIVERSITY
57
1968
BLAIR, GLENN MYERS
EDUCATIONAL PSYCHOLOGY
NEW YORK
30
1962
BLAKE, ELIAS, JR.
THE FUTURE OF THE BLACK
CAMBRIDGE,
COLLEGES
MA. 02
1971
BOLAND, R.J.
CRITICAL ISSUES IN INFORMATION
CHICHESTER,
SYSTEMS
ENG.63 R 1987
BROWN, SANBORN C., ED.
SCIENTIFIC MANPOWER
CAMBRIDGE, MASS.
29
1971
BUCKLAND, MICHAEL K. LIBRARY SERVICES IN ELMSFORD,
THEORY ANDNY
CONTEXT
70
1983
BUDIG, GENE A.
ACADEMIC QUICKSANDLINCOLN,
: SOME NEBRASKA
TRENDS AND
37 ISS 1973
CALIFORNIA. DEPT. OF JUSTICE
LAW IN THE SCHOOL MONTCLAIR, N.J. 35
1974
CAMPBELL, MARGARET A.WHY WOULD A GIRL GO
OLD
INTO
WESTBURY,
MEDICINE?
N.Y.
48
1973
CARNEGIE COMMISSION ON
A DIGEST
HIGHER
OF REPORTS
NEW
OF YORK
THE CARNEGIE
30 COMM
1974
IS 257 – Fall 2014
PRICE
$3.00
$10.95
$7.00
$6.00
$7.00
$5.00
$8.00
$10.00
$9.00
$12.00
$11.00
$14.00
$13.00
$14.00
$11.00
$14.25
$30.95
$4.00
$12.00
$13.00
$0.50
$1.50
$3.50
PAGINATION
ILL
63 V.
ILL.
294 P.
X, 300 P. GRAPHS
XVI, 307 P.
280 P.
XII, 319 P.
VIII, 271 P.
XXVII, 507 P.
XVII, 147 P.
XX, 200 P.
IV, 160P.
XIV, 239 P.
XII, 308 P.
VII, 251 P.
678 P.
VIII, PP. 539
XV, 394 P. ILL.
X, 180 P.
XII, 201 P. ILL.
74 P.
IV, 87 P.
V, 114 P.
399 P.
HEIGHT
26
22
28
24
18
20
21
21
24
21
23
18
18
21
24
23
24
26
23
23
21
24
24
2014.09.18 - SLIDE 63
How to Normalize?
• Currently no way to have multiple authors
for a given book, and there is duplicate
data spread over the BIBFILE table
• Can we use the DBMS to help us
normalize?
• It is possible (but takes a bit more SQL
knowledge than has been hinted at so far)
– We will return to this problem later
– But CONCEPTUALLY…
IS 257 – Fall 2014
2014.09.18 - SLIDE 64
Using RDBMS to Normalize
Create a new table for Authors that includes author name
and an automatically incrementing id number (for
primary key)
Populate the table using the unique author names (which
get assigned id numbers) by extracting them from the
BIBFILE…
CREATE TABLE AUTHORS (AU_ID INT AUTO_INCREMENT PRIMARY KEY)
;
Create a new table containing a author_id and an ACCNO
Populate the new table by matching the Authors and
BIBFILE names…
AS SELECT DISTINCT (Author) from BIBFILE
CREATE TABLE AU_BIB (AU_ID INT, ACCNO INT) AS SELECT AUTHORS.AU_ID,
BIBFILE.ACCNO FROM AUTHORS, BIBFILE WHERE AUTHORS.Author = BIBFILE.Author;
Drop the Author name column from BIBFILE
ALTER TABLE BIBFILE DROP COLUMN Author
IS 257 – Fall 2014
2014.09.18 - SLIDE 65
Lecture Outline
• Review
– Logical Model for the Diveshop database
• Normalization
• Relational Advantages and
Disadvantages
IS 257 – Fall 2014
2014.09.18 - SLIDE 66
Advantages of RDBMS
• Relational Database Management
Systems (RDBMS)
• Possible to design complex data storage
and retrieval systems with ease (and
without conventional programming).
• Support for ACID transactions
– Atomic
– Consistent
– Independent
– Durable
IS 257 – Fall 2014
2014.09.18 - SLIDE 67
Advantages of RDBMS
• Support for very large databases
• Automatic optimization of searching (when
possible)
• RDBMS have a simple view of the
database that conforms to much of the
data used in business
• Standard query language (SQL)
IS 257 – Fall 2014
2014.09.18 - SLIDE 68
Disadvantages of RDBMS
• Until recently, no real support for complex
objects such as documents, video, images,
spatial or time-series data. (ORDBMS add -- or
make available support for these)
• Often poor support for storage of complex
objects from OOP languages (Disassembling the
car to park it in the garage)
• Usually no efficient and effective integrated
support for things like text searching within fields
(MySQL now does have simple keyword
searching with index support, but no ranking)
IS 257 – Fall 2014
2014.09.18 - SLIDE 69
Next Week
• (Re)Introduction to SQL
• More on Logical Design/Normalization
• Physical Design
IS 257 – Fall 2014
2014.09.18 - SLIDE 70