- Courses - University of California, Berkeley

Download Report

Transcript - Courses - University of California, Berkeley

Physical Database Design
University of California, Berkeley
School of Information
IS 257: Database Management
IS 257 – Fall 2008
2008-09-23 SLIDE 1
Lecture Outline
• Review
– Normalization
• Physical Database Design
• Access Methods
IS 257 – Fall 2008
2008-09-23 SLIDE 2
Lecture Outline
• Review
– Normalization
• Physical Database Design
• Access Methods
IS 257 – Fall 2008
2008-09-23 SLIDE 3
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
IS 257 – Fall 2008
2008-09-23 SLIDE 4
Normalization
• Normalization theory is based on the
observation that relations with certain
properties are more effective in inserting,
updating and deleting data than other sets
of relations containing the same data
• Normalization is a multi-step process
beginning with an “unnormalized” relation
– Hospital example from Atre, S. Data Base:
Structured Techniques for Design,
Performance, and Management.
IS 257 – Fall 2008
2008-09-23 SLIDE 5
Normal Forms
•
•
•
•
•
•
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
IS 257 – Fall 2008
2008-09-23 SLIDE 6
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
IS 257 – Fall 2008
BoyceCodd and
Higher
Functional
dependency
of nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependency
of nonkey
attributes on
the primary
key
2008-09-23 SLIDE 7
Unnormalized Relations
• First step in normalization is to convert the
data into a two-dimensional table
• In unnormalized relations data can repeat
within a column
IS 257 – Fall 2008
2008-09-23 SLIDE 8
Unnormalized Relation
Patient #
Surgeon #
145
1111 311
Surg. date
Patient Name
Jan 1,
1995; June
12, 1995
John White
Patient Addr Surgeon
15 New St.
New York,
NY
243
1234 467
2345 189
Jan 8,
1996
Charles Brown
4876 145
Nov 5,
1995
Hal Kane
5123 145
May 10,
1995
Paul Kosher
Charles
Field
10 Main St. Patricia
Rye, NY
Gold
Dogwood
Lane
Harrison,
David
NY
Rosen
55 Boston
Post Road,
Chester,
CN
Beth Little
Blind Brook
Mamaronec
k, NY
Beth Little
6845 243
Apr 5,
1994 Dec
15, 1984
Ann Hood
Hilton Road
Larchmont, Charles
NY
Field
IS 257 – Fall 2008
Postop drug
Drug side effects
Gallstone
s removal;
Beth Little Kidney
Michael
stones
Penicillin,
Diamond removal
none-
Apr 5,
1994 May
10, 1995
Mary Jones
Surgery
rash
none
Eye
Cataract
removal
Thrombos Tetracyclin Fever
is removal e none
none
Open
Heart
Surgery
Cholecyst
ectomy
Gallstone
s
Removal
Eye
Cornea
Replacem
ent Eye
cataract
removal
Cephalosp
orin
none
Demicillin
none
none
none
Tetracyclin
e
Fever
2008-09-23 SLIDE 9
First Normal Form
• To move to First Normal Form a relation
must contain only atomic values at each
row and column.
– No repeating groups
– A column or set of columns is called a
Candidate Key when its values can uniquely
identify the row in the relation.
IS 257 – Fall 2008
2008-09-23 SLIDE 10
First Normal Form
Patient #
Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name
1111
145
01-Jan-95 John White
1111
311
12-Jun-95 John White
15 New St.
New York,
NY
15 New St.
New York,
NY
1234
243
05-Apr-94 Mary Jones
10 Main St.
Rye, NY
1234
467
10-May-95 Mary Jones
2345
4876
5123
6845
6845
IS 257 – Fall 2008
189
145
145
243
243
Charles
08-Jan-96 Brown
10 Main St.
Rye, NY
Dogwood
Lane
Harrison,
NY
05-Nov-95 Hal Kane
55 Boston
Post Road,
Chester,
CN
05-Apr-94 Ann Hood
15-Dec-84 Ann Hood
Hilton Road
Larchmont,
NY
Drug adminSide Effects
Charles Field
Gallstone
s removal
Kidney
stones
removal
Eye
Cataract
removal
Patricia Gold
Thrombos
is removal none
none
David Rosen
Open
Heart
Surgery
none
Beth Little
Cholecyst
ectomy
Demicillin
Beth Little
Michael
Diamond
Blind Brook
Mamaronec
10-May-95 Paul Kosher k, NY
Beth Little
Hilton Road
Larchmont,
NY
Surgery
Penicillin
rash
none
none
Tetracyclin
e
Fever
Cephalosp
orin
Charles Field
Gallstone
s
Removal
none
Eye
Cornea
Replacem Tetracyclin
ent
e
Charles Field
Eye
cataract
removal
none
none
none
Fever
none
2008-09-23 SLIDE 11
Second Normal Form
• A relation is said to be in Second Normal
Form when every nonkey attribute is fully
functionally dependent on the primary key.
– That is, every nonkey attribute needs the full
primary key for unique identification
IS 257 – Fall 2008
2008-09-23 SLIDE 12
Second Normal Form
Patient #
1111
1234
2345
4876
5123
6845
IS 257 – Fall 2008
Patient Name Patient Address
15 New St. New
John White York, NY
10 Main St. Rye,
Mary Jones NY
Charles
Dogwood Lane
Brown
Harrison, NY
55 Boston Post
Hal Kane
Road, Chester,
Blind Brook
Paul Kosher Mamaroneck, NY
Hilton Road
Ann Hood
Larchmont, NY
2008-09-23 SLIDE 13
Second Normal Form
Surgeon #
Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
IS 257 – Fall 2008
2008-09-23 SLIDE 14
Second Normal Form
Patient # Surgeon # Surgery Date
1111
1111
1234
1234
2345
4876
Drug Admin Side Effects
145
Gallstones
01-Jan-95 removal
Kidney
Penicillin
rash
311
stones
12-Jun-95 removal
none
none
243
Eye Cataract
05-Apr-94 removal
Tetracycline Fever
467
Thrombosis
10-May-95 removal
189
Open Heart
08-Jan-96 Surgery
Cephalospori
n
none
145
Cholecystect
05-Nov-95 omy
Demicillin
none
none
none
none
none
5123
145
6845
243
6845
243
IS 257 – Fall 2008
Surgery
Gallstones
10-May-95 Removal
Eye cataract
15-Dec-84 removal
Eye Cornea
05-Apr-94 Replacement
none
none
Tetracycline Fever
2008-09-23 SLIDE 15
Third Normal Form
• A relation is said to be in Third Normal Form if
there is no transitive functional dependency
between nonkey attributes
– When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be a
transitive functional dependency.
• The side effect column in the Surgery table is
determined by the drug administered
– Side effect is transitively functionally dependent on
drug so Surgery is not 3NF
IS 257 – Fall 2008
2008-09-23 SLIDE 16
Third Normal Form
Patient # Surgeon # Surgery Date
IS 257 – Fall 2008
Surgery
Drug Admin
1111
145
1111
311
01-Jan-95 Gallstones removal
Kidney stones
12-Jun-95 removal
Penicillin
1234
243
05-Apr-94 Eye Cataract removal Tetracycline
1234
467
10-May-95 Thrombosis removal
2345
189
08-Jan-96 Open Heart Surgery
Cephalosporin
4876
145
05-Nov-95 Cholecystectomy
Demicillin
5123
145
10-May-95 Gallstones Removal
none
6845
243
none
6845
243
15-Dec-84 Eye cataract removal
Eye Cornea
05-Apr-94 Replacement
none
none
Tetracycline
2008-09-23 SLIDE 17
Third Normal Form
Drug Admin
IS 257 – Fall 2008
Side Effects
Cephalosporin
none
Demicillin
none
none
none
Penicillin
rash
Tetracycline
Fever
2008-09-23 SLIDE 18
Boyce-Codd Normal Form
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
– Candidate keys in the relation are composite
keys (they are not single attributes)
– There is more than one candidate key in the
relation, and
– The keys are not disjoint, that is, some
attributes in the keys are common
IS 257 – Fall 2008
2008-09-23 SLIDE 19
BCNF Relations
Patient # Patient Name
IS 257 – Fall 2008
Patient #
1111 John White
1111
1234 Mary Jones
Charles
2345 Brown
1234
4876 Hal Kane
4876
5123 Paul Kosher
5123
6845 Ann Hood
6845
2345
Patient Address
15 New St. New
York, NY
10 Main St. Rye,
NY
Dogwood Lane
Harrison, NY
55 Boston Post
Road, Chester,
Blind Brook
Mamaroneck, NY
Hilton Road
Larchmont, NY
2008-09-23 SLIDE 20
Fourth Normal Form
• Any relation is in Fourth Normal Form if it
is BCNF and any multivalued
dependencies are trivial
• Eliminate non-trivial multivalued
dependencies by projecting into simpler
tables
IS 257 – Fall 2008
2008-09-23 SLIDE 21
Fifth Normal Form
• A relation is in 5NF if every join
dependency in the relation is implied by
the keys of the relation
• Implies that relations that have been
decomposed in previous NF can be
recombined via natural joins to recreate
the original relation.
IS 257 – Fall 2008
2008-09-23 SLIDE 22
Normalization
• Normalization is performed to reduce or
eliminate Insertion, Deletion or Update
anomalies.
• However, a completely normalized
database may not be the most efficient or
effective implementation.
• “Denormalization” is sometimes used to
improve efficiency.
IS 257 – Fall 2008
2008-09-23 SLIDE 23
Denormalization
• Usually driven by the need to improve
query speed
• Query speed is improved at the expense
of more complex or problematic DML
(Data manipulation language) for updates,
deletions and insertions.
IS 257 – Fall 2008
2008-09-23 SLIDE 24
Downward Denormalization
Customer
ID
Address
Name
Telephone
Before:
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
IS 257 – Fall 2008
After:
Customer
ID
Address
Name
Telephone
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
2008-09-23 SLIDE 25
Upward Denormalization
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Item
Order No
Item No
Item Price
Num Ordered
IS 257 – Fall 2008
Order
Order No
Date Taken
Date Dispatched
Date Invoiced
Cust ID
Cust Name
Order Price
Order Item
Order No
Item No
Item Price
Num Ordered
2008-09-23 SLIDE 26
Using RDBMS to help normalize
• Example database: Cookie
• Database of books, libraries, publisher and
holding information for a shared (union)
catalog
IS 257 – Fall 2008
2008-09-23 SLIDE 27
Cookie relationships
IS 257 – Fall 2008
2008-09-23 SLIDE 28
Cookie BIBFILE relation
ACCNO
A003
T082
C024
B006
B007
B005
B008
B010
B009
B012
B011
B014
B013
B016
B017
F047
B116
S102
B118
B018
C031
C032
C034
AUTHOR
TITLE
LOC
PUBID DATE PRICE
AMERICAN LIBRARY ASSOCIATION
ALA BULLETIN
CHICAGO
04
$3.00
ANDERSON, THEODORE
THE TEACHING OF MODERN
PARIS LANGUAGES
53
1955
$10.95
AXT, RICHARD G.
COLLEGE SELF STUDY
BOULDER,
: LECTURES
CO. ON
51INSTITU
1960
$7.00
BALDERSTON, FREDERICK
MANAGING
E.
TODAYS SAN
UNIVERSITY
FRANCISCO 27
1975
$6.00
BARZUN, JACQUES
TEACHER IN AMERICA
GARDEN CITY
18
1954
$7.00
BARZUN, JACQUES
THE AMERICAN UNIVERSITY
NEW YORK
: HOW IT 24
RUNS, W
1970
$5.00
BARZUN, JACQUES
THE HOUSE OF INTELLECT
NEW YORK
24
1961
$8.00
BELL, DANIEL
THE COMING OF POST-INDUSTRIAL
NEW YORK
SOCIETY
09
1976
:
$10.00
BENSON, CHARLES S. IMPLEMENTING THE SAN
LEARNING
FRANCISCO
SOCIETY
27
1974
$9.00
BERG, IVAR
EDUCATION AND JOBS
BOSTON
: THE GREAT TRAINING
10
1971
$12.00
BERSI, ROBERT M.
RESTRUCTURING THE
WASHINGTON,
BACCALAUREATE
D.C.
03
1973
$11.00
BEVERIDGE, WILLIAM I.THE ART OF SCIENTIFIC
NEWINVESTIGATION
YORK
58
1957
$14.00
BIRD, CAROLINE
THE CASE AGAINST NEW
COLLEGE
YORK
08
1975
$13.00
BISSELL, CLAUDE T. THE STRENGTH OF THE
TORONTO
UNIVERSITY 57
1968
$14.00
BLAIR, GLENN MYERS EDUCATIONAL PSYCHOLOGY
NEW YORK
30
1962
$11.00
BLAKE, ELIAS, JR.
THE FUTURE OF THECAMBRIDGE,
BLACK COLLEGES
MA.02
1971
$14.25
BOLAND, R.J.
CRITICAL ISSUES IN INFORMATION
CHICHESTER, ENG.
SYSTEMS
63
1987
R
$30.95
BROWN, SANBORN C., SCIENTIFIC
ED.
MANPOWER
CAMBRIDGE, MASS.
29
1971
$4.00
BUCKLAND, MICHAEL K.LIBRARY SERVICES ELMSFORD,
IN THEORY AND
NY CONTEXT
70
1983
$12.00
BUDIG, GENE A.
ACADEMIC QUICKSAND
LINCOLN,
: SOME
NEBRASKA
TRENDS
37 AND1973
ISS
$13.00
CALIFORNIA. DEPT. OFLAW
JUSTICE
IN THE SCHOOLMONTCLAIR, N.J. 35
1974
$0.50
CAMPBELL, MARGARET
WHY
A. WOULD A GIRLOLD
GO INTO
WESTBURY,
MEDICINE?
48
N.Y. 1973
$1.50
CARNEGIE COMMISSION
A DIGEST
ON HIGHER
OF REPORTS
NEW
OF
YORK
THE CARNEGIE
30
COMM
1974
$3.50
IS 257 – Fall 2008
PAGINATION
ILL
HEIGHT
63 V.
ILL.
26
294 P.
22
X, 300 P. GRAPHS
28
XVI, 307 P.
24
280 P.
18
XII, 319 P.
20
VIII, 271 P.
21
XXVII, 507 P.
21
XVII, 147 P.
24
XX, 200 P.
21
IV, 160P.
23
XIV, 239 P.
18
XII, 308 P.
18
VII, 251 P.
21
678 P.
24
VIII, PP. 539
23
XV, 394 P. ILL.
24
X, 180 P.
26
XII, 201 P. ILL.
23
74 P.
23
IV, 87 P.
21
V, 114 P.
24
399 P.
24
2008-09-23 SLIDE 29
How to Normalize?
• Currently no way to have multiple authors
for a given book, and there is duplicate
data spread over the BIBFILE table
• Can we use the DBMS to help us
normalize?
• Access example…
IS 257 – Fall 2008
2008-09-23 SLIDE 30
Lecture Outline
• Review
– Normalization
– Using Relational DBs in normalization
• Physical Database Design
• Access Methods
IS 257 – Fall 2008
2008-09-23 SLIDE 31
Database Design Process
Application 1
External
Model
Application 2
Application 3
Application 4
External
Model
External
Model
External
Model
Application 1
Conceptual
requirements
Application 2
Conceptual
requirements
Application 3
Conceptual
requirements
Conceptual
Model
Logical
Model
Internal
Model
Application 4
Conceptual
requirements
IS 257 – Fall 2008
Physical
Design
2008-09-23 SLIDE 32
Physical Database Design
• Many physical database design decisions
are implicit in the technology adopted
– Also, organizations may have standards or an
“information architecture” that specifies
operating systems, DBMS, and data access
languages -- thus constraining the range of
possible physical implementations.
• We will be concerned with some of the
possible physical implementation issues
IS 257 – Fall 2008
2008-09-23 SLIDE 33
Physical Database Design
• The primary goal of physical database
design is data processing efficiency
• We will concentrate on choices often
available to optimize performance of
database services
• Physical Database Design requires
information gathered during earlier stages
of the design process
IS 257 – Fall 2008
2008-09-23 SLIDE 34
Physical Design Information
• Information needed for physical file and
database design includes:
– Normalized relations plus size estimates for them
– Definitions of each attribute
– Descriptions of where and when data are used
• entered, retrieved, deleted, updated, and how often
– Expectations and requirements for response time,
and data security, backup, recovery, retention and
integrity
– Descriptions of the technologies used to implement
the database
IS 257 – Fall 2008
2008-09-23 SLIDE 35
Physical Design Decisions
• There are several critical decisions that
will affect the integrity and performance of
the system
– Storage Format
– Physical record composition
– Data arrangement
– Indexes
– Query optimization and performance tuning
IS 257 – Fall 2008
2008-09-23 SLIDE 36
Storage Format
• Choosing the storage format of each field
(attribute). The DBMS provides some set
of data types that can be used for the
physical storage of fields in the database
• Data Type (format) is chosen to minimize
storage space and maximize data integrity
IS 257 – Fall 2008
2008-09-23 SLIDE 37
Objectives of data type selection
•
•
•
•
•
Minimize storage space
Represent all possible values
Improve data integrity
Support all data manipulations
The correct data type should, in minimal
space, represent every possible value (but
eliminate illegal values) for the associated
attribute and can support the required data
manipulations (e.g. numerical or string
operations)
IS 257 – Fall 2008
2008-09-23 SLIDE 38
Access Data Types
•
•
•
•
•
•
•
•
•
Numeric (1, 2, 4, 8 bytes, fixed or float)
Text (255 max)
Memo (64000 max)
Date/Time (8 bytes)
Currency (8 bytes, 15 digits + 4 digits decimal)
Autonumber (4 bytes)
Yes/No (1 bit)
OLE (limited only by disk space)
Hyperlinks (up to 64000 chars)
IS 257 – Fall 2008
2008-09-23 SLIDE 39
Access Numeric types
• Byte
– Stores numbers from 0 to 255 (no fractions). 1 byte
• Integer
– Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes
• Long Integer
(Default)
– Stores numbers from –2,147,483,648 to 2,147,483,647 (no fractions). 4
bytes
• Single
– Stores numbers from -3.402823E38 to –1.401298E–45 for negative
values and from 1.401298E–45 to 3.402823E38 for positive values.
4 bytes
• Double
– Stores numbers from –1.79769313486231E308 to –
4.94065645841247E–324 for negative values and from
1.79769313486231E308 to 4.94065645841247E–324 for positive
values.
15
8 bytes
• Replication ID
– Globally unique identifier (GUID) N/A
IS 257 – Fall 2008
16 bytes
2008-09-23 SLIDE 40
Controlling Data Integrity
•
•
•
•
•
Default values
Range control
Null value control
Referential integrity (next time)
Handling missing data
IS 257 – Fall 2008
2008-09-23 SLIDE 41
Designing Physical Records
• A physical record is a group of fields
stored in adjacent memory locations and
retrieved together as a unit
• Fixed Length and variable fields
IS 257 – Fall 2008
2008-09-23 SLIDE 42
Designing Physical/Internal Model
• Overview
• terminology
• Access methods
IS 257 – Fall 2008
2008-09-23 SLIDE 43
Physical Design
• Internal Model/Physical Model
User request
Interface 1
External Model
DBMS
Internal Model
Access Methods
Interface 2
Operating
System
Access Methods
Interface 3
Data
Base
IS 257 – Fall 2008
2008-09-23 SLIDE 44
Physical Design
• Interface 1: User request to the DBMS.
The user presents a query, the DBMS
determines which physical DBs are
needed to resolve the query
• Interface 2: The DBMS uses an internal
model access method to access the data
stored in a logical database.
• Interface 3: The internal model access
methods and OS access methods access
the physical records of the database.
IS 257 – Fall 2008
2008-09-23 SLIDE 45
Physical File Design
• A Physical file is a portion of secondary storage
(disk space) allocated for the purpose of storing
physical records
• Pointers - a field of data that can be used to
locate a related field or record of data
• Access Methods - An operating system
algorithm for storing and locating data in
secondary storage
• Pages - The amount of data read or written in
one disk input or output operation
IS 257 – Fall 2008
2008-09-23 SLIDE 46
Internal Model Access Methods
• Many types of access methods:
– Physical Sequential
– Indexed Sequential
– Indexed Random
– Inverted
– Direct
– Hashed
• Differences in
– Access Efficiency
– Storage Efficiency
IS 257 – Fall 2008
2008-09-23 SLIDE 47
Physical Sequential
• Key values of the physical records are in
logical sequence
• Main use is for “dump” and “restore”
• Access method may be used for storage
as well as retrieval
• Storage Efficiency is near 100%
• Access Efficiency is poor (unless fixed
size physical records)
IS 257 – Fall 2008
2008-09-23 SLIDE 48
Indexed Sequential
• Key values of the physical records are in logical
sequence
• Access method may be used for storage and
retrieval
• Index of key values is maintained with entries for
the highest key values per block(s)
• Access Efficiency depends on the levels of
index, storage allocated for index, number of
database records, and amount of overflow
• Storage Efficiency depends on size of index and
volatility of database
IS 257 – Fall 2008
2008-09-23 SLIDE 49
Index Sequential
Data File
Actual
Value
IS 257 – Fall 2008
Address
Block
Number
Dumpling
1
Harty
2
Texaci
3
...
…
Adams
Becker
Dumpling
Block 1
Getta
Harty
Block 2
Mobile
Sunoci
Texaci
Block 3
2008-09-23 SLIDE 50
Indexed Sequential: Two Levels
Key
Value
Key
Value
Address
150
1
385
2
001
003
.
.
150
Address
385
7
678
8
805
9
…
Key
Value
Address
536
3
678
4
Key
Value
251
.
.
385
455
480
.
.
536
605
610
.
.
678
Address
785
5
805
6
791
.
.
805
IS 257 – Fall 2008
705
710
.
.
785
2008-09-23 SLIDE 51
Indexed Random
• Key values of the physical records are not
necessarily in logical sequence
• Index may be stored and accessed with Indexed
Sequential Access Method
• Index has an entry for every data base record.
These are in ascending order. The index keys
are in logical sequence. Database records are
not necessarily in ascending sequence.
• Access method may be used for storage and
retrieval
IS 257 – Fall 2008
2008-09-23 SLIDE 52
Indexed Random
Becker
Harty
Actual
Value
Address
Block
Number
Adams
2
Becker
1
Dumpling
3
Getta
2
Harty
1
Adams
Getta
Dumpling
IS 257 – Fall 2008
2008-09-23 SLIDE 53
Btree
F
B
||
D || F|
||
P || Z|
H ||
L || P|
R ||
S || Z|
Devils
Aces
Boilers
Cars
IS 257 – Fall 2008
Flyers
Hawkeyes
Hoosiers
Minors
Panthers
Seminoles
2008-09-23 SLIDE 54
Inverted
• Key values of the physical records are not
necessarily in logical sequence
• Access Method is better used for retrieval
• An index for every field to be inverted may
be built
• Access efficiency depends on number of
database records, levels of index, and
storage allocated for index
IS 257 – Fall 2008
2008-09-23 SLIDE 55
Inverted
Student
name
Course
Number
CH 145
101, 103,104
Actual
Value
Address
Block
Number
CH 145
1
CS 201
2
CS 623
3
PH 345
…
Adams
CH145
Becker
cs201
Dumpling ch145
CS 201
102
Getta
ch145
Harty
cs623
Mobile
cs623
CS 623
105, 106
IS 257 – Fall 2008
2008-09-23 SLIDE 56
Direct
• Key values of the physical records are not
necessarily in logical sequence
• There is a one-to-one correspondence
between a record key and the physical
address of the record
• May be used for storage and retrieval
• Access efficiency always 1
• Storage efficiency depends on density of
keys
• No duplicate keys permitted
IS 257 – Fall 2008
2008-09-23 SLIDE 57
Hashing
• Key values of the physical records are not
necessarily in logical sequence
• Many key values may share the same physical
address (block)
• May be used for storage and retrieval
• Access efficiency depends on distribution of
keys, algorithm for key transformation and space
allocated
• Storage efficiency depends on distibution of
keys and algorithm used for key transformation
IS 257 – Fall 2008
2008-09-23 SLIDE 58
Comparative Access Methods
Factor
Storage space
Sequential
retrieval on
primary key
Random Retr.
Multiple Key
Retr.
Deleting records
Sequential
No wasted space
Indexed
Hashed
No wasted
space for data
but extra space for index
more space needed for
addition and deletion of
records after initial load
Very fast
Moderately Fast
Impractical
Moderately Fast
Very fast with
multiple indexes
OK if dynamic
Very fast
OK if dynamic
very easy
Easy but requires
Maintenance of
indexes
very easy
Impractical
Possible but needs
a full scan
can create wasted
space
Adding records requires rewriting
file
Updating records usually requires
rewriting file
IS 257 – Fall 2008
Not possible
very easy
2008-09-23 SLIDE 59
Database Creation in Access
• Simplest to use a design view
– wizards are available, but less flexible
• Need to watch the default values
• Helps to know what the primary key is, or
if one is to be created automatically
– Automatic creation is more complex in other
RDBMS and ORDBMS
• Need to make decision about the physical
storage of the data
IS 257 – Fall 2008
2008-09-23 SLIDE 60
Database Creation in Access
• Some Simple Examples
IS 257 – Fall 2008
2008-09-23 SLIDE 61
Next Time
• Indexes and when to index
• Integrity Constraints
• Referential Integrity
IS 257 – Fall 2008
2008-09-23 SLIDE 62