11_dbimpl - University of Central Florida

Download Report

Transcript 11_dbimpl - University of Central Florida

IMS 4212: Database Implementation
Physical Database Implementation—Topics
• Denormalization
• Partitioning Tables (relations)
• Parallel Processing & RAID
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
1
IMS 4212: Database Implementation
Denormalization
• Denormalizing is the process of reshuffling attributes
and sometimes entities to create entities that violate the
rules of normalization
• We are trading off (again) storage efficiency and
anomaly avoidance for better retrieval efficiency
• Denormalizing includes:
– Storing derived attributes explicitly
– Allowing transitive dependencies (violating second,
third, or Boyce-Codd normal form)
– Merging entities in 1:1 relationships
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
2
IMS 4212: Database Implementation
Denormalization (cont.)
STUDENT
StudentID
LastName
FirstName
TotCrHr
TotGP
• Derived attributes
– Storing derived attributes is one of the most
common means of improving processing
efficiency
– How many tables/row examinations are avoided by
storing total grade points and total credit hours with the
STUDENT entity?
– What new operations must be introduced to keep the
data current?
– Explicitly storing derived attributes gives rise to new
operational business rules to enforce accuracy
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
3
IMS 4212: Database Implementation
Denormalization (cont.)
• 1:1 Relationships
– It may be possible to collapse data from one entity in a
1:1 relationship into the other.
OFFICE
– Usually the pervasive entity survives
Building
Room
Window
– Alternately, both entities may be retained
Length
Width
but the data from one may be copied into
LastPaint
FacultyID <FK>
the other to avoid a table look-up
LastName
PROFESSOR
FacultyID
LastName
FirstName
Phone
Dept. <FK>
OFFICE
Has
Building
Room
Window
Length
Width
LastPaint
FacultyID <FK>
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
FirstName
Phone
Dept
4
IMS 4212: Database Implementation
Denormalizing (cont.)
• 1:M Relationships
– You may consider moving or duplicating attributes
from the “one” side of a 1:M relationship into the
“many” side
– This will result in considerable data duplication
– Considerations
• There should be many records on the “one” side
• Frequent access should be directly into the “many”
side
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
5
IMS 4212: Database Implementation
Denormalization (cont.)
• 1:M Relationships (cont.)
ENROLLMENT
SectionID <FK1>
StudentID <FK2>
Grade
<FK3>
LastName
FirstName
STUDENT
Has
StudentID
LastName
FirstName
:
• Similar technique may be used by collapsing or copying
attributes into the associative entity between two
entities in a M:M relationship
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
6
IMS 4212: Database Implementation
Denormalization (cont.)
• The goal of denormalizing is to avoid accessing a
(large) table for high frequency critical transactions
• Denormalizing usually requires additional business
rules to guarantee that data remains accurate in the
face of updates
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
7
IMS 4212: Database Implementation
Partitioning
• Partitioning entities divides one table into many
– Horizontal partitioning
• Each table has all fields from the original table
• Each table has a subset of records
– Vertical partitioning
• Each table has the PK of the original table
• Each table has all records
• Each table has a subset of fields
– May partition both vertically and horizontally
• Very powerful technique with historical data
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
8
IMS 4212: Database Implementation
Partitioning (cont.)
• Horizontal Partitioning
– How many records in the STUDENT table?
– How many of them are currently enrolled?
– How frequently do we need to access both current and
former students in the same query or operation?
– It may make sense to partition tables based on a
historical context
• Active records vs. archived records
– May also partition based on geographic considerations
– Whole table can be reconstructed using UNION query
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
9
IMS 4212: Database Implementation
Partitioning (cont.)
STUDENT
E_MAIL
StudentID
E_Mail
STUDENT
HEALTHREC
STUDENT
StudentID
LastName
FirstName
MiddleInitial
StudentID
BloodType
Height
Weight
• Vertical Partitioning
– Librarian, Registrar, Athletic Department, and
Health Center may all need a different subset
of fields from the STUDENT entity
– It may make sense to create separate tables
containing the necessary attributes for each view
– Common PK creates 1:1 cardinality between all tables
– Whole logical record can be assembled using SQL
when needed
– We are actually backing into a supertype/subtype
relationship
STUDENT
ADDRESS
StudentID
HomeStreet
HomeCity
HomeState
HomeCountry
HomeZip
LocalStreet
LocalCity
LocalState
LocalCountry
LocalZip
LocalPhone
E_Mail
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
10
IMS 4212: Database Implementation
RAID Storage Devices
• In conventional drives data is laid down sequentially
along a track in the disk
– Read/Write head must move along the track to read the
data
– Each read/write operation must finish before the next
can begin
– A drive failure can result in loss of all data
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
11
IMS 4212: Database Implementation
RAID Storage Devices
• RAID is for Redundant Array of Inexpensive Disks
– Multiple disks appear as a single logical drive to the
computer
– May be implemented in hardware or software (OS)
• Various RAID levels provide for different levels of
performance and redundancy
• Most RAID levels enable the rebuilding of entire lost
physical drives through parity storage
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
12
IMS 4212: Database Implementation
*
RAID Storage Devices—Raid 3
• Records are striped across
multiple physical devices
– Part of each record is laid down
across multiple physical drives
– Much faster Read/Write time since disk rotation needed
to read whole record/block is much shorter
– However only one request can be serviced concurrently
– Not commonly used in practice
• A single parity disk allows reconstruction of data on
damaged drives
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
* Image source: Wikipedia
13
IMS 4212: Database Implementation
*
RAID Storage Devices—Raid 4
• Blocks are stored independently
on the drives
– Block A1 can be serviced just
by Drive 0
– Simultaneous requests for Blocks B2 or D3 can also be
serviced
• A single parity drive enables recovery of lost data
• Write operations may be slower—simultaneous write
operations to Drives 0-2 must wait on the parity
calculation and writing on Drive 3
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
* Image source: Wikipedia
14
IMS 4212: Database Implementation
*
RAID Storage Devices—Raid 5
• Similar to Raid 4 except that
parity storage is distributed
across multiple drives
– Rotating allocation
– Lessens the chance that writes on two drives will wait
on parity updates on a single parity drive
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
* Image source: Wikipedia
15
IMS 4212: Database Implementation
Dr. Lawrence West, Management Dept., University of Central Florida
[email protected]
16
P2
P3
One Processor
• More and more computers support parallel processing
(multiple CPUs on the same computer)
1
...
• Some tasks can be split among multiple
2
...
3
...
processors
4
...
• In an SQL SELECT query the usual
5
...
method requires the RDBMS to scan
6
...
each record to determine if it matches
7
...
the WHERE clause or JOIN criteria
8
...
9
...
• In parallel processing part of the
whole table is passed to each processor
• Availability depends on hardware, OS, and RDBMS
P1
Parallel Processing