Physical Design Performance

Download Report

Transcript Physical Design Performance

Physical Database Design
Barry Floyd
BUS 498
Advanced Database Management Systems
Introduction
The Physical Database Design Process
Goal is to translate our conceptual designs
into physical reality
Draw on requirements analysis and our
conceptual data model
Agenda
Data Volume and Usage Analysis
Data Distribution Strategy
discuss this later in the quarter
Indexes
Denormalization
Overview
Important step in the database design
process (also the last step)
Decisions made here impact ...
data accessibility
response times
usability
Vocabulary
Data volume - how many records
Data usage - how often and in what
manner are the records used
Data Volume Analysis
Use volume analysis to
select physical storage devices
estimate costs of storage
Data Volume Analysis
LOCATION
100
TREATMENT
PATIENT
GIVEN
PHYSICIAN
50
GIVEN
CHARGE
GIVEN
ITEM
500
Data Volume Analysis
LOCATION
100
DERIVE
(10)
TREATMENT
PATIENT
1000
* Keep patient record active
for 30 days
* Average length of stay CHARGE
for a patient is 3 days
100 X 30 / 3 => 1000
(20)
PHYSICIAN
50
ITEM
500
Data Volume Analysis
LOCATION
100
(10)
TREATMENT
4000
(4)
DERIVE
PATIENT
1000
(20)
PHYSICIAN
50
* Each patient has 4 treatments
ITEM
CHARGE
on average.
500
1000 X 4 => 4000
Data Volume Analysis
* Each patient has 10 charges
LOCATION
on average.
100
1000 X 10 => 10,000
TREATMENT
4000
(4)
PATIENT
1000
(20)
PHYSICIAN
50
(20)
ITEM
500
(10)
CHARGE
10,000
DERIVE
Data Volume Analysis
LOCATION
100
(10)
TREATMENT
4000
(4)
KNOW ...
Number of
records and
relationships
PATIENT
1000
(20)
PHYSICIAN
50
(20)
ITEM
500
(10)
CHARGE
10,000
Data Usage Analysis
Want to identify major transactions and
processes which hit on the database
Analyze each transaction and process to
determine access paths used and
frequency of use
Create composite map from individual
analyses
Transaction Analysis Form
TRANSACTION NUMBER MVCH-4
TRANSACTION NAME: CREATE PATIENT BILL
TRANSACTION VOLUME:
AVERAGE 2/HR
PEAK: 10/HR
(1)
PATIENT
1000
(2)
CHARGE
10,000
NO. NAME
(3)
ITEM
500
ACCESS TRAN PERIOD
TYPE
REF
REF
(1) ENTRY-PATIENT READ
1
10
Transaction Analysis Form
(1)
PATIENT
1000
(2)
CHARGE
10,000
NO. NAME
(3)
ITEM
500
ACCESS TRAN PERIOD
TYPE
REF
REF
(1) ENTRY-PATIENT READ
1
10
(2) PATIENT-CHARGE READ
10
100
(3) CHARGE-ITEM
READ
10
100
Composite Usage Map
Determine how the data structures are
accessed for each transaction and process
include programs
standard queries
programmed
ad hoc
Composite Usage Map
LOCATION
100
TREATMENT
4000
(50)
(50)
(25)
PATIENT
1000
PHYSICIAN
50
(50)
NUMBER IS
PER HOUR
AT PEAK
VOLUME
CHARGE
10,000
ITEM
500
Composite Usage Map
LOCATION
100
(75)
TREATMENT
4000
(25)
(50)
PATIENT
1000
(20)
(50)
(200)
CHARGE
10,000
(30)
PHYSICIAN
50
(100)
ITEM
500
Composite Usage Map
(50)
LOCATION
100
(75)
TREATMENT
4000
(50)
(25)
PATIENT
1000
(20)
(200)
(50)
CHARGE
10,000
(25)
(50)
(30)
PHYSICIAN
50
(100)
(50)
ITEM
500
Summary
Given volume and usage knowledge we
can consider different physical
implementation strategies, including ...
INDEXES
DENORMALIZATION
CLUSTERING
Indexes
Purpose: To speed up access to a
particular row or a group of rows in a
table.
Also used to enforce uniqueness
Eliminates the necessity of re-sorting the
table each time we need to create a
sequenced list
Indexes
Allen 3
Brian 6
Carole 7
John 2
Karen 5
Marvin1
Sharon 8
Sue 4
1
2
3
4
5
6
7
8
Marvin …
John ...
Allen ...
Sue ...
Karen ...
Brian ...
Carole ...
Sharon ...
Example
SELECT NAME, DEPT, RATING FROM EMP
WHERE RATING = 10;
Indexing on RATING improves performance.
Without an index, must do a full table
scan.
Costs of an index?
Storage space
Maintenance
Indexed must be changed for each
add/delete or change in value on indexed
field.
One benchmark ... insert into table w/o
indexes, 0.11 seconds, w/ 8 indexes, 0.94
seconds.
Access Indexes
Automatically created on primary key.
You must create other indexes as needed.
Note, creating a unique index on a foreign
key turns the relationship into a 1 - 1
relationship rather than a 1 - m
relationship.
Let’s consider Oracle indexes and performance ...
Oracle Indexes
SELECT COUNT(*)
FROM EMP
WHERE EMP_NO>0
INDEX
+ TABLE
INDEX
ONLY
% OF
FILE
READ
BREAKEVEN
SELECT EMP_NAME
FROM EMP
WHERE EMP_NO>0
%
8.5
15.5
25.2
50.7
100
FULL TABLE
SCAN
Seconds
0.66 12.03
1.04 16.21
1.54 25.45
2.80 33.89
5.72 87.23
35.70
35.70
35.70
35.70
35.70
26,000 Rows, 7 Rows per Block
Oracle Indexes
SELECT COUNT(*)
FROM EMP
WHERE EMP_NO>0
INDEX
+ TABLE
INDEX
ONLY
% OF
FILE
READ
BREAKEVEN
SELECT EMP_NAME
FROM EMP
WHERE EMP_NO>0
%
8.5
15.5
25.2
50.7
100
Seconds
0.66 2.31
1.05 4.01
1.59 6.37
2.91 12.69
6.01 25.37
FULL TABLE
SCAN
4.52
4.52
4.52
4.52
4.52
26,000 Rows, 258 Rows per Block
Rules of thumb
Use indexes generously for applications
which are decision support/retrieval
based.
Use indexes judiciously for transaction
processing applications.
Places to use indexes
PRIMARY KEY
FOREIGN KEYS
Non Key attributes that are referred to in
qualification, sorting, and grouping
(WHERE, ORDER BY, GROUP BY)
Denormalization
Goal is to reduce the number of physicals
reads to the storage devices by reducing
the number of joins.
Costs of Denormalization
Makes coding more complex
Often sacrifices flexibility
Will speed up retrieval but slow updates
Including children
in the parent record
Multiple addresses in the personnel record
Absolute number of children for a parent is
known (e.g., 2 addresses)
The number won’t change over time
The number is not very large
Clusters in Oracle
Clustering stores records from two tables
into the same physical storage space
Only useful for EQUI-JOINS
Improves performance by 2-3 times
Storing most recent child
data in the parent record
Multiple children, but children have an
ordering (e.g., date of order)
For example, perhaps storing amount of last
order.
Amount of last dividend paid to a particular
account
Store running totals /
Create extract tables
Store summary data from a child record
Year to date sales
Create a summary table which contains
aggregate values over some period (say,
one month)
Duplicating a key beyond
an immediate child record
CLASS
CLASS_ID
PARTS
PART_ID,CLASS_ID
ORDERS
ADD THIS KEY
ORDER_ID,
PART_ID,
CLASS_ID
Consider SQL statement
for previous example
SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC
FROM CLASS C, PART P, ORDER O
WHERE O.PART_NO = P.PART_NO
AND P.CLASS = C.CLASS;
SELECT PART_NO, ORDER_NO, CLASS, CLASS_DESC
FROM CLASS C,
ORDER O
WHERE O.CLASS = C.CLASS;
Record Partitioning
Breaking up a record into two parts
A,B,C,D,E,F,G
E,F,G
A,B,C,D
Summary
Logical design gives you information
about the ‘how’ to build the system.
Good physical design takes into account
the performance of the final design … to
know how best to do this task, you must
understand how the system is being used!