4/3 Physical Database Design
Download
Report
Transcript 4/3 Physical Database Design
Information Resources
Management
April 3, 2001
Agenda
Administrivia
Physical Database Design
Database Integrity
Performance
Administrivia
Exam 2
Regrade Requests Exam SQL
Create Database
Enter query(s) as submitted
Submit to me
Database (electronic)
Graded homework (paper)
Reserve the right to change test data
and reexecute query
Foreign Keys
Inserts require all FK values be the
value of a primary key in the reference
table
Update and delete constraints are also
possible
Referential Integrity
ON DELETE CASCADE/RESTRICT/SET
NULL
ON UPDATE CASCADE/RESTRICT/SET
NULL
Default
ON DELETE RESTRICT
ON UPDATE CASCADE
Example
CREATE TABLE PCAccess
(PC#
INTEGER,
EmpID
CHAR(9),
AccessType
CHAR(15),
PRIMARY KEY (PC#, EmpID),
FOREIGN KEY (EmpID) REFERENCES (Employee),
FOREIGN KEY (PC#) REFERENCES (PC))
Example - PCAccess Table
PC#
EmpID
AccessType
1
1
Full
1
2
Restricted
2
1
Full
3
4
Full
3
2
Semi-Restricted
Example #1
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
INSERT INTO PCAccess (PC#)
VALUES (4)
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #2
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
INSERT INTO PCAccess (PC#, EmpID)
VALUES (4,5)
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #3
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
UPDATE Employee
SET EmpID = 10
WHERE EmpID = 1
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #4
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
UPDATE PCAccess
SET EmpID = 10
WHERE EmpID = 1
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #5
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
DELETE FROM Employee
WHERE EmpID = 2
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #6
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
DELETE FROM PCAccess
WHERE EmpID = 2
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #7
PC#
1
1
2
3
3
DELETE FROM PC
WHERE PC# = 3
EmpID
1
2
1
4
2
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Cascading
Chain followed until the end
Especially for deletes
If mix of CASCADE, RESTRICT, SET
NULL
Will get all or nothing
Update & Delete Constraints
CREATE TABLE T1
(A
CHAR(5)
B
CHAR(5)
PRIMARY KEY (A,B)
FOREIGN KEY (A) REFERENCES (T2)
ON DELETE RESTRICT
ON UPDATE RESTRICT)
CREATE TABLE T2
(C
CHAR(5)
D
VARCHAR(30) PRIMARY KEY (C))
Constraints
T1
T2
A
X1
X1
X1
X1
X2
X3
X3
B
Y1
Y2
A1
A4
A4
Y5
Y1
C
X1
X2
X3
D
X1 Desc
X2 Desc
X3 Desc
Want to update the value of
X1 to be X11.
What has to happen?
Performance
Requires Knowledge of
DBMS
Applications
Data
Users & Expectations
Environment
Performance Classes
OLTP
On-Line Transaction Processing
OLAP
On-Line Analytic Processing
Mix of OLTP and OLAP
OLTP
Throughput Driven
Throughput - number of transactions
per unit of time
Lots of Transactions
Mix of Update and Query
High Concurrency
OLAP
Response Time Driven
Response Time - single transaction
Very Large, Possibly Complex,
Transactions
Query Evaluation and Optimization
Performance Tuning
Consider the Mix of OLTP & OLAP
Interference Between Types
Example:
Single daily large analytic transaction, rest
simple transactions, locking could prevent
others from running.
Tuning Levels
DBMS
Hardware
Design
Interactions Between Levels
DBMS Parameter Tuning
Specific to DBMS
Buffers - Buffer Pool
Logging - Checkpoints
Lock Management
Space Allocation - Log, Data,
Freespace
Thread Management
Operating System Tuning
Hardware Tuning
Memory
CPU
Disk
RAID
Number of Drives
Partitioning
Architecture -- Parallel Systems?
RAID
Redundant Array of Inexpensive Disks
Appears
as single disk
Physical storage difference - no
database differences
Increase performance
Provide recovery from disk failure
Negative Effect of RAID
MTBF (mean time between failures)
Increase by factor = # of drives used
# Drives
1
2
4
8
MTBF
730 days
365
182 ½
91 ¼
How RAID Works
Striping - dividing equally across all disks
1
2
3
4
1
2
3
4
Stripe 1
5
6
7
8
Stripe 2
9
10
11
12
Stripe 3
Stripe n
RAID Levels
RAID-0
RAID-1
RAID-2
RAID-3
RAID-4
RAID-5
RAID-6
RAID-0
All disks store unique data
Very fast
No fault tolerance or recovery
1
1
2
2
3
3
4
4
5
6
7
8
9
10
11
12
RAID-1
Fully Redundant
Faster Reads/Slower Writes
High fault tolerance -- easy recovery
1
1
2
2
3
1
4
2
3
4
3
4
5
6
5
6
RAID-2
Each record spans all drives
Some disks store ECC (error correction codes)
Parity checks allow error detection and correction
1
1a
2
1b
3
4
ECC
ECC
2a
2b
ECC
ECC
3a
3b
ECC
ECC
RAID-3
Each record spans all drives
One disk stores ECC
Single-User
1
1a
2
1b
3
1c
4
ECC
2a
2b
2c
ECC
3a
3b
3c
ECC
RAID-4
Each record stored on a single disk
One drive for ECC
Multi-user reads; Single-User writes
1
1
2
2
3
3
4
ECC
4
5
6
ECC
7
8
9
ECC
RAID-5 (Rotating Parity Array)
Drive has both data and ECC
ECCs rotate to different drive
Multi-user reads and writes
1
1
2
2
3
3
4
ECC
ECC
4
5
6
6
ECC
7
8
9
10
ECC
11
12
13
14
ECC
RAID-6 P+Q Redundancy
P - “parity”
Q - “extra parity”
2 bits of ECC per 4 bits of data
Handles multiple disk failures
Reed-Solomon codes
Introduction to the Theory of ErrorCorrecting Codes, Pless (1989)
Your Mileage May Vary
“We note that numerous
improvements have been proposed to
the basic RAID schemes described
here. As a result, sometimes there is
confusion about the exact definitions
of the different RAID levels.”
RAID Usage
1, 3, and 5 outperform others
RAID-1 - fastest, no storage cost, but
not fault tolerant
RAID-3 - single-user only
RAID-5 - higher speed than single disk,
fault tolerant, multi-user, but some
storage cost and slower write times
Design Tuning
Transactions
Physical Database
Transaction Tuning
The DBMS optimizes so why worry?
An optimized poorly written transaction
can always be outperformed by a wellwritten nonoptimized one.
EXPLAIN (DB2)
What did the optimizer come up with?
Transaction Tuning
Distributed Databases
Client-Server
Network performance becomes an
additional concern
Transaction Tuning
DBA participation in program reviews
and walkthroughs
Continuous Monitoring
Transaction Tuning Heuristics
Single query instead of multiple queries
“multiple” includes sub-queries
Avoid long-running transactions
Avoid large quantities of updates
Locking and logging
Reduce number of tables joined
Transaction Tuning Heuristics
Reduce sorting
Return less data rather than more
Don’t shift logic from query to program
Optimizer is likely to be faster
Less data is returned
Transaction Tuning Example
Get the names of all managers whose
offices have property listed in Pgh.
SELECT *
FROM Property as P, Office as O, Manager as M,
Employee as E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr
= M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property
as P2 WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr =
M.OfficeNbr)
Transaction Tuning Example
SELECT *
FROM Property as P, Office as O, Manager as M, Employee as
E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr =
M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property as P2
WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr)
More is selected than is needed.
Transaction Tuning Example
SELECT *
FROM Property as P, Office as O, Manager as M, Employee as
E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr =
M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property as P2
WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr)
Some joined tables can be eliminated.
Transaction Tuning Example
SELECT *
FROM Property as P, Office as O, Manager as M, Employee as
E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr =
M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property as P2
WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr)
Subquery is executed once per office.
Transaction Tuning Example
SELECT E.Name
FROM Employee as E
WHERE E.MgrFlag = 1 AND OfficeNbr IN
(SELECT OfficeNbr FROM Property as P
WHERE P.City = ‘Pgh’)
Version without any joins - 2 single table
queries only.
Transaction Tuning Example
SELECT DISTINCT E.Name
FROM Property as P, Employee as E
WHERE P.OfficeNbr = E.OfficeNbr AND E.MgrFlag = 1 AND
P.City = ‘Pgh’
Single query with join.
Transaction Tuning
Explain (or similar tool) can help to
identify how each transaction will
access the data and what temporary
tables will have to be created to execute
the query
With multiple options, test them
Order of conditions in WHERE can
affect the optimization and performance
I.E., put MgrFlag = 1 first
Physical Database Tuning
Indices
Schema Tuning
Retaining Normalization
Denormalization
Indices
Unique
Nonunique
Single Attribute
Multiple Attributes
(concatenated or
composite key)
Primary Key
Secondary Index
Additional Indices
Index decreases read time but
increases update time
Based on queries - even single query
(EXPLAIN)
Indices need reorganization
Inserts, Updates, Deletes
Specify freespace
Reduce frequency of reorganizations
Schema Tuning Staying Normal
Split Tables - Vertical Partitioning
Highly used vs. infrequently used
columns
Don’t partition if result will be more joins
Keys are duplicated
Schema Tuning Staying Normal
Variable length fields (VARCHAR,
others)
Indeterminant record lengths
Row locations vary
Vertically partition row into two tables,
one with fixed and one with variable
columns
Schema Tuning Leaving Normal
Normalization
Eliminates duplication
Reduces anomalies
Does
not result in efficiency
Denormalize for performance
Denormalization Warnings
Increases chance of errors or inconsistencies
May result in reprogramming if business rules
change
Optimizes based on current transaction mix
Increases duplication and space required
Increases programming complexity
Always normalize first then denormalize
Denormalization
Partition Rows
Combine Tables
Combine and Partition
Replicate Data
Combining Opportunities
One-to-one (optional)
allow nulls
Many-to-many (assoc. entity)
2 tables instead of 3
Reference data (one-to-many)
“one” not use elsewhere
few of “many”
Combining Examples
Employee-Spouse (name and SSN
only)
Owner-PctOwned - Property
few owners with multiple properties
Property-Type (description)
one type per property
Partitioning
Horizontal
By row type
Separate processing by type
Supertype/subtype decision
Vertical (already seen)
Both
Replication
Intentionally repeating data
Example: Owner-PctOwned-Property
Owner includes PctOwned &
PropertyID
Property includes majority OwnerSSN
and PctOwned
Performance Tuning
Not a one-time event
Monitoring probably more important
Things change
applications, database (table) sizes,
data characteristics
hardware, operating system, DBMS