4/3 Physical Database Design

Download Report

Transcript 4/3 Physical Database Design

Information Resources
Management
April 3, 2001
Agenda
Administrivia
 Physical Database Design
 Database Integrity
 Performance

Administrivia

Exam 2
Regrade Requests Exam SQL
Create Database
 Enter query(s) as submitted
 Submit to me
 Database (electronic)
 Graded homework (paper)
 Reserve the right to change test data
and reexecute query

Foreign Keys

Inserts require all FK values be the
value of a primary key in the reference
table

Update and delete constraints are also
possible
Referential Integrity



ON DELETE CASCADE/RESTRICT/SET
NULL
ON UPDATE CASCADE/RESTRICT/SET
NULL
Default
 ON DELETE RESTRICT
 ON UPDATE CASCADE
Example
CREATE TABLE PCAccess
(PC#
INTEGER,
EmpID
CHAR(9),
AccessType
CHAR(15),
PRIMARY KEY (PC#, EmpID),
FOREIGN KEY (EmpID) REFERENCES (Employee),
FOREIGN KEY (PC#) REFERENCES (PC))
Example - PCAccess Table
PC#
EmpID
AccessType
1
1
Full
1
2
Restricted
2
1
Full
3
4
Full
3
2
Semi-Restricted
Example #1
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
INSERT INTO PCAccess (PC#)
VALUES (4)
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #2
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
INSERT INTO PCAccess (PC#, EmpID)
VALUES (4,5)
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #3
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
UPDATE Employee
SET EmpID = 10
WHERE EmpID = 1
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #4
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
UPDATE PCAccess
SET EmpID = 10
WHERE EmpID = 1
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #5
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
DELETE FROM Employee
WHERE EmpID = 2
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #6
PC#
1
1
2
3
3
EmpID
1
2
1
4
2
DELETE FROM PCAccess
WHERE EmpID = 2
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Example #7
PC#
1
1
2
3
3
DELETE FROM PC
WHERE PC# = 3
EmpID
1
2
1
4
2
AccessType
Full
Restricted
Full
Full
Semi-Restricted
Cascading
Chain followed until the end
 Especially for deletes


If mix of CASCADE, RESTRICT, SET
NULL
 Will get all or nothing
Update & Delete Constraints
CREATE TABLE T1
(A
CHAR(5)
B
CHAR(5)
PRIMARY KEY (A,B)
FOREIGN KEY (A) REFERENCES (T2)
ON DELETE RESTRICT
ON UPDATE RESTRICT)
CREATE TABLE T2
(C
CHAR(5)
D
VARCHAR(30) PRIMARY KEY (C))
Constraints
T1
T2
A
X1
X1
X1
X1
X2
X3
X3
B
Y1
Y2
A1
A4
A4
Y5
Y1
C
X1
X2
X3
D
X1 Desc
X2 Desc
X3 Desc
Want to update the value of
X1 to be X11.
What has to happen?
Performance

Requires Knowledge of
 DBMS
 Applications
 Data
 Users & Expectations
 Environment
Performance Classes
OLTP
 On-Line Transaction Processing
 OLAP
 On-Line Analytic Processing


Mix of OLTP and OLAP
OLTP
Throughput Driven
 Throughput - number of transactions
per unit of time
 Lots of Transactions
 Mix of Update and Query
 High Concurrency

OLAP
Response Time Driven
 Response Time - single transaction
 Very Large, Possibly Complex,
Transactions
 Query Evaluation and Optimization

Performance Tuning
Consider the Mix of OLTP & OLAP
 Interference Between Types

Example:
Single daily large analytic transaction, rest
simple transactions, locking could prevent
others from running.
Tuning Levels
DBMS
 Hardware
 Design


Interactions Between Levels
DBMS Parameter Tuning
Specific to DBMS
 Buffers - Buffer Pool
 Logging - Checkpoints
 Lock Management
 Space Allocation - Log, Data,
Freespace
 Thread Management
 Operating System Tuning

Hardware Tuning
Memory
 CPU
 Disk
 RAID
 Number of Drives
 Partitioning
 Architecture -- Parallel Systems?

RAID

Redundant Array of Inexpensive Disks
 Appears
as single disk
 Physical storage difference - no
database differences
 Increase performance
 Provide recovery from disk failure
Negative Effect of RAID
MTBF (mean time between failures)
 Increase by factor = # of drives used

# Drives
1
2
4
8
MTBF
730 days
365
182 ½
91 ¼
How RAID Works
Striping - dividing equally across all disks
1
2
3
4
1
2
3
4
Stripe 1
5
6
7
8
Stripe 2
9
10
11
12
Stripe 3
Stripe n
RAID Levels
RAID-0
 RAID-1
 RAID-2
 RAID-3
 RAID-4
 RAID-5
 RAID-6

RAID-0



All disks store unique data
Very fast
No fault tolerance or recovery
1
1
2
2
3
3
4
4
5
6
7
8
9
10
11
12
RAID-1



Fully Redundant
Faster Reads/Slower Writes
High fault tolerance -- easy recovery
1
1
2
2
3
1
4
2
3
4
3
4
5
6
5
6
RAID-2



Each record spans all drives
Some disks store ECC (error correction codes)
Parity checks allow error detection and correction
1
1a
2
1b
3
4
ECC
ECC
2a
2b
ECC
ECC
3a
3b
ECC
ECC
RAID-3



Each record spans all drives
One disk stores ECC
Single-User
1
1a
2
1b
3
1c
4
ECC
2a
2b
2c
ECC
3a
3b
3c
ECC
RAID-4



Each record stored on a single disk
One drive for ECC
Multi-user reads; Single-User writes
1
1
2
2
3
3
4
ECC
4
5
6
ECC
7
8
9
ECC
RAID-5 (Rotating Parity Array)



Drive has both data and ECC
ECCs rotate to different drive
Multi-user reads and writes
1
1
2
2
3
3
4
ECC
ECC
4
5
6
6
ECC
7
8
9
10
ECC
11
12
13
14
ECC
RAID-6 P+Q Redundancy
P - “parity”
 Q - “extra parity”
 2 bits of ECC per 4 bits of data
 Handles multiple disk failures
 Reed-Solomon codes
 Introduction to the Theory of ErrorCorrecting Codes, Pless (1989)

Your Mileage May Vary
“We note that numerous
improvements have been proposed to
the basic RAID schemes described
here. As a result, sometimes there is
confusion about the exact definitions
of the different RAID levels.”
RAID Usage

1, 3, and 5 outperform others
RAID-1 - fastest, no storage cost, but
not fault tolerant
 RAID-3 - single-user only
 RAID-5 - higher speed than single disk,
fault tolerant, multi-user, but some
storage cost and slower write times

Design Tuning
Transactions
 Physical Database

Transaction Tuning
The DBMS optimizes so why worry?
An optimized poorly written transaction
can always be outperformed by a wellwritten nonoptimized one.
 EXPLAIN (DB2)
 What did the optimizer come up with?

Transaction Tuning
Distributed Databases
 Client-Server


Network performance becomes an
additional concern
Transaction Tuning
DBA participation in program reviews
and walkthroughs
 Continuous Monitoring

Transaction Tuning Heuristics
Single query instead of multiple queries
 “multiple” includes sub-queries
 Avoid long-running transactions
 Avoid large quantities of updates
 Locking and logging
 Reduce number of tables joined

Transaction Tuning Heuristics
Reduce sorting
 Return less data rather than more


Don’t shift logic from query to program
 Optimizer is likely to be faster
 Less data is returned
Transaction Tuning Example
Get the names of all managers whose
offices have property listed in Pgh.
SELECT *
FROM Property as P, Office as O, Manager as M,
Employee as E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr
= M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property
as P2 WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr =
M.OfficeNbr)
Transaction Tuning Example
SELECT *
FROM Property as P, Office as O, Manager as M, Employee as
E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr =
M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property as P2
WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr)
More is selected than is needed.
Transaction Tuning Example
SELECT *
FROM Property as P, Office as O, Manager as M, Employee as
E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr =
M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property as P2
WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr)
Some joined tables can be eliminated.
Transaction Tuning Example
SELECT *
FROM Property as P, Office as O, Manager as M, Employee as
E
WHERE P.OfficeNbr = O.OfficeNbr AND O.OfficeNbr =
M.OfficeNbr AND M.EmpID = E.EmpID AND
PropertyID IN (SELECT PropertyID FROM Property as P2
WHERE P2.City = ‘Pgh’ AND P2.OfficeNbr = M.OfficeNbr)
Subquery is executed once per office.
Transaction Tuning Example
SELECT E.Name
FROM Employee as E
WHERE E.MgrFlag = 1 AND OfficeNbr IN
(SELECT OfficeNbr FROM Property as P
WHERE P.City = ‘Pgh’)
Version without any joins - 2 single table
queries only.
Transaction Tuning Example
SELECT DISTINCT E.Name
FROM Property as P, Employee as E
WHERE P.OfficeNbr = E.OfficeNbr AND E.MgrFlag = 1 AND
P.City = ‘Pgh’
Single query with join.
Transaction Tuning
Explain (or similar tool) can help to
identify how each transaction will
access the data and what temporary
tables will have to be created to execute
the query
 With multiple options, test them
 Order of conditions in WHERE can
affect the optimization and performance
 I.E., put MgrFlag = 1 first

Physical Database Tuning
Indices
 Schema Tuning
 Retaining Normalization
 Denormalization

Indices




Unique
Nonunique
Single Attribute
Multiple Attributes
(concatenated or
composite key)


Primary Key
Secondary Index
Additional Indices
Index decreases read time but
increases update time
 Based on queries - even single query
 (EXPLAIN)
 Indices need reorganization
 Inserts, Updates, Deletes
 Specify freespace
 Reduce frequency of reorganizations

Schema Tuning Staying Normal
Split Tables - Vertical Partitioning
 Highly used vs. infrequently used
columns


Don’t partition if result will be more joins

Keys are duplicated
Schema Tuning Staying Normal
Variable length fields (VARCHAR,
others)
 Indeterminant record lengths
 Row locations vary


Vertically partition row into two tables,
one with fixed and one with variable
columns
Schema Tuning Leaving Normal

Normalization
 Eliminates duplication
 Reduces anomalies
 Does

not result in efficiency
Denormalize for performance
Denormalization Warnings

Increases chance of errors or inconsistencies
May result in reprogramming if business rules
change
Optimizes based on current transaction mix
Increases duplication and space required
Increases programming complexity

Always normalize first then denormalize




Denormalization
Partition Rows
 Combine Tables
 Combine and Partition
 Replicate Data

Combining Opportunities
One-to-one (optional)
 allow nulls
 Many-to-many (assoc. entity)
 2 tables instead of 3
 Reference data (one-to-many)
 “one” not use elsewhere
 few of “many”

Combining Examples
Employee-Spouse (name and SSN
only)
 Owner-PctOwned - Property
 few owners with multiple properties
 Property-Type (description)
 one type per property

Partitioning
Horizontal
 By row type
 Separate processing by type
 Supertype/subtype decision
 Vertical (already seen)
 Both

Replication

Intentionally repeating data

Example: Owner-PctOwned-Property
 Owner includes PctOwned &
PropertyID
 Property includes majority OwnerSSN
and PctOwned
Performance Tuning
Not a one-time event
 Monitoring probably more important
 Things change
 applications, database (table) sizes,
data characteristics
 hardware, operating system, DBMS
