Database Tuning Principles, Experiments and Troubleshooting

Download Report

Transcript Database Tuning Principles, Experiments and Troubleshooting

Database Tuning
Principles, Experiments and
Troubleshooting Techniques
http://www.mkp.com/dbtune
Dennis Shasha ([email protected])
Philippe Bonnet ([email protected])
1
Availability of Materials
1. Power Point presentation is available on
my web site (and Philippe Bonnet’s). Just
type our names into google
2. Experiments available from Denmark site
maintained by Philippe (site in a few
slides)
3. Book with same title available from
Morgan Kaufmann.
2
Database Tuning
Database Tuning is the activity of making a
database application run more quickly.
“More quickly” usually means higher
throughput, though it may mean lower
response time for time-critical applications.
3
Application
Programmer
(e.g., business analyst,
Data architect)
Application
Sophisticated
Application
Programmer
Query Processor
(e.g., SAP admin)
Indexes
Storage Subsystem
Concurrency Control
Recovery
DBA,
Tuner
Operating System
Hardware
[Processor(s), Disk(s), Memory]
4
Outline of Tutorial
1.
2.
3.
4.
5.
6.
7.
8.
9.
Basic Principles
Tuning the guts
Indexes
Relational Systems
Application Interface
Ecommerce Applications
Data warehouse Applications
Distributed Applications
Troubleshooting
5
Goal of the Tutorial
• To show:
– Tuning principles that port from one system to
the other and to new technologies
– Experimental results to show the effect of these
tuning principles.
– Troubleshooting techniques for chasing down
performance problems.
6
Tuning Principles Leitmotifs
• Think globally, fix locally (does it matter?)
• Partitioning breaks bottlenecks (temporal
and spatial)
• Start-up costs are high; running costs are
low (disk transfer, cursors)
• Be prepared for trade-offs (indexes and
inserts)
7
Experiments -- why and where
•
•
Simple experiments to illustrate the
performance impact of tuning principles.
http://www.diku.dk/dbtune/experiments to
get the SQL scripts, the data and a tool to
run the experiments.
8
Experimental DBMS and
Hardware
•
Results presented throughout this tutorial
obtained with:
–
–
1.
2.
3.
SQL Server 7, SQL Server 2000, Oracle 8i,
Oracle 9i, DB2 UDB 7.1
Three configurations:
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb) 2 Ultra 160 channels, 4x18Gb
drives (10000RPM), Windows 2000.
Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb
drives (10000RPM), Windows 2000.
Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2
channels, 3x18Gb drives (10000RPM), Linux Debian 2.4.
9
Tuning the Guts
• Concurrency Control
– How to minimize lock contention?
• Recovery
– How to manage the writes to the log (to dumps)?
• OS
– How to optimize buffer size, process scheduling, …
• Hardware
– How to allocate CPU, RAM and disk subsystem
resources?
10
Isolation
• Correctness vs. Performance
– Number of locks held by each transaction
– Kind of locks
– Length of time a transaction holds locks
11
Isolation Levels
• Read Uncommitted (No lost update)
– Exclusive locks for write operations are held for the duration of the
transactions
– No locks for read
• Read Committed (No dirty retrieval)
– Shared locks are released as soon as the read operation terminates.
• Repeatable Read (no unrepeatable reads for read/write )
– Two phase locking
•
Serializable (read/write/insert/delete model)
– Table locking or index locking to avoid phantoms
12
Snapshot isolation
• Each transaction executes
against the version of the data
items that was committed when
the transaction started:
– No locks for read
– Costs space (old copy of data
must be kept)
• Almost serializable level:
–
–
–
–
T1
T2
T1: x:=y
T2: y:= x
Initially x=3 and y =17
X=Y=Z=0
Serial execution:
x,y=17 or x,y=3
– Snapshot isolation:
x=17, y=3 if both transactions
start at the same time.
T3
TIME
13
Value of Serializability -- Data
Settings:
accounts( number, branchnum, balance);
create clustered index c on accounts(number);
–
–
–
–
–
–
100000 rows
Cold buffer; same buffer size on all systems.
Row level locking
Isolation level (SERIALIZABLE or READ COMMITTED)
SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb), 4x18Gb drives (10000RPM),
Windows 2000.
14
Value of Serializability -transactions
Concurrent Transactions:
– T1: summation query [1 thread]
select sum(balance) from accounts;
– T2: swap balance between two account numbers (in
order of scan to avoid deadlocks) [N threads]
T1:
valX:=select balance from accounts where number=X;
valY:=select balance from accounts where number=Y;
T2:
update accounts set balance=valX where number=Y;
update accounts set balance=valY where number=X;
15
Value of Serializability -- results
Ratio of correct
answers
SQLServer
1
0.8
0.6
0.4
0.2
0
read committed
serializable
0
2
4
6
8
10
Concurrent update threads
Ratio of correct
answers
Oracle
1
0.8
0.6
0.4
0.2
0
read committed
serializable
0
2
4
6
8
Concurrent update threads
10
• With SQL Server and
DB2 the scan returns
incorrect answers if
the read committed
isolation level is used
(default setting)
• With Oracle correct
answers are returned
(snapshot isolation).
16
Cost of Serializability
Throughput
(trans/sec)
SQLServer
0
2
4
6
8
read committed
Concurrent Update Threads
serializable
10
Throughput
(trans/sec)
Oracle
Because the update
conflicts with the scan,
correct answers are
obtained at the cost of
decreased concurrency
and thus decreased
throughput.
read committed
serializable
0
2
4
6
8
10
Concurrent Update Threads
17
Locking Overhead -- data
Settings:
accounts( number, branchnum,
balance);
create clustered index c on accounts(number);
–
–
–
–
100000 rows
Cold buffer
SQL Server 7, DB2 v7.1 and Oracle 8i on Windows 2000
No lock escalation on Oracle; Parameter set so that there is no
lock escalation on DB2; no control on SQL Server.
–
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal
RAID controller from Adaptec (80Mb), 4x18Gb
drives (10000RPM), Windows 2000.
18
Locking Overhead -- transactions
No Concurrent Transactions:
– Update [10 000 updates]
update accounts set balance = Val;
– Insert [10 000 transactions], e.g. typical one:
insert into accounts values(664366,72255,2296.12);
19
Locking Overhead
Throughput ratio
(row locking/table locking)
1
0.8
db2
sqlserver
oracle
0.6
0.4
0.2
Row locking is barely
more expensive than table
locking because recovery
overhead is higher than
row locking overhead
– Exception is updates on
DB2 where table locking is
distinctly less expensive
than row locking.
0
update
insert
20
Logical Bottleneck: Sequential
Key generation
• Consider an application in which one needs
a sequential number to act as a key in a
table, e.g. invoice numbers for bills.
• Ad hoc approach: a separate table holding
the last invoice number. Fetch and update
that number on each insert transaction.
• Counter approach: use facility such as
Sequence (Oracle)/Identity(MSSQL).
21
Counter Facility -- data
Settings:
accounts( number, branchnum, balance);
create clustered index c on accounts(number);
counter ( nextkey );
insert into counter values (1);
– default isolation level: READ COMMITTED; Empty tables
– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb), 4x18Gb drives (10000RPM),
Windows 2000.
22
Counter Facility -- transactions
No Concurrent Transactions:
– System [100 000 inserts, N threads]
• SQL Server 7 (uses Identity column)
insert into accounts values (94496,2789);
• Oracle 8i
insert into accounts values (seq.nextval,94496,2789);
– Ad-hoc [100 000 inserts, N threads]
begin transaction
NextKey:=select nextkey from counter;
update counter set nextkey = NextKey+1;
insert into accounts values(NextKey,?,?);
commit transaction
23
Avoid Bottlenecks: Counters
Throughput
(statements/sec)
SQLServer
system
ad-hoc
0
10
20
30
40
50
Number of concurrent insertion threads
Throughput
(statements/sec)
Oracle
system
ad-hoc
0
10
20
30
40
Number of concurrent insertion threads
• System generated counter
(system) much better than a
counter managed as an
attribute value within a table
(ad hoc). Counter is separate
transaction.
• The Oracle counter can
become a bottleneck if every
update is logged to disk, but
caching many counter numbers
is possible.
• Counters may miss ids.
50
24
Insertion Points -- transactions
No Concurrent Transactions:
– Sequential [100 000 inserts, N threads]
Insertions into account table with clustered index on ssnum
Data is sorted on ssnum
Single insertion point
– Non Sequential [100 000 inserts, N threads]
Insertions into account table with clustered index on ssnum
Data is not sorted (uniform distribution)
100 000 insertion points
– Hashing Key [100 000 inserts, N threads]
Insertions into account table with extra attribute att with clustered index
on (ssnum, att)
Extra attribute att contains hash key (1021 possible values)
1021 insertion points
25
Insertion Points
Row Locking
Throughput
(tuples/sec)
800
600
400
200
0
0
10
20
30
sequential
non sequential
hashing key
40
50
60
Number of concurrent insertion threads
Page Locking
Throughput
(tuples/sec)
140
120
100
80
sequential
non sequential
hashing key
60
40
20
0
0
10
20
30
40
50
60
• Page locking: single
insertion point is a source
of contention (sequential
key with clustered index,
or heap)
• Row locking: No
contention between
successive insertions.
• DB2 v7.1 and Oracle 8i do
not support page locking.
Number of concurrent insertion threads
26
Atomicity and Durability
COMMITTED
COMMIT
Ø
ACTIVE
BEGIN
TRANS
(running, waiting)
ROLLBACK
ABORTED
• Every transaction either
commits or aborts. It
cannot change its mind
• Even in the face of
failures:
– Effects of committed
transactions should be
permanent;
– Effects of aborted
transactions should leave no
trace.
27
UNSTABLE STORAGE
DATABASE BUFFER
LOG BUFFER
lri lrj
Pi
WRITE
log records before commit
LOG
DATA
RECOVERY
Pj
WRITE
modified pages after commit
DATA
DATA
STABLE STORAGE
28
Log IO -- data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,
L_LINENUMBER
, L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );
– READ COMMITTED isolation level
– Empty table
– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller
from Adaptec (80Mb), 4x18Gb drives (10000RPM), Windows 2000.
29
Log IO -- transactions
No Concurrent Transactions:
Insertions [300 000 inserts, 10 threads], e.g.,
insert into lineitem values
(1,7760,401,1,17,28351.92,0.04,0.02,'N','O',
'1996-03-13','1996-02-12','1996-0322','DELIVER IN PERSON','TRUCK','blithely
regular ideas caj');
30
Throughput (tuples/sec)
Group Commits
• DB2 UDB v7.1 on
Windows 2000
• Log records of many
transactions are written
together
350
300
250
200
150
100
50
0
1
25
Size of Group Commit
– Increases throughput by
reducing the number of
writes
– at cost of increased
minimum response time.
31
Throughput (tuples/sec)
Put the Log on a Separate Disk
350
Log on same disk
Log on separate disk
300
250
200
150
100
50
0
controller cache
no controller cache
• DB2 UDB v7.1 on
Windows 2000
• 5 % performance
improvement if log is
located on a different disk
• Controller cache hides
negative impact
– mid-range server, with
Adaptec RAID controller
(80Mb RAM) and 2x18Gb
disk drives.
32
Tuning Database Writes
• Dirty data is written to disk
– When the number of dirty pages is greater than
a given parameter (Oracle 8)
– When the number of dirty pages crosses a given
threshold (less than 3% of free pages in the
database buffer for SQL Server 7)
– When the log is full, a checkpoint is forced.
This can have a significant impact on
performance.
33
Tune Checkpoint Intervals
• Oracle 8i on Windows
2000
• A checkpoint (partial flush
of dirty pages to disk)
occurs at regular intervals
or when the log is full:
Throughput Ratio
1.2
1
0.8
0.6
0.4
0.2
0
0 checkpoint
4 checkpoints
– Impacts the performance of
on-line processing
+ Reduces the size of log
+ Reduces time to recover
from a crash
34
Database Buffer Size
DATABASE PROCESSES
• Buffer too small, then hit
ratio too small
hit ratio =
(logical acc. - physical acc.) /
(logical acc.)
DATABASE
BUFFER
Paging
Disk
RAM
LOG
DATA
DATA
• Buffer too large, paging
• Recommended strategy:
monitor hit ratio and increase
buffer size until hit ratio
flattens out. If there is still
paging, then buy memory.
35
Buffer Size -- data
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
clustered index c on employees(lat); (unused)
– 10 distinct values of lat and long, 100 distinct values of hundreds1
and hundreds2
– 20000000 rows (630 Mb);
– Warm Buffer
– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb), 4x18Gb drives (10000 RPM),
Windows 2000.
36
Buffer Size -- queries
Queries:
– Scan Query
select sum(long) from employees;
– Multipoint query
select * from employees where lat = ?;
37
Database Buffer Size
Scan Query
• SQL Server 7 on Windows
2000
• Scan query:
Throughput
(Queries/sec)
0.1
0.08
0.06
0.04
0.02
0
0
200
400
600
800
1000
Buffer Size (Mb)
Multipoint Query
• Multipoint query:
160
Throughput
(Queries/sec)
– LRU (least recently used) does
badly when table spills to disk
as Stonebraker observed 20
years ago.
– Throughput increases with
buffer size until all data is
accessed from RAM.
120
80
40
0
0
200
400
600
800
1000
Buffer Size (Mb)
38
Scan Performance -- data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,
L_LINENUMBER
, L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );
–
–
–
–
600 000 rows
Lineitem tuples are ~ 160 bytes long
Cold Buffer
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb), 4x18Gb drives (10000RPM),
Windows 2000.
39
Scan Performance -- queries
Queries:
select avg(l_discount) from lineitem;
40
Throughput (Trans/sec)
Prefetching
• DB2 UDB v7.1 on
Windows 2000
0.2
0.15
0.1
scan
0.05
0
32Kb
64Kb
128Kb
Prefetching
256Kb
• Throughput increases
up to a certain point
when prefetching size
increases.
41
Throughput (Trans/sec)
Usage Factor
0.2
0.15
0.1
scan
0.05
0
70
80
90
Usage Factor (%)
100
• DB2 UDB v7.1 on
Windows 2000
• Usage factor is the
percentage of the page
used by tuples and
auxilliary data structures
(the rest is reserved for
future)
• Scan throughput increases
with usage factor.
42
Tuning the Storage Subsystem
43
Outline
• Storage Subsystem Components
– Moore’s law and consequences
– Magnetic disk performances
• From SCSI to SAN, NAS and Beyond
– Storage virtualization
• Tuning the Storage Subsystem
– RAID levels
– RAID controller cache
44
Exponential Growth
Moore’s law
– Every 18 months:
• New processing = sum of all existing processing
• New storage = sum of all existing storage
– 2x / 18 months ~ 100x / 10 years
http://www.intel.com/research/silicon/moorespaper.pdf
45
Consequences of “Moore’s law”
• Over the last decade:
–
–
–
–
–
–
10x better access time
10x more bandwidth
100x more capacity
4000x lower media price
Scan takes 10x longer (3 min vs 45 min)
Data on disk is accessed 25x less often (on average)
46
3500
3000
2500
2000
1500
1000
500
0
• Disk Sales double
every nine months
Sales
– Because volume of
stored data increases
Year
Graph courtesy of Joe Hellerstein
Source: J. Porter, Disk/Trend, Inc.
http://www.disktrend.com/pdf/portrpkg.pdf
20 00
19 99
19 98
19 97
19 96
19 95
19 94
19 93
19 92
19 91
19 90
19 89
Moore's
Law
19 88
Petabytes
Data Flood
•
•
•
•
Data Warehouses
Internet Logs
Web Archives
Sky Survey
– Because media price
drops much faster than
areal density.
47
Memory Hierarchy
Access Time
Price $/ Mb
100
10
0.2
0.2
(nearline)
Processor cache
1 ns
RAM
x10
Disks
x10 6
Tapes / Optical Disks
x10 10
48
Magnetic Disks
tracks
platter
spindle
read/write
head
actuator
disk arm
Controller
1956: IBM (RAMAC) first
disk drive
5 Mb – 0.002 Mb/in2
35000$/year
9 Kb/sec
1980: SEAGATE
first 5.25’’ disk drive
5 Mb – 1.96 Mb/in2
625 Kb/sec
1999: IBM MICRODRIVE
first 1’’ disk drive
340Mb
6.1 MB/sec
disk interface
49
Magnetic Disks
• Access Time (2001)
–
–
–
–
Controller overhead (0.2 ms)
Seek Time (4 to 9 ms)
Rotational Delay (2 to 6 ms)
Read/Write Time (10 to 500
KB/ms)
• Disk Interface
– IDE (16 bits, Ultra DMA - 25
MHz)
– SCSI: width (narrow 8 bits vs.
wide 16 bits) - frequency
(Ultra3 - 80 MHz).
http://www.pcguide.com/ref/hdd/50
RAID Levels
• RAID 0: striping (no redundancy)
• RAID 1: mirroring (2 disks)
• RAID 5: parity checking
– Read: stripes read from multiple disks (in parallel)
– Write: 2 reads + 2 writes
• RAID 10: striping and mirroring
• Software vs. Hardware RAID:
– Software RAID: run on the server’s CPU
– Hardware RAID: run on the RAID controller’s CPU
51
Why 4 read/writes when updating
a single stripe using RAID 5?
• Read old data stripe; read parity stripe (2
reads)
• XOR old data stripe with replacing one.
• Take result of XOR and XOR with parity
stripe.
• Write new data stripe and new parity stripe
(2 writes).
52
RAID Levels -- data
Settings:
accounts( number, branchnum, balance);
create clustered index c on accounts(number);
– 100000 rows
– Cold Buffer
– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal
RAID controller from Adaptec (80Mb), 4x18Gb drives
(10000RPM), Windows 2000.
53
RAID Levels -- transactions
No Concurrent Transactions:
– Read Intensive:
select avg(balance) from accounts;
– Write Intensive, e.g. typical insert:
insert into accounts values (690466,6840,2272.76);
Writes are uniformly distributed.
54
RAID Levels
Throughput (tuples/sec)
Read-Intensive
• SQL Server7 on Windows
2000 (SoftRAID means
striping/parity at host)
• Read-Intensive:
80000
60000
40000
20000
0
SoftRAID5
RAID5
RAID0
RAID10
RAID1
Single
Disk
Throughput (tuples/sec)
Write-Intensive
160
120
– Using multiple disks
(RAID0, RAID 10, RAID5)
increases throughput
significantly.
• Write-Intensive:
80
– Without cache, RAID 5
suffers. With cache, it is ok.
40
0
SoftRAID5
RAID5
RAID0
RAID10
RAID1
Single
Disk
55
RAID Levels
• Log File
– RAID 1 is appropriate
• Fault tolerance with high write throughput. Writes are
synchronous and sequential. No benefits in striping.
• Temporary Files
– RAID 0 is appropriate.
• No fault tolerance. High throughput.
• Data and Index Files
– RAID 5 is best suited for read intensive apps or if the
RAID controller cache is effective enough.
– RAID 10 is best suited for write intensive apps.
56
Controller Prefetching no,
Write-back yes.
• Read-ahead:
– Prefetching at the disk controller level.
– No information on access pattern.
– Better to let database management system do it.
• Write-back vs. write through:
– Write back: transfer terminated as soon as data is
written to cache.
• Batteries to guarantee write back in case of power failure
– Write through: transfer terminated as soon as data is
written to disk.
57
SCSI Controller Cache -- data
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
create clustered index c on
employees(hundreds2);
– Employees table partitioned over two disks; Log on a separate
disk; same controller (same channel).
– 200 000 rows per table
– Database buffer size limited to 400 Mb.
– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb), 4x18Gb drives (10000RPM),
Windows 2000.
58
SCSI (not disk) Controller Cache
-- transactions
No Concurrent Transactions:
update employees set lat = long, long = lat
where hundreds2 = ?;
– cache friendly: update of 20,000 rows (~90Mb)
– cache unfriendly: update of 200,000 rows (~900Mb)
59
SCSI Controller Cache
2 Disks - Cache Size 80Mb
Throughput (tuples/sec)
2000
no cache
cache
1500
1000
500
0
cache friendly (90Mb)
cache unfriendly (900Mb)
• SQL Server 7 on Windows
2000.
• Adaptec ServerRaid
controller:
– 80 Mb RAM
– Write-back mode
• Updates
• Controller cache increases
throughput whether
operation is cache friendly or
not.
– Efficient replacement policy!
60
Index Tuning
• Index issues
– Indexes may be better or worse than scans
– Multi-table joins that run on for hours, because
the wrong indexes are defined
– Concurrency control bottlenecks
– Indexes that are maintained and never used
61
Index
An index is a data structure that supports
efficient access to data
Condition
on
attribute
value
index
(search key)
Set of
Records
Matching
records
62
Index
An index is a data structure that supports
efficient access to data
Condition
on
attribute
value
index
(search key)
Set of
Records
Matching
records
63
Search Keys
• A (search) key is a sequence of attributes.
create index i1 on accounts(branchnum, balance);
• Types of keys
– Sequential: the value of the key is monotonic
with the insertion order (e.g., counter or
timestamp)
– Non sequential: the value of the key is
unrelated to the insertion order (e.g., social
security number)
64
Data Structures
• Most index data structures can be viewed as trees.
• In general, the root of this tree will always be in
main memory, while the leaves will be located on
disk.
– The performance of a data structure depends on the
number of nodes in the average path from the root to
the leaf.
– Data structure with high fan-out (maximum number of
children of an internal node) are thus preferred.
65
B+-Tree
• A B+-Tree is a balanced tree whose leaves
contain a sequence of key-pointer pairs.
96
75 83
33 48 69
75 80 81
107
83 92 95
96 98 103
107 110 120
66
B+-Tree Performance
• Tree levels
– Tree Fanout
• Size of key
• Page utilization
• Tree maintenance
– Online
• On inserts
• On deletes
– Offline
• Tree locking
• Tree root in main memory
67
B+-Tree Performance
• Key length influences fanout
– Choose small key when creating an index
– Key compression
• Prefix compression (Oracle 8, MySQL): only store that part of
the key that is needed to distinguish it from its neighbors: Smi,
Smo, Smy for Smith, Smoot, Smythe.
• Front compression (Oracle 5): adjacent keys have their front
portion factored out: Smi, (2)o, (2)y. There are problems with
this approach:
– Processor overhead for maintenance
– Locking Smoot requires locking Smith too.
68
Hash Index
• A hash index stores key-value pairs based
on a pseudo-randomizing function called a
hash function.
Hashed key
key
2341
Hash
function
0
1
n
values
R1 R5
R3 R6 R9
R14 R17 R21
R25
The length of
these chains impacts
performance
69
Clustered / Non clustered index
• Clustered index
(primary index)
– A clustered index on
attribute X co-locates
records whose X values are
near to one another.
Records
• Non-clustered index
(secondary index)
– A non clustered index does
not constrain table
organization.
– There might be several nonclustered indexes per table.
Records
70
Dense / Sparse Index
• Sparse index
• Dense index
– Pointers are associated to
pages
P1
P2
Pi
– Pointers are associated to
records
– Non clustered indexes are
dense
record
record
record
71
Index Implementations in some
major DBMS
• SQL Server
– B+-Tree data structure
– Clustered indexes are sparse
– Indexes maintained as
updates/insertions/deletes
are performed
• DB2
– B+-Tree data structure,
spatial extender for R-tree
– Clustered indexes are dense
– Explicit command for index
reorganization
• Oracle
– B+-tree, hash, bitmap,
spatial extender for R-Tree
– clustered index
• Index organized table
(unique/clustered)
• Clusters used when
creating tables.
• TimesTen (Main-memory
DBMS)
– T-tree
72
Types of Queries
•
Point Query
•
SELECT balance
FROM accounts
WHERE number = 1023;
•
Multipoint Query
SELECT balance
FROM accounts
WHERE branchnum = 100;
Range Query
SELECT number
FROM accounts
WHERE balance > 10000 and
balance <= 20000;
•
Prefix Match Query
SELECT *
FROM employees
WHERE name = ‘J*’ ;
73
More Types of Queries
•
Extremal Query
SELECT *
FROM accounts
WHERE balance =
max(select balance from accounts)
•
•
Grouping Query
SELECT branchnum, avg(balance)
FROM accounts
GROUP BY branchnum;
•
Join Query
Ordering Query
SELECT *
FROM accounts
ORDER BY balance;
SELECT distinct branch.adresse
FROM accounts, branch
WHERE
accounts.branchnum =
branch.number
and accounts.balance > 10000;
74
Index Tuning -- data
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
clustered index c on employees(hundreds1)
with fillfactor = 100;
nonclustered index nc on employees (hundreds2);
index nc3 on employees (ssnum, name, hundreds2);
index nc4 on employees (lat, ssnum, name);
– 1000000 rows ; Cold buffer
– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec
(80Mb), 4x18Gb drives (10000RPM), Windows 2000.
75
Index Tuning -- operations
Operations:
– Update:
update employees set name = ‘XXX’ where ssnum = ?;
– Insert:
insert into employees values
(1003505,'polo94064',97.48,84.03,4700.55,3987.2);
– Multipoint query:
select * from employees where hundreds1= ?;
select * from employees where hundreds2= ?;
– Covered query:
select ssnum, name, lat from employees;
– Range Query:
select * from employees where long between ? and ?;
– Point Query:
select * from employees where ssnum = ?
76
Clustered Index
clustered
nonclustered
no index
Throughput ratio
1
0.8
0.6
0.4
0.2
0
SQLServer
Oracle
DB2
• Multipoint query that
returns 100 records
out of 1000000.
• Cold buffer
• Clustered index is
twice as fast as nonclustered index and
orders of magnitude
faster than a scan.
77
Index “Face Lifts”
Throughput
(queries/sec)
SQLServer
100
80
60
40
20
0
No maintenance
Maintenance
0
20
40
60
80
% Increase in Table Size
100
• Index is created with
fillfactor = 100.
• Insertions cause page splits
and extra I/O for each query
• Maintenance consists in
dropping and recreating the
index
• With maintenance
performance is constant
while performance degrades
significantly if no
maintenance is performed.
78
Index Maintenance
Throughput
(queries/sec)
Oracle
20
15
10
No
maintenance
5
0
0
20
40
60
80
% Increase in Table Size
100
• In Oracle, clustered index are
approximated by an index
defined on a clustered table
• No automatic physical
reorganization
• Index defined with pctfree = 0
• Overflow pages cause
performance degradation
79
Covering Index - defined
• Select name from employee where
department = “marketing”
• Good covering index would be on
(department, name)
• Index on (name, department) less useful.
• Index on department alone moderately
useful.
80
Throughput (queries/sec)
Covering Index - impact
70
60
covering
50
covering - not
ordered
non clustering
40
30
20
clustering
10
0
SQLServer
• Covering index performs
better than clustering
index when first attributes
of index are in the where
clause and last attributes
in the select.
• When attributes are not in
order then performance is
much worse.
81
Throughput (queries/sec)
Scan Can Sometimes Win
scan
non clustering
0
5
10
15
% of selected records
20
25
• IBM DB2 v7.1 on
Windows 2000
• Range Query
• If a query retrieves 10% of
the records or more,
scanning is often better
than using a nonclustering non-covering
index. Crossover > 10%
when records are large or
table is fragmented on
disk – scan cost increases.
82
Throughput (updates/sec)
Index on Small Tables
18
16
14
12
10
8
6
4
2
0
no index
index
• Small table: 100 records,
i.e., a few pages.
• Two concurrent processes
perform updates (each
process works for 10ms
before it commits)
• No index: the table is
scanned for each update.
No concurrent updates.
• A clustered index allows
to take advantage of row
locking.
83
Bitmap vs. Hash vs. B+-Tree
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
create cluster c_hundreds (hundreds2 number(8)) PCTFREE 0;
create cluster c_ssnum(ssnum integer) PCTFREE 0 size 60;
create cluster c_hundreds(hundreds2 number(8)) PCTFREE 0
HASHKEYS 1000 size 600;
create cluster c_ssnum(ssnum integer) PCTFREE 0 HASHKEYS
1000000 SIZE 60;
create bitmap index b on employees (hundreds2);
create bitmap index b2 on employees (ssnum);
– 1000000 rows ; Cold buffer
– Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec
84
(80Mb), 4x18Gb drives (10000RPM), Windows 2000.
Multipoint query: B-Tree, Hash
Tree, Bitmap
Throughput (queries/sec)
Multipoint Queries
25
20
15
10
5
0
B-Tree
Hash index
Bitmap index
• There is an overflow chain
in a hash index because
hundreds2 has few values
• In a clustered B-Tree
index records are on
contiguous pages.
• Bitmap is proportional to
size of table and nonclustered for record
access.
85
B-Tree, Hash Tree, Bitmap
Throughput (queries/sec)
Range Queries
• Hash indexes don’t
help when evaluating
range queries
0.5
0.4
0.3
0.2
0.1
0
B-Tree
Hash index
Bitmap index
• Hash index
outperforms B-tree on
point queries
Throughput(queries/sec)
Point Queries
60
50
40
30
20
10
0
B-Tree
hash index
86
Tuning Relational Systems
• Schema Tuning
– Denormalization, Vertical Partitioning
• Query Tuning
– Query rewriting
– Materialized views
87
Denormalizing -- data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,
L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );
region( R_REGIONKEY, R_NAME, R_COMMENT );
nation( N_NATIONKEY, N_NAME, N_REGIONKEY, N_COMMENT,);
supplier( S_SUPPKEY, S_NAME, S_ADDRESS, S_NATIONKEY,
S_PHONE,
S_ACCTBAL, S_COMMENT);
– 600000 rows in lineitem, 25 nations, 5 regions, 500 suppliers
88
Denormalizing -- transactions
lineitemdenormalized ( L_ORDERKEY, L_PARTKEY ,
L_SUPPKEY, L_LINENUMBER, L_QUANTITY,
L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT, L_REGIONNAME);
– 600000 rows in lineitemdenormalized
– Cold Buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM,
3x18Gb drives (10000RPM), Windows 2000.
89
Queries on Normalized vs.
Denormalized Schemas
Queries:
select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY,
L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS,
L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE,
L_COMMENT, R_NAME
from LINEITEM, REGION, SUPPLIER, NATION
where
L_SUPPKEY = S_SUPPKEY
and S_NATIONKEY = N_NATIONKEY
and N_REGIONKEY = R_REGIONKEY
and R_NAME = 'EUROPE';
select L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY,
L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS,
L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE,
L_COMMENT, L_REGIONNAME
from LINEITEMDENORMALIZED
where L_REGIONNAME = 'EUROPE';
90
Throughput (Queries/sec)
Denormalization
0.002
0.0015
0.001
0.0005
0
normalized
denormalized
• TPC-H schema
• Query: find all lineitems
whose supplier is in
Europe.
• With a normalized schema
this query is a 4-way join.
• If we denormalize
lineitem and add the name
of the region for each
lineitem (foreign key
denormalization)
throughput improves 30%
91
Vertical Partitioning
• Consider account(id, balance, homeaddress)
• When might it be a good idea to do a
“vertical partitioning” into
account1(id,balance) and
account2(id,homeaddress)?
• Join vs. size.
92
Vertical Partitioning
• Which design is better
depends on the query
pattern:
– The application that sends a
monthly statement is the
principal user of the address
of the owner of an account
– The balance is updated or
examined several times a
day.
• The second schema might
be better because the
relation (account_ID,
balance) can be made
smaller:
– More account_ID, balance
pairs fit in memory, thus
increasing the hit ratio
– A scan performs better
because there are fewer
pages.
93
Tuning Normalization
• A single normalized relation XYZ is better than
two normalized relations XY and XZ if the single
relation design allows queries to access X, Y and
Z together without requiring a join.
• The two-relation design is better iff:
– Users access tend to partition between the two sets Y
and Z most of the time
– Attributes Y or Z have large values
94
Vertical Partitioning
and Scan
• R (X,Y,Z)
– X is an integer
– YZ are large strings
Througput (queries/sec)
0.02
0.015
0.01
0.005
0
No Partitioning Vertical
No Partitioning Vertical
XYZ
Partitioning - XYZ
XY
Partitioning - XY
• Scan Query
• Vertical partitioning
exhibits poor performance
when all attributes are
accessed.
• Vertical partitioning
provides a sped up if only
two of the attributes are
accessed.
95
Vertical Partitioning
and Point Queries
• R (X,Y,Z)
– X is an integer
– YZ are large strings
Throughput (queries/sec)
1000
800
600
400
no vertical partitioning
vertical partitioning
200
0
0
20
40
60
% of access that only concern XY
80
100
• A mix of point queries
access either XYZ or XY.
• Vertical partitioning gives
a performance advantage
if the proportion of queries
accessing only XY is
greater than 20%.
• The join is not expensive
compared to a simple
look-up.
96
Queries
Settings:
employee(ssnum, name, dept, salary, numfriends);
student(ssnum, name, course, grade);
techdept(dept, manager, location);
clustered
index i1 on employee (ssnum);
nonclustered index i2 on employee (name);
nonclustered index i3 on employee (dept);
clustered
index i4 on student
nonclustered index i5 on student
clustered
(ssnum);
(name);
index i6 on techdept (dept);
– 100000 rows in employee, 100000 students, 10 departments; Cold buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
97
Queries – View on Join
View Techlocation:
create view techlocation as
select ssnum, techdept.dept, location
from employee, techdept
where employee.dept = techdept.dept;
Queries:
– Original:
select dept from techlocation where ssnum = ?;
– Rewritten:
select dept from employee where ssnum = ?;
98
Query Rewriting - Views
Throughput improvement percent
80
70
60
SQLServer 2000
Oracle 8i
DB2 V7.1
50
40
30
20
10
0
view
• All systems expand the
selection on a view into a
join
• The difference between a
plain selection and a join
(on a primary key-foreign
key) followed by a
projection is greater on
SQL Server than on
Oracle and DB2 v7.1.
99
Queries – Correlated Subqueries
Queries:
– Original:
select ssnum
from employee e1
where salary =
(select max(salary)
from employee e2
where e2.dept = e1.dept);
– Rewritten:
select max(salary) as bigsalary, dept
into TEMP
from employee group by dept;
select ssnum
from employee, TEMP
where salary = bigsalary
and employee.dept = temp.dept;
100
Query Rewriting –
Correlated Subqueries
> 1000
Throughput improvement percent
80
> 10000
70
60
50
SQLServer 2000
Oracle 8i
DB2 V7.1
40
30
20
10
0
-10
correlated subquery
• SQL Server 2000 does a
good job at handling the
correlated subqueries (a
hash join is used as
opposed to a nested loop
between query blocks)
– The techniques
implemented in SQL Server
2000 are described in
“Orthogonal Optimization
of Subqueries and
Aggregates” by C.GalindoLegaria and M.Joshi,
SIGMOD 2001.
101
Eliminate unneeded DISTINCTs
• Query: Find employees who work in the
information systems department. There should be
no duplicates.
SELECT distinct ssnum
FROM employee
WHERE dept = ‘information systems’
• DISTINCT is unnecessary, since ssnum is a key of
employee so certainly is a key of a subset of
employee.
102
Eliminate unneeded DISTINCTs
• Query: Find social security numbers of
employees in the technical departments.
There should be no duplicates.
SELECT DISTINCT ssnum
FROM employee, tech
WHERE employee.dept = tech.dept
• Is DISTINCT needed?
103
Distinct Unnecessary Here Too
• Since dept is a key of the tech table, each
employee record will join with at most one
record in tech.
• Because ssnum is a key for employee,
distinct is unnecessary.
104
Reaching
• The relationship among DISTINCT, keys and
joins can be generalized:
– Call a table T privileged if the fields returned by the
select contain a key of T
– Let R be an unprivileged table. Suppose that R is joined
on equality by its key field to some other table S, then
we say R reaches S.
– Now, define reaches to be transitive. So, if R1 reaches
R2 and R2 reaches R3 then say that R1 reaches R3.
105
Reaches: Main Theorem
• There will be no duplicates among the
records returned by a selection, even in the
absence of DISTINCT if one of the two
following conditions hold:
– Every table mentioned in the FROM clause is
privileged
– Every unprivileged table reaches at least one
privileged table.
106
Reaches: Proof Sketch
• If every relation is privileged then there are no
duplicates
– The keys of those relations are in the from clause
• Suppose some relation T is not privileged but
reaches at least one privileged one, say R. Then
the qualifications linking T with R ensure that
each distinct combination of privileged records is
joined with at most one record of T.
107
Reaches: Example 1
SELECT ssnum
FROM employee, tech
WHERE employee.manager = tech.manager
• The same employee record may match several
tech records (because manager is not a key of
tech), so the ssnum of that employee record may
appear several times.
• Tech does not reach the privileged relation
employee.
108
Reaches: Example 2
SELECT ssnum, tech.dept
FROM employee, tech
WHERE employee.manager = tech.manager
• Each repetition of a given ssnum vlaue
would be accompanied by a new tech.dept
since tech.dept is a key of tech
• Both relations are privileged.
109
Reaches: Example 3
SELECT student.ssnum
FROM student, employee, tech
WHERE student.name = employee.name
AND employee.dept = tech.dept;
• Student is priviledged
• Employee does not reach student (name is not a
key of employee)
• DISTINCT is needed to avoid duplicates.
110
Aggregate Maintenance -- data
Settings:
orders( ordernum, itemnum, quantity, storeid, vendorid );
create clustered index i_order on orders(itemnum);
store( storeid, name );
item(itemnum, price);
create clustered index i_item on item(itemnum);
vendorOutstanding( vendorid, amount);
storeOutstanding( storeid, amount);
– 1000000 orders, 10000 stores, 400000 items; Cold buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
111
Aggregate Maintenance -triggers
Triggers for Aggregate Maintenance
create trigger updateVendorOutstanding on orders for insert as
update vendorOutstanding
set amount =
(select vendorOutstanding.amount+sum(inserted.quantity*item.price)
from inserted,item
where inserted.itemnum = item.itemnum
)
where vendorid = (select vendorid from inserted) ;
create trigger updateStoreOutstanding on orders for insert as
update storeOutstanding
set amount =
(select storeOutstanding.amount+sum(inserted.quantity*item.price)
from inserted,item
where inserted.itemnum = item.itemnum
)
where storeid = (select storeid from inserted)
112
Aggregate Maintenance -transactions
Concurrent Transactions:
– Insertions
insert into orders values
(1000350,7825,562,'xxxxxx6944','vendor4');
– Queries (first without, then with redundant tables)
select orders.vendor, sum(orders.quantity*item.price)
from orders,item
where orders.itemnum = item.itemnum
group by orders.vendorid;
vs. select * from vendorOutstanding;
select store.storeid, sum(orders.quantity*item.price)
from orders,item, store
where orders.itemnum = item.itemnum
and orders.storename = store.name
group by store.storeid;
vs. select * from storeOutstanding;
113
Aggregate Maintenance
pect. of gain with aggregate maintenance
35000
30000
25000
20000
15000
10000
5000
0
-5000
31900
21900
- 62.2
insert
vendor total
store total
• SQLServer 2000 on
Windows 2000
• Using triggers for
view maintenance
• If queries frequent or
important, then
aggregate maintenance
is good.
114
Superlinearity -- data
Settings:
sales( id, itemid, customerid, storeid, amount, quantity);
item (itemid);
customer (customerid);
store (storeid);
A sale is successful if all foreign keys are present.
successfulsales(id, itemid, customerid, storeid, amount,
quantity);
unsuccessfulsales(id, itemid, customerid, storeid, amount,
quantity);
tempsales( id, itemid, customerid, storeid,
amount,quantity);
115
Superlinearity -- indexes
Settings (non-clustering, dense indexes):
index s1 on item(itemid);
index s2 on customer(customerid);
index s3 on store(storeid);
index succ on successfulsales(id);
– 1000000 sales, 400000 customers, 40000 items, 1000 stores
– Cold buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
116
Superlinearity -- queries
Queries:
– Insert/create indexdelete
insert into successfulsales
select sales.id, sales.itemid, sales.customerid,
sales.storeid, sales.amount, sales.quantity
from sales, item, customer, store
where sales.itemid = item.itemid
and sales.customerid = customer.customerid
and sales.storeid = store.storeid;
insert into unsuccessfulsales
select * from sales;
go
delete from unsuccessfulsales
where id in (select id from successfulsales)
117
Superlinearity -- batch queries
Queries:
– Small batches
DECLARE @Nlow INT;
DECLARE @Nhigh INT;
DECLARE @INCR INT;
set @INCR = 100000
set @NLow = 0
set @Nhigh = @INCR
WHILE (@NLow <= 500000)
BEGIN
insert into tempsales
select * from sales
where id between @NLow and @Nhigh
set @Nlow = @Nlow + @INCR
set @Nhigh = @Nhigh + @INCR
delete from tempsales
where id in (select id from successfulsales);
insert into unsuccessfulsales
select * from tempsales;
delete from tempsales;
END
118
Superlinearity -- outer join
Queries:
– outerjoin
insert into successfulsales
select sales.id, item.itemid, customer.customerid, store.storeid,
sales.amount, sales.quantity
from
((sales left outer join item on sales.itemid = item.itemid)
left outer join customer on sales.customerid = customer.customerid)
left outer join store on sales.storeid = store.storeid;
insert into unsuccessfulsales
select *
from successfulsales
where itemid is null
or customerid is null
or storeid is null;
go
delete from successfulsales
where itemid is null
or customerid is null
or storeid is null
119
Circumventing Superlinearity
140
Response Time (sec)
120
100
insert/delete no index
insert/delete indexed
small batches
outer join
80
60
40
20
0
small
large
• SQL Server 2000
• Outer join achieves the
best response time.
• Small batches do not
help because overhead
of crossing the
application interface is
higher than the benefit
of joining with smaller
tables.
120
Tuning the Application Interface
• 4GL
– Power++, Visual basic
• Programming language +
Call Level Interface
– ODBC: Open DataBase Connectivity
– JDBC: Java based API
– OCI (C++/Oracle), CLI (C++/ DB2), Perl/DBI
• In the following experiments, the client program is
located on the database server site. Overhead is
due to crossing the application interface.
121
Looping can hurt -- data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,
L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );
– 600 000 rows; warm buffer.
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
122
Looping can hurt -- queries
• Queries:
– No loop:
sqlStmt = “select * from lineitem where l_partkey <= 200;”
odbc->prepareStmt(sqlStmt);
odbc->execPrepared(sqlStmt);
– Loop:
sqlStmt = “select * from lineitem where l_partkey = ?;”
odbc->prepareStmt(sqlStmt);
for (int i=1; i<100; i++)
{
odbc->bindParameter(1, SQL_INTEGER, i);
odbc->execPrepared(sqlStmt);
}
123
throughput (records/sec)
Looping can Hurt
600
500
400
300
200
100
0
loop
no loop
• SQL Server 2000 on
Windows 2000
• Crossing the application
interface has a significant
impact on performance.
• Why would a programmer
use a loop instead of
relying on set-oriented
operations: objectorientation?
124
Cursors are Death -- data
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
– 100000 rows ; Cold buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
125
Cursors are Death -- queries
Queries:
– No cursor
select * from employees;
– Cursor
DECLARE d_cursor CURSOR FOR select * from employees;
OPEN d_cursor
while (@@FETCH_STATUS = 0)
BEGIN
FETCH NEXT from d_cursor
END
CLOSE d_cursor
go
126
Throughput (records/sec)
Cursors are Death
5000
4000
3000
2000
1000
0
cursor
SQL
• SQL Server 2000 on
Windows 2000
• Response time is a few
seconds with a SQL
query and more than
an hour iterating over
a cursor.
127
Retrieve Needed Columns Only data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,
L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );
create index i_nc_lineitem on lineitem (l_orderkey,
l_partkey, l_suppkey, l_shipdate, l_commitdate);
– 600 000 rows; warm buffer.
– Lineitem records are ~ 10 bytes long
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
128
Retrieve Needed Columns Only queries
Queries:
– All
Select * from lineitem;
– Covered subset
Select l_orderkey, l_partkey, l_suppkey, l_shipdate,
l_commitdate from lineitem;
129
Throughput (queries/msec)
Retrieve Needed Columns Only
• Avoid transferring
unnecessary data
• May enable use of a
covering index.
• In the experiment the
subset contains ¼ of the
attributes.
1.75
1.5
1.25
all
covered subset
1
0.75
0.5
0.25
0
no index
index
Experiment performed on
Oracle8iEE on Windows 2000.
– Reducing the amount of
data that crosses the
application interface yields
significant performance
improvement.
130
Bulk Loading Data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,
L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );
– Initially the table is empty; 600 000 rows to be inserted (138Mb)
– Table sits one disk. No constraint, index is defined.
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
131
Bulk Loading Queries
Oracle 8i
sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl
load data
infile "lineitem.tbl"
into table LINEITEM append
fields terminated by '|'
(
L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER,
L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX,
L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYY-MMDD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE
DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE,
L_COMMENT
)
132
Direct Path
Throughput (rec/sec)
50000
40000
30000
20000
10000
65
0
conventional
direct path
insert
Experiment performed on
Oracle8iEE on Windows 2000.
• Direct path loading
bypasses the query
engine and the storage
manager. It is orders
of magnitude faster
than for conventional
bulk load (commit
every 100 records) and
inserts (commit for
each record).
133
Batch Size
Throughput (records/sec)
5000
4000
3000
2000
1000
0
0
100000 200000 300000 400000 500000 600000
Experiment performed on
SQL Server 2000
on Windows 2000.
• Throughput increases
steadily when the batch
size increases to 100000
records.Throughput
remains constant
afterwards.
• Trade-off between
performance and amount
of data that has to be
reloaded in case of
problem.
134
Tuning
E-Commerce Applications
Database-backed web-sites:
– Online shops
– Shop comparison portals
– MS TerraServer
135
E-commerce Application
Architecture
DB cache
Application servers Database server
DB cache
Web cache
Web cache
Web servers
Web cache
Clients
136
E-commerce Application
Workload
• Touristic searching (frequent, cached)
– Access the top few pages. Pages may be personalized.
Data may be out-of-date.
• Category searching (frequent, partly cached and
need for timeliness guarantees)
– Down some hierarchy, e.g., men’s clothing.
• Keyword searching (frequent, uncached, need for
timeliness guarantees)
• Shopping cart interactions (rare, but transactional)
• Electronic purchasing (rare, but transactional)
137
Design Issues
• Need to keep historic information
– Electronic payment acknowledgements get lost.
• Preparation for variable load
– Regular patterns of web site accesses during the day,
and within a week.
• Possibility of disconnections
– State information transmitted to the client (cookies)
• Special consideration for low bandwidth
• Schema evolution
– Representing e-commerce data as attribute-value pairs
(IBM Websphere)
138
Caching
• Web cache:
– Static web pages
– Caching fragments of dynamically created web pages
• Database cache (Oracle9iAS, TimesTen’s
FrontTier)
– Materialized views to represent cached data.
– Queries are executed either using the database cache or
the database server. Updates are propagated to keep the
cache(s) consistent.
– Note to vendors: It would be good to have queries
distributed between cache and server.
139
Ecommerce -- setting
Settings:
shoppingcart( shopperid, itemid, price, qty);
– 500000 rows; warm buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
140
Ecommerce -- transactions
Concurrent Transactions:
– Mix
insert into shoppingcart values (107999,914,870,214);
update shoppingcart set Qty = 10 where shopperid =
95047 and itemid = 88636;
delete from shoppingcart where shopperid = 86123 and
itemid = 8321;
select shopperid, itemid, qty, price from shoppingcart
where shopperid = ?;
– Queries Only
select shopperid, itemid, qty, price from shoppingcart
where shopperid = ?;
141
Connection Pooling (no refusals)
Response Time
250
200
150
100
connection pooling
50
simple connections
0
20
40
60
80
100
120
Number of client threads
Experiment performed on
Oracle8i on Windows 2000
140
• Each thread establishes a
connection and performs 5
insert statements.
• If a connection cannot be
established the thread waits 15
secs before trying again.
• The number of connection is
limited to 60 on the database
server.
• Using connection pooling, the
requests are queued and
serviced when possible. There
are no refused connections.
142
Throughput (queries/sec)
Indexing
90
80
70
60
50
40
30
20
10
0
• Using a clustered
index on shopperid in
the shopping cart
provides:
no index
clustered index
Mix
Queries Only
Experiment performed on
SQL Server 2000
on Windows 2000
– Query speed-up
– Update/Deletion speedup
143
Capacity Planning
Entry (S1)
0.4
0.5
Search (S2)
• Arrival Rate
– A1 is given as an assumption
– A2 = (0.4 A1) + (0.5 A2)
– A3 = 0.1 A2
• Service Time (S)
0.1
– S1, S2, S3 are measured
• Utilization
Checkout (S3)
Getting the demand assumptions right
is what makes capacity planning hard
– U=AxS
• Response Time
– R = U/(A(1-U)) = S/(1-U)
(assuming Poisson arrivals)
144
How to Handle Multiple Servers
• Suppose one has n servers for some task
that requires S time for a single server to
perform.
• The perfect parallelism model is that it is as
if one has a single server that is n times as
fast.
• However, this overstates the advantage of
parallelism, because even if there were no
waiting, single tasks require S time.
145
Rough Estimate for Multiple Servers
• There are two components to response time:
waiting time + service time.
• In the parallel setting, the service time is
still S.
• The waiting time however can be well
estimated by a server that is n times as fast.
146
Approximating waiting time for n
parallel servers.
• Recall: R = U/(A(1-U)) = S/(1-U)
• On an n-times faster server, service time is
divided by n, so the single processor
utilization U is also divided by n. So we
would get: Rideal = (S/n)/(1 – (U/n)).
• That Rideal = serviceideal + waitideal.
• So waitideal = Rideal – S/n
• Our assumption: waitideal ~ wait for n
processors.
147
Approximating response time for
n parallel servers
• Waiting time for n parallel processors ~
(S/n)/(1 – (U/n)) – S/n =
(S/n) ( 1/(1-(U/n)) – 1) =
(S/(n(1 – U/n)))(U/n) =
(S/(n – U))(U/n)
• So, response time for n parallel processors
is above waiting time + S.
148
Example
•
•
•
•
A = 8 per second.
S = 0.1 second.
U = 0.8.
Single server response time = S/(1-U) = 0.1/0.2 =
0.5 seconds.
• If we have 2 servers, then we estimate waiting
time to be (0.1/(2-0.8))(0.4) = 0.04/1.2 = 0.033. So
the response time is 0.133.
• For a 2-times faster server, S = 0.05, U = 0.4, so
response time is 0.05/0.6 = 0.0833
149
Example -- continued
•
•
•
•
A = 8 per second.
S = 0.1 second.
U = 0.8.
If we have 4 servers, then we estimate waiting
time to be (S/(n – U))(U/n) = 0.1/(3.2) * (0.8/4) =
0.02/3.2 = 0.00625 So response time is 0.10625.
150
Datawarehouse Tuning
• Aggregate (strategic) targeting:
– Aggregates flow up from a wide selection of
data, and then
– Targeted decisions flow down
• Examples:
– Riding the wave of clothing fads
– Tracking delays for frequent-flyer customers
151
Data Warehouse
Workload
• Broad
– Aggregate queries over ranges of values, e.g., find
the total sales by region and quarter.
• Deep
– Queries that require precise individualized
information, e.g., which frequent flyers have been
delayed several times in the last month?
• Dynamic (vs. Static)
– Queries that require up-to-date information, e.g.
which nodes have the highest traffic now?
152
Tuning Knobs
• Indexes
• Materialized views
• Approximation
153
Bitmaps -- data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY, L_LINENUMBER,
L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );
create bitmap index b_lin_2 on lineitem(l_returnflag);
create bitmap index b_lin_3 on lineitem(l_linestatus);
create bitmap index b_lin_4 on lineitem(l_linenumber);
– 100000 rows ; cold buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
154
Bitmaps -- queries
Queries:
– 1 attribute
select count(*) from lineitem where l_returnflag =
'N';
– 2 attributes
select count(*) from lineitem where l_returnflag = 'N'
and l_linenumber > 3;
– 3 attributes
select count(*) from lineitem where l_returnflag =
'N' and l_linenumber > 3 and l_linestatus = 'F';
155
Bitmaps
Throughput (Queries/sec)
12
10
8
6
linear scan
bitmap
4
2
0
1
2
3
A
N
R
• Order of magnitude
improvement compared to
scan.
• Bitmaps are best suited for
multiple conditions on
several attributes, each
having a low selectivity.
l_returnflag
O
F
l_linestatus
156
Multidimensional Indexes -- data
Settings:
create table spatial_facts
( a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7
int, a8 int, a9 int, a10 int, geom_a3_a7
mdsys.sdo_geometry );
create index r_spatialfacts on
spatial_facts(geom_a3_a7) indextype is
mdsys.spatial_index;
create bitmap index b2_spatialfacts on
spatial_facts(a3,a7);
– 500000 rows ; cold buffer
– Dual Pentium II (450MHz, 512Kb), 512 Mb RAM, 3x18Gb drives
(10000RPM), Windows 2000.
157
Multidimensional Indexes -queries
Queries:
– Point Queries
select count(*) from fact where a3 = 694014 and a7 = 928878;
select count(*) from spatial_facts where
SDO_RELATE(geom_a3_a7, MDSYS.SDO_GEOMETRY(2001, NULL,
MDSYS.SDO_POINT_TYPE(694014,928878, NULL), NULL, NULL),
'mask=equal querytype=WINDOW') = 'TRUE';
– Range Queries
select count(*) from spatial_facts where
SDO_RELATE(geom_a3_a7, mdsys.sdo_geometry(2003,NULL,NULL,
mdsys.sdo_elem_info_array(1,1003,3),mdsys.sdo_ordinate_array
(10,800000,1000000,1000000)), 'mask=inside
querytype=WINDOW') = 'TRUE';
select count(*) from spatial_facts where a3 > 10 and a3 <
1000000 and a7 > 800000 and a7 < 1000000;
158
Multidimensional Indexes
• Oracle 8i on Windows
2000
• Spatial Extension:
Response Time (sec)
14
12
bitmap - two
attributes
r-tree
10
8
6
4
2
0
point query
range query
– 2-dimensional data
– Spatial functions used
in the query
• R-tree does not
perform well because
of the overhead of
spatial extension.
159
Multidimensional Indexes
R-Tree
SELECT STATEMENT
SORT AGGREGATE
TABLE ACCESS BY INDEX
ROWID SPATIAL_FACTS
DOMAIN INDEX
R_SPATIALFACTS
Bitmaps
SELECT STATEMENT
SORT AGGREGATE
BITMAP CONVERSION
COUNT
BITMAP AND
BITMAP INDEX SINGLE
VALUE B_FACT7
BITMAP INDEX SINGLE
VALUE B_FACT3
160
Materialized Views -- data
Settings:
orders( ordernum, itemnum, quantity, storeid, vendor );
create clustered index i_order on orders(itemnum);
store( storeid, name );
item(itemnum, price);
create clustered index i_item on item(itemnum);
– 1000000 orders, 10000 stores, 400000 items; Cold buffer
– Oracle 9i
– Pentium III (1 GHz, 256 Kb), 1Gb RAM, Adapter 39160 with 2
channels, 3x18Gb drives (10000RPM), Linux Debian 2.4.
161
Materialized Views -- data
Settings:
create materialized view vendorOutstanding
build immediate
refresh complete
enable query rewrite
as
select orders.vendor, sum(orders.quantity*item.price)
from orders,item
where orders.itemnum = item.itemnum
group by orders.vendor;
162
Materialized Views -transactions
Concurrent Transactions:
– Insertions
insert into orders values
(1000350,7825,562,'xxxxxx6944','vendor4');
– Queries
select orders.vendor, sum(orders.quantity*item.price)
from orders,item
where orders.itemnum = item.itemnum
group by orders.vendor;
select * from vendorOutstanding;
163
Response Time (sec)
Materialized Views
14
12
• Graph:
– Oracle9i on Linux
– Total sale by vendor is materialized
10
8
6
4
• Trade-off between query speed-up
and view maintenance:
2
0
Throughput (statements/sec)
Materialized View
No Materialized View
1600
1200
800
400
0
Materialized
Materialized
No
View (fast on View (complete Materialized
commit)
on demand)
View
– The impact of incremental
maintenance on performance is
significant.
– Rebuild maintenance achieves a
good throughput.
– A static data warehouse offers a
good trade-off.
164
Materialized View
Maintenance
• Problem when large
number of views to
maintain.
• The order in which views
are maintained is
important:
– A view can be computed
from an existing view
instead of being recomputed
from the base relations
(total per region can be
computed from total per
nation).
•
•
•
•
Let the views and base tables be
nodes v_i
Let there be an edge from v_1 to
v_2 if it possible to compute the
view v_2 from v_1. Associate the
cost of computing v_2 from v_1 to
this edge.
Compute all pairs shortest path
where the start nodes are the set of
base tables.
The result is an acyclic graph A.
Take a topological sort of A and let
that be the order of view
construction.
165
Materialized View Example
• Detail(storeid, item, qty, price, date)
• Materialized view1(storeid, category, qty,
month)
• Materializedview2(city, category, qty,
month)
• Materializedview3(storeid, category, qty,
year)
166
Approximations -- data
Settings:
– TPC-H schema
– Approximations
insert into approxlineitem
select top 6000 *
from lineitem
where l_linenumber = 4;
insert into approxorders
select O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS,
O_TOTALPRICE, O_ORDERDATE, O_ORDERPRIORITY, O_CLERK,
O_SHIPPRIORITY, O_COMMENT
from orders, approxlineitem
where o_orderkey = l_orderkey;
167
Approximations -- queries
insert into approxsupplier
select distinct S_SUPPKEY,
S_NAME
,
S_ADDRESS,
S_NATIONKEY,
S_PHONE,
S_ACCTBAL,
S_COMMENT
from approxlineitem, supplier
where s_suppkey = l_suppkey;
insert into approxpart
select distinct P_PARTKEY,
P_NAME ,
P_MFGR
,
P_BRAND ,
P_TYPE
,
P_SIZE ,
P_CONTAINER ,
P_RETAILPRICE ,
P_COMMENT
from approxlineitem, part
where p_partkey = l_partkey;
insert into approxpartsupp
select distinct PS_PARTKEY,
PS_SUPPKEY,
PS_AVAILQTY,
PS_SUPPLYCOST,
PS_COMMENT
from partsupp, approxpart, approxsupplier
where ps_partkey = p_partkey and
ps_suppkey = s_suppkey;
insert into approxcustomer
select distinct C_CUSTKEY,
C_NAME
,
C_ADDRESS,
C_NATIONKEY,
C_PHONE
,
C_ACCTBAL,
C_MKTSEGMENT,
C_COMMENT
from customer, approxorders
where o_custkey = c_custkey;
insert into approxregion select * from
region;
insert into approxnation select * from
nation;
168
Approximations -- more queries
Queries:
– Single table query on lineitem
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as
sum_charge,
avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc, count(*) as count_order
from lineitem
where datediff(day, l_shipdate, '1998-12-01') <= '120'
group by l_returnflag, l_linestatus
order by l_returnflag, l_linestatus;
169
Approximations -- still more
Queries:
– 6-way join
select n_name, avg(l_extendedprice * (1 - l_discount)) as
revenue
from customer, orders, lineitem, supplier, nation, region
where c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AFRICA'
and o_orderdate >= '1993-01-01'
and datediff(year, o_orderdate,'1993-01-01') < 1
group by n_name
order by revenue desc;
170
Approximation accuracy
Approximated
Aggregate Values
(% of Base
Aggregate Value)
TPC-H Query Q1: lineitem
60
1% sample
10% sample
40
20
0
-20
1
2
3
4
5
6
7
-40
Aggregate values
Approximated
aggregated values
(% of Base
Aggregate Value)
TPC-H Query Q5: 6 way join
60
40
20
0
-20
-40
1
2
3
4
5
1% sample
10% sample
Groups returned in the query
8
• Good approximation for
query Q1 on lineitem
• The aggregated values
obtained on a query with a
6-way join are
significantly different
from the actual values -for some applications may
still be good enough.
171
Approximation Speedup
Response Time (sec)
• Aqua approximation
on the TPC-H schema
4
3.5
3
2.5
2
1.5
1
0.5
0
– 1% and 10% lineitem
sample propagated.
base relations
1 % sample
10% sample
Q1
Q5
TPC-H Queries
• The query speed-up
obtained with
approximated relations
is significant.
172
Approximating Count Distinct
• Suppose you have one or more columns with
duplicates. You want to know how many distinct
values there are.
• Imagine that you could hash into (0,1). (Simulate
this by a very large integer range).
• Hashing will put duplicates in the same location
and, if done right, will distribute values uniformly.
• How can we use this?
173
Approximating Count Distinct
• Suppose you took the minimum 1000 hash
values.
• Suppose that highest value of the 1000 is f.
• What is a good approximation of the
number of distinct values?
174
Tuning Distributed Applications
• Queries across multiple databases
– Federated Datawarehouse
– IBM’s DataJoiner now integrated at DB2 v7.2
• The source should perform as much work as
possible and return as few data as possible for
processing at the federated server
• Processing data across multiple databases
175
A puzzle
• Two databases X and Y
– X records inventory data to be used for restocking
– Y contains delivery data about shipments
• Improve the speed of shipping by sharing data
– Certain data of X should be postprocessed on Y shortly
after it enters X
– You want to avoid losing data from X and you want to
avoid double-processing data on Y, even in the face of
failures.
176
Two-Phase Commit
commit
X
commit
Source
(coordinator
& participant)
Y
Destination
(participant)
+ Commits are
coordinated between
the source and the
destination
- If one participant fails
then blocking can
occur
- Not all db systems
support prepare-tocommit interface
177
Replication Server
X
Source
Y
Destination
+ Destination within a
few seconds of being
up-to-date
+ Decision support
queries can be asked
on destination db
- Administrator is
needed when network
connection breaks!
178
Staging Tables
X
Staging table
Source
Y
+ No specific
mechanism is
necessary at source or
destination
- Coordination of
transactions on X and
Y
Destination
179
Issues in Solution
• A single thread can invoke a transaction on
X, Y or both, but the thread may fail in the
middle. A new thread will regain state by
looking at the database.
• We want to delete data from the staging
tables when we are finished.
• The tuples in the staging tables will
represent an operation as well as data.
180
States
• A tuple (operation plus data) will start in
state unprocessed, then state processed, and
then deleted.
• The same transaction that processes the
tuple on the destination site also changes its
state. This is important so each tuple’s
operation is done exactly once.
181
Staging Tables
Write to destination
Read from source table
M2
M1
Yunprocessed
Yunprocessed
Yunprocessed
Yunprocessed
Yunprocessed
Yunprocessed
Table S
Table I
Database X
Database Y
Table I
Table S
STEP 1
unprocessed
unprocessed
unprocessed
Database X
Database Y
STEP 2
182
Staging Tables
Update on destination
M3
Yunprocessed
Yunprocessed
Yunprocessed
Delete source; then dest Y
Xact: Update source site
Xact:update destination site
M4
processed
unprocessed
unprocessed
Table S
Table I
Database X
Database Y
Yunprocessed
Yunprocessed
Yunprocessed
Table I
Table S
STEP 3
processed
unprocessed
unprocessed
Database X
Database Y
STEP 4
183
What to Look For
• Transactions are atomic but threads,
remember, are not. So a replacement thread
will look to the database for information.
• In which order should the final step do its
transactions? Should it delete the
unprocessed tuple from X as the first
transaction or as the second?
184
Troubleshooting Techniques
(*)
• Extraction and Analysis of Performance
Indicators
– Consumer-producer chain framework
– Tools
• Query plan monitors
• Performance monitors
• Event monitors
(*) From Alberto Lerner’s chapter
185
A Consumer-Producer Chain of
a DBMS’s Resources
sql
High Level
Consumers
commands
PARSER
OPTIMIZER
EXECUTION
SUBSYSTEM
DISK
SYBSYSTEM
probing
spots
for
indicators
CACHE
MANAGER
MEMORY
CPU
LOCKING
SUBSYSTEM
Intermediate
Resources/
Consumers
LOGGING
SUBSYSTEM
DISK/
CONTROLLER
NETWORK
Primary
Resources
186
Recurrent Patterns of Problems
Effects are not always felt first where the cause is!
• An overloading high-level
consumer
• A poorly parameterized
subsystem
• An overloaded primary
resource
187
A Systematic Approach to Monitoring
Extract indicators to answer the following questions
• Question 1: Are critical
queries being served in
the most efficient manner?
• Question 2: Are
subsystems making
optimal use of resources?
• Question 3: Are there
enough primary resources
available?
188
Investigating High Level Consumers
•
Answer question 1:
“Are critical queries being served in
the most efficient manner?”
1. Identify the critical queries
2. Analyze their access plans
3. Profile their execution
189
Identifying Critical Queries
Critical queries are usually those that:
• Take a long time
• Are frequently executed
Often, a user complaint will tip us off.
190
Event Monitors to Identify
Critical Queries
• If no user complains...
• Capture usage measurements at specific
events (e.g., end of each query) and then sort
by usage
• Less overhead than other type of tools
because indicators are usually by-product of
events monitored
• Typical measures include CPU used, IO
used, locks obtained etc.
191
An example Event Monitor
• CPU indicators
sorted by Oracle’s
Trace Data Viewer
• Similar tools: DB2’s
Event Monitor and
MSSQL’s Server
Profiler
192
An example Plan Explainer
• Access plan according
to MSSQL’s Query
Analyzer
• Similar tools: DB2’s
Visual Explain and
Oracle’s SQL Analyze
Tool
193
Finding Strangeness in Access Plans
What to pay attention to in a plan
• Access paths for each table
• Sorts or intermediary results (join,
group by, distinct, order by)
• Order of operations
• Algorithms used in the operators
194
To Index or not to index?
select c_name, n_name from CUSTOMER join NATION
on c_nationkey=n_nationkey where c_acctbal > 0
Which plan performs best?
(nation_pk is an non-clustered index over n_nationkey,
and similarly for acctbal_ix over c_acctbal)
195
Non-clustering indexes can be trouble
For a low selectivity predicate, each access to
the index generates a random access to the table
– possibly duplicate! It ends up that the number
of pages read from the table is greater than its
size, i.e., a table scan is way better
Table Scan
CPU time
data logical reads
data physical reads
index logical reads
index physical reads
5 sec
143,075 pages
6,777 pages
136,319 pages
7 pages
Index Scan
76 sec
272,618 pages
131,425 pages
273,173 pages
552 pages
196
An example
Performance Monitor (query level)
• Buffer and CPU
consumption for a query
according to DB2’s
Benchmark tool
• Similar tools: MSSQL’s
SET STATISTICS
switch and Oracle’s
SQL Analyze Tool
Statement number: 1
select C_NAME, N_NAME
from DBA.CUSTOMER join DBA.NATION on C_NATIONKEY = N_NATIONKEY
where C_ACCTBAL > 0
Number of rows retrieved is: 136308
Number of rows sent to output is: 0
Elapsed Time is:
76.349 seconds
…
Buffer pool data logical reads
= 272618
Buffer pool data physical reads
= 131425
Buffer pool data writes
=0
Buffer pool index logical reads
= 273173
Buffer pool index physical reads
= 552
Buffer pool index writes
=0
Total buffer pool read time (ms)
= 71352
Total buffer pool write time (ms)
=0
…
Summary of Results
==================
Elapsed
Agent CPU
Rows
Rows
Statement # Time (s)
Time (s)
Fetched Printed
1
76.349
6.670
136308
0
197
An example
Performance Monitor (system level)
• An IO indicator’s
consumption evolution
(qualitative and
quantitative) according
to DB2’s System
Monitor
• Similar tools: Window’s
Performance Monitor
and Oracle’s
Performance Manager
198
Investigating High Level Consumers:
Summary
Find
critical
queries
Investigate
lower levels
no
Found any?
yes
Answer Q1
over them
no
Overconsumption?
yes
Tune problematic
queries
199
Investigating Primary Resources
•
Answer question 3:
“Are there enough primary resources
available for a DBMS to consume?”
•
Primary resources are: CPU, disk &
controllers, memory, and network
Analyze specific OS-level indicators to
discover bottlenecks.
A system-level Performance Monitor is the
right tool here
•
•
200
CPU Consumption Indicators
at the OS Level
100%
CPU
% of
utilization
70%
total usage
system usage
time
Sustained utilization
over 70% should
trigger the alert.
System utilization
shouldn’t be more
than 40%.
DBMS (on a nondedicated machine)
should be getting a
decent time share.
201
Disk Performance Indicators
at the OS Level
Should be
close to zero
Average Queue Size
Disk Transfers
/second
Idle disk with
pending requests?
Check controller
contention.
Also, transfers
should be
balanced among
disks/controllers
New requests
Wait queue
Wait times
should also
be close to
zero
202
Memory Consumption Indicators
at the OS Level
Page faults/time
should be close
to zero. If paging
happens, at least
not DB cache pages.
% of pagefile in
use will tell
you how much
memory is “needed.”
real
memory
virtual
memory
pagefile
203
Investigating Intermediate
Resources/Consumers
•
Answer question 2:
“Are subsystems making optimal use of
resources?”
•
Main subsystems: Cache Manager, Disk
subsystem, Lock subsystem, and Log/Recovery
subsystem
Similarly to Q3, extract and analyze relevant
Performance Indicators
•
204
Cache Manager Performance Indicators
Table
scan
readpage()
Cache
Manager
Pick
victim
strategy
If page is not in the
cache, readpage
(logical) generates an
actual IO (physical).
Fraction of readpages
that did not generate
physical IO should be
90% or more (hit ratio)
Data Pages
Free Page slots
Page reads/
writes
Pages are regularly
saved to disk to make
free space.
# of free slots should
always be > 0
205
Disk Manager Performance Indicators
rows
page
Storage
Hierarchy
(simplified)
extent
file
disk
Displaced rows (moved to
other pages) should be kept
under 5% of rows
Free space fragmentation:
pages with little space should
not be in the free list
Data fragmentation: ideally
files that store DB objects
(table, index) should be in one
or few (<5) contiguous extents
File position: should balance
workload evenly among all
206
disks
Lock Manager Performance Indicators
Lock
request
Object
Lock
List
Lock Type
Locks
pending list
TXN ID
Lock wait time for a
transaction should be a
small fraction of the whole
transaction time.
Number of lock waits
should be a small fraction
of the number of locks on
the lock list.
Deadlocks and timeouts
should seldom happen (no
more then 1% of the
transactions)
207
Investigating Intermediate and Primary
Resources: Summary
Answer Q3
no
Answer Q2
Tune
subsystems
yes
Problematic
subsystems?
no
Problems at
OS level?
yes
Tune low-level
resources
Investigate
upper level
208
Troubleshooting Techniques
• Monitoring a DBMS’s performance should
be based on queries and resources.
– The consumption chain helps distinguish
problems’ causes from their symptoms
– Existing tools help extracting relevant
performance indicators
209
Recall Tuning Principles
• Think globally, fix locally (troubleshoot to see
what matters)
• Partitioning breaks bottlenecks (find parallelism in
processors, controllers, caches, and disks)
• Start-up costs are high; running costs are low
(batch size, cursors)
• Be prepared for trade-offs (unless you can rethink
the queries)
210