Queries - School of Computing

Download Report

Transcript Queries - School of Computing

CS5226 2002
Operating System &
Database Performance Tuning
Xiaofang Zhou
School of Computing, NUS
Office: S16-08-20
Email: [email protected]
URL: www.itee.uq.edu.au/~zxf
Outline

Part 1: Operating systems and DBMS

Part 2: OS-related tuning
2
Operating System
• Operating system is an
interface between hardware
and other software,
supporting:
• Processes and threads;
hardware
• Paging, buffering and IO
scheduling
• Multi-tasking
• File system
• Other utilities such as timing,
networking and performing
monitoring
3
Scheduling

Process vs thread

Scheduling based on time-slicing, IO, priority etc


The cost of content switching


Different from transaction scheduling
When switch is desirable? And when is not?
The administrator can set priorities to
processes/threads



Case 1: the DBMS runs at a lower priority
Case 2: different transactions run at different priority
Case 3: online transactions with higher priority than offline
transactions
4
Priority Inversion

Let priorities T1 > T2s > T3
Request X
T1
T2s
T3
Lock x
… a solution: priority inheritance
5
Database Buffers
•An application can have its own inmemory buffers (e.g., variables in the
program; cursors);
•A logical read/write will be issued to
the DBMS if the data needs to be
read/written to the DBMS;
Application buffers
DBMS buffers
•A physical read/write is issued by the
DBMS using its systematic page
replacement algorithm. And such a
request is passed to the OS.
•OS may initiate IO operations to
support the virtual memory the DBMS
buffer is built on.
OS buffers
6
Database Buffer Size
DATABASE PROCESSES

Buffer too small, then hit
ratio too small
hit ratio =
(logical acc. - physical acc.) /
(logical acc.)
DATABASE
BUFFER
RAM


Paging
Disk
LOG
DATA
DATA
Buffer too large, paging
Recommended strategy:
monitor hit ratio and
increase buffer size until
hit ratio flattens out. If
there is still paging, then
buy memory.
7
Buffer Size - Data
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
clustered index c on employees(lat); (unused)




10 distinct values of lat and long, 100 distinct values of
hundreds1 and hundreds2
20000000 rows (630 Mb);
Warm Buffer
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb), 4x18Gb drives (10000
RPM), Windows 2000.
8
Buffer Size - Queries
Queries:
 Scan Query
select sum(long) from employees;

Multipoint query
select * from employees where lat = ?;
9
Database Buffer Size
Scan Query

Throughput
(Queries/sec)
0.1
0.08
0.06
0.04

0.02
SQL Server 7 on
Windows 2000
Scan query:

0
0
200
400
600
800
1000
Buffer Size (Mb)
Multipoint Query

Throughput
(Queries/sec)
160
120
Multipoint query:

80
40
LRU (least recently used)
does badly when table spills
to disk as Stonebraker
observed 20 years ago.
Throughput increases with
buffer size until all data is
accessed from RAM.
0
0
200
400
600
800
1000
Buffer Size (Mb)
10
It’s All About $$$

Buffering is about a trade-off between speed
and cost





A (18 GB) disk offers 170 random access for $300
 the access cost A=$1.76 per access per second
RAM  C=$0.5/MB
Page size B = 8 KB
Page p is accessed every I=200 s
Keep page p in memory?



Yes: cost C/1024*B = $0.0039 for 8KB RAM
No: cost A/I = $0.0088
So, p is in memory until its access interval reaches ??? s
11
Multiprogramming Levels

More concurrent users




Better utilization of CPU cycles (and other system
resources)
Risk of excessive page swapping
More lock conflicts
So how many exactly

Depends on transaction profiles


Experiments to find the best value
And this parameter may change when application
patterns change
12
Disk Layout and Access

Larger disk allocation chunks improves
write performance


Setting disk usage factor



At the cost of disk utilisation
Low when expecting updates/inserts
Higher for scan-type of queries
Using prefetching

For non-random accesses
13
Scan Performance - Data
Settings:
lineitem ( L_ORDERKEY, L_PARTKEY , L_SUPPKEY,
L_LINENUMBER
, L_QUANTITY, L_EXTENDEDPRICE ,
L_DISCOUNT, L_TAX , L_RETURNFLAG, L_LINESTATUS ,
L_SHIPDATE, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT ,
L_SHIPMODE , L_COMMENT );




600 000 rows
Lineitem tuples are ~ 160 bytes long
Cold Buffer
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
controller from Adaptec (80Mb), 4x18Gb drives (10000RPM),
Windows 2000.
14
Scan Performance - Queries
Queries:
select avg(l_discount) from lineitem;
15
Usage Factor
Throughput (Trans/sec)

0.2

0.15
0.1
scan
0.05
0
70
80
90
Usage Factor (%)
100

DB2 UDB v7.1 on
Windows 2000
Usage factor is the
percentage of the page
used by tuples and
auxiliary data structures
(the rest is reserved for
future)
Scan throughput
increases with usage
factor.
16
Prefetching
Throughput (Trans/sec)

0.2
DB2 UDB v7.1 on
Windows 2000
0.15

0.1
scan
0.05
0
32Kb
64Kb
128Kb
Prefetching
256Kb
Throughput
increases up to a
certain point when
prefetching size
increases.
17
Summary

In this module, we have covered:


A review of OS from the DBMS perspective
How to optimise OS-related parameters
and options




Thread
Buffer, and
File system
Next: tuning the hardware
18