Parallel Database Systems 101 Jim Gray & Gordon Bell Microsoft
Download
Report
Transcript Parallel Database Systems 101 Jim Gray & Gordon Bell Microsoft
Parallel Database Systems
Instructor: Dr. Yingshu Li
Student: Chunyu Ai
Main Message
Technology trends give
– many processors and storage units
– inexpensively
To analyze large quantities of data
– sequential (regular) access patterns are 100x faster
– parallelism is 1000x faster (trades time for money)
– Relational systems show many parallel algorithms.
Implications of Hardware Trends
Large Disc Farms will be inexpensive
(10k$/TB)
Large RAM databases will be inexpensive
Processors will be inexpensive
So building block will be
a processor
with large RAM
lots of Disc
lots of network bandwidth
(1K$/GB)
1k SPECint
CPU
500 GB Disc
2 GB RAM
5
Why Parallel Access To Data?
At 10 MB/s
1.2 days to scan
1 Terabyte
10 MB/s
1,000 x parallel
1.5 minute SCAN.
1 Terabyte
Parallelism:
divide a big problem
into many smaller ones
to be solved in parallel.
Implication of Hardware Trends:
Clusters
CPU
50 GB Disc
5 GB RAM
Future Servers are CLUSTERS
of processors, discs
Thesis
Many Little will Win over Few Big
Parallel Database Architectures
Parallelism: Performance is the Goal
Goal is to get 'good' performance.
Law 1: parallel system should be
faster than serial system
Law 2: parallel system should give
near-linear scaleup or
near-linear speedup or
both.
Speed-Up and Scale-Up
Speedup: a fixed-sized problem executing on a small system is given to a
system which is N-times larger.
– Measured by:
speedup = small system elapsed time
large system elapsed time
– Speedup is linear if equation equals N.
Scaleup: increase the size of both the problem and the system
– N-times larger system used to perform N-times larger job
– Measured by:
scaleup = small system small problem elapsed time
big system big problem elapsed time
– Scale up is linear if equation equals 1.
Kinds of Parallel Execution
Pipeline
Partition
Any
Sequential
Program
Sequential
Sequential
Any
Sequential
Sequential
Program
outputs split N ways
inputs merge M ways
Any
Sequential
Program
Any
Sequential
Sequential
Program
Processors & Discs
Startup:
Linearity
Processors & Discs
A Bad Speedup Curve
3-Factors
Skew
ity
r
a
e
n
Li
A Bad Speedup Curve
No Parallelism
Benefit
Interference
The Good
Speedup Curve
Startup
Speedup = OldTime
NewTime
The Drawbacks of Parallelism
Processors & Discs
Creating processes
Opening files
Optimization
Interference: Device (cpu, disc, bus)
logical (lock, hotspot, server, log,...)
Communication: send data among nodes
Skew:
If tasks get very small, variance > service time
Parallelism: Speedup & Scaleup
Speedup:
Same Job,
More Hardware
Less time
100GB
100GB
100GB
1 TB
Scaleup:
Bigger Job,
More Hardware
Same time
Transaction
Scaleup:
more clients/servers
Same response time
1 k clients
10 k clients
100GB
1 TB
Server
Server
Database Systems “Hide” Parallelism
Automate system management via tools
• data placement
• data organization (indexing)
• periodic tasks (dump / recover / reorganize)
Automatic fault tolerance
• duplex & failover
• transactions
Automatic parallelism
• among transactions (locking)
• within a transaction (parallel execution)14
Automatic Data Partitioning
Split a SQL table to subset of nodes & disks
Partition within set:
Range
Hash
A...E F...J K...N O...S T...Z
Good for equi-joins,
range queries
group-by
A...E F...J K...N O...S T...Z
Good for equi-joins
Round Robin
A...E F...J K...N O...S T...Z
Good to spread load
Shared disk and memory less sensitive to partitioning,
Shared nothing benefits from "good" partitioning
Index Partitioning
Hash indices partition by hash
0...9
10..19
20..29 30..39 40..•
B-tree indices partition as a forest of trees.
One tree per range
A..C
Primary index clusters data
D..F
G...M
N...R
S..Z
Partitioned Execution
Spreads computation and IO among processors
Count
Count
Count
Count
Count
Count
A Table
A...E
F...J
K...N
O...S
T...Z
Partitioned data gives
NATURAL parallelism
N x M way Parallelism
Merge
Merge
Merge
Sort
Sort
Sort
Sort
Sort
Join
Join
Join
Join
Join
A...E
F...J
K...N
O...S
T...Z
N inputs, M outputs, no bottlenecks.
Partitioned Data
Partitioned and Pipelined Data Flows
Blocking Operators = Short Pipelines
An operator is blocking,
if it does not produce any output,
until it has consumed all its input
Tape
File
SQL Table
Process
Scan
Examples:
Sort,
Aggregates,
Hash-Join (reads all of one operand)
Sort Runs
Database Load
Template has
three blocked
phases
Merge Runs
Table Insert
Sort Runs
Merge Runs
Index Insert
Sort Runs
Merge Runs
Index Insert
Sort Runs
Merge Runs
Index Insert
Blocking operators kill pipeline parallelism
Make partition parallelism all the more important.
SQL Table
Index 1
Index 2
Index 3
Parallel Aggregates
For aggregate function, need a decomposition strategy:
count(S) = count(s(i)), ditto for sum()
avg(S) = ( sum(s(i))) / count(s(i))
and so on...
For groups,
sub-aggregate groups close to the source
drop sub-aggregates into a hash river.
Count
Count
Count
Count
Count
Count
A Table
A...E
F...J
K...N
O...S
T...Z
Parallel Sort
River is range or hash partitioned
Merge
runs
Sub-sorts
generate
runs
M inputs N outputs
Disk and merge
not needed if
sort fits in
Memory
Range or Hash Partition River
Scan
or
other source
Sort is benchmark from hell for shared nothing machines
net traffic = disk bandwidth, no data filtering at the source
Hash Join: Combining Two Tables
Right Table
Left
Table
Hash smaller table into N buckets (hope N=1)
If N=1 read larger table, hash to smaller
Else, hash outer to disk then
Hash
bucket-by-bucket hash join.
Buckets
Purely sequential data behavior
Always beats sort-merge and nested
unless data is clustered.
Good for equi, outer, exclusion join
Lots of papers,
Hash reduces skew
Q&A
Thank you!
23