VLDB98 - Microsoft Research

Download Report

Transcript VLDB98 - Microsoft Research

The Rebirth of
Database Machines
Dina Bitton
Jim Gray
1
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Outline
•
•
•
•
Active Disks are coming
Disk Tutorial (not presented, but slides in deck)
Disk Arms are important (optimize them)
The Rebirth of Database Machines
2
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Disks of 30 Years Ago
• 10 MB
• Failed every few weeks
• Cost more than 400$
3
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Disk Arrays
• 24 cpus
• 384 disks
• More mips
in the disks
than in the
cpus
4
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Year 2003 Disks
• Big disk (10 $/GB)
–
–
–
–
3”
200 GB
150 kaps (k accesses per second)
30 MBps sequential
• Small disk (20 $/GB)
–
–
–
–
2”
40 GB
100 kaps
20 MBps sequential
• Both running DBMS, Mail, Web,
and OS
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
5
• From CMU
Active Disk
web site
http://www.pdl.cs.cmu.edu/Active/
6
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Research Problem:
When every disk is a super-computer…
And there are thousands of them...
• Who manages data placement?
• Query plans among 1,000 severs?
• How does
– mirroring work?
– backup work?
• Where does my program run?
7
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Relevant University Research
on Active Disks
• Kim Keeton & Dave Patterson @ UC Berkeley
http://www.cs.berkeley.edu/~pattrsn/talks/sigmod98-keynote.ppt
• Erik Riedel & Garth Gibson @ CMU
http://www.pdl.cs.cmu.edu/Active/
• Mike Franklin @ U Maryland
http://www.cs.umd.edu/projects/bdisk
• Anurag Acharya, Mustafa Uysal @ UC SB
http://www.cs.ucsb.edu/TRs/TRCS98-06.html
8
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Outline
•
•
•
•
Active Disks are coming
Disk Tutorial (not presented, but slides in deck)
Disk Arms are important
The Rebirth of Database Machines
9
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Disk Access Time
• Access time = SeekTime
+ RotateTime
+ ReadTime
• Rotate time:
6 ms
3 ms
1 ms
– 5,000 to 10,000 rpm
• ~ 12 to 6 milliseconds per rotation
• ~ 6 to 3 ms rotational latency
12
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Disk Access Time Improves Slowly
• Access time = SeekTime
+ RotateTime
+ ReadTime
• Other useful facts:
6 ms
3 ms
1 ms
8%/y
8%/y
40%/y
– Power rises more than size3 (small is indeed beautiful)
– Small devices are more rugged
– Small devices can use plastics (forces are much smaller)
e.g. bugs fall without breaking anything
13
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Disk Seek Time
• Seek time is ~ Sqrt(distance)
(distance = 1/2 acceleration x time2)
• Specs assume seek is
1/3 of disk
• Short seeks are common.
(over 50% are zero length)
• Typical 1/3 seek time: 6 ms
• 4x improvement in 20 years.
time
14
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Disk Access Ratios Have Changed
• Key metrics:
$/GB
Kaps/GB (KB accesses per second per GB)
SCAN: time to scan the disk
• Scan going from minutes to days
• Disk arms are precious resource
(disk capacity is no longer the precious resource)
Kaps/GB went from 500 to 7 and going to 1
year
1988
1998
2003
Capacity
kaps/
Scan
Scan
GB
$/GB kaps GB Sequential
Random
0.25 20,000 30 1200 2 minutes 20 minutes
18
50
120
7 20 minutes
5 hrs
200
5
200
1
2 hrs
1.2 days
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
20
Stripe For More Bandwidth
• N-stores have N-times the bandwidth
• Works great!
• Supported by most file systems
21
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Mirrors: Replicate Stores for Availability
•
•
•
•
Read one, write all
If one fails, rebuild from survivor
Run scrubber in background to fix faults
N-replicas can give N-times the bandwidth
• UnAvailabity ~
MTTF
MTTF N
1day
50 years
2
 1,000 ,000 years
A Million Years!!!
22
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
RAID5: Parity Saves Storage Space
• Mirrors: 50% storage overhead
– read one, write both
• RAID5: 12% Storage overhead:
– read one, write one plus parity
PARITY
23
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Interesting Fact:
Mirrored Disks Optimize Disk Arms
seek is min seek.
• Doubles write cost
(write both)
– Write time increases because
seek is max seek.
Seek Distance vs Disks
0.8
Fraction of disk surface for seek
• Doubles read bandwidth
Sequential: Read
stagger reads from
each drive (stripe)
Random: Read closest arm
0.7
0.6
0.5
Write Seek
0.4
0.3
Read Seek
0.2
0.1
0
1
2
3
4
5
6
7
8
Number of disks
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
24
If Mix Reads & Writes
Mirror is Better Than Partition
• 2 servers are better than one
• Benefit is better than 2x write cost if reads  writes
Fraction of disk
surface for seek
Normalized Seek Time
Write
2.0
25% Read
1.5
50% read
1.0
75% read
0.5
Read
0.0
1 2 3 4 5 6 7 8
Number of disks
25
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
What if you have LOTS of Disks
• When you have BIG disks (200 GB),
arms are precious, space is cheap.
• If you replicate 1000x
– write seek time asymptotically approaches 1.7x
– read seek time asymptotically approaches zero.
Distance to Seek
1000
Time to Seek
Time to Seek
900
Write
700
600
2
2
Write
1.8
1.8
1.6
1.6
1.4
400
1.2
time
500
300
1
0.8
200
100
Read
0
1
10
100
Write
1.4
time
800
1.2
1
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
1000
0
1
2
3
4
5
6
disks
7
Read 0
8
Read
9
10
200
400
600
disks
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
800
1000
26
Outline
•
•
•
•
Active Disks are coming
Disk Tutorial (not presented, but slides in deck)
Disk Arms are important
The Rebirth of Database Machines
27
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
The Rebirth of Database Machines
Dina Bitton
IDS
Jim Gray
Microsoft
28
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Outline
• Performance hungry databases
• History: life and death of database
machines
• What has changed that can make
database machines work today
• Shared-Nothing Database Machine
• Where is the required bandwidth
• DMP : Shared-Nothing & SharedEverything
29
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Demand for Database
Performance
• Larger Databases:
– marketing data warehouses: TB of historical
data
– daily news broadcasts: 1 TB of searchable
video/audio data
• Large Scans:
Searches require access to large fraction of
database
• Repeated Scans:
DSS queries, Data mining algorithms
30
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Life, Death & Reincarnation
• Database Machines are coming, Database Machines
are coming ... (Hsiao 1979)
• Then there was Britton-Lee, Direct, ICL …
– Teradata builds highly-parallel shared-nothing SQL
server
– many university “paper” designs
• “Database Machines, An Idea whose time has
Passed?” (Boral- DeWitt 1983)
• Then there was MMDBs, Grace, Gamma and more
Teradata
• Then there was Software (Parallel Database Query)
• Next: PDQ + lots of disks with power controllers
31
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
And All Along
Stonebraker’s Opinion:
“The history of DBMS research is littered with innumerable
proposals to construct hardware database machines to
provide high performance operations. In general these have
been proposed by hardware types with a clever solution in
searchof a problem on which it might work.”
Readings in Database Systems, Morgan-Kaufmann
32
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Why Not then, but Yes now
• Too early: small databases on 1 disk
TB databases span thousands disks, need partitioning
• Disk filter designs: addressed only small part of
DBMS requirements
disk controllers are fast computers
• Exotic technologies (bubbles, CCD…) went away
• Special purpose hardware increased design time
and cost
Higher level of integration, VLSI design tools better
• Parallel query processing was not well-understood
Large body of research, successful commercial
implementations
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
33
Parallel Query Processing
[DeWitt-Gray CACM91]
Merge
Sort
Sort
Sort
Sort
Sort
Scan
Scan
Scan
Scan
Scan
Source
Data
Source
Data
Source
Data
Source
Data
Source
Data
Pipelining
Partitioning
data streams flow
from one operator to
the next
tables are partitioned to allow
concurrent processing on
partitions
34
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Data Pathway Contention
[Patterson Sigmod 1998]
• Disk
external I/O bus bottleneck to transfer rate,
cost
• Network
internal I/O bus interface is bottleneck to
delivered bandwidth
• Memory-Processor
processor-memory interface (cache+memory
bus) is bottleneck to delivered bandwidth
35
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
A Shared-Nothing Database Machine
Scalable Interconnect
Processor
&
Memory
Processor
&
Memory
. . .
Processor
&
Memory
Processor
&
Memory
No contention in memory access or parallel disk access
=> “Embarrassingly Parallel” Scan [Patterson]
But: how fast need Interconnect be?
Each processor has own OS, communication protocols,DB instance
Exchange data streams for pipelining ops, for sort, merge
Can’t support M:N mapping between disks & threads
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
36
Share-Everything?
• Need more bandwidth for shipping data
streams than network can provide
• Need M:N mapping from disks to
processors for sort/merge
• Control & synchronization: Data-flow
best to synchronize processors
37
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
Where to Get the Bandwidth?
Level of
Components
Integration
Chip
Board
System
Transistor /
Gate
Chips / Discrete
components
Board /
Interface
Links
Throughput
Connection
lines
30 GB/Sec
(16 64-bit registers 1-8 internal clocks
at 200 MHz)
1. Point-topoint
connections
2. Buses
1. Crossbars
2. Buses
1. 800 MB/Sec
2. 150 MB/Sec
1. 200-500
MB/Sec
Latency
1.Half of transaction
(10 clocks of the
slowest device )
1. 10-50 bus clocks
1. 10 crossbar clocks
2. 10-50 bus clocks
2. 80 MB/Sec
Network
Node (Systems) Fibre Channel
/ Bridges
Ethernet / VIA
Fibre Channel :
100–200 MB/Sec
Sender overhead +
Receiver overhead +
transmission latency +
link availability
38
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
The Data Manipulation Platform
To Host Computer
To other DMP Boards via high-speed switch
I/O interface
adapter
NP 1
NP 2
Bus adapter
...
RFM
NP 16
...
P4
P1
BAM
• Massive Parallel
Operation
data-flow control
• M:N thread-to-disk
RAM
Direct processor to disk access
Direct disk to memory connect
Direct connection
...
1
80
DMP BOARD
39
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
A DSS Query Execution
Plan
Exchange 5
Sort 1
Exchange 4
Select sum(tabX.amount*.08), tabY.region
from tabX,tabY
where tabX.key=tabY.region
Sort 2
group by tabY.region,
order by tabY.region;
Group 1
Group 2
1/10 grouped
Exchange 3
Temp Disks
HJoin
HJoin
Exchange 1
Scan tabX
1
Scan tabX
2
...
Exchange 2
Scan tabX
32
1/3 selected
Scan tabY
1
...
1
2
1/10 joined
HJoin
...
Scan tabY
3
...
32
1
Database Disks
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
3
40
Bandwidth
Requirements
Exchange 5
Sort 1
Sort 2
2.1 MB/s
Exchange 4
Group 1
Group 2
21 MB/s
Exchange 3
Temp Disk
Contentio
n
HJoin
HJoin
HJoin
Exchange 1
Scan tabX
1
Scan tabX
2
...
Exchange 2
Scan tabX
32
Scan tabY
1
...
Database Disks
1
2
210 MB/s
Scan tabY
3
...
32
1
32*20MB/s= 640 MB/s
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
3
41
Conclusion
DMP: shared-nothing and shared-everything
IT ISN’T THAT YOU CAN’T SHARE
IT IS WHERE YOU SHARE
ON A CHIP
ON A BOARD
ON A NETWORK
42
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt