A Case for Flash Memory SSD in Enterprise Database Applications

Download Report

Transcript A Case for Flash Memory SSD in Enterprise Database Applications

A Case for Flash Memory SSD in Enterprise
Database Applications
Authors:
Park,
Sang-Won Lee, Bongki Moon, Chanik
Jae-Myung Kim, Sang-Woo Kim
Published on SIGMOD2008
Presented by Jin Xiong
11/4/2008
Outline
•
•
•
•
•
•
•
Flash memory SSD
DB storage and workload
Experimental settings
Transaction log
MVCC rollback segment
Temporary table spaces
Conclusions
2
Flash memory SSD (1)
• Flash memory SSD
– NAND-type flash memory
– SAMSUNG
– Interface: IDE
3
Flash memory SSD (2)
• Characteristics
– Uniform random access speed
• Purely electronic device, no mechanically moving parts
• Access latency is almost linearly proportional to the
amount of data irrespective of their physical locations in
flash memory.
• One of the key characteristics we can take advantage of
– Erase before overwriting
• Data on SSD cannot be updated in place
• Erase unit is much larger than a sector, 128KB vs. 1KB
• Erase is time consuming, typically 1-2 ms
– Asymmetry of read and write speed
• Write is much slower than read on SSD, 0.4 ms vs 0.2 ms
in this paper
4
Flash memory SSD (3)
• Hardware logic
– Dual channel architecture, 4-way interleaving
– Hide flash programming latency and increase bandwidth
– 128KB SRAM for program code, data and buffer memory
5
Flash memory SSD (4)
• Firmware: Flash translation layer (FTL)
– Address mapping and wear leveling
• Address the issue of limited write cycles of each sector
• Based-on super-blocks: 1MB, 8 erase units, 2 on each
flash chip
• Limit the amount of information required for mapping
• Trends
– Two-fold annual increase in the density
– Original used in mobile computing devices
• PDA’s, MP3 players, mobile phones, digital cameras
– Recently more and more used in portable computers
and enterprise server market
– Tremendous potential as a new storage medium that can
replace magnetic disk and achieve much higher
performance for enterprise database servers
6
DB Storage
• Data structures in DB systems
– Database tables and indexes
• Not within the scope of this paper
– Transaction log
• Whenever a transaction updates a data object, its log
record is created
• Must be kept on stable storage for recoverability and
durability
– Temporary tables
• Used to store temporary data required for performing
operations such as sorts or joins
– Rollback segments
• Used in multiversion concurrent control (MVCC)
7
DB Workload
• Typical transactional database workloads, e.g.
TPC-C
– Little locality and sequentiality
– Many synchronous writes
• Forced writes of log records at commit time
• Must wait until data are written on disk
– Prefetching and write buffering are less effective
– Performance is limited by disk latency rather than disk
bandwidth and capacity
– The latency-bandwidth imbalance of disk seems to be
more serious in the future
– Low latency of SSD
• Improve performance significantly
8
Experimental Settings
• Two machines with identical hardware except
disk
– 1.86 GHz Intel Pentium dual-core processor
– 2GB RAM
• OS: Linux-2.6.22
• Disk
– SSD: Samsung Standard Type, 32GB, PATA (IDE), SLD
NAND
– HDD: Seagate Barracuda, 250 GB, 7200 rpm, SATA
• DB
– A commercial database server
– Used HDD/SSD as a raw device (not through FS)
– Database tables were cached in memory
9
Transaction log
• Synchronous writes
– When a transaction commits, it appends a commit type
log record to the log, and force-writes the log tail to
stable storage
• Response time
– Tresponse = Tcpu + Tread + Twrite + Tcommit
– Tcommit is a significant overhead, waiting disk I/O
– Commit time delay is a serious bottleneck
• Append-only sequential writes
– HDD: no seek delay , avg latency 4.17ms (7200 rpm)
– SSD: do not cause expensive merge or erase operations
if clean blocks are available
10
Transaction log
• Simple SQL transactions
– Multi-threaded concurrent transactions
– TPS on SSD is much higher (12x-4x) than that on HDD
– The gap is shrinking with the increase of the number of
concurrent transactions
– HDD: Disk access latency is the bottleneck, low CPU
utilization
– SSD
• Limited by CPU rather than I/O
• Saturated CPU utilization, no increase in TPS
11
Transaction log
• TPC-B benchmark performance
– A stress test: transaction commit rate is higher than that
of TCP-C
– Suitable for testing the log storage: a large number of
small transactions causing significant forced-write
activities
– The number of concurrent users: 20
– TPS on SSD is 3.5x
– Considerably lower log write latency on SSD
– CPU is the bottleneck for SSD
12
Transaction log
• I/O-bound vs CPU-bound
– SSD: faster CPU improves TPS
• Dual-core: saturated at about 3000 TPS
• Quad-core: saturated at about 4300 TPS
– HDD: almost no difference
13
MVCC rollback segment
• MVCC — Multiversion concurrency control
– An alternative to the traditional concurrency control
mechanism based on lock
– When updating a data object, its before image is written
to a rollback segment, then the new data is applied to it
– When reading a data object, search for the correct
version on rollback segment
– Two advantages
• Minimize performance penalty on concurrent updates of
transactions, because read consistency is supported
without any lock
• Support snapshot isolation and time travel queries
– Cost
• Costly read operation: search through a long list of
versions of a data object if it is updated many times
14
MVCC rollback segment
• Write pattern 1
– Append only, sequential write
– Multiple streams in parallel
– 1MB extent
• Write pattern 2
– In-place writes to a small logical
region
• HDD is expected to perform
poorly
– Disk arm movement each 1MB
– Excessive disk seek
• SSD is expected to perform well
– No additional cost when there are
clean blocks
– Reclamation cost can be
amortized
• Infrequent, every 1MB extent
• Slight performance difference
– SSD: avg 6.8ms/block
15
MVCC rollback segment
• Read pattern
– Clustered, randomly scattered across quite a large
logical address space (1GB)
• Performance
– SSD: 16x faster than HDD
16
Temporary table spaces
• External sort
– Typical algorithm
• Partitions an input data set into
smaller chunks
• Sorts the chunks separately
• Merges them into a single sorted
file
– I/O pattern
• Sequential write followed by
random read
– Performance
• Sequential write: small difference
• Random read: SSD almost 10
times faster
17
Temporary table spaces
• External sort
– Effect of cluster size on sort
performance
• HDD: sort performance is
improved with larger cluster size
• SSD: sort performance is
deteriorated
• Reasons:
– Larger cluster is good for the
first stage, but not good for
merging
– The second stage dominates the
performance
– Effect of buffer cache size on sort
performance
• Performance is improved with
larger buffer size in both cases
18
Temporary table spaces
• Hash join
– Similarity with sort algorithm
• Partition input data set into smaller chunks, and process each chunk
separately
– Opposite I/O pattern
• Random writes followed by sequential reads
– Performance
• SSD is expected to perform poorly in the first stage
• Actual result is unexpected, sequential append-only write in the first
stage
• SSD is 3 times faster than HDD
19
Temporary table spaces
• Sort-merge join
– SSD is 7 times faster
– HDD: sort-merge join is two times slower than hash join
– SSD: sort-merger join is as fast as hash join
20
Conclusions
• Demonstrated that processing I/O requests for
transaction log, rollback and temporary data can
become a serious bottleneck for transaction
processing
• Showed that flash memory SSD can alleviate this
bottleneck drastically
• Due attention should be paid to SSD in all aspect
of DB system design to maximize the benefit from
this new technology
21