Transcript ppt

Hash in a Flash:
Hash Tables for Solid State Devices
Tyler Clemons*
Charu Aggarwal†
S M Faisal*
Shirish Tatikonda ‡
Srinivasan Parthasarathy*
*The Ohio State University. Columbus, Ohio
‡IBM Almaden Research Center. San Jose, California
†IBM T.J. Watson Center. Yorktown Heights, New York
Motivation and Introduction
2

Data is growing at a fast pace
 Scientific

data, Twitter, Facebook, Wikipedia, WWW
Traditional Data Mining and IR algorithms require
random out-of-core data access
 Often
Data is too large to fit in memory thus frequent
random disk access is expected
11/30/2007
Motivation and Introduction (2)
3

Traditional Hard Disk Drives can keep pace with
storage requirements but NOT random access
workloads
 Moving
parts are physical limitations
 Also contribute to rising energy consumption

Flash Devices have emerged as an alternative
 Lack
moving parts
 Faster
Random Access
 Lower
energy usage
 But they have several drawbacks….
11/30/2007
Flash Devices
4

Limited Lifetime
 Supports
 Also

limited number of rewrites
known as erasures or cleans.
Impacts response time
 These
are incurred at the block level.
 Blocks consist of pages. Pages (4kb-8kb) are the smallest
I/O unit
 Poor
Random Write Performance
 Incurs
many erasures and lowers lifetime
 Efficient
 Lowers
sequential write performance
erasures and increases lifetime
11/30/2007
On Flash Devices, DM, and IR
5

Flash Devices provide fast random read access


Common for many IR and DM algorithms and data structures
Hash Tables are common in both DM and IR
Useful for associating keys and values
 Counting Hash Tables associate keys with a frequency

 This
is found in many algorithms that track word frequency
 We will examine one such algorithm common in both DM and IR
(TF-IDF)

They exhibit random access for writes and reads
 Random
Writes are an issue for Flash Devices
11/30/2007
Hash Tables for Flash Devices must:
6


Reduce erasures/cleans and Reduce random writes to SSD
 Batch updates
Maintain reasonable query times
 Data
Structure must not incur unreasonable disk
overhead
 Nor should it require unreasonable memory restraints
11/30/2007
Our approach
7

Our approach makes two key contributions:

Optimize our designs for a counting hash table.

This has not been done by the previous approaches


(A. Anand ’10), (D. Andersen ’09) , (B. Debnath, ’10) , (D.
Zelinalipour-Yatzi ’05)
The Primary Hash Table resides on the Flash Device.

Many designs use the SSD as a cache to the HDD


(D. Andersen ’09) (B. Debnath, ’10)
Anticipate data sets with high random access and throughout
requirements
11/30/2007
Hash Tables for Flash Devices must:
8

Reduce erasures/cleans and Reduce random writes to SSD
 Batch updates
 Create
In Memory Structure
 Target semi-random updates or block level updates

Maintain reasonable query times
 Data
Structure must not incur unreasonable disk
overhead
 Carefully
 Nor
index keys on disk
should it require unreasonable memory restraints
 Memory
requirement is at most fixed parameter
Memory Bounded(MB) Buffering
9
(64,2)
(12,7)
Updates are quickly combined in memory
Updates are Hashed into a bucket in the RAM
When full, batch updates to
corresponding Disk Buckets
If Disk Buckets are full, invoke overflow region
Memory Bounded(MB) Buffering
10

Two way Hash

On-Disk Closed Hash
Table
 Hash
at page level
 Update via block level
 Linear Probing for collisions

In memory Open Hash
table
 Hash
at block level
 Combine updates
 Flush with merge()
operation

Overflow segment
 Closed
Hash table excess
11/30/2007
Can we improve MB?
11

Reduces number of write operations to flash device




Query times are reasonable




Batch Updates only when memory buffer is full
Updates are semi-random
(Key,Value) changes are maintained in memory
Memory buffer search is fast
Relatively fast SSD random access and linear probing (See Paper)
Prefetch pages
MB has disadvantages

Sequential Page Level operations are preferred


Fewer block updates
Limited by the amount of available memory
Think large disk datasets.
 Updates may be numerous

11/30/2007
Introduce an On Disk Buffer
12

Batch updates from memory to disk are page level
Reduce expensive block level writes (time and cleans)
 Increase Sequential writes


Increase buffering capability
Reduce expensive non semi-random Block Updates
 May decrease cleans


Search space increases during queries
Incurred only if inserting and reading concurrently
 However, less erasure time will decrease latency

11/30/2007
On Disk Buffering
13

Change Segment (CS)



stage() operation



Flushes memory to CS
Fast Page Level Operations
merge() operation




Sequential Log Structure
sequential writes
Invoked when CS is full
Combines CS with Data Segment
Less frequent than stage()
What is the structure of the CS?
11/30/2007
Change Segment Structure v1
14
Buckets are assigned specific Change Segment Buckets.
Change Segment Buckets are shared by multiple RAM buffer buckets.
Memory Disk Bounded Buffer (MDB)
15



Associate a CS block to k
data blocks
Semi random writes
Only merge() full CS
blocks


Frequently updated blocks
may incur numerous (k-1)
merge() operations
Query times incur an
additional block read

Packed with unwanted data
11/30/2007
Change Segment Structure v2
16
As buckets are flushed, they are written sequentially to the change segment
one page at a time
MDB-L
17



No Partitions in CS
Allows frequently updated
blocks to have maximum
space
merge() all blocks when CS is
full



Potentially expensive
Very infrequent
Queries are supported by
pointers


As blocks are staged onto the
CS, their pages are recorded
for later retrieval
Prefetch
11/30/2007
Expectations
18

MB will incur more cleans than MDB or MDBL
 Frequent

MDB and MDBL will incur slightly higher query times
 Addition

merge() operation will incur block erasure
of CS
MDB and MDBL will have superior I/O performance
 Most
operations are page level
 Less erasures  lower latency
11/30/2007
Experimental Setup (Application)
19

TF-IDF
 Term
Frequency-Inverse Document Frequency
 Word importance is highest for infrequent words
 Requires a counting hash table
 Useful in many data mining and IR applications
(document classification and search)
11/30/2007
Experimental Setup (DataSets)
20

100,000 Random Wikipedia articles
 136M
keywords
 9.7M entries

MemeTracker (Aug 2009 dump)
 402M
total entries
 17M unique
11/30/2007
Experimental Setup (Method)
21

1M random queries were issued during insertion
phase
 10


random workloads, queries need not be in the table
Measure Query Performance, I/O time, and Cleans
Used three SSD configurations
 One
Single Level Cell (SLC) vs two Multi Level Cell
(MLC) configurations
 MLC
is more popular. Cheaper per GB but less lifetime
 SLC have lower internal error rate, and faster response
rates (See Paper for specific configurations)

DiskSim and Microsoft SSD Plugin
 Used
for benchmarking and fine-tuning our SSD
Results (AVERAGE Query Time)
22
By varying the on memory buffer, as a percentage of the data segment,
the average query time only reduces by fractions of a second. This
suggest the majority of the query time is incurred by the disk.
11/30/2007
Results (AVERAGE Query Time)
23
By varying the on disk buffer, as a percentage of the data segment,
the average query time decreases substantiall for MDBL
This reduction is seen in both datasets.
11/30/2007
MDB requires block reads in the CS.
Results (AVERAGE Query Time)
24
Using the Wiki dataset, we compared SLC with MLC
We experience consistent performance
11/30/2007
Results(AVERAGE I/O)
25
In this experiment, we set the in memory buffer to 5% and the CS to 12.5%
of the primary hash table size
Simulation time is highest for MB because of the block erasures (next slide).
MDBL is faster than MDB because of the increased page level operations
Results(Cleans/Erasures)
26
Cleans are extremely low for both MDB and MDBL relative to MB
This is caused by the page level sequential operations
Queries are effected by cleans because the SSD must allocate
resources to cleaning moving
11/30/2007
Discussion and Conclusion
27

Flash Devices are gaining popularity
Low Latency, High Random Read Performance, Low Energy
 Limited lifetime, poor random write performance


Hash tables are useful data structures in many data
mining and IR algorithms

They exhibit random write patterns
 Challenging

for Flash Devices
We have demonstrated that a proper Hash table for
Flash Devices will have
In-memory buffer for batch memorydisk updates
 On disk data buffer with page level operations

11/30/2007
Future work
28

Our current designs rely on hash functions that use
the mod operator
 Extendible


Hashing
Checkpoint methods for crash recovery
Examine on Real SSD
 Disksim
is great for finetuning and examining statistics
11/30/2007
Questions?