amsel_poster_FINAL

Download Report

Transcript amsel_poster_FINAL

Load Balancing in File Systems
Nadine Amsel
Dr. Carlos Maltzahn
Storage Systems Research Center (SSRC) at UCSC
http://ssrc.cse.ucsc.edu
Introduction
Results
• A new breed of distributed, petabyte-scale file systems
uses many Object Storage Devices (OSDs)
Will more hardware prevent overload?
What is the length of each period of overload time?
• Search in such file systems requires OSDs to store
large indices and cope with ever-changing hot spots due
to a diverse query stream
• What is the extent of query hot spots? How long do
they persist?
Most overload periods last only a few minutes. The
distribution of period lengths follows a heavy-tailed
power law so the variance is infinite (there is no stable
average).
Methods
• Time-stamped queries by 500,000 AOL users over 3
months used to determine overload patterns
• Each term in a query maps to one OSD (i.e.
assuming a term-distributed index)
Overload occurs all the time. Just one overloaded OSD
can slow down the whole storage system.
• Two questions to answer:
1. How many OSDs are overloaded?
2. How long does an OSD stay overloaded?
The median overload length is ~4 minutes for 128 OSDs
and ~2 minutes for 1K OSDs. In 99% of all cases, the
overload period lasts no longer than an hour.
• OSD address determined by taking the hash of the
term and extracting the last n bits (where n is
determined by the number of OSDs)
• An OSD’s load is determined by the number of queries
it receives per minute
• Query traces analyzed using different numbers of
OSDs and overload thresholds:
• 128, 1K, and 64K OSDs
• 10, 30, and 50 queries/minute overload thresholds
128
1K
64K
Median
18
3
2
Mean
15
2
1
The query workload leads to overload even if distributed
over a large number of nodes. Increasing the number of
nodes is not a solution.
Conclusion
• Index query workloads cannot be effectively addressed
by increasing the number of OSDs.
• Load-balancing mechanism needs to adapt on a minuteby-minute basis and any mechanism that takes longer
than an hour to adapt will not be able to keep up with
99% of the workload changes.
This work was completed as part of UCSC's SURF-IT summer undergraduate research program, an NSD CISE REU Site. This material is based upon work supported by the National Science Foundation under Grant No. CCF-0552688.