Managing - EuroSys 2009
Download
Report
Transcript Managing - EuroSys 2009
Memory Resource Allocation for File
System Prefetching
-- From a Supply Chain Management Perspective
Zhe Zhang (NCSU), Amit Kulkarni (NCSU)
Xiaosong Ma (NCSU/ORNL), Yuanyuan Zhou (UIUC)
1
7/18/2015
Aggressive prefetching: an idea
whose time has come*
Enlarging processor-I/O gap
Processing power doubling every 18 to 24 months
Disparity between growth of disk latency and throughput
Latency improving 10% per year while throughput improving
40% per year [Hennessy 03]
Large memory cache sizes
Usually 0.05% ~ 0.2% of storage capacity [Hsu 04]
* [Papathanasiou 05]
2
7/18/2015
… and whose challenges follow
Systems facing large number of concurrent requests
#1
Facebook
How to manage file systems’ memory
resource for aggressive prefetching?
#10
Servers handling large number of clients
Jaguar @
Oak Ridge 11,000 Compute
nodes
National Lab
…
…
…
3
Lustre
72 I/O nodes
18 DDN S2A9500 couplets
7/18/2015
All streams are not created equal
MP3 : 128 kbps
Youtube : 200 kbps
Youtube HQ : 900 kbps
Allocating memory resource according to access rate?
Related work
Access pattern detection: rate not detected [Lee 87, Li 04, Soundararajan 08]
Aggressiveness control: based on sequentialty [Patterson 95, Kaplan 02, Li 05]
Multi-stream prefetching: rate not sufficient utilized [Cao 96, Tomkins 97, Gill 07]
4
7/18/2015
Similar story in grocery stores!
……
Milk : 200 per day
……
Beer : 80 per day
$300 Wine : 1 per year
Allocating storage resource according to consumption rate?
Studied in Supply Chain Management (SCM)
Demand rate measurement/analysis/prediction
Dated back to first wars
Yet active
Wal-Mart: $24M on satellite network for instant inventory control
Dell: aiming at “zero inventory”
5
Our contributions
A mapping between data prefetching and SCM problems
Novel rate-aware multi-stream prefetching techniques
based on SCM heuristics
Implementation and performance evaluation
Modified Linux 2.6.18 kernel
Extensive experiments with modern server and multiple
workloads
Coordinated multi-level prefetching
Based on multi-echelon inventory control
Extending application access pattern to lower level
Evaluation with combinations of state-of-the-art single level
algorithms
6
7/18/2015
Outline
Motivation
Background and problem mapping
Algorithms
Performance evaluation
Conclusions
7
7/18/2015
Background – Inventory cycles
Inventory theory
Task: manage inventory for goods
Goal: satisfy customer demands
Inventory
level
cycle
inventory
order
quantity
average
demand
fast
dem
-and
slow
demand
reorder
point
safety
inventory
8
lead
time
Time
7/18/2015
Background – Prefetching basics
Memory cache
trigger distance
prefetch degree
Disk
9
7/18/2015
Background – Prefetching cycles
Prefetching techniques:
Task: manage the cache for data blocks
Goal: satisfy application requests
order
prefetch
quantity
degree
Prefetched
blocks
cycle
inventory
average
demand
fast
dem
-and
slow
demand
Tc
safety
inventory
10
trigger
distance
Ts
disklead
access
time
reorder
point
Time
7/18/2015
Challenges in mapping
Data requests Customer demands
Data blocks are unique
“Linear sequence of blocks” in detected streams
GroceryStore::getMilk();
FileSystem::getNextBlock();
FileSystem::getBlock(Position p);
Prefetched data blocks
Inventory
Accessed data blocks remain in the cache
But as “second class citizens” [Gill 05, Li 05]
11
7/18/2015
Outline
Motivation
Background and problem mapping
Algorithms
Performance evaluation
Conclusions
12
7/18/2015
Performance metrics and objectives
Prefetching optimization objective: improve cache hit rate
Dynamically adjust
Trigger distance
Prefetch degree
SCM optimization objective: improve fill rate
Fraction of demand satisfied from inventory
ESC
FR 1
Q
ESC: Expected Shortage per Cycle
Q: order quantity
13
7/18/2015
Rate aware prefetching algorithms
prefetch
degree
Prefetched
blocks
cycle
inventory
average
demand
slow
demand
fast
demand
Tc
safety
inventory
reorder
point
trigger
distance
Ts
Time
Task: calculating Tc and Ts
Tc: lead time × average consumption rate
Ts: based on estimation of uncertainty
14
7/18/2015
Algorithm1: Equal Time Supplies (ETS)
Safety inventory for all goods set to the same time supply
(e.g., amount of goods consumed in 5 days)
With “standard” distribution shapes, uncertainty is
proportional to the mean value
Ts: set to be proportional to average data access rate
trigger distance of streami
average rate of streami
Ri
Ti Ti Ti
Ttotal
Ri
S
C
1in
total allowed trigger distance
15
7/18/2015
Algorithm2: Equal Safety Factors (ESF)
Safety inventory set to maintain the same safety factor
across all goods
safety_ factor
safety_ inventory
standard deviation
Ts: set to be proportional to standard deviation of access
rate
i
C
Ti Ti Ti (Ri lead_ tim e)
(Ttotal Ti )
i
1in
C
S
1in
Implementation challenges
Measurement and calculation overhead
Limited floating point calculation in kernel
16
7/18/2015
Outline
Motivation
Background and problem mapping
Algorithms
Performance evaluation
Conclusions
17
7/18/2015
Comparing with Linux native prefetching
Linux prefetching algorithm (kernel 2.6.18)
Trigger distance (T) = Prefetch degree (P)
Doubling T and P for each sequential hit
Upper bounds:
T = P = 32 (pages)
32-32
Implementation of SCM-based algorithms
Principle: maintaining same memory consumption as original
algorithm
[T (T P)]
P
memory_consumption
T
2
2
Default parameters
Tdefault = 24, Pdefault = 48
18
24-48
7/18/2015
Experimental setup
Platform
Linux server
2.33GHz quad-core CPU, 16GB memory
Comparing 32-32, 24-48, ETS and ESF algorithms
Workloads
Synthetic benchmarks
Linux file transfer applications
HTTP web server workload
Server benchmarks
SPC2-VOD-like (sequential)
TPC-H (random)
19
7/18/2015
Two streams with different rates
Rate of stream 1 fixed at 1000 pages / second
Response time (μs)
120
100
80
32-32
24-48
ETS
60
40
20
0
3000
5000
7000
Rate of fast stream (pages/second )
Average response time
ETS: 19%~25% improvement over
32-32
20
Number of missies per
cycle
Rate of stream 2 varying b/w 3000 to 7000 pages / second
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
32-32
24-48
ETS
3000
5000
7000
Rate of fast stream (pages/second )
# of cache misses per prefetch
cycle (ESC)
ETS: same # of cycles as 24-48 and
similar ESC as 32-32
7/18/2015
Two streams with different deviations
SD of stream 1 fixed at square root of rate
45
40
35
30
25
20
15
10
5
0
120
ETS
Response time (μs)
Response time (μs)
SD of stream 2 varying b/w 3 to 7 times of the average rate
100
ESF
3 x mean
5 x mean
7 x mean
SD of unstable stream
Average response time
ESF: 20%~35% improvement over
ETS
21
80
60
stable stream w/ ETS
stable stream w/ ESF
unstable stream w/ ETS
unstable stream w/ ESF
40
20
0
3 x mean
5 x mean
7 x mean
SD of unstable stream
Response time of individual streams
ESF: large improvement for unstable
stream, small degradation for stable
stream
7/18/2015
Throughput of server benchmarks
SPC2-VOD-like (sequential streams)
TPC-H (random accesses)
Aggregate
throughput
Memory usage
32-32
ETS
32-32
ETS
8 8 12 12 1616 2020 2424
2828
3232
Number
of sequential
streams
Number
of sequential
streams
throughput
Sequential+random apps. memory
consumption
ETS: 6%~53% improvement over 32-32
22
Throughput (pages/second)
unaccessed
amount of
Avg.
(pages/second)
Throughput
data (pages)
9000
1600
8000
1400
7000
1200
6000
1000
5000
800
4000
600
3000
400
2000
200
1000
0 0
450
400
350
300
250
200
150
100
50
0
TPC-H throughput
32-32
ETS
8
12
16
20
24
28
32
Number of sequential streams
Random application throughput
ETS: never worth than 32-32; 2.5%
average improvement
7/18/2015
Conclusions and future work
Observations
File blocks can be managed as apples!
Simple approaches such as ETS seems to perform well
Future work
Awareness of both access rate and delivery time
Adjusting the prefetch degree
Acknowledgements
Anonymous reviewers
Our shepherd: George Candea
Our sponsors: NSF and DOE Office of Science
23
7/18/2015