SATNAC presentation - University of Cape Town

Download Report

Transcript SATNAC presentation - University of Cape Town

ANALYZING STORAGE SYSTEM WORKLOADS
Paul G. Sikalinda, Pieter S. Kritzinger
{psikalin, psk}@cs.uct.ac.za,
DNA Research Group
Computer Science Department
University of Cape Town,
and Lourens O. Walters.
[email protected]
Mosaic Software
Rondebosch
Cape Town
Republic of South Africa.
2
Presentation Outline
Introduction
Motivation and Objectives
Storage Systems
Storage System Workloads
The Storage System Workload Analyzed
Statistical Methodology
Workload Analysis Results
Conclusions
Future Work
3
Introduction
The DNA Group specializes, among other things, in using
theory, formal methods and software tools in the:
– specification of …
– design of …
– modelling of …
– building of …
– security of …
– *workload analysis of …
– correctness analysis of …
– performance analysis of …
concurrent computing systems (CCS).
4
Introduction (cont’d)
ANALYZING STORAGE SYSTEM
WORKLOADS
Introduction (cont’d)
PROCESSOR
RQ
ANALYZING STORAGE SYSTEM
WORKLOADS
•Start Address
•Operation Type
•Request Size
•Timestamps
•Etc.
RP
6
Motivation and Objectives
A lot of effort is being spent in improving the I/O subsystem because
it is a bottleneck in current computer systems.
-In design, performance and correctness evaluation of storage
systems the workload modelling is an important component.
Common assumption not correct:
-Uniform distribution of start addresses,
-Exponential inter-arrival times.
Therefore storage system workload analysis should be done to
come up with correct models.
7
Motivation and Objectives (cont’d)
-Designing storage systems.
-Designing I/O optimization techniques (read
caching, write caching, pre-fetching, I/O
parallelism, I/O rescheduling) to improve
performance.
-Understanding application behavior and
requirements.
-Deciding to pool storage system resources
(SSPs).
-Implementing intelligent storage systems.
etc.
8
Motivation and Objectives (cont’d)
Our aim was to analyze storage
system workloads in terms of
(a) inter-arrival times,
(b) sizes and
(c) “seek distances” of I/O requests
and provide statistics for these
parameters to be used to:
(a) derive models for storage system
evaluation and
(b) design optimization techniques
(read caching, I/O parallelism etc. )
9
Storage Systems
Enterprise Storage System (ESS)
Path to host
Host/Bus
adapter
Path to cache
Cache
Path to controller
Array controller
Path to disks
Disk drives
10
Storage Systems (cont’d)
ESS are powerful disk storage systems with the
following capabilities:
-High performance*,
-Large capacity and availability
-Protection against physical drive failure can be
provided using RAID methods.
*But can not still match the processor speeds
because of mechanical processes in the disk
drives.
11
Storage System Workloads
I/O Request Servicing and workload
classification:
-Logical Workloads (File System Workloads)
-Storage System Workloads (Physical I/O Traffic)
Application Software
I/O request
Operating System
File System
I/O request
Disk System
12
Storage System Workloads (cont’d)
Workload Parameters:
-Logical Volume Number
-*Start Address (seek distances)
-*Request Size
-Operation Type (i.e., read or write)
-*Time Stamp (inter-arrival times)
13
The Storage System Workload Analyzed
We analyzed inter-arrival times, request
sizes, and ”seek distances” of I/O requests
from a system running a web search
engine deviation.
Got the I/O trace files from Storage Performance
Council (SPC).
(http://www.storageperformance.org)
14
Statistical Methodology
-Visual Techniques:
-Histogram and
-ECDF graphs.
-Key Data Statistics
-Sample mean,
-Variance and standard deviation,
-Coefficient of skew, kurtosis, and variation,
-Five number data summaries (minimum, lower
quartile, median, upper quartile, maximum).
-Lower and upper outlier limits
15
Results 1: inter-arrival times (µm)
Sample Size
1055448
Five Number Summary
(126, 242, 1695, 4487, 100100)
Sample Mean
2985.761
Sample Variance
12508927
Standard Deviation
3536.796
Coefficient of Variation
1.184554
Coefficient of Skew
2.142186
Coefficient of Kurtosis
8.884555
Upper Outlier
26142
16
Results 1: inter-arrival times
-Highly variable data. Range (126, 100100
microseconds)
-Coefficient of kurtosis shows that the
distribution is heavy tailed.
17
Results 2: Request sizes (bytes)
Sample Size
1055449
Five Number Summary
(512, 8192, 8192, 24580, 1138000)
Sample Mean
15510
Sample Variance
102017528
Standard Deviation
10100.37
Coefficient of Variation
0.6512577
Coefficient of Skew
3.441212
Coefficient of Kurtosis
287.6503
Upper Outlier
106520
18
Results 2: Request sizes
Distribution peaks – 8192 (60%), 16384(10%), 24576
(9%) and 32768 (20%).
Reason:
OS Filesystem Block
- 8192 bytes
19
Results 3: Seek distances (blocks)
Sample Size
1055448
Five Number Summary
(-34926160, -8581248, 6.4, 8580496, 34910700)
Sample Mean
27.95
Sample Variance
170691900000000
Standard Deviation
13064910
Coefficient of Skew
0
Coefficient of Variation
467398.8
Upper Outlier
51482656
Lower Outlier
-51482528
20
Results 3: Seek distances
-The distribution of seek
distances is symmetrical.
21
Conclusions
(1) Analyzing storage system workloads is
necessary to properly model the workloads:
-To model Web inter-arrival time, Weibull, lognormal,
beta, gamma, exponential probability density functions
should be considered.
-To model Web data size and seek distance using
probability mass function is more appropriate.
*We intend to use the models in simulations of ESS.
22
Conclusions (cont’d)
(2) The analysis results are useful when designing optimization
techniques of storage system. E.g.,
-Cache management block size – 8192 bytes.
-I/O rescheduling and background tasking would be ideal for the
workload.
-The storage system handling the workload we analyzed can be
optimized to handle the symmetrical behavior*.
*The results are not broadly applicable.
23
Conclusions (cont’d)
(3) Other conclusions:
-Request sizes influenced by filesystem in use.
-Seek distances are not always uniform distributed.
*In summary, we have provided statistics about the
parameters for the storage system workload that we
analyzed and have shown how we can use them to derive
models and design I/O optimization techniques.
24
Future Work
-Rigorously find a probability density function
matching a given data set of inter-arrival times.
- Analyze the storage system workloads in terms
of other parameters (e.g., logical volume
numbers and operation types)
25
THANK YOU FOR YOUR ATTENTION!
?