Profiling - University of Wisconsin–Madison
Download
Report
Transcript Profiling - University of Wisconsin–Madison
Profiling Grid Data Transfer
Protocols and Servers
George Kola, Tevfik Kosar and Miron Livny
University of Wisconsin-Madison
USA
Motivation
Scientific experiments are
generating large amounts
of data
Education research &
commercial videos are not
far behind
Data may be generated and
stored at multiple sites
How to efficiently store
and process this data ?
Applic
ation
First
Data
Data
Volume
(TB/yr)
Users
SDSS
1999
10
100s
LIGO
2002
250
100s
ATLAS
/CMS
2005
5,000
1000s
Source: GriPhyN Proposal, 2000
WCER
2004
500+
100s
2/33
Motivation
Grid enables large scale computation
Problems
Data intensive applications have suboptimal
performance
Scaling up creates problems
Storage servers thrash and crash
Users want to reduce failure rate and improve
throughput
3/33
Profiling Protocols and Servers
Profiling is a first step
Enables us to understand how time is spent
Gives valuable insights
Helps
computer architects add processor features
OS designers add OS features
middleware developers to optimize the middleware
application designers design adaptive applications
4/33
Profiling
We (middleware designers) are aiming for
automated tuning
Tune protocol parameters, concurrency level
Depends on dynamic state of network, storage server
We are developing low overhead online analysis
Detailed Offline + Online analysis would enable
automated tuning
5/33
Profiling
Requirements
Should not alter system characteristics
Full system profile
Low overhead
Used OProfile
Based on Digital Continuous Profiling Infrastructure
Kernel profiling
No instrumentation
Low overhead/tunable overhead
6/33
Profiling Setup
Two server machines
Moderate server: 1660 MHzAthlon XP CPU with 512
MB RAM
Powerful server: dual Pentium 4 Xeon 2.4 GHz CPU
with 1 GB RAM.
Client Machines were more powerful – dual Xeons
To isolate server performance
100 Mbps network connectivity
Linux kernel 2.4.20, GridFTP server 2.4.3 , NeST
prerelease
7/33
GridFTP Profile
45.0
Percentage of CPU Time
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
Read Rate = 6.45 MBPS, Write Rate = 7.83 MBPS
=>Writes to server faster than reads from it
8/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
Writes to the network more expensive than reads
=> Interrupt coalescing
9/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
IDE reads more expensive than writes
10/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
File system writes costlier than reads
=> Need to allocate disk blocks
11/33
GridFTP Profile
45.0
40.0
Percentage of CPU Time
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Globus
Read From GridFTP
Oprofile
IDE
File I/O
Rest of
Kernel
Write To GridFTP
More overhead for writes because of higher transfer rate
12/33
GridFTP Profile Summary
Writes to the network more expensive than reads
Interrupt coalescing
DMA would help
IDE reads more expensive than writes
Tuning the disk elevator algorithm would help
Writing to file system is costlier than reading
Need to allocate disk blocks
Larger block size would help
13/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
Read Rate = 7.69 MBPS, Write Rate = 5.5 MBPS
14/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
Similar trend as GridFTP
15/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
More overhead for reads because of higher
transfer rate
16/33
NeST Profile
60.0
Percentage of CPU Time
50.0
40.0
30.0
20.0
10.0
0.0
Idle
Ethernet Interrupt
Driver Handling
Libc
Read From NeST
NeST
Oprofile
IDE
File I/O
Rest of
Kernel
Write To NeST
Meta data updates (space allocation) makes
NeST writes more expensive
17/33
GridFTP versus NeST
GridFTP
NeST
Read Rate = 6.45 MBPS, write Rate = 7.83 MBPS
Read Rate = 7.69 MBPS, write Rate = 5.5 MBPS
GridFTP is 16% slower on reads
GridFTP I/O block size 1 MB (NeST 64 KB)
Non-overlap of disk I/O & network I/O
NeST is 30% slower on writes
Lots (space reservation/allocation)
18/33
Effect of Protocol Parameters
Different tunable parameters
I/O block size
TCP buffer size
Number of parallel streams
Number of concurrent transfers
19/33
Read Transfer Rate
20/33
Server CPU Load on Read
21/33
Write Transfer Rate
22/33
Server CPU Load on Write
23/33
Transfer Rate and CPU Load
24/33
Server CPU Load and L2 DTLB misses
25/33
L2 DTLB Misses
Parallelism triggers the kernel to use larger page size
=> lower DTLB miss
26/33
Profiles on powerful server
Next set of graphs were obtained using the
powerful server
27/33
Parallel Streams versus Concurrency
28/33
Effect of File Size (Local Area)
29/33
Transfer Rate versus Parallelism in
Short Latency (10 ms) Wide Area
30/33
Server CPU Utilization
31/33
Conclusion
Full system profile gives valuable insights
Larger I/O block size may lower transfer rate
Parallelism may reduce CPU load
Network, disk I/O not overlapped
May cause kernel to use larger page size
Processor feature for variable sized pages would be useful
Operating system support for variable page size would be
useful
Concurrency improves throughput at increased server load
32/33
Questions
Contact
[email protected]
www.cs.wisc.edu/condor/publications.html
33/33