Black Box Methods for Inferring Parallel Applications` Properties in

Download Report

Transcript Black Box Methods for Inferring Parallel Applications` Properties in

PhD Final Talk
Black Box Methods for Inferring Parallel
Applications' Properties in Virtual
Environments
Committee:
Prof. Peter Dinda
Prof. Fabian Bustamante
Prof. Yan Chen
Prof. Dongyan Xu (Purdue University)
Ashish Gupta
March 2008
Introduction
Background
• Virtual Machine Distributed Computing
• Virtuoso
– Middleware for autonomic Virtual Machine distributed
computing
– Presents a simple abstraction for distributed computing,
insulating from underlying computational, networking and
middleware complexities
– VMM abstracts computational resources
– VNET abstracts different networking domains into one –
also ideal point for monitoring
– Autonomic Resource Management
3
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Problem Introduction
• Adaptation
– Resources can be heterogeneous (CPU, memory etc)
– If shared, then resources availability can also highly dynamic
– Application demands also change !
• Autonomic computing:
–
–
–
–
What is available ?
What is required ?
How can we effectively match the two ?
One of the major components  What does the
distributed application want ?
– Should work with existing unmodified applications and OS
4
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Thesis Statement
My thesis is that it is feasible to infer various useful demands and behavior of a parallel
application running inside a collection of VMs to a significant degree using a black box
model. To evaluate this thesis, I enumerate and define various demands and types of
behavior that can be inferred, and also design, implement and evaluate ideas and
approaches towards inferring these.
Chapter 2
One of the demands I infer is the communication behavior and the runtime topology of
a parallel application. I also show how to infer some very useful runtime properties of
a parallel application like its runtime performance, its slowdown under external load
Chapter 3
Chapter 4
and its global bottlenecks.
Significantly all of this is done using black-box assumptions and without specific
assumptions about the application or the operating system. I also give evidence of how
automatic black box inference can assist in adapting the application and its resource
usage resulting in improved performance.
Appendices A, B
6
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Black box assumption and impact
Black Box – Can make no assumptions about the implementation,
behavior or internal state of the Guest OS/module beyond its external
interface
• Lowers barrier to adoption of the new inference techniques
• helps deploy my work to legacy applications
• Mainly accomplished by looking at external signals:
•
traffic, host load etc.
• Not tied to Virtuoso
7
•
Other systems like softUDC [91], XenoServer [52], SODA [87], Violin [88], In-VIGO
[13], VioCluster [136] can also benefit from black box application inference
•
Potentially any virtual distributed system
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Components of my dissertation
1
3
5
8
Inference
Virtual Topology and Traffic
Inference Framework
Ball in The Court principles
to compute application
slowdown
2
Black box metrics for
Absolute Performance
4
Global Bottlenecks using
time decomposition
Adaptation
Increasing Application
Performance In Virtual
Environments through Runtime Inference and Adaptation
6
Free Network Measurement
For Adaptive Virtualized
Distributed Computing
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Topics I cover
2
4
A
9
Inference
Virtual Topology and Traffic
Inference Framework
Ball in The Court principles
to compute application
slowdown
3
Black box metrics for
Absolute Performance
5
Global Bottlenecks using
time decomposition
Adaptation
Increasing Application
Performance In Virtual
Environments through Runtime Inference and Adaptation
B
Free Network Measurement
For Adaptive Virtualized
Distributed Computing
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
BSP application model
• Multiple processes executing a common kernel
• Execution alternates between one or more computing phases and
one or more communication phases
• Combination of a schedule of computation and communication
phases  Super-step
• A very popular model for implementing
a large variety of scientific applications
and parallel algorithms
• Original paper: Valiant [167]
10
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns
• Synthetic workload generator developed before. Models a BSP application
• Can execute many different types of topologies common in BSP programs
Some parameters
Topologies
Topology
N-dimensional mesh
Number of processors
N-dimensional torus
Message size
N-dimensional hypercube
# of iterations
Binary reduction tree
Flops per element
All to All
Memory Reads/Writes
per element
11
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns application’s capabilities
3-D Toroid
2-D Mesh
Reduction Tree
12
3-D Hypercube
All-to-All
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
NAS parallel benchmarks
• Developed by NASA [17, 172]
• Set of programs to evaluate performance of
parallel supercomputers
• Representative of CFD applications
• To generate realistic parallel application workloads
• 5 kernel benchmarks: EP, MG, FT, IS, CG
– Five very frequently used numeric methods
13
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Different contributions of my dissertation
2
4
A
14
Inference
Virtual Topology and Traffic
Inference Framework
Ball in The Court principles
to compute application
slowdown
3
Black box metrics for
Absolute Performance
5
Global Bottlenecks using
time decomposition
Adaptation
Increasing Application
Performance In Virtual
Environments through Runtime Inference and Adaptation
B
Free Network Measurement
For Adaptive Virtualized
Distributed Computing
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Collective Inference for Topology:
VTTIF
Goal of VTTIF
?
Low Level Traffic
Monitoring
Application Topology
An online topology inference framework
for a VM environment
16
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Physical Host
VM
VNET daemon
Traffic Analyzer
Rate based
Change detection
VNET
overlay
network
Traffic Matrix
Query Agent
To other VNET
daemons
VM Network Scheduling Agent
VTTIF Architecture
17
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inferred topology
Parallel Integer Sort
18
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Black Box Measures of performance
The problem
• Performance of BSP applications – an important goal
• Lot of work dedicated to improving performance of parallel
applications, e.g.
– Virtuoso
– VioCluster
• How do we measure performance in a black box fashion ?
– The current way in Virtuoso is manual (e.g. Lin et.al. [106])
• Impact
– Would enable superior adaptation algorithms
– Automated evaluation and adaptation cycle
– Generate reports on effectiveness of different
adaptation/scheduling methods
20
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Cost model for BSP applications
• Popular strategy: break super-step into its components:
– computational cost
– Communication cost of the global exchange of the data
– Synchronization cost
Computation
cost
Communication
cost
Number of
super steps
Static model of performance
+
requires detailed application
profiling and access to source code
21
Sync latency
cost
Speed of
computation in
FLOPS
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Super-step approach
• Super-step structure an invariant
– No. of steps depend on parameters and data
Another possible measure of
performance: number of super-steps
executed per second, or the
iteration rate  dynamic metric
Multiple super-steps for dynamic
applications  iteration rate is not
constant
22
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
A new black box metric: Round Trip Iteration Rate (RIR)
• Based solely on communication behavior of the application
• Correlated to the iteration rate
• Indirectly measures number of process interactions  this indicates
progress as synchronization happens at end of a super-step
• Approach: Examined various properties of the traffic trace
• Inter send-packet delay exhibits interesting properties
23
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inter send-packet delay
176 Receive from 175
176 Send to 175
176 Send to 175
176 Send to 175
176 Receive from 175
176 Send to 175
24
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Inter send-packet delay
Traffic trace for Patterns
Message size = 4000 bytes
Computation per iteration: 100 MFlops
Clustering based
on inter-send
delays
Count in cluster
matches actual
iteration rate
output by
application (325)
25
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns: clusters without load
Reported: 569
Actual: 568
26
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Patterns: clusters with external CPU load
Reported: 153
Actual: 142
Actual execution
time ratio: 3.922
Ratio from
reported iteration
rate: 3.72
Within 5%
27
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Why Does Circled Bin Correspond To Iteration Rate?
• Each iteration consists of send, receive and compute phases
Send packets
time
• Number of items in large inter send-packet delay cluster
represent the group that represent an inter-process interaction
• Each inter-process interaction represents progress in
the super-step
28
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Plotting inter send-packet delay for MG
1. No clean clusters for a more complex
application
2. Delays shift towards right on greater
load
29
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the RIR metric
• Previous examples were for Patterns –
•
static performance case, easy clustering
• Applications like MultiGrid from NAS benchmarks  changing
iteration rate
For a given packet time series,
1. Count send pairs whose
inter-packet delay exceeds by
c * RTT
2. Send pair must be
interleaved by one receive
Based on BSP principles
30
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
RIR time series
For dynamic applications, RIR changes
with time
Need a time series for RIR over the
trace
31
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Outputting RIR time series - Workflow
Sniff Packets
Using tcpdump/libpcap for the VM traffic
Send packets satisfying
the two conditions
Send packets that
obey the conditions
Slide a 1 second window
over these send packets
Get a new time series
denoting RIR for each
1 second sliding window
Sliding interval = t
1 sec
Sampling duration = T
Slide by t
i1, i2, i3, i4, i5, …. in
Each number represents number of
iterations for a particular 1 sec window
instance
Average RIR
Derivative Metrics from
the above time series
CDF
Power Spectrum
Super-phase period
32
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Representing Dynamic Performance
• Define a spectrum of metrics for dynamic performance
33
RIRavg
Long term stationary average of RIR time series
RIR-CDF
CDF of RIR time series indicating spread of
iteration rates
RIR-PS
Phase structure, periodicities, application
fingerprinting, statistical scheduling
RIR-PSE
Summary of the periodic behavior for multiple
supersteps
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the stationary Average – sampling issues
• Sliding window resolution (sampling rate)
– Needs to be high enough to capture the any important high frequency
behavior
• Capture duration
– Enough to capture the stationarity of the signal (repetition of all supersteps)
• Assumption:
– iteration dynamics of the application are indeed empirically stationary for the long
run.
– For a dynamic application that consist of repetitive phases, it means capturing
enough of its performance behavior to capture this repetitive element.
•
34
Capturing stationarity
•
Power Spectrum based techniques help us determine the right sampling
rate and duration
•
Details in dissertation
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effectiveness of RIRavg
For MG application, running under different load
conditions (100% and 60% CPU load), predicted
execution time error rates were 13% and 7% respectively
(completely black box)
Value of c = 1.1 here
35
(c*RTT factor)
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
RIR time series graph
A super-step
36
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Can we predict slowdown of application if we put it under load ?
For the IS and MG applications, which application may be hurt
more if one of the processes from each application shares the
physical host with an external computational load?
The impact: we can now determine in advance, the
impact of external load if we must choose one of
these applications to be influenced by the load.
• Scheduling fact: Depending the scheduling algorithm, affect of
load on different RIR regions can be different (Govindan et.al. [60])
• Reason: Scheduling handled differently for CPU bound processes vs
I/O bound processes
• “Providing enough CPU is not enough, an equally important
consideration is to provide the CPU at the right time”
Role of CDF
• RIR-CDF can be used to predict which application will be more affected
by external load (for dynamic applications)
• Very useful when extra load needs to be introduced over existing
applications due to demand
• Scheduling using the CDF:
– From the normal CDF, we predict a slowdown CDF based on a slowdown
mapping
– Slowdown CDF  How will the RIR-CDF of the application look like after
load ?
– Slowdown mapping  What is the slowdown for a particular RIR under a
particular load ?
Slowdown
mapping
38
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Other metrics
• Power spectrum of RIR
– Gives idea about the super-step structure of the application
– Length of Super-step
– A summary of the power spectrum and the significant frequencies serves
as a fingerprint/super-step snapshot
39
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
A sample power spectrum for 4 processes
Consistency
across
processes
40
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Example for MG
Significant
frequency
separation
Exec time =
19.44 seconds
Super-phase
period = 4.1
seconds
(1/0.244)
41
Number of
super-phases
~4
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Recap
• We can deduce performance for a BSP application using black
box means.
• We can predict performance for an application, when imposed with
load
• Entirely based on Based on packet analysis
• New metric called RIR : Round trip Iteration Rate
• More complex metrics for dynamic applications
– RIRavg
– RIR-CDF
– RIR Power Spectrum
42
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Ball In the Court Methods
The problem
• Last chapter: Predicting performance under load (Slowdown CDF)
• This chapter: Can we predict performance of application if an
existing external load was removed ?
• I.e. What is the slowdown of the application under the current load
conditions ?
44
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Ball in the Court
• BSP super-step  computation, communication, sync
• Communication according to a certain schedule and then
computation in between
• Each process acts (computes) on a message before sending out the
next one
• This acting on a message is called “Ball in the Court” (BIC)
• Ball is the responsibility of the process to do some local processing
and then interact with other processes , court is the local host.
BIC delay
45
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Developing a strategy
• Focus on one process
• If just one process is loaded, entire application slowed down because
of one process
• All processes operate in sync, and iterations can only proceed if the
loaded process does its duties
stretched
Under load
46
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Why BIC delay?
• The traffic trace captures the behavior of the process for the
entire duration
• If the process slows down, the trace time length will increase.
• There will be corresponding changes in the trace as well.
What are those changes ?  Seed of the idea
47
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Approach to computing the BIC delay
Let’s compute the time differential for event pairs, for *.176
Some
computation
(6 ms)
Some
computation
(6ms)
48
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
With Load, Some BIC Delays Get Larger
Unloaded process
Loaded process
1. Sending an ack  process’s
responsibility
2. This responsibility was hugely inflated
in the loaded case ( 64 us to 23822 us)
3. BIC delays for other receives are
similar (receive  BIC for other
processes)
49
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
An Algorithm for BIC Delay?
• Can we estimate the total BIC delay from the traffic trace alone ?
• Investigated this question with different approaches.
• Each packet can be of the following types:
–
–
–
–
50
1.
2.
3.
4.
Send Packet (SP)
Send Ack (SA)
Receive Packet (SP)
Receive Ack (RA)
•
We can pair up consecutive packets to form event pairs  SA
followed by SP  SA-SP
•
For 4 event possibilities  we have 16 event pairs
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
BIC events
• * - S* event pairs  can be classified as BIC events
• Intuitively,
– process has either received a packet and is responsible to send the next
one
– Just sent a packet and is responsible to send the next one as well
51
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the BIC delay
• 1. Count all BIC event pairs and sum up their time differentials 
total BIC delay
• 2. BIC delays partitioned by the recipient IP address
– E.g.: RP (175) – SP (173)  delay goes into the bucket of 173
•
52
Some special cases and concerns for counting BIC delays are
explained in the dissertation (Page 140)
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Using BIC Delays to Measure Application Imbalance
• Get a measure of the local computing/processing time for
which its responsible
• Comparing this local time with another unloaded
process  gives an idea of imbalance for the loaded
process
• Gives an idea of how much extra time the loaded process is
taking
•The way we compare  produces different strategies for computing
slowdown
53
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Using BIC Delays to Measure Application Imbalance
Compute BIC delay
Imbalance Algorithm
Slowdown
54
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Computing the no-load runtime using BIC delays
•Global BIC imbalance algorithm:
– Assumes all processes are executing the same BSP kernel
– 1. Compute the BIC delay for the loaded process
– 2. Select a partner process for comparison
• What if the loaded process was put in the shoes of the partner process ?
• Choice of partner process can affect results. Select least loaded for most optimistic
results. Or select one that reflects possible migration conditions
– 3. Compute the “imbalance” by finding the difference in the BIC delays of
these two processes
55
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Global BIC algorithm
56
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation with the IS application
Balanced case
Total runtime = 14.36
seconds
Loaded case
Total runtime = 152.67
seconds
Imbalance between 176 and 175 = 124.49 seconds
Actual difference = 138.41 seconds
Using completely black box means, we could determine,
how slow is the application running with load,
without knowing about the unloaded case
57
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
More sophisticated Imbalance Algorithms
• Process-level BIC imbalance algorithm
•
•
•
A better fit for more dynamic applications like the NAS
benchmarks
Sometimes work done by all processes is not the same
Algorithm takers into account inter-process level interactions
and BIC delays
• Multi-iteration BIC-delay Bias-Aware Imbalance Algorithm
•
BIC algorithm for multi-load situations
• Covered in detail in the dissertation
• Gives a range of slowdown (optimistic and pessimistic values)
• Evaluated with MG and IS benchmarks
58
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Other contributions in the Chapter
• Metrics for computing imbalance amongst processes
– Gives idea of heterogeneity experienced by different processes
– Draws attention to applications that can use lot of help
– Three imbalance metrics
• Two global: Standard Deviation from average, Squared distance
• Process Level imbalance metric
59
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Global bottlenecks and Time
Decomposition
Global Bottleneck
• In the BIC discussion, we could identify the slow processes using
their BIC delays
• This chapter extends the argument to network and I/O resources
• Looks at the problem of Global Time Decomposition
– Where does the application spend its time ?
– At network resource and process level
•
Time decomposition can point out potential sources of imbalance
•
Some metrics output are:
–
–
–
–
62
Cumulative Message Latency
Average Latencies observed
Cumulative message transfer time
Average Bandwidth observed
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Flowchart for diagnosis
Tools
• Traffic based iteration rate
metrics
• RIR time series
• Dynamic Performance
Metrics
• Global BIC imbalance algorithm
• Process-level Bias Aware BIC
imbalance algorithm
• Multi-load BIC imbalance
algorithm
• Imbalance metrics
• Global Time decomposition
techniques for various
network metrics
• Unix tools to find root case
Questions/problems
Chap 3
Measure current
performance
Chap 4
NO
BIC analysis for
local imbalance
• Is application running at expected
speed compared to previous known
instances ?
• Multiple application case: which
application will be slowed down
more from load ?
• Indicates if application is highly
imbalanced,
• Indicates amount of slowdown, and
• Which host to focus on?
Chap 5
Global
Time Decomposition
• Shows time allocation for network
components, and
• Which non-host resources to focus
on ?
Appendices A,B
• Adaptation Mechanisms
• Migration
• Network overlay links
• Routing rules
• Network reservation
63
Eliminate a bottleneck using
adaptation mechanisms
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Different contributions of dissertation
2
4
Inference
Virtual Topology and Traffic
Inference Framework
Ball in The Court principles
to compute application
slowdown
3
Black box metrics for
Absolute Performance
5
Global Bottlenecks using
time decomposition
* An API also defined in the dissertation
A
64
Adaptation
Increasing Application
Performance In Virtual
Environments through Runtime Inference and Adaptation
B
Free Network Measurement
For Adaptive Virtualized
Distributed Computing
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Adaptation
1.
BSP
2.
Transactional ecommerce
1.
Application throughput
1.
VTTIF
Monitoring and inference
2.
Network monitoring
Optimization metric
1.
Application Throughput
1.
Single metric
2.
Combined metric
1.
Overlay topology
2.
Forwarding rules
3.
VM migration
Applications
Application performance
measure
Adaptation algorithm
Adaptation mechanisms
65
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect on BSP Application Throughput
Adapting to External Load Imbalance
External load
removed, but I/O still
dominates, so
topology helps
External load
removed, can drive
higher I/O
For high Compute/Communicate Ratio,
Migration + Topology dramatically improves performance
66
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Free Network Measurement For Adaptive Virtualized Distributed Computing
67
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Closest Related Work
• Wood et. al. [178] - Black box and gray box strategies for
Virtual Machine Migration, NSDI 2007
• Goal: To improve performance of Stand-alone applications running
inside VMs
• Black box and gray box strategies
• Black box  externally visible parameters like CPU load, disk
swapping activity and network usage
• Focuses on stand-alone applications
• I focus on distributed parallel applications  collective properties
Closest Related Work
• Aguilera et. al. [14] – Performance debugging for distributed
systems of blackboxes, OSP 2003
• Goal: To find components that contribute to high latency amongst critical
message paths in a distributed application
• Somewhat analagous to BIC chapter
• Assume constant delay per component
• Unclear how their techniques translate to parallel applications (cyclic
computation)
•No estimate of actual slowdown
• Reynolds et. al. [133] – WAP5: black-box performance debugging
for wide-area systems, WWW 2006
• Focus on wide area systems now, more focus on the network delay aspects
• Major work focuses on identifying causal relationships between messages
• Assume a one to one message casualty  not so in Parallel applications
• Acknowledge that will not work for barrier type communications
Related Publications
•
Gupta, A., Dinda, P. A. Inferring the topology and traffic load of parallel
programs running in a virtual machine environment. In Proceedings of the 10th
Workshop on Job Scheduling Strategies for Parallel Processing (JSPPS 2004 (June
2004).
•
A. Sundararaj, A. Gupta, P. Dinda, Dynamic Topology Adaptation of Virtual
Networks of Virtual Machines, Proceedings of the Seventh Workshop on Languages,
Compilers and Run-time Support for Scalable Systems (LCR 2004)
•
Sundararaj, A. Gupta, P. Dinda, Increasing Distributed Application Performance
in Virtual Environments through Run-time Inference and Adaptation, In
Proceedings of the 14th IEEE International Symposium on High Performance
Distributed Computing (HPDC 2005)
•
Ashish Gupta, Ananth Sundararaj, Marcia Zangrilli, Peter Dinda, Bruce B. Lowekamp,
Free Network Measurement For Adaptive Virtualized Distributed Computing, In
Proceedings of 20th IEEE International Parallel & Distributed Processing Symposium,
2006.
70
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Other Work
•
Robert Schweller et al., Reversible Sketches: Enabling Monitoring and Analysis over High-speed
Data Streams, Journal paper in IEEE/ACM Transactions on Networking, 2006.
•
Robert Schweller et al., Monitoring Flow-level High-speed Data Streams with Reversible Sketches, In
Proceedings of IEEE INFOCOM 2006.
•
Robert Schweller, Ashish Gupta, Elliot Parsons, Yan Chen, Reverse Hashing Algorithms for Sketchbased Change Detection on Highspeed Networks:, In Proceedings of ACM SIGCOMM Internet
Measurement Conference, October 2004, Taormina, Sicily
•
P. Dinda, G. Memik, R. Dick, B. Lin, A. Mallik, A. Gupta, S. Rossoff, The User In Experimental Computer
Systems Research, Proceedings of the Workshop on Experimental Computer Science (ExpCS 2007)
•
A. Gupta, B. Lin, P. Dinda, Measuring And Understanding User Comfort With Resource Borrowing, In
Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
(HPDC 2004), Honolulu, Hawaii
•
Bin Lin, A. Gupta, Peter Dinda, Measuring, Understanding, and Exploiting Direct User Input In
Resource Scheduling, Journal paper in submission
Sketchbased
reverse
hashing
work
User
feedback
work
Accepted Posters
•
Ashish Gupta, Peter Dinda, Fabian Bustamante, Distributed Popularity Indices, Poster Presentation at
ACM SIGCOMM 2005, Philadelphia
•
Ashish Gupta, Manan Sanghi, Peter Dinda, Fabian Bustamante, Magnolia: a novel DHT architecture for
Keyword based search, Poster Presentation at Network System Design and Implementation (NSDI 2005),
Boston
•
Ashish Gupta et al., Free Network Measurement For Adaptive Virtualized Distributed Computing,
Poster Presentation at Supercomputing 2005, Seattle
71
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Components of my dissertation
1
3
5
72
Inference
Virtual Topology and Traffic
Inference Framework
Ball in The Court principles
to compute application
slowdown
2
Black box metrics for
Absolute Performance
4
Global Bottlenecks using
time decomposition
Adaptation
Increasing Application
Performance In Virtual
Environments through Runtime Inference and Adaptation
6
Free Network Measurement
For Adaptive Virtualized
Distributed Computing
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Thank you !
Backup slides
Infrastructure
• Virtuoso cluster, which is an IBM e1350 with 32 compute nodes, each of
which is a dual 2.2 GHz Intel HT Xeon Processors (except the
management node which is faster), 1.5 GB RAM, and 40 GB of disk.
• The machines also have 1 Gbit interfaces and in some evaluation
scenarios in later chapters(Chapters 3, 4, 5), I use 4 of these machines
which form a mini-cluster of their own. These machines are then
connected via a 100 Mbit switch.
• Virtual Machine Monitor Used:
75
•
I use both VMWare and the popular open source Xen VMM in my
evaluations.
•
•
VMWare GSX Server 2.5 (VTTIF evaluation)
Xen version 3.0.3-rc3-1.2798.f in my evaluations in Linux Kernel
release 2.6.18-1.2798.fc6xen. (other chapters)
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Increasing Application Performance In Virtual Environments Through Run-time Inference
and Adaptation
Dynamically adapt unmodified applications on unmodified
operating systems in virtual environments to available resources
The adaptation mechanisms are application independent and
controlled automatically without user or developer help
Demonstrate the feasibility of adaptation at the level of collection
of VMs connected by Virtual Networks
Show that its benefits can be significant for two classes of
applications
Published in LCR 2004, HPDC 2005
76
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Optimization Problem (2/2) Topology + Migration
Informally stated:
Input
•
•
Network traffic load matrix of application
Topology of the network
Output
• Mapping of VMs to hosts
• Overlay topology connecting hosts
• Forwarding rules on the topology
 Such that the application throughput is maximized
The algorithm is described in detail in the paper
77
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation
Applications
• Patterns: A synthetic BSP benchmark
• TPC-W: Transactional web ecommerce
benchmark
Benefits of adaptation (performance speedup)
• Adapting to compute/communicate ratio
• Adapting to external load imbalance
78
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect on BSP Application Throughput
Adapting to Compute/Communicate Ratio
Since I/O dominates,
drop in latency
improves
performance
Less time spent
in I/O, so
migration alone
is enough
Even for small
amount of I/O,
it takes up
significant time
For high Compute/Communicate Ratio,
Migration + Topology dramatically improves performance
79
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
TPCW
Throughput (WIPS) With Image Server Facing External Load
80
No Topology
Topology
No Migration
1.216
1.76
Migration
1.4
2.52
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Free Network Measurement for Adaptive Virtualized Distributed Computing
ADAPTATION : A FOUR STEP PROCESS
1. Automatically infer application demands
(network/CPU)
2. Monitor resource availability
(bw/latency/CPU)
3. Adapt distributed application for better
performance/cost effectiveness
4. Reserve Resources when possible
WREN PAPER, Published in IPDPS 2006
81
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
What is WREN ?
Developed by my colleague Marcia Zangrilli from WM College
What does it do ?
1. Observes incoming/outgoing packets
2. Online analysis to derive latency/bandwidth
information for all host pair connections
3. Answers network queries for any pair of
hosts
82
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
An important Contribution: Problem Formalization
83
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Two approaches to adaptation
84
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
VTTIF in a dynamic topology environment
Parameters:
Smoothing Window  sliding window
duration over which updates
are aggregated
Detection Threshold
Update Rate
85
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Questions
• Are these techniques applicable to other applications ?
• Why do you think they are generic enough even for BSP applications
? How did you test their generality ?
– Because the reasoning for their derivation was based on the BSP model
not on any particular application
– Tested on Patterns and then tried on other more complex applications.
– VTTIF was used by other components like adaptation
•
What modifications would you suggest to the
– Network hardware
– VMM
– Router level ?
•
•
How could you improve time synchronization issues ?
What is the closed work to you and how is it different from whats
out there
– Black box inference NSDI 2007
– Black box debugging from HP labs
86
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
• How about applying the techniques beyond BSP applications ?
– Distributed apps
– P2P
– Web apps
87
•
What are the limitations of these techniques ? When will they not
work ? What are the assumptions ?
•
What is the main research contribution to your work..?
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Bin
Evidence of Adaptation Driven by
Inference
Evidence of Adaptation
How can automated inference assist
automated adaptation without user
intervention and application knowledge ?
Significant work already done in this area by me
• Using multiple mechanisms for adaptation
• Combining resource availability with application demands to match
needs
• Non-scientific apps also investigated (multi-tier web sites)
• More in following slides…
90
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Optimization Problem (2/2) Topology + Migration
Informally stated:
Input
•
•
Network traffic load matrix of application
Topology of the network
Output
•
•
•
Mapping of VMs to hosts
Overlay topology connecting hosts
Forwarding rules on the topology
Such that the application throughput is maximized
The algorithm is described in detail in the paper
91
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect on BSP Application Throughput
Adapting to External Load Imbalance
External load
removed, but I/O still
dominates, so
topology helps
External load
removed, can drive
higher I/O
For high Compute/Communicate Ratio,
Migration + Topology dramatically improves performance
92
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation
Scenario 2 : Large 256 host topology. 32 potential hosts, 8 Virtual Machines
Results for Multi Constraint Cost Function : Bandwidth and Latency
Annealing easy to adapt and finds good mappings compared to heuristic
93
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The New Adaptation Process
94
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Two approaches to adaptation
95
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Evaluation
Scenario 2 : Large 256 host topology. 32 potential hosts, 8 Virtual Machines
Results for Multi Constraint Cost Function : Bandwidth and Latency
Annealing easy to adapt and finds good mappings compared to heuristic
96
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic
Inference Framework
1. Infer the traffic matrix for a
BSP application
Black box metrics for Absolute
Performance
2. Spatial Communication
Topology of the application
Ball in The Court principles to
compute application slowdown
3. Online implementation and low
performance overhead
Global Bottlenecks using time
decomposition
4. Handling dynamic situations
and avoiding oscillations
Increasing Application
Performance In Virtual
Environments through Run-time
Inference and Adaptation
5. Integration with adaptation
system
Free Network Measurement For
Adaptive Virtualized Distributed
Computing
97
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic
Inference Framework
1.
How to evaluate BSP application
performance under black box
conditions ?
Black box metrics for Absolute
Performance
2.
New set of metrics based on traffic
3.
Ball in The Court principles to
compute application slowdown
RIR (Round trip Iteration Rate) 
correlated to the iteration rate
4.
Complex metrics for dynamic
applications
Global Bottlenecks using time
decomposition
Increasing Application
Performance In Virtual
Environments through Run-time
Inference and Adaptation
Free Network Measurement For
Adaptive Virtualized Distributed
Computing
98
1.
RIRavg
2.
RIR-CDF
3.
RIR-PS
5.
Scheduling using these advanced
metrics (Slowdown CDF)
6.
Other applications beyond performance
1.
Fingerprinting
2.
Statistical Scheduling
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic
Inference Framework
Black box metrics for Absolute
Performance
Ball in The Court principles to
compute application slowdown
Global Bottlenecks using time
decomposition
1.
Quantitative estimate of the application
slowdown under external load
2.
Amount of speedup achievable
3.
Can greatly help in
adaptation/scheduling decisions 
predicts the benefit of migration in
advance
4.
Concept of Ball In the Court delays
5.
Using BIC delays to compute imbalance
in application
6.
Algorithms to compute slowdown
Increasing Application
Performance In Virtual
Environments through Run-time
Inference and Adaptation
Free Network Measurement For
Adaptive Virtualized Distributed
Computing
99
7.
1.
Global Imbalance Algorithm
2.
Process Level BIC imbalance algo
3.
Multi-load BIC imbalance algo
Metrics to indicate amount of
imbalance in application as a whole
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic
Inference Framework
Black box metrics for Absolute
Performance
Ball in The Court principles to
compute application slowdown
Global Bottlenecks using time
decomposition
1.
How can we find the global bottleneck
for the application ?
2.
Time Decomposition of execution time
 Helps identify regions of great
imbalance
3.
Includes the network aspects
4.
Many new metrics for latency and
bandwidth time components
5.
Evaluation with various load scenarios
Increasing Application
Performance In Virtual
Environments through Run-time
Inference and Adaptation
Free Network Measurement For
Adaptive Virtualized Distributed
Computing
10
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic
Inference Framework
Black box metrics for Absolute
Performance
Ball in The Court principles to
compute application slowdown
Global Bottlenecks using time
decomposition
Increasing Application
Performance In Virtual
Environments through Run-time
Inference and Adaptation
Free Network Measurement For
Adaptive Virtualized Distributed
Computing
10
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Summary of contributions
Virtual Topology and Traffic
Inference Framework
Black box metrics for Absolute
Performance
Ball in The Court principles to
compute application slowdown
Global Bottlenecks using time
decomposition
Increasing Application
Performance In Virtual
Environments through Run-time
Inference and Adaptation
Free Network Measurement For
Adaptive Virtualized Distributed
Computing
10
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Impact
• Further the goal of autonomic computing
• Black box approach  impacts huge set of applications
• Help in lower the entry barriers for distributed and parallel autonomic
computing
• Shared resources  no new tweaks needed to existing applications
•
10
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Application Model
• BSP Parallel applications
• Two benchmarks being considered
– Patterns
– NAS Benchmarks
10
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Citations
[167] VALIANT, L. A bridging model for parallel computation.
Communications of the ACM 33, 8 (Aug. 1990), 103–111.
Contributions
• Understanding the correlation between network traffic and interprocess interactions
• Defining a black box measure of application performance (RIR)
– Techniques to derive it
11
•
Advanced metrics for Dynamic Applications and their performance
•
Frequency Domain based metrics for greater insight about
application
•
Applications to advanced scheduling and application fingerprinting
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Solving the issues
• Sampling rate:
– Compute at a “high enough” sample rate to capture enough high
frequency dynamics
– Compute power spectrum to calculate total energy in the signal (power
spectrum is magnitude of Fourier Transform)  Integrate the power
spectrum
– Sample at a slightly lower rate and see reduction in energy.
– Repeat until we start losing energy beyond a certain threshold (5%) 
this indicates at this rate we start losing significant part of signal
Gives us 95% cutoff sampling rate
– If actual sampling rate is much higher (5 to 10 times) than this cutoff
point, we are doing alright
11
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Solving the issues
11
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Capturing the right time duration
• Again a Power Spectrum based approach
• Helps us determine if our sample size captures most of the energy in
the signal
• Ideally we will capture as much as resources allow us
•The process:
–
After capturing, we test the amount of energy loss, if we shorten the
sample size.
– Find the cut-off point as last time
– Compare the cut-off sample size and our measured sample size
•
After sampling rate and sampling duration is satisfactory,
RIRavg computed by averaging the time series
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect of load on the RIR time series
1. Expansion effect
2. Basic structure of time series is same
3. Each phase more irregular (more spikes)
4. Captures the time dynamics of the application
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The RIR-CDF metric
• More exhaustive than the RIRavg
• Statistical scheduling
• Application fingerprint
• Performance comparison of runs at more detailed level than RIRavg
Global metric
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
The RIR-CDF metric
CDF indicates the RIR region
application spends most of its time
in
RIR-CDF can be used to predict
which application will be more
affected by external load (for
dynamic applications)
Slowdown
mapping
I show how we can predict slowdown for MG vs IS
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Taking guidance from RIR-CDF : a proof of concept
• CDF indicates the RIR region application spends most of its time in
• Scheduling fact: Depending the scheduling algorithm, affect of load on
different RIR regions is different (Govindan et.al.)
• Reason: Scheduling handled differently for CPU bound processes vs
I/O bound processes
• “Providing enough CPU is not enough, an equally important
consideration is to provide the CPU at the right time”
• Higher RIR  more inter-process interaction  can use more efficient
communication
• Lower RIR  more compute intensive or large messages
• How do applications with different RIRs suffer from external load ?
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Effect of load on different RIR areas
Difference of
performance over
12.46 times at the
extremes under full
load
Applications with
higher iteration
rates always
seem to be more
drastically
affected by
external load
then those with
lower iteration
rates.
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Scheduling using the CDF
The slowdown mapping can be complicated and built over a long history. We
use results from Patterns benchmark to provide a simpler mapping
Input: RIR value
Output: Fractional slowdown
Assumed load = 100%
More realistic
mappings may
take bandwidth
and CPU
utilization as
input
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Scheduling using the CDF
For the IS and MG applications we have seen till now, which
application may be hurt more if one of the processes from
each application shares the physical host with an external
computational load?
The impact: we can now determine in advance, the
impact of external load if we must choose one of
these applications to be influenced by the load.
Map each RIR value to
its slowdown RIR value
from the mapping
Slowdown CDF
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Slowdown CDF
Avg =
0.103
For MG
More time
spent in low
regions
Slowdown CDF
Avg =
0.065
For IS
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Actual slowdown
• MG  2.4 times
• IS  15.52 times
• IS is affected much more from the load as predicted.
• A better mapping function can yield more powerful slowdown
estimates for complex applications.
• Shows how we can make complex decisions for dynamic applications
using RIR – CDF.
• Other applications  Statistical scheduling ideas
– examples
12
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Power Spectrum
•
Other applications of the Power Spectrum
– Statistical Scheduling decisions
– Application categorization
– Component separation
•
More discussion in dissertation
•
Steps in computing the Power Spectrum – numerous measures
taken to get a representative power spectrum (Windowing functions,
zero padding, eliminating the DC bias)
• Evaluation with another application: IS (Integer Sort) from the NAS
benchmarks  similar results
• Implementation details and making it work online
13
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments
Process Level BIC imbalance Algorithm
• A better fit for more dynamic applications like the NAS benchmarks
• Sometimes work done by all processes is not the same
• Algorithm takers into account inter-process level interactions and
BIC delays
• Phenomenon of Load Bias also observed  A heavily loaded process
affects the BIC delay of other unloaded processes too (shifting of
work)
• Covered in detail in the dissertation
• Gives a range of slowdown (optimistic and pessimistic values)
• Evaluation with MG
– Range of slowdown = [28.67s, 50.78s ]
– Actual slowdown = 42.67s
13
Black Box Methods for Inferring Parallel Applications' properties in Virtual Environments