MATE - Paradyn
Download
Report
Transcript MATE - Paradyn
Paradyn/Condor Week 2004
April 2004
MATE:
Monitoring, Analysis and Tuning
Environment
Anna Morajko, Tomàs Margalef and Emilio Luque
Universitat Autònoma de Barcelona
Paradyn/Condor Week 2004
Content
1.
Introduction
2.
Dynamic Performance Tuning
3.
MATE
4.
Tuning Techniques
5.
Conclusions and future work
Paradyn/Condor Week 2004
2
Introduction
Application performance
• Demand of high performance computation
• The main goal of parallel/distributed applications: solve a
considered problem in the possible fastest way
• Performance is one of the most important issues
• Developers must optimize application performance to
provide efficient and useful applications
Paradyn/Condor Week 2004
3
Introduction
Application performance optimization
Application development
Steps:
• monitoring,
• analysis,
• tuning
Source
Instrumentation
Application
Modifications
Monitored execution
Performance data
Monitoring
Tuning
Measurements
Changes
Bottlenecks
Source code relation
Performance
analysis
Paradyn/Condor Week 2004
Solutions
4
Introduction
Application performance optimization
• Difficulties in finding bottlenecks and determining their solutions
for parallel/distributed applications
– Many tasks that cooperate with each other
• High degree of expertise
• Application behavior may change on input data or environment
• Difficult task especially for non-expert users
Paradyn/Condor Week 2004
5
Introduction
Our goals
• Investigate if it is possible to optimize performance of
parallel/distributed applications dynamically without user
intervention
• Investigate the applicability of dynamic tuning
• Create a tool that is able to dynamically optimize applications:
–
–
–
–
automatically improve application performance
improve the application execution during run time
tune without recompiling and rerunning
adapt application to existing conditions
• Practically evaluate profitability of dynamic tuning
Paradyn/Condor Week 2004
6
Introduction
Dynamic automatic tuning
Application development
Application
Source
Execution
Performance data
Modifications
Instrumentation
Monitoring
Events
Tuning
Performance
analysis
Problem /
Solution
Paradyn/Condor Week 2004
7
Content
1.
Introduction
2.
Dynamic Performance Tuning
3.
MATE
4.
Tuning Techniques
5.
Conclusions and future work
Paradyn/Condor Week 2004
8
Dynamic Performance Tuning
Requirements
• No user intervention
• No source recompilation
• Performance analysis on the fly
– Global analysis
– Decisions taken in a short time
– Not complex analysis and modifications
• Run time monitoring
• Run time tuning
– Modifications performed carefully
• Parallel/distributed application control
• Low intrusion
Paradyn/Condor Week 2004
9
Dynamic Performance Tuning
Key question
What can be tuned in an application?
Application knowledge
Limited information about the application
Tuning layers
Approaches to tuning
Paradyn/Condor Week 2004
10
Dynamic Performance Tuning
Tuning layers
•
•
•
•
Application specific code
Standard and custom libraries (API+code)
Operating system libraries (API+code)
Hardware
Application
code
API
Libraries
code
OS API
Operating System
kernel
Hardware
Paradyn/Condor Week 2004
11
Dynamic Performance Tuning
Application
• Application code changes
– Different bottlenecks that depend on the application implementation
Libraries
• Library code changes
• API usage
– Standard
• C/C++ library -> memory management,
dynamic containers
– Custom
• PVM, MPI -> communication
OS
Application
code
API
Libraries
code
OS API
More
bottlenecks
common for
wider group of
applications
Operating System
kernel
Hardware
• Kernel code changes
• API usage
– Adjustment of options (e.g. TCP/IP socket), I/O request grouping
Paradyn/Condor Week 2004
12
Dynamic Performance Tuning
Approaches to tuning
More cooperative,
more applicationspecific
• Cooperative
– Application must be prepared
for tuning
– Application-specific knowledge
is provided
• Automatic - black-box
Application
code
API
Libraries
code
OS API
Operating System
kernel
Hardware
– Tuning of any application
– No application-specific
knowledge is required
– Knowledge about bottleneck is required
– No changes are introduced
into the application source code
Paradyn/Condor Week 2004
More automatic,
more generic
information available
13
Dynamic Performance Tuning
Knowledge representation
• Measure points
– Where the instrumentation must be inserted to provide
measurements
• Performance model
measurements
– Determines minimal execution
time of the entire application
Formulas
and conditions
for optimal
behavior
optimal values
• Tuning points/actions/synchronization
– What and when can be changed in the application
• point – element that may be changed
• action – what to invoke on a point
• synchronization – when a tuning action can be invoked to ensure
application correctness
Paradyn/Condor Week 2004
14
Dynamic Performance Tuning
Application knowledge
Measure points
Provided by the
user
Application
code
API
Performance
model
Tuning point,
action, sync
Paradyn/Condor Week 2004
Libraries
code
OS API
Provided
automatically by
a tuning system
Operating System
kernel
Hardware
15
Dynamic Performance Tuning
Manipulation of a running application
• monitoring – collect information about the behavior of a running
application
• tuning – insert tuning code into a running application that improves its
performance
Dynamic instrumentation – DynInst
Paradyn/Condor Week 2004
16
Dynamic Performance Tuning
Dynamic modifications of a running application
with DynInst
• Function replacement
• Function invocation
• One-time function invocation
• Function call elimination
• Function parameter changes
• Variable changes
Paradyn/Condor Week 2004
17
Content
1.
Introduction
2.
Dynamic Performance Tuning
3.
MATE
4.
Tuning Techniques
5.
Conclusions and future work
Paradyn/Condor Week 2004
18
MATE
MATE – Monitoring, Analysis and Tuning Environment
• prototype implementation in C++
• for PVM based applications
• Sun Solaris 2.x / SPARC
Paradyn/Condor Week 2004
19
MATE
pvmd
Machine 1
Machine 2
pvmd
modif.
Task1
AC
DMLib
AC
Task3
DMLib
DMLib
instr.
instr.
events
•
•
•
Application Controller - AC
Dynamic Monitoring Library DMLib
Analyzer
Task2
events
Analyzer
Machine 3
Paradyn/Condor Week 2004
20
MATE: Application Controller
Services
• Distributed application control
– Startup/exit of tasks (Tasker)
– Startup/exit of PVM daemons, slave ACs (Hoster)
– Clock synchronization
• Application model management (Task Manager)
• Performance monitoring (Monitors)
– Manage monitoring instrumentation
– Provide monitoring API for Analyzer
• Performance tuning (Tuners)
– Manage tuning instrumentation
– Provide tuning API for Analyzer
Paradyn/Condor Week 2004
21
MATE: Application Controller
Monitors
• Instrumentation management via DynInst
– Dynamically load DMLib
– Generate monitoring snippets that
call appropriate library functions
– Insert/remove snippets in/from
requested points
• API
– AddEventTrace(tid,
eventId,
funcName,
instrPlace,
attrs)
– RemoveEventTrace(tid,eventId)
Paradyn/Condor Week 2004
Machine 1
Task2
Task1
DMLib
Instrument
Via
DynInst
DMLib
AC
Monitor
add event/
remove event
Analyzer
Machine 2
22
MATE: Application Controller
Tuners
• Tuning via DynInst
– Generate tuning snippet according to
the request
– Insert tuning snippet
Machine 1
Task2
Task1
Tune
Via
DynInst
AC
• API
–
–
–
–
–
–
–
LoadLibrary(tid,path)
SetVariableValue(tid,params,brkpt)
ReplaceFunction(…)
InsertFunctionCall(…)
OneTimeFunctionCall(…)
RemoveFunctionCall(…)
FunctionParamChange(…)
Paradyn/Condor Week 2004
Tuner
Apply tuning
Analyzer
Machine 2
23
MATE: Dynamic Monitoring Library
Services
Machine 1
• Register event
•
What – event type (id, place)
•
When – global timestamp
•
Where – task identifier
•
Requested attributes –
e.g. function call parameters, return value
• Deliver event to the Analyzer
• API
– DMLib_InitLogger(tid,
analyzerHost,port,clockDiff)
– DMLib_OpenEvent(id, nAttrs)
– DMLib_AddIntAttr(value)
– DMLib_AddFloatAttr(value)
– DMLib_AddCharAttr(value)
– DMLib_AddStringAttr(value)
– DMLib_CloseEvent()
Paradyn/Condor Week 2004
– DMLib_DoneLogger()
Task1
DMLib
pvm_send (p1, p2)
{
entry
}
DMLib_OpenEvent();
DMLib_AddIntAttr();
DMLib_AddIntAttr();
DMLib_CloseEvent();
API implementation
event
1
0
64884
524247
262149
1
TCP/IP
Analyzer
24
MATE: Analyzer
Services
• Automatic performance analysis on the fly
–
–
–
–
–
Request for events
Collect incoming events
Find bottlenecks among events applying performance model
Find solutions that overcome bottlenecks
Send tuning request
• Analyzer is provided with an application knowledge about
performance problems
• Information related to one problem we call a tuning technique
• A tuning technique describes a complete performance
optimization scenario
Paradyn/Condor Week 2004
25
MATE: Analyzer
Tunlets
• Each technique is implemented in MATE as a tunlet
• A tunlet contains specific code (analysis logic) related to one
concrete performance problem
– measure points – what events are needed
– performance model – how to determine bottlenecks and solutions
– tuning actions/points/synchronization - what to change, where, when
• A tunlet is a C/C++ library dynamically loaded to the Analyzer
process
Analyzer
Tunlet
Paradyn/Condor Week 2004
Performance
model
Measure points
Tuning point,
action, sync
26
MATE: Analyzer
Events (from DMLibs) via TCP/IP
MetaData (from ACs) via TCP/IP
Tuning request
(to tuner)
via TCP/IP
Event
Collector
Controller
Event
Repository
DTAPI
AC Proxy
Instrument. request
(to monitor)
via TCP/IP
Application model
Tunlet
Tunlet
Tunlet
thread
Paradyn/Condor Week 2004
27
Content
1.
Introduction
2.
Dynamic Performance Tuning
3.
MATE
4.
Tuning Example
5.
Conclusions and future work
Paradyn/Condor Week 2004
28
Tuning Example
Workload balancing (App layer)
• Imbalance problem:
– Heterogeneous computing and communication powers
– Varying amount of distributed work
• Goal:
– minimize the idle time by balancing the work among the
processes considering efficiency of machines
• Balancing -> faster machines process more work than slower
• It cannot be statically balanced before program execution
(different input data, network load, machine power and load)
Paradyn/Condor Week 2004
30
Tuning Example
Workload balancing (App layer)
• Many scheduling methods -> Factoring Scheduling method
– Work is divided into different-size tuples according to the factor
• Application must be tunable:
– well known variable that represents the factor
– the factor must be checked before each iteration of the work
distribution
– the work tuples are calculated using the factoring scheduling
method and according to the current factor value
Paradyn/Condor Week 2004
31
Tuning Example
Example application
• Forest Fire propagation – Xfire
• High computation cost
applicatio n 4500
executio n
4000
time [sec]
3500
3768
3919
3000
2500
2000
No tuning
1967 1885
1953
2071
Tuning
1500
1000
500
0
1
Scenarios:
1) homogeneous and dedicated
2) heterogeneous and dedicated
3) heterogeneous and non-dedicated
Paradyn/Condor Week 2004
2
3
scenario s
Benefits:
1) Up to 2%
2) Up to 49%
3) Up to 48%
32
Content
1.
Introduction
2.
Dynamic Performance Tuning
3.
MATE
4.
Tuning Techniques
5.
Conclusions and future work
Paradyn/Condor Week 2004
33
Conclusions
• The principal conclusion: dynamic tuning works, is
applicable, effective and useful in certain conditions
• Limits of such tuning -> incomplete application information
• Classification of layers where tuning can be performed (OS,
libraries, apps)
• Approaches to tuning: automatic and cooperative
• Application knowledge representation:
– measure points, performance model, tuning point/action/sync
Paradyn/Condor Week 2004
34
Conclusions
• Working prototype environment – MATE – that
automatically monitors, analyses and tunes running
applications
• Practical experiments conducted with MATE and
parallel/distributed applications prove that it automatically
adapts application behavior to existing conditions during
run time!
Paradyn/Condor Week 2004
35
Future work
• Global and local analysis
– Scalability (problems with global analysis)
– Some problems can be treated locally
• Performance analysis
– How tuning techniques influence other techniques
– Other approaches than performance model
• Metrics
– Complementary information provided by metrics
• Provision of the application knowledge
– Tunlet provided externally in a declarative manner
• Instrumentation evaluation
– Prediction of monitoring and tuning instrumentation cost
Paradyn/Condor Week 2004
36
Future work
• Tuning techniques
– OS layer
• TCP/IP options (e.g. sending without delay – Nagle’s algorithm)
• I/O operations (e.g. read/write operations, I/O buffer size)
– Library layer
• Investigation of problems in MPI, numerical libraries
– Application layer
• Automatic selection of algorithm (e.g. sorting algorithm)
• Recommendations
– Provision of good explanation to the user
• Towards grid
Paradyn/Condor Week 2004
37
Thesis
March, 2004
Thank you very much
Paradyn/Condor Week 2004