Performance Validation - Extreme Scale Research

Download Report

Transcript Performance Validation - Extreme Scale Research

Workflow Validation
Kerstin Kleese van Dam
Michela Taufer
Discussion Topics
• Identify State of the Art, Gaps and Future
Research Topics for:
• Validating Performance
• Validation and Reproducibility
• Validating Accuracy
Performance Validation (1)
State of the Art:
•No tools, just one off studies. Single application performance tools are
available, but not necessarily applicable to workflows as is.
•Different vision of performance in Cloud computing - i.e., latency driven.
Cloud deploys the concept of enough accurate service within a certain time
/amount of resources
Gaps:
•Tools to model and predict workflow performance, tools to monitor
performance to validate and improve future predictions and models
(importance of factors)
•Information flow between system, application and workflow
•How to express and achieve performance goals - for a facility versus for a
specific workflow
•Do we know what metrics we need to capture for different performance
goals? Who is capturing them and which granularity
Performance Validation (2)
Future Research Topics:
• Tools to monitor, model and predict performance
• Performance information capture at runtime - workflow and system,
capture information about events, capture at different levels - provenance
• In-situ analysis of performance data
• Intelligent schedulers at different levels to optimize performance
according to set goals
Reproducibility and Validation (1)
State of the Art:
•Definition of “reproducibility” depends on community and single scientist; it ranges
from stringent bit-to-bit reproducibility to reproducibility of the science with different
methods.
•Some disagreements on who is responsible for reproducibility: the application, the
workflow, the system? In the end in complex applications, the workflow has major role
on domain decompositions.
Gaps:
•Not clear role of the workflow in tracking aspects related to reproducibility.
•Hidden information that if revealed can be overwhelming. How can you track
everything at exascale, especially when you have unexpected events? What is the
responsibility of the workflow system and what it needs to track?
•Provenance should be viewed from the point of view of the consumer of the
information
•The level of resolution to keep provenance need to be define
•Need to define the lifespam of validation/provenance data
Reproducibility and Validation (2)
Future Research Topics:
• Integration of application, programming models, and run-time systems to
pursue reproducibility
• Pursue reproducibility of results to identify incorrect science. When
publishing results, we need to provide provenance, so that others can
reproduce the results
• Automatic annotation embedded in multi-layer or modular workflows
Validating Accuracy (1)
State of the Art:
•Who is responsible for which part of accuracy? Application responsible to
guarantee accuracy
•No test suite to check for workflow and system accuracy regularly
•No one is watching the correctness of the workflow manager
•establish accuracy of components a priori, workflow system can monitor
specific variables to see if application/workflow is progressing to plan and
present to user. First research in using data mining and machine learning to
identify deviation and correct for it - challenge its black box - so how to
convey it to the user, trade offs of between workflow completion and
accuracy observed
Gaps:
•Link to reproducibility - can we reproduce to validate accuracy
•Can we use accuracy measure to determine what data to keep or through
away, can accuracy metrics trigger capture of more system information to
inform later debugging
•Investigate human in the loop role in determining accuracy (what deviation is
acceptable), and decide on suitable actions.
Validating Accuracy (2)
Future Research Topics:
• We cannot always get exact numbers, but the numbers need to be within
a certain range
• For a given problem, can you capture what the best settings are?