Transcript Kepler
Scientific workflow management system
based on Ptolemy II
Allows scientists to visually design and
execute scientific workflows
Actor-oriented model with directors
acting as the main workflow engine
Enables different models of computation
Modeling flow of data from one step to
another in series of computations to
achieve some scientific goal
Software system for modeling, simulation, and
design of concurrent, real-time, embedded
systems developed at UC Berkeley
Objective:
“The focus is on assembly of concurrent
components. The key underlying principle in the
project is the use of well-defined models of
computation that govern the interaction between
components. A major problem area being
addressed is the use of heterogeneous mixtures
of models of computation.”
Directors
Link
Port
Link
Port
Attributes
ec
ec
nn
Link
tio
n
Attributes
co
n
tio
Relations
Actor
Relation
nn
Ports
Actor
co
Actors
connection
Port
Actor
Attributes
Directors control execution of workflow
Actors are executable components of a
workflow (scheduling, dispatching threads, etc)
Directors govern execution of Actors
Actor-/Dataflow
Orientation
vs
Object-/
Control flow
Orientation
Every
Kepler workflow needs a
director
Execute networks of components
under multiple execution models
› Synchronous vs. Parallel vs. Dataflow vs.
time-based vs. event-based vs. all
combined
Computation
model dictates
semantics for component interaction
Make use of separation of concerns
› e.g., component execution, workflow
execution and provenance tracking
Managers acts like “common execution
environment”
› governing different concerns related to
execution of network and services
CT
– continuous time modeling
DE – discrete event systems
FSM – finite state machines
PN – process networks
SDF – synchronous dataflow
DDF – dynamic dataflow
SR - synchronous/reactive systems
Reusable components that execute
variety of functions
Communicate with other actors in
workflow through ports
Composite actor – aggregation of actors
Composite actor may have a local
director
Top level workflows can be conceptual
representation of science process
Drilling down reveals increasing levels of
detail
Composing models using hierarchy
promotes development of re-usable
components
Each actor implements several methods
› initialize() – initializes state variables
› prefire() – indicates if actor wants to fire
› fire() – main point of execution
Read inputs, produce outputs, read
parameter values
› postfire() – update persistent state, see if
execution complete
› wrapup()
Each director calls these methods
according to its model
Copy actor– copy files from one resource to another
during execution
› Stage actor – local to remote host
› Fetch actor - remote to local host
Job execution actor – submit and run a remote job
Monitoring actor – notify user of failures
Service discovery actor – import web services from a
service repository or web site
Rexpression actors
MatlabExpression actors
Web services actors – Given WSDL and name of an
operation of a web service, dynamically customizes itself to
implement and execute that method
Database connection and query actors
Ports used to produce and consume
data and communicate with other
actors in workflow
› Input port – data consumed by actor
› Output port – data produced by actor
› Input/output port – data both produced and
consumed
Direct same input or output to more than
one port
Example: direct output to
1. display actor to show intermediate results,
and
2. operational actor for further processing
Execution Options:
› inside GUI
› at command-line
› distributed computing
Kepler components can be shared by
exporting workflow or component into a
Kepler Archive (KAR) file (extension of JAR file
format)
Component Repository is centralized
system for sharing Kepler workflows
Users can search for components from
repository from within Vergil
Kepler provides direct access to
scientific data archived in many of
commonly used data archives.
› Ex. access to data stored in Knowledge
Network for Biocomplexity (KNB) Metacat
server and described using Ecological
Metadata Language.
Additional supported data sources
› DiGIR protocol, OPeNDAP protocol, GridFTP,
JDBC, SRB, and others.
Kepler ships by default with:
› Globus actors
› GridFTP actors
No BES implementation*
Job submission to openPBS, G-lite
Kepler actors capable of using Unicore by
Euforia (Poznań SC)
TeraGrid gateways exists that use Kepler
Actor Data Polymorphism:
› Add numbers (int, float, double, complex)
› Add strings (concatenation)
› Add complex types (arrays, records,
matrices)
› Add user-defined types
Distributed execution of workflow parts (peer to peer)
Efficient data transfer
Provenance tracking of data and processes
Tracking workflow evolution
Streaming data analysis
Easy-to-deploy batch interfaces
Intuitive workflow design
Customizable semantic typing
Interoperability with other workflow and analytical
environments (at exec level)
Ecology
›
›
›
Geosciences
›
›
›
DIGARCH: Digital preservation; UK Text Mining Center: Cheshire feature and archival
Conservation biology
›
Resurgence: Computational chemistry; DART/ARCHER: X-Ray crystallography
Library science
›
REAP: SST data processing; LOOKING/OOI CI: ocean observing CI
ROADNet: real-time data modeling and analysis
ATOL: Processing Phylodata ; CiPRES: Phylogentic tools
Chemistry
›
SDM: Gene promoter identification and ScalaBLAST
ChIP-chip: Genome-scale research; CAMERA: Metagenomics
Oceanography
›
›
GEON: LiDAR data processing, Geological data integration
NEESit: Earthquake engineering
Molecular biology
›
›
SEEK: Ecological Niche Modeling and climate change
REAP: Modeling parasite invasions in grasslands using sensor networks
NEON: Ecological sensor networks; COMET: Environmental science
SanParks: Thresholds of Potential Concerns
Physics
›
SDM: astrophysics TSI-1 and TSI-2 ; CPES: Plasma fusion simulation; ITER-EU: ITM fusion
workflows