Microsoft PowerPoint
Download
Report
Transcript Microsoft PowerPoint
A Hybrid Decomposition
Scheme for Building Scientific
Workflows
Wei Lu
Indiana University
Application Decomposition
• Large scientific applications require
– Decomposing the problem into manageable units
– Units need to be
•
•
•
•
Self-described
Self-encapsulated
Independently developed and deployed
composable
• Two decomposition dimensions
– Functional Decomposition (a.k.a. Spatial Decomposition)
• C/C++, JAVA
• Component
– Temporal Decomposition
• Unix Pipe
• Workflow
Our work
– however,
• most PSEs provide only one approach to the exclusion of the other
Common Component Architecture (CCA)
• Scientific computing imposes special
requirements
– Support for legacy software
– Performance is crucial
– languages, data types
• Fortran, C/C++, Python, Java, etc.
• Complex numbers and Arrays (as first-class objects)
– Support the various parallel run-time platforms
• CCA
– Component framework specification
– Designed for the scientific high performance computing
– Aims at improving the scientific software reusing
IntegratorPort
FunctionPort
C
MidpointIntegrator
FunctionPort
CCA Component
• Each component describes
Fortran
NonlinearFunction
FunctionPort
– What functionality it fulfills
• Provide port
Python
LinearFunction
– What functionality it needs to fulfill its task
• Use port
• Use-Provide pattern
– Plug-and-play
• The port is described in SIDL
– Scientific Interface Definition Language
– Partially derived from CORBA IDL
– With constructs to describe the complex number,
array, etc.
– Babel : Language Interoperability Tool
Example of the CCA Composition
interface IntegratorPort extends gov.cca.Port
{
double integrate(in double lowBound, in double upBound, in int count);
}
Ccaffeine
• Parallel implementation of the CCA framework
• SCMD (Single Component Multiple Data)
– Inter-components communication
• virtual function call in the same address space
– Intra-components communication
• could be MPI, PVM, etc.
Kepler
• Scientific workflow enviroment
– Data-flow oriented
• Basic unit: Actor
URL
Credential
GridFtp
localFilePath
– Input, Output
– Typed dataflow structure
– Lots of domain-specific actors supporting
• biology, ecology, astronomy
– General facility actors
• Grid service actor
• Web service actor
• Wire the actors by piping
Classifier
Compare Side by Side
• Actor
– Stands for one function
• Port
– Input/Output
– A data-structure definition
• Connection
– Producer to Consumer
• Compositions defines
“How”
• Advantages
– Loosely coupled
– Supports distributed
resource sharing
• Component
– Stands for one class
• Port
– Provide/Use
– An interface signature
• Connection
– Caller to Callee
• Composition defines
“What”
• Advantages
– Good performance
– Supports parallel
programming model
A Hybrid solution
• Typical scientific applications
– involve multiple distributed data processing phases.
– Among those phases there are number of
computationally intensive cores,
• often the classical numerical algorithm
• need the high performance execution environment.
• The hybrid scheme
– use the workflow scheme to decompose based on the
distribution of the resource
– Then use the component scheme to further
decompose those computationally intensive subproblems to form the parallel solution.
• Benefit from both schemes
Service over Components
• Building web service over the CCA
– Web service = good interoperability
– Kepler supports web service as the actor
– More resource and protocols (e.g., WS-BEPL)
service
• Façade pattern
– External view by the coarse-grained web service
– Internal functionality by the fine-grained components.
• Factory pattern
– Workflow needs
• a task-specific service rather than meta-level service.
create
– The task-specific Service
• Should be created dynamically and on-demand
– But service is not instantiable !
Task-specific
service
Architecture
• Job
– A specific task performed by a group wired components
• Two phases execution
– Compose the job
– Run the job
• Two explicitly separated web services (CCA-Services)
– Factory Service
– Job Proxy
Composer
User
Job description
Invocation
Factory
Service
Job
Proxy
IPC
Ccaffeine
Framework
Job Factory Service
• A Façade for the ccaffeine framework
– Connects the ccaffeine muxer via a socket
– Maintains the job tables, job lifecycle
• Create
– parameters
• Gateway port
– the task-specific interface
• Composition Description:
– how components wired to support the Gateway port
– Convert the SIDL to WSDL
• Gateway port definition to the equivalent WSDL
– Forward the composition commands to the ccaffeine muxer
• Will be executed in parallel
– Maintain job records internally
– Create the Job Proxy service
• return its WSDL URL
• Modify
– Change the composition without impacting the service interface
Job Proxy Service
• Façade for the wired components
• With task-specific WSDL interface
• When getting the SOAP message
– Extract the argument from the message
– Pass the argument to the ccaffeine
– Invoke the ccaffeine
– Get result from Driver and send SOAP
response
Driver
User
SOAP request
Job
Proxy
Arguments
Example
Job table
Composer
Gateway port
composition
Factory
Service
socket
Go
User
SOAP
Job
Proxy
Job WSDL
Gateway port
Convert SIDL to WSDL
• SIDL
• Port interface (methods)
• object oriented
• WSDL
• PortType (operations)
• wire-format description
– Port interface
– PortType
– No data structure so far
– Any type is data structure
essentially (by XML Schema)
Introducing
structure in •SIDL
will
• A virtual
interface
A group
of message exchanges
alleviate
the problem reasonably
• inheritance,
polymorphism
• no inheritance, no polymorphism
• Can be referred as the
• can’t be referred as the method
function parameter type
parameter type
Challenge
No way to figure out the structural information from a SIDL port interface!
Current workaround:
Only allow the methods with primitive argument type
Example
interface IntegratorPort extends gov.cca.Port
{
double integrate(in double lowBound, in double upBound, in int count);
}
<wsdl:message name="integrateInput">
<wsdl:part name="lowBound" type="xsd:double"/>
<wsdl:part name="upBound" type="xsd:double"/>
<wsdl:part name="count" type="xsd:integer"/>
</wsdl:message>
<wsdl:message name="integrateOutput">
<wsdl:part name="return" type="xsd:double"/>
</wsdl:message>
<wsdl:portType name="integrator.IntegratorPort_PortType">
<wsdl:operation name="integrate">
<wsdl:input message="integrateInput"/>
<wsdl:output message="integrateOutput"/>
</wsdl:operation>
</wsdl:portType>
Kepler Web Service Actor
• Kepler provides a general web service actor
• For a method defined in the WSDL
– The actor will dynamically adjusts its input/output
setting
Kepler CCA-Service Actor
• For CCA-Serivce
– Recall that we have 2 explicit steps
– the JobProxy service is dynamically created
– We need to hide the procedure of creating the
JobProxy service from the user
• CCA-Service Actor
– Extended from the web service actor
– First calls the JobFactory service to create the
JobProxy service
– With the WSDL of JobProxy, it does same thing as a
general web service actor does
Change the GUI from
Socket stream based to
Soap message based.
Conclusion
• A hybrid decomposition scheme for scientific application
• Workflow scheme is used first based on the resource
distribution
• Component scheme is used to further decompose the
core parts
• Web service interface is the key to the integration
• CCA integrates into Kepler as a special actor, with GUI
supporting unified visual environment.
• Converting SIDL to WSDL is inherently challenging,
Structure is useful for distributed systems, so we need to
introduce the Structure into SIDL
Thanks
• Thanks for the valuable comment by the
reviewers