Transcript Document

CIPRes in Kepler:
An integrative workflow package for
streamlining phylogenetic data analyses
Zhijie Guan1, Alex Borchers1, Timothy McPhillips2,
Shirley Cohen3, Mark A. Miller1, Ilkay Altintas1
1San
biology.sdsc.edu
Diego Supercomputer Center, UCSD
2University of California, Davis
3University of Pennsylvania
What is a Scientific Workflow?

Combination of



Mission of scientific workflow systems





data integration, analysis, and visualization steps
larger, automated "scientific process"
Promote “scientific discovery” by providing tools and methods to
generate scientific workflows
Create an extensible and customizable graphical user interface
for scientists from different scientific domains
Support computational experiment creation, execution, sharing,
reuse and provenance
Design frameworks which define efficient ways to connect to the
existing data and integrate heterogeneous data from multiple
resources
Make technology useful through user’s monitor!!!
biology.sdsc.edu
Promoter Identification Workflow
Source: Matt Coleman (LLNL)
biology.sdsc.edu
A Workflow for Phylogeny Analysis
biology.sdsc.edu
Kepler is a Scientific Workflow System
www.kepler-project.org



… and a cross-project collaboration
June 2, 2006 Beta release
Builds upon the
open-source
Ptolemy II
framework
Ptolemy II: A software system
used for prototyping engineering
system
KEPLER:
A platform to design and
execute Scientific Workflows
KEPLER = “Ptolemy II + X” for
Scientific Workflows
biology.sdsc.edu
Some Kepler Contributors
Ptolemy II
Griddles
SKIDL
Resurgence
SRB
NLADR
Other contributors:
- Chesire (UK Text Mining Center)
LOOKING
- DART (Great Barrier Reef, Australia)
- National Digital Archives + UCSD-TV (US)
-…
biology.sdsc.edu
Contributor names and
funding info are at the
Kepler website!!
A co-development in KEPLER: GEON
Dataset Generation & Registration
% Makefile
$> ant run
SQL database access (JDBC)
biology.sdsc.edu
Phylogeny Analysis Workflows
Local Disk
Multiple
Sequence
Alignment
biology.sdsc.edu
Phylogeny
Analysis
Tree
Visualization
Kepler Workflow: Actors

Actor



Port





biology.sdsc.edu
Communication between input and
output data
The place where data get in/out
Model of computation

Actor-Oriented Design
Encapsulation of parameterized
actions
Interface defined by ports and
parameters
Flow of control
Sequential / parallel execution
Implementation is a framework
CIPRes Workflow: Actors
Input Port:
Nexus File Content
Data Matrix
Tree
Taxa Info
Output Ports:
biology.sdsc.edu
Some actors in place for…
• Generic Web Service Client and Web Service Harvester
• Customizable RDBMS query and update
• Command Line wrapper tools (local, ssh, scp, ftp, etc.)
• Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator
• SRB support
• Native R and Matlab support
• Interaction with Nimrod and APST
• Communication with ORBs through actors and services
• Imaging, Gridding, Vis Support
• Textual and Graphical Output
• …more generic and domain-oriented actors…
biology.sdsc.edu
CIPRes Workflow
Actor:
GUIGen: Parameter Setting
Choose the input file
Run ClustalW
Channel: Convey the data
Get the subset
of the aligned
sequences
Read the tree
Run PAUP for Tree
Inference
Parse the tree
Display the tree
biology.sdsc.edu
Results:
CIPRes Workflows: Demo


Read Sequences  Multiple Sequence
Alignment  Display the Alignment
Matrix Alignment  Tree Inference 
Consensus Tree  Tree Visualization
biology.sdsc.edu
Summary

Kepler is good at:






Visual programming language


Integrating data, programs, and computing resources
Capturing your ideas and realizing them
Supporting computational experiment creation,
execution, sharing, and reuse
Quickly prototyping scientific workflows
Building streamlining applications
Don’t write your application, “draw”/compose it
Cipres-Kepler package can be used to build
scientific workflows for phylogenetic data analyses
biology.sdsc.edu
Future Work


Cipres-Kepler can help you
There is (always) a lot more to work on:





More actors for phylogeny analyses
Automatically generating actors based on CORBA
services
Database (TreeBase) support to store large amounts
of data
More computing power for large dataset processing
Need your collaboration:



Sharing experiences
Teaching each other the domain knowledge
Locating a specific problem and solving it
biology.sdsc.edu
Questions?
Zhijie Guan
[email protected]
1-858-822-3620
www.sdsc.edu
Cipres-Kepler Release:
ftp://ftp.sdsc.edu/outgoing/borchers/cipresReleases/20060621/cipresKepler_Dist.tgz
biology.sdsc.edu