Java API - seqware

Download Report

Transcript Java API - seqware

Design Principles
Separation between components into a modular
system


Independent standalone modules, that are also
runnable programs
–
Collaborator wants to run srf2FastQ at home,
without a MetaDB
–
Researcher tries custom parameters, but still
track his run in the MetaDB
XML Workflows that defines jobs and data
dependencies
–
Parameterized to reuse workflows on different
Application Wrapper Interface

Application conforms to a standard interface

Developers and users to not have to understand rest of the the pipeline

Force users to adhere to best practices


Syntax, --help option

Required test harness

Verifications of input, output, parameters
Wrapped applications mustLocal
be runnable
both
Execution:
Java API:
public interface
WrapperInterface {
int init(); // Optional
int get_syntax();
int do_test();
int do_verify_input();
int do_verify_parameters();
int do_run();
int do_verify_output();
int clean_up(); // Optional
}
$ java SeqWareRunner bpostprocess --help
→ Reports get_syntax()
$ java SeqWareRunner bpostprocess input
→ Run bpostprocess on the command line
$ java SeqWareRunner bpostprocess --db input
→ Same as above, but without MetaDB feedback
$ java SeqWareRunner bpostprocess --db input --config=config.txt
$ java SeqWareRunner bpostprocess --db input -A 0 -n 8
XML Workflow

Follows DAX Standard, which is input to Pegasus

Defines jobs, arguments, configuration, and data dependencies

Defines dependencies between jobs
<?xml version="1.0" encoding="UTF-8"?>
 xmlns="http://pegasus.isi.edu/schema/DAX"
Use Java Freemarker to populate the XML template
<adag
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://pegasus.isi.edu/schema/DAX
http://pegasus.isi.edu/schema/dax-2.1.xsd" version="2.1" count="1" index="0"
name="bfast" jobCount="3" fileCount="0" childCount="2">
for each experiment
<!-- Dependencies -->
<!-- jobs -->
<job id="ID0000001" namespace="seqware" name="runner" version="0.0.1">
<argument>bfast matches %{reference_file} %{experiment}.fastq...</argument>
<profile namespace="globus" key="max_memory">24576</profile>
<profile namespace="globus" key="count">8</profile>
<uses file="%{experiment}.fastq" link="input">
<uses file="%{experiment}.bmf" link="output" transfer="false" register="false">
</job>
<job id="ID0000002" namespace="seqware" name="runner" version="0.0.1">
<argument>bfast localalign ...</argument>
<uses file="%{experiment}.bmf" link="input">
<uses file="%{experiment}.baf" link="output" transfer="false" register="false">
</job>
<job id="ID0000003" namespace="seqware" name="runner" version="0.0.1">
<argument>bfast postprocess ...</argument>
<uses file="%{experiment}.bmf" link="input">
<uses file="%{experiment}.bam" link="output" transfer="true" register="true">
</job>
.....
<child ref="ID0000002">
<parent ref="ID0000001"/>
</child>
<child ref="ID0000003">
<parent ref="ID0000001"/>
<parent ref="ID0000002"/>
</child>
</adag>
</xml>
Pegasus

Each task is a standalone application,
independently runnable





Scientific says 'how do I run Bfast'
Collaborator wants to run srf2FastQ at home, but
does not have a pipeline or Metadata DB
Researcher wants to try some custom parameters,
but we still want to try his run in the Metadata DB
Each application conforms to a standard, welldefined interface
The interface is abstract enough for users to
wrap their applications without knowing
Pegasus

Each task is a standalone application,
independently runnable





Scientific says 'how do I run Bfast'
Collaborator wants to run srf2FastQ at home, but
does not have a pipeline or Metadata DB
Researcher wants to try some custom parameters,
but we still want to try his run in the Metadata DB
Each application conforms to a standard, welldefined interface
The interface is abstract enough for users to
wrap their applications without knowing