Workflow Strategy

Download Report

Transcript Workflow Strategy

Visualization and Analysis
Workflow
December 14, 2009 Draft
/ber
The concept of a Workflow
• Express the analysis of plant systems in terms
of the data and operations on those data
– Multiple types of data (e.g., experimental,
computed, archival)
– Mutliple types of operations (e.g., analytical,
visualization, search)
• Treat the data and operations as components,
which can be re-used, replaced, augmented,
and extended.
Workflow
• A pathway of operations
• Entities:
– Operation
– Data
– Flow
• The flow through the operations is managed by the
workflow software (e.g., VizTrails)
Multi-layer workflows
Conceptual Level: High-level representation for casual users, with lots of
defaults pre-selected
List of
genes
Co-expression
analysis
Network
Professional Level: Visibility into underlying workflows, with freedom to select
tools and parameters
Pathways
Analysis of omics data
List of
genes
Statistical
analysis tool
Network
Interactive
Visual
Analysis
Metabolites
Infrastructure Level: The explicit treatment of underlying data, databases, data
integration, tools, operations, parameters, defaults, wrappers, provenance,
interconnectivity, access, etc.
VizTrails- a candidate workflow architecture
Professional
workflow
Provenance
and metadata
Conceptual
workflow
Interactive
visualizations
• Visual programming interface for representing data and
operations as workflows
• Loose coupling, using parameterizable Python wrappers
• Extensible, flexible, re-usable components and workflows
• Coupled with an attractive, flexible User Interface (to be
developed)
Example Workflows from iPLANT team
• Goals:
– Demonstrate the use of a workflow model for representing
the data and processes in plant genomic research
exploration
– Provide a common structure for iPLANT use cases
– Help define requirements for data integration
– Motivate discussions about analysis that join multiple
types of data, allow users to interact dynamically, and
provide interactive painting across visual representations
(e.g., painting a metabolic pathway with gene expression
magnitude)
Workflow for Maize Gene Analysis
Modeling and
Statistical Inference
Candidate maize
gene
Homolog Finder (e.g,
CoGE)
Literature search
List of homologous
Arabadopsis gene IDs
5 genes of interest
Examine clusters that
can handle maize data
(e.g., eNorthern,
MapMan)
note: very limited data for maize
so may need to go to rice
For each, examine structure of
transcripts and expression over
time (e.g, EFP Maize Genome
Browser)
Expression data for 20
maize genes
Co-Expression Analysis
(e.g., ATTED2)
Expression Network of
10 Arabidopsis Genes
Homolog Finder (e.g,
CoGE)
Find expression values for
these genes (e.g, Next
Gen)
List of 20 homogolous
maize gene IDs
/tb/ber
Workflow for Analysis of Omics Data in a Model Species
Gene
expression
data
Expression Analysis
Identify sub-cellular
locations of gene
products (e.g.,
Interactome)
Metabolite
Data
Interactive Visual &Statistical
Analysis (e.g., ViVA, Co-expression
analysis, PlantMetGenMap, Gene
Mania)
Visually-identifed, cellbased, network
regions of interest
•Interactive visual and
statistical analysis
•Explicit support for
iterative what-if
analysis
Visually identified
genes and metabolites
to map onto functional
pathways
Inferred
Protein-Protein
interactions
Visualize
•Integrated gene
expression and
metabolomic data
iterate
iterate
Testable
Hypotheses
Visualize
Visually-identified
enriched pathways
/rg/ber
Other Data Sources to be Incorporated
1.
2.
3.
4.
5.
Motifs from Regulatory Regions in Model Species
Cell-specific Expression
Pathways Wiki, place gene(s) of interest in established pathways.
Metabolites, incorporate information from Reactome
Literature , PubMed Assistant???
Depiction Needed
Displays of inferred regulatory networks, as in Gene Mania.
1
Analysis of Gene Expression from A Partially Sequenced Species
Experimental
exposure of plants to
stress
Highly
expressive
genes
Ecophysiological data
6
Meta Annotator: Explore
known features of these
genes (e.g. signaling
pathways, eFP, literature)
onto pathways (e.g.,
MapMan)
3
Identification of
homologs in reference
species (e.g. CoGe)
7 Formulate
mechanistic
models
Visualization of
enriched
pathways
5
2 Paint identified genes
Compare magnitude of
activity across
reference pathways
(e.g., PageMan,KEGG,
GO, MapMan)
4
Identification of candidate
homologs that have been
reported as co-expressed
(e.g., statistical correlation)
Co-expressed
genes for
reference species
/rg/ber