ODD-Genes - National e

Download Report

Transcript ODD-Genes - National e

ODD-Genes:
Accelerating data-driven
scientific discovery
NeSC Review 2003
NeSC
2003-09-30
Introduction
ODD-Genes Background
Science enabled by ODD-Genes
Automating routine statistical conditioning of highly
variable microarray results.
Discovering related data sources
Querying discovered data sources for relevant data
Identifying significant targets for focussed
investigation
Caveats & further work
ODD-Genes Background
ODD-Genes is a demonstrator
Demonstrates how Grid technologies enable e-Science, accelerating
scientific discovery
SunDCG’s TOG software allows for job submission on remote compute
resources
OGSA-DAI provides access, control and discovery of data resources
ODD-Genes used to investigate Wilms Tumour
Routine statistical conditioning of microarray results
Data-driven discovery of novel targets for investigation and potential
therapy
Collaborative project
NeSC/EPCC, Edinburgh, UK
Scottish Centre for Genomic Technology and Informatics, Edinburgh,
UK (GTI)
Human Genetics Unit at MRC, Western General Hospital, Edinburgh,
UK (HGU)
SunDCG – Enabling Routine
Statistical Conditioning
Choose analysis to perform
Automates analysis process
Provides predetermined
workflow
Can run more than one
analysis at a time
Multiple reproducible avenues
for investigation
Reduces cost (human,
machine), increases availability
TOG enables this by allowing
access to HPC resources
SunDCG - Conditioning Results
Results of conditioning can
be analysed and investigated
Researcher has potentially several
views of data to explore, all
presented simultaneously in
parallel (cp traditional serialised,
manual process)
Researcher can reproduce this
initial condition for repeated
analyses
Researcher need not perform each
step manually and serially, or ask
dedicated statistician to do so.
OGSA-DAI - Results Investigation
Multiple views of data
Raw
Heat Map
Cluster Map
Wilms Tumour study
takes a new direction
two genes appear
significant in early
development
Researchers would like
more info on these
genes…
OGSA-DAI - Data Resource
Discovery
OGSA-DAI uses keywords to locate
relevant data resources
May return data resources previously
unknown to researcher
Researcher selects most interesting
data resource to query for information
about gene
Researcher selects Mouse atlas –
narrow, deep database of spatial gene
expression in mice embryonic
development
Contrast with GTI database of broad,
shallow genome-wide gene expression
across multiple organisms, stages &
conditions
OGSA-DAI - Data Resource Query
OGSA-DAI returns data from
query
Data and annotation displayed
Data contains references to
related images
Researcher rapidly moves from
numeric and textual description
to spatial representation of
relevant gene expression
These show that the genes
are stem cell markers
Targets for focussed
investigation, potential therapy
ODD-Genes Caveats & Further
Work
ODD-Genes is a demonstrator
Need to develop production applications for both routine
statistical processing and data resource discovery and query
Need to parameterise routine conditioning appropriately to
complete automation
ODD-Genes requires GRID infrastructure
Participating researchers need to partner with centres who host
application front-ends (or, host the infrastructure themselves)
However, alternatives often proprietary, expensive, less flexible
ODD-Genes requires registration by data-hosts
Critical mass of registered data sources.