Slide - Indico
Download
Report
Transcript Slide - Indico
ACT 119153 (NISR+Τ) 3rd Phase ”Open Access Repositories and Electronic
Journals” supervised by the Greek Information Society /FP6
Subtask 6 Development of an integrated DNA microarray data processing and
meta-analysis platform plus a microarray experimental data repository, in
Grid.
Overview
•
•
•
•
•
•
•
•
Introduction
MicroArray Experiments
Problems
The GRISSOM Portal
System Architecture Overview
Case Study
Technical Issues
GRISSOM Platform Benefits
MicroArray Experiments
Gene expression microarrays constitute promising high
throughput measurement methodologies of the
simultaneous expression of the whole genome
of an organism at a specific instant.
In practice they can be used to compare the level of transcription among different
conditions in order to:
i) Understand the mechanisms implicated in various stages of the biological system
investigated
ii) Classify diseases, or in general pathologies
e.g. tumours with different prognosis status that
are indistinguishable by microscopic histological examination
iii) Monitor the response to therapy and
iv) Identify and categorize diagnostic or prognostic biomarkers
MicroArray Experiment Workflow
Problems
•
•
•
•
•
•
Computational Processing steps of Microarray experiments data are
laborious, something which represents by far the most considerable
bottleneck in the successful exploitation of the technology.
Consequently there is imperative requirement for large storage and
computing facilities
This results in compounding costs in a significant yet expensive
technology, thus setting back research progress in the field.
Technical setbacks: array artifacts, scratches, scanner sensitivity and
settings.
The curse of dimensionality: tens of thousands of genes (variables) with a
small number of samples form major challenges in statistical inference.
Noise: non-specific hybridization as well as the difference between the
actual amount of mRNA per cell and the relative differential expression
measured by microarrays introduce variance and noise in experiments
The GRISSOM Portal
• http://www.grissom.gr
• Access
– Restricted Web Access
– Registered Users
– Special Security Mechanism
The GRISSOM Portal
• http://www.grissom.gr
• Web Portal Access (SSL)
– Two Access Modes:
HellasGrid Certificates
Validation
Custom Certificates
Validation
Signed by
NHRF
The GRISSOM Portal
• HellasGrid Authentication & Access
– MyProxy
• MyProxy Certification Authority
User
MyProxy-logon
MyPrxy
-Server
Grid.Auth.GR
The GRISSOM Portal
http://www.grissom.gr
• Features
– Experimental data upload
– Versatile Data Processing:
• Normalization, Filtering, Statistical Selection, Clustering,
Genes Annotation
– Automated experiment submission to HellasGrid
Infrastructure and monitoring
– Biological Experiment Repository
– Meta-Analysis Methods including gene annotation
and GO Analysis
The GRISSOM Portal
http://www.grissom.gr
• Input:
– Raw Dataset Files
(various image formats,for cDNA/Affy)
– Analysis Parameters
• Output
– Expressed Gene Lists
– Interactive Graphs
– Annotated Genes
– References to similar
Experiments
The GRISSOM Portal
http://www.grissom.gr
• Distributed Database:
– Data instantiation through PHP calls on mySQL
database (distributed) while actual data on SEs
– Interconnection with other open biological
databases (EBI ArrayExpress, NCBI GEO) for finding
other related experiments
– Annotation of genes performed using specialized
databases (Biomart)
System Architecture Overview
• Main Components:
– Web Portal (User Interface)
– Local DB
– Grid Middleware
• PHP + Java
• gLite 3.1
– Parallel Execution Code (MPI + Octave)/ in the
phase of development job submission through
gLite DAG for fully distributed code execution
– Grid Storage Elements
System Architecture Overview
System Architecture Overview
Analysis steps are executed using the MPI
technology over multiple nodes
The number of Nodes are equal to the number
of experimental conditions found in every
experiment
Case Study – Test Scenarios
The system was tested using multiple datasets
that differ in size and architecture:
Description
Transcriptional
signature
of
wounded
keratinocytes reveals selective roles for ERK1/2,
P38 and PI3K signaling pathways
Gene expression variation in lymphocyte
subpopulations in response to low dose of
ionizing radiations
Embryonic stem cell differentiation induced by
various chemicals: time course
Compensated
and
decompensated
right
ventricular hypertrophy at onset: time course
Spatial gene expression in flowers (cDNA array)
NHRF provided Dataset
GEO ID
Num. of
nodes
required
Num. of
Num. of
Replicate
Conditions
s
GSE6820
18
18
36
GSE6978
6
6
34
GDS1823
3 or 9
3 or 9
18
GDS1928
3
3
27
GDS865
-
5
5
5
5
15
15
Case Study – Performance
Measures
Intel Core 2 Duo E4300 1.8GHz processor with 2.0
GB RAM system used running GNU Octave
2.1.73 on Linux Ubuntu 7.10 operation system
HellasGrid nodes
used for
GRISSOM run
GRISSOM Run
(minutes)
Single Node Run
(minutes)
Speed-up
Ratio
GEO-GSE6820 - 18/36
18
80
400
5
GEO-GSE6978 - 6/34
6
79
340
4.30
GEO-GDS1823 - 9/18
9
38
186
4.89
GEO-GDS1928 - 3/27
3
35
63
1.83
GEO-GDS865 - 5/15
5
16
47
2.94
NHRF - 5/15
5
17
54
3.18
Scenario
Case Study – Performance
Measures
Analysis Run using the Same Dataset with
different Parallelization Level.
First Run: 3 Nodes - Second Run: 9 Nodes
Grid-related Performance
Limitations
• Different node H/W generations
• Heterogeneity of node installed S/W (esp.
regarding biocomputing packages like
Bioconductor)
• Maintenance Issues
GRISSOM Platform Screenshots
GRISSOM Platform Benefits
•
•
•
•
•
•
Parallelization
Time optimization
User Transparency
Automated Job Submission + Monitoring
Open Access Biological Experiment Repository
Shell fully concealing the Grid
• GRISSOM Development Team
Aristotle Chatziioannou ([email protected])
Ilias Maglogiannis ([email protected])
Ioannis Kanaris ([email protected])
Charalambos Doukas ([email protected])
Eleftherios Pilalis ([email protected])
Panagiotis Moulos ([email protected])
• Under the supervision of the Institute of Biological
Research & Biotechnology, National Hellenic
Research Foundation
Fragiskos Kolisis ([email protected])
in collaboration with the National Documentation
Center, National Hellenic Research Foundation
• Funded by ACT 119153 (NISR+Τ) 3rd Phase ”Open
Access Repositories and Electronic Journals”
supervised by the Greek Information Society /FP6
•
h t t p : / / w w w . g r i s s o m . g r
Thank you
Questions ?