Transcript Folie 1

WG5: e-Technologies
Martin Dugas, Jaakko Hollmen
EuGESMA
European Genomics and Epigenomics Study
on MDS and AML
Goals of WG5
• Central support for data analysis, management and
interpretation
• Research into novel methods for integration of
clinical and molecular data, initially with respect to
the analysis of microarray data
• Integrative data analysis will be performed in
collaboration with expert biostatisticians in the field
• Development of data management and analysis
systems for various chip platforms, such as gene
expression profiling (Affymetrix), SNP arrays, array
CGH, ChIP-on-chip, microRNA data, epigenetic
profiling, proteomic data, high-throughput
sequencing
• Application of standard biometric procedures (e.g.
survival analysis) to data from AML and MDS trials
Expected outcomes of WG5
• Harmonization of data from multiple centres who may
be using different chip array types and platforms
• Integration of molecular data from mRNA, miRNA,
epigenetic, SNP and CGH studies via an interactive and
dynamic interface driven through mutation,
cytogenetic and outcome parameters
• The identification of target genes and pathways for
development of, and testing of, novel therapeutic
drugs, molecules and agents
• The generation and frequently updating the Action
specific website (via the web-site coordinator)
Agenda
• Javier de las Rivas
Tools to integrative analyses of Affymetrix microarray data (expression
and copy number) and method to build a leukemia multiclass predictor
based on transcriptomic profiling
• Jaakko Hollmen
Modeling DNA copy number amplification pattern in human cancers
• Cesare Furlanello
Recent material on biomarker stability from predictive classifiers
• Silvio Bicciato
Genomic data integration with specific application to myelopoiesis
• Lara Nonell
Microarray data analysis and integration approach with an overview of
ongoing and incoming leukemia/MDS projects
• Andrea Zangrando
MLL rearrangements in pediatric acute lymphoblastic and myeloblastic
leukemias: MLL specific and lineage specific signatures
• Lucjan Wyrwicz
Functional annotation of gene lists in microarray studies
Relevance of e-Technologies:
The data explosion continues
• Affymetrix GeneChip 2.0plus: ~40.000
probesets
• Affymetrix SNP-Chip 6.0: ~1 Mio. SNPs
• ChIP-Seq: ~5 Mio. sequence reads
• Whole-Genome-Sequencing: 3 Billion base
pairs
[Ley TJ, et al.
DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.
Nature 2008; 456(7218):66-72]
What is the challenge?
Tools to integrate analyses
of microarray data:
integrative genomics
Huge amounts of data per array + hundreds of samples
[Javier de las Rivas]
We address the challenge
Tools to integrate analyses
of microarray data:
integrative genomics
Open methods to integrated / combined
data mining and data analyses
1.
Expression arrays: Affymetrix Human_Exon_1.0
measuring at once in “omic”-scale
genes, exons, miRNAs & ncRNAs
GATE = Genomic and Transcriptomic Explorer
includes probes mapping to loci http://bioinfow.dep.usal.es/xgate/
Mapping de novo all oligo probes from Affymetrix expression microarrays
7
[Javier de las Rivas]
A stable snapshot
of highly unstable condition
[Lucjan Wyrwicz]
COST – WG5
Antwerp, March 2009
Approach for microarray data analyses
 Introduction
 Aims
 M&M
 Results
 Conclusions
 Acknowledgements
University of Padova
[Andrea Zangrando]
COST – WG5
Antwerp, March 2009
SAM results
 Introduction
 Aims
 M&M
Constant part
Phenotype
Phenotype
Translocation
Translocation
Variable part
Translocation
Translocation
Phenotype
Phenotype
L1
L3
L2
L4
AML/MLL(-)
vs
AML/MLL(+)
ALL/MLL(-)
vs
AML/MLL(-)
ALL/MLL(+)
vs
AML/MLL(+)
Comparison ID
 Results
SAM comparisons
ALL/MLL(-)
vs
ALL/MLL(+)
 Conclusions
UP / down
ALL/MLL(-)
1013 / 740
 Acknowledgements
UP / down
AML/MLL(-)
1378 / 754
155 / 555
UP / down
ALL/MLL(+)
754 / 1378
740 / 1013
379 / 601
UP / down
AML/MLL(+)
555 / 155
Total
(Common) Signature
1753
710
601 / 379
2132
980
379
622
Translocation specific
Phenotype specific
SAM results for paired comparisons between considered subgroups. Translocationspecific signature was obtained by matching deregulated probe sets from L1 and L3
comparisons, phenotype-specific signature from L2 and L4 comparisons.
University of Padova
[Andrea Zangrando]
Data analysis
Genotyping
• One sample
• Population studies
Typical Workflow
Quality analysis
Normalization
CNP
Copy Number/LOH
SNP Genotyping
Interpretation
[Lara Nonell]
[Jaakko Hollmen]
Profiles of DNA copy number amplification
[Jaakko Hollmen]
SODEGIR: single sample analysis
defines regions with concomitant alterations of gene CN and GE in single
samples (SODEGIR)
Status
q-value
Score
SODEGIR deleted
CN loss
=0
≤quantile(d_CNgj,0.1)
GE down
≤0.05
≤quantile(d_GEgj,0.1)
SODEGIR amplified
CN gain
=0
≥ quantile(d_CNgj,0.9)
GE up
≤0.05
≥ quantile(d_GEgj,0.9)
[Silvio Bicciato]
GeneAnnot custom-CDFs
www-
[Silvio Bicciato]
Concern on reproducibility of scientific results and the need for replication
Repeatability: NG study
 Reproducibility of scientific results and the need for
replication: on a leading journal, a multi-institutional
study , papers about gene expression profiling:
 Inability to reproduce the analysis > 50%
 Partial reproduction in 1/3
 Perfect reproduction in 11%
1. Editorial: “four teams of analysts treated the findings of a number of microarray
papers published in the journal in 2005–2006 as their gold standard and
attempted to replicate a sample of the analyses conducted on each of them, with
frankly dismal results.”, Nature Genetics, Feb 2009
[Cesare Furlanello]
List Stability Indicator
The List Stability Indicator [Jurman, 2008]
Based on the algebraic theory of metrics on symmetric groups
A list can be represented as an element of the permutation group Sp
key concept: Canberra distance between two ranked lists (of equal length)
Given a set of ranked gene lists (ranking given by the classifier),
the indicator is defined as the mean of all the pairwise distances
• Hoeffding thm: the distances are (asymptotically) normally distributed
FOR THIS STUDY: study variability of endpoints, effect of swapping
theory extended to manage gene lists of different length
•
•
•
•
Canberra Distance: two views
 Complete
Canberra distance is measured over the set of features given for the
endpoint (all probes in the platform)
 Core
Consider only distances between features in candidate gene lists
[Cesare Furlanello]
20
The FDA MAQC-II project
Reaching consensus on the “best practices” (Data Analysis Protocol,
DAP) in developing and validating microarray-based predictive models
(classifiers) for clinical and preclinical applications. Reliable and robust
predictive models are essential to realize the promises of personalized
medicine. Recommendations on the development and validation of classifiers
are put forward through the MAQC-II.
Synergy with the FDA Voluntary eXploratory Data Submission (VXDS)
program: Regulatory review of microarray pharmacogenomic data to develop
for a biomarker qualification process (guidance for industry)
• September 2008: 60 organizations, 36 data analysis teams,
18200 models generated on 13 endpoints
(1) Understand the behavior of various prediction rules and gene selection methods
that may be applied to microarray data sets to generate predictors of clinical
outcomes;
(2) Identify and characterize sources of variability in multi-gene prediction
Participant organizations: government agencies, manufacturers of microarray
platforms, microarray service providers, academic laboratories
[Cesare Furlanello]
WG5 - topics
• Quality
– experimental design
– data (batch effects!)
– analysis, prediction
• Data analysis plans
=> reproducibility of data analysis
=> Best practice in genomic and
epigenomic data analysis
• Stability of gene signatures
WG5 - topics
• Data integration in translational research: =>
need for data analysis platforms,
especially CN + GEP
• (Semi-)Automated gene signature analysis
• Multiclass predictive models
• Mapping of microarray data
• Informatics for next generation sequencing
• Training in bioinformatics
WG5: e-Technologies
Martin Dugas, Jaakko Hollmen
EuGESMA
European Genomics and Epigenomics Study
on MDS and AML