Minimum Information about a Microarray
Download
Report
Transcript Minimum Information about a Microarray
European Bioinformatics Institute
Minimum Information About a
Microarray Experiment
- MIAME
Alvis Brazma
European Bioinformatics Institute
European Molecular Biology Laboratory
European Bioinformatics Institute
What is MIAME?
A document, the goal of which is to specify the
minimum information that must be reported
about a microarray experiment in order to ensure
its interpretability, as well as potential
verification of the results
Underlying motivation –
–
–
to enable the establishment of public repositories for
microarray data
to serve as a basis for designing a microarray data
exchange format
European Bioinformatics Institute
Acknowledgements
MIAME working group
MAML working group
MGED steering committee
John Aach, Wilhelm Ansorge, Pascal
Hingamp, Frank Holstege, Alex Lash, John
Quackenbush, Alan Robinson, Paul
Spellman, Criss Stoeckert, Martin Vingron
European Bioinformatics Institute
MIAME history
A need to establish a public repository or
repositories for microarray gene expression data
became apparent in 1998
That requires data standards
MGED 1 meeting in Cambridge in November,
1999 establishes five working groups, including
the microarray data annotation group (MIAME)
Several MIAME drafts produced by the group
MGED steering committee meeting in November
2000 in Bethesda endorses a MIAME draft
Last revision yesterday in MIAME working
group meeting
European Bioinformatics Institute
Outline of this talk
Considerations behind the MIAME design
– why
The MIAME details – what
Future developments and use of MIAME –
how
European Bioinformatics Institute
How to think about MIAME
What minimum information about a microarray
gene expression measuring experiment should be
recorded in a database for the database entries to
be usable on stand-alone basis:
–
–
–
the users may not know any background
information that is not recorded
the database should be usable for automated data
analysis and mining, i.e. not only on record-byrecord basis
the data may be coming from different laboratories
and different technology platforms
Samples
Genes
European Bioinformatics Institute
Gene expression database – a
conceptual view:
Gene
annotations
Sample
annotations
Gene expression
matrix
Gene expression
levels
European Bioinformatics Institute
Three parts of a gene expression
database
Gene annotation – might be given by links to
gene sequence databases and GO – not perfect
state of art, but lets not worry about it
Sample annotation – we do not have any
external databases for sample description (except
species taxonomy) – problem 1
Gene expression matrix – what are the
measurement units for gene expression levels? –
problem 2
European Bioinformatics Institute
Problem/consideration 1 –
sample annotation
Gene expression data have any meaning only in
the context of detailed description of the sample
If the data is going to be interpreted by
independent parties, the information about the
sample has to be in the database
Controlled vocabularies and ontologies (species,
cell types, compound nomenclature, treatments,
etc) are needed for unambiguous sample
description, if it has to be queried
European Bioinformatics Institute
Sample annotation – what can
be done
Some use of free text descriptions are
unavoidable
Controlled vocabularies and ontologies
should be used wherever available
Externally defined controlled vocabularies
and ontologies should be used whenever
they exist
European Bioinformatics Institute
Problem/consideration 2 – the lack of
gene expression measurement units
What we would like to have
–
–
gene expression levels expressed in some
standard units (e.g. molecules per cell)
reliability measure associated with each
value (e.g. standard deviation)
What we do have
–
–
each experiment using different units
no reliability information
European Bioinformatics Institute
Comparing expression data
cm
inc
European Bioinformatics Institute
Comparing expression data
?
?
European Bioinformatics Institute
Comparing expression data
Intermediate data
Array scans
Final data
Images
Samples
Genes
Raw data
Spots
European Bioinformatics Institute
From microarray images to gene
expression data
Spot/Image
quantiations
Gene
expression
levels
European Bioinformatics Institute
What to do in the absence of
standard measurement units?
Record raw, intermediate and final analysis
data together with the detailed annotation
how the analysis has been performed
This effectively passes on the
responsibility about interpreting the final
analysis data to the user
European Bioinformatics Institute
Measurement units
In perspective:
–
–
standard controls for experiments (on chips
and in the samples) should be introduced
replicate measurements will become a norm
Temporary solution:
–
storing intermediate analysis results
(including the images) and annotations of
how they were obtained - i.e., the evidence
European Bioinformatics Institute
Problem/consideration - 3
We need to find a compromise found between the
burden on the data producers to annotate and
provide the data and the need of data to be
sufficiently annotated for the database users
–
–
Too much detail may turn away the potential data
providers and complicate the data submission and
storage
Too little detail may limit the usability of the data
The current draft is a compromise between these
two
European Bioinformatics Institute
Some more general principles
MIAME is aimed at a cooperative data provider,
not as a legal document designed to close all
loop-holes
MIAME is an informal specification
The concept of ‘qualifier, value, source’ triplets,
e.g.,
–
–
–
qualifier – cell type
value – epithelial
source – Human Anatomy (author, edition)
The concept of ‘experimental protocol’
European Bioinformatics Institute
General principles - continued
MIAME is not designed as a ‘questionnaire’ that
can be filled in, but only as an informal
specification based on which such a
questionnaire, in fact, an annotation tool, can be
based
Although MIAME is conceptually independent
on databases, the aim of establishing a
microarray database should be kept in mind then
reading MAIME
European Bioinformatics Institute
Outline of this talk
Considerations behind the MIAME design
– why
The MIAME details – what
European Bioinformatics Institute
A microarray experiment
Publication
(e.g. , PubMedCentral)
External links
ArrayExpress
Source
(e.g., Taxonomy)
Sample
Experiment
Hybridisation
Normalisation
Array
Analysis
Annotation of an experiment - a major challenge
Gene
(e.g., EMBL)
European Bioinformatics Institute
MIAME six parts:
1. Experimental design: the set of the
hybridisation experiments as a whole
2. Array design: each array used and each element
(spot) on the array
3. Samples: samples used, the extract preparation
and labeling
4. Hybridizations: procedures and parameters
5. Measurements: images, quantitation,
specifications
6. Controls: types, values, specifications
www.mged.org
European Bioinformatics Institute
MIAME six parts:
1. Experimental design: the set of the
hybridisation experiments as a whole
European Bioinformatics Institute
Part 1 - Experimental design: the set of
the hybridisation experiments as a whole
Normally ‘an experiment’ should consist of one
or more hybridisations that are in some way
related and performed in a limited number of
time, e.g. all related to the same publication
–
–
–
–
–
Author, contact information, citations
Type of experiment (e.g., time course, normal vs
diseased comparison)
Experimental factors – i.e. tested parameters in the
experiment (e.g. time, dose, genetic variation,
response to a compound)
List of organisms used in the experiment
List of platforms used
European Bioinformatics Institute
Experimental design - continued
List of samples, arrays and hybridisations and
their relationships, e.g.:
–
–
–
Samples:
S1, S2, S3
Arrays:
A1, A2, A3
Hybridisations:
•
•
•
H1 is S1 and S2 on A1
H2 is S2 and S3 on A2
H3 is S1 and S2 on A3
Which hybridisations are replicates,
–
e.g. H1 and H3 are replicates
European Bioinformatics Institute
Experimental design – continued 2
Quality related indicators
Optional user defined ‘qualifier, value,
source’ triplet – e.g.:
–
–
–
qualifier – survival data
value – given
source – user defined
Description of the experiment or link to a
publication
European Bioinformatics Institute
MIAME six parts:
1. Experimental design: the set of the
hybridisation experiments as a whole
2. Array design: each array used and each element
(spot) on the array
European Bioinformatics Institute
Part 2 - Array design: each array used
and each element (spot) on the array
This part is separate for each type of array
used in the experiment
For the database, the array description
should be normally submitted only once
For each physical array used in the
experiment a unique ID and the array type
are given
European Bioinformatics Institute
Array design – continued
Array design related information (e.g. platform
type – insitu synthesized or spotted, array
provider, surface type – glass, membrane, other,
etc)
Properties of each type of elements on the array,
that are generated by similar protocols (e.g.
synthesized oligos, PCR products, plasmids,
colonies, others) – may be simple or composite
(Affymetrix)
Each element (spot) on the array
European Bioinformatics Institute
Array design – continued
Each element (spot) on the array
–
–
–
–
–
Elements may be simple or composite
Each element must be identified by either
the sequence, clone ID, PCR primer pair,
or in any other unambiguous way
Composite elements may be identified by a
reference sequence
May be linked to genes (preferably)
Will normally be provided in a separate file
(e.g. spreadsheet)
European Bioinformatics Institute
MIAME six parts:
1. Experimental design: the set of the
hybridisation experiments as a whole
2. Array design: each array used and each element
(spot) on the array
3. Samples: samples used, the extract preparation
and labeling
European Bioinformatics Institute
Part 3 - Samples: samples used,
the extract preparation and labeling
Sample source and treatment
–
–
Organism (NCBI taxonomy)
Additional ‘qualifier, value, source’ list
•
•
•
•
•
•
•
cell source and type
developmental sage
organism part (tissue)
animal/plant strain or line
genetic variation
disease state or normal
…
Typically only some of these qualifiers are relevant
– an ontology tree is needed to implement the
annotation tool for sample source and treatment
European Bioinformatics Institute
Sample - continued
Hybridisation extract preparation
–
Laboratory protocol, including extraction
method, whether RNA, mRNA, or genomic
DNA is extracted, amplification method
Labelling
–
Laboratory protocol, including amount of
nucleic acids labelled, label used (e.g. Cy3,
Cy5, 33P, etc)
European Bioinformatics Institute
A microarray experiment
Publication
(e.g. , PubMedCentral)
External links
ArrayExpress
Source
(e.g., Taxonomy)
Sample
Experiment
Hybridisation
Normalisation
Array
Analysis
Annotation of an experiment - a major challenge
Gene
(e.g., EMBL)
European Bioinformatics Institute
MIAME six parts:
1. Experimental design: the set of the
hybridisation experiments as a whole
2. Array design: each array used and each element
(spot) on the array
3. Samples: samples used, the extract preparation
and labeling
4. Hybridizations: procedures and parameters
5. Measurements: images, quantitation,
specifications
6. Controls: types, values, specifications
European Bioinformatics Institute
Part 4 - Hybridizations:
procedures and parameters
Laboratory protocol including
–
–
–
–
–
–
–
The solution (e.g. concentration of solutes)
Blocking agent
Wash procedure
Quantity of labelled target used
Time, concentration, volume, temperature
Description of the hybridisation instruments
Optional additional ‘qualifier, value, source’ list
European Bioinformatics Institute
MIAME six parts:
1. Experimental design: the set of the
hybridisation experiments as a whole
2. Array design: each array used and each element
(spot) on the array
3. Samples: samples used, the extract preparation
and labeling
4. Hybridizations: procedures and parameters
5. Measurements: images, quantitation,
specifications
Intermediate data
Array scans
Final data
Images
Samples
Genes
Raw data
Spots
European Bioinformatics Institute
Raw, intermediate and final data
Spot/Image
quantiations
Gene
expression
levels
European Bioinformatics Institute
Part 5 - Measurements: images,
quantitation, specifications
Hybridisation scan raw data – image
Intermediate data – image analysis and
quantiation
Final data – summarised information from
possible replicates
European Bioinformatics Institute
From microarray images to gene
expression data
Raw data
Array scans
European Bioinformatics Institute
Measurements continued
Image data
–
–
The scanner image file (e.g. TIFF, DAT)
Scanning information
•
•
Scan parameters, including laser power,
spatial resolution, pixel space, PMT voltage
Laboratory protocol for scanning, including
scanning hardware and software used
Raw data
Intermediate data
Array scans
Images
Spots
European Bioinformatics Institute
From microarray images to gene
expression data
Spot/Image
quantiations
European Bioinformatics Institute
Measurements continued
Image analysis and quantitation
–
–
Complete image analysis output (of the
particular image analysis software) for each
element – normally given as separate file
(e.g. spreadsheet)
Image analysis information
•
•
Image analysis software specification
All parameters
Intermediate data
Array scans
Final data
Images
Samples
Genes
Row data
Spots
European Bioinformatics Institute
From microarray images to gene
expression data
Spot/Image
quantiations
Gene
expression
levels
European Bioinformatics Institute
Measurements continued
Summarised information from possible
replicates
Derived measurement values summarising
related elements as used by the author
– Reliability information for these values, as
used by the author (may be ‘unknown’)
(these will be typically given in a spreadsheet)
– Specifications of these two (e.g., median
value of the replicates, standard deviation)
–
European Bioinformatics Institute
MIAME six parts:
1. Experimental design: the set of the
hybridisation experiments as a whole
2. Array design: each array used and each element
(spot) on the array
3. Samples: samples used, the extract preparation
and labeling
4. Hybridizations: procedures and parameters
5. Measurements: images, quantitation,
specifications
6. Controls: types, values, specifications
European Bioinformatics Institute
Part 6 - Controls: types, values,
specifications
Normalisation strategy (spiking,
housekeeping genes, total array, other)
Normalisation algorithm
Control array elements
Hybridisation extract preparation
European Bioinformatics Institute
Outline of this talk
Considerations behind the MIAME design
– why
The MIAME details – what
Future developments and use of MIAME –
why
European Bioinformatics Institute
How to use MIAME
Data exchange format (MAML) allowing
to communicate MIAME information
Establishing MIAME compliant databases
(e.g. ArrayExpress)
Developing annotation tools for generating
MIAME compliant information
Journals and public funding agencies may
establish MIAME related policies
European Bioinformatics Institute
Some questions
Is the current MIAME draft
–
–
sufficiently detailed? (if not what are the
queries it should support?)
or is it already excessive for data submitters?
Images – if they are required, what is the
mechanism for their communication?
–
–
–
storage in the public repositories?
responsibility of the data submitters?
may be in the future they wont be
necessary?
European Bioinformatics Institute
What is MIAME?
A document, the goal of which is to specify the
minimum information that must be reported
about a microarray experiment in order to ensure
its interpretability, as well as potential
verification of the results
Underlying motivation –
–
–
to enable the establishment of public repositories for
microarray data
to serve as a basis for designing a microarray data
exchange format
European Bioinformatics Institute
A proposal for MIAME future work
To prepare the current MIAME draft for publishing
To write up and post the MIAME glossary
To collaborate with the data normalisation group for
finalizing the MIAME part 6 and develop quality control
section
To collaborate with the Ontology group in developing the
sample specification
To collaborate with MAML group in the data exchange
format
To guide the development of MIAME compliant data
annotation tools
To update MIAME as the technology and our
understanding of it develops
To formulate a superset of recommended additional
information
European Bioinformatics Institute
A proposal to MGED 3
To endorse the current MIAME draft
To recommend journals and funding
agencies to adopt policies promoting
publishing MIAME compliant information
To facilitate the adoption of MIAME by
the array technology providers and
software companies
European Bioinformatics Institute
Outstanding questions
Does a natural, standard gene expression
level measurement unit (units) exist?
If yes
–
–
What is it (or they)?
Is it (are they) achievable by microarrays?
If not
–
How much we will be abele to get out of
gene expression databases?