Some issues in microarray experimental design

Download Report

Transcript Some issues in microarray experimental design

Some views on microarray
experimental design
Rainer Breitling
Molecular Plant Science Group &
Bioinformatics Research Centre
University of Glasgow, Scotland, UK
Personal Background
• University of Glasgow, Scotland, UK
• Molecular Plant Sciences Group
• Bioinformatics Research Centre
• Functional Genomics Facility
Some common questions in
microarray experimental design
•
•
•
•
How many arrays will I need?
Should I pool my samples?
Which arrays should I choose?
Which samples should I put together on
one array?
Why are microarrays special?
• produce large amounts of data
instantaneously
• can look for unexpected effects
• are still quite expensive
almost never repeated
careful design necessary before you start
How many replicates?
• as many as possible
Statistics says: The more replicates, the
better your estimate of expression (that’s
an asymptotic process, so if you add at
least a few replicates, the effect will be
really strong)
How many replicates?
n
4( z1 / 2  z1 )
( /  )
2
2
•α significance level (probability of detecting FP)
•1-β power to detect differences (probability of detecting TP)
•σ standard deviation of the log-ratios
•δ detectable difference between class mean log-ratios
•z percentile of standard normal distribution
 n required number of arrays (reference design)
How many replicates?
• Five
Experience shows: For most common
experiments you get a reasonable list of
differentially expressed genes with 5
replicates
How many replicates?
• Three
One to convince yourself, one to convince
your boss, one just in case...
How many replicates?
• It depends on
– the quality of the sample
– the magnitude of the expected effect
– the experimental design
– the method of analysis
The quality of the sample
• smaller samples (single cells) are more
noisy than large samples (tissue
homogenates)
• cell cultures are less noisy than patient
biopsies
• sample pooling can decrease noise – if
individual variation is not of interest
The magnitude of the effect
• Microarrays are very sensitive
• To keep effects small:
– use early time points, gentle stimuli
– never compare dogs and donuts
• if you get a list of 2000 genes that are
significantly changed, your experiment
failed!
The magnitude of the effect
• some problematic cases
– stably transfected cell lines (are they still the
same cells?)
– knock-out organisms (even the same tissue
can be a different)
– local changes may be diluted  cell
isolation will increase noise
The experimental design
• Three major options:
– reference design (flexible)
– balanced block design (efficient)
– loop design (elegant)
The experimental design
• loop designs can save samples...
A
B
C
D
R
R
R
R
A
B
D
C
• ...but they can cause interpretation
nightmares in less simple cases (use for
large studies, if you have a full-time
statistician in the team)
The method of analysis
• Golub et al. (1999) data
set
• 38 leukemia patient bone
marrow samples,
hybridized individually to
Affymetrix microarrays
• Differential expression
between two leukemia
types was examined,
using random subsets of
the complete dataset
The method of analysis
0h
9.5h
iterative
GroupAnalysis
(iGA)
11.5h
13.5h
15.5h
18.5h
20.5h
6144 - purine base
metabolism
6099 - tricarboxylic
acid cycle
6099 - tricarboxylic
acid cycle
3773 - heat shock
protein activity
6099 - tricarboxylic
acid cycle
9277 - cell wall
(sensu Fungi)
3773 - heat shock
protein activity
5749 - respiratory
chain complex II
(sensu Eukarya)
6099 - tricarboxylic
acid cycle
3773 - heat shock
protein activity
297 - spermine
transporter activity
6950 - response to
stress
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
5977 - glycogen
metabolism
5749 - respiratory
chain complex II
(sensu Eukarya)
15846 - polyamine
transport
297 - spermine
transporter activity
8177 - succinate
dehydrogenase
(ubiquinone) activity
6950 - response to
stress
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
4373 - glycogen
(starch) synthase
activity
3773 - heat shock
protein activity
4373 - glycogen
(starch) synthase
activity
8177 - succinate
dehydrogenase
(ubiquinone) activity
15846 - polyamine
transport
4373 - glycogen
(starch) synthase
activity
4129 - cytochrome
c oxidase activity
6537 - glutamate
biosynthesis
5353 - fructose
transporter activity
7039 - vacuolar
protein catabolism
5751 - respiratory
chain complex IV
(sensu Eukarya)
6097 - glyoxylate
cycle
15578 - mannose
transporter activity
6950 - response to
stress
5749 - respiratory
chain complex II
(sensu Eukarya)
5750 - respiratory
chain complex III
(sensu Eukarya)
7039 - vacuolar
protein catabolism
4129 - cytochrome
c oxidase activity
6121 - oxidative
phosphorylation,
succinate to
ubiquinone
9060 - aerobic
respiration
8645 - hexose
transport
5751 - respiratory
chain complex IV
(sensu Eukarya)
8177 - succinate
dehydrogenase
(ubiquinone) activity
4129 - cytochrome
c oxidase activity
respiratory chain
complex II
glyoxylate
cycle
citrate (TCA) cycle
oxidative phosphorylation
Graph-based iterative
GroupAnalysis (GiGA)
respiratory chain
complex III
(complex V)
What is a good replicate?
The experiment your competitor at the other
side of the globe would do to see if your
results are reproducible
Vary “all” parameters – challenge your
results
Prepare new samples, from new cultures,
using new buffers and new graduate
students
Remember to produce matched controls
What is a “bad” replicate?
• technical replicates (i.e. hybridizing the
same sample repeatedly)
• dye-swapping experiments (usually genespecific dye bias is not a big issue, and
dye balancing is more efficient anyway)
• pooled samples, hybridized repeatedly
• the same preparation, only labelled twice
Should samples be pooled?
• most samples are already pooled – they
come from multiple cells
• pool to increase amount of mRNA, but
only as much as necessary
• prepare independent pools to assess
variation
• problems: bias, “contamination”, outliers,
information loss...
Which arrays are the best?
• Standard arrays
compare and exchange data easily
• Whole-genome arrays
detect unexpected effects, increase confidence
• Single-color arrays (Affymetrix GeneChip)
for more complex comparisons
• Annotated arrays
Further reading
• Dobbin, Shih & Simon (2003) J. Natl.
Cancer Inst. 95: 1362.
• Yang & Speed (2002) Nature Rev. Genet.
3: 579.
• Breitling (2004)
http://www.brc.dcs.gla.ac.uk/~rb106x/microarray_tips.htm
Contact
Rainer Breitling
Bioinformatics Research Centre
Davidson Building A416
[email protected]
http://www.brc.dcs.gla.ac.uk/~rb106x