Some aspects of the design and analysis of gene expression

Download Report

Transcript Some aspects of the design and analysis of gene expression

Some thoughts of the design of
cDNA microarray experiments
Terry Speed & Yee HwaYang,
Department of Statistics
UC Berkeley
MGED IV Boston, February 14, 2002
Some aspects of design
Layout of the array
– Which cDNA sequence to print?
• Library
• Controls
– Spatial position
Allocation of samples to the slides
– Different design layout
•
•
•
•
A vs B : Treatment vs control
Multiple treatments
Factorial
Time series
– Other considerations
• Replication
• Physical limitations: the number of slides and the amount of material
• Extensibility - linking
Some issues to consider before designing
cDNA microarray experiments
Scientific
Aims of the experiment
Specific questions and priorities between them.
How will the experiments answer the questions posed?
Practical (Logistic)
Types of mRNA samples: reference, control, treatment, mutant, etc
Amount of material. Count the amount of mRNA involved in one channel
of hybridization as one unit.
The number of slides available for the experiment.
Other Information
The experimental process prior to hybridization: sample isolation, mRNA
extraction, amplification, labelling,…
Controls planned: positive, negative, ratio, etc.
Verification method: Northern, RT-PCR, in situ hybridization, etc.
Natural design choice
T1
T2
T3
C
T4
T1
T2
Tn-1 Tn
Ref
Case 1: Meaningful biological control (C)
Samples: Liver tissue from four mice treated by cholesterol modifying drugs.
Question 1: Genes that respond differently between the T and the C.
Question 2: Genes that responded similarly across two or more treatments
relative to control.
Case 2: Use of universal reference.
Samples: Different tumor samples.
Question: To discover tumor subtypes.
Treatment vs Control
Two samples
e.g. KO vs. WT or mutant vs. WT
Indirect
T
Direct
T
C
average (log (T/C))
2 /2
Ref
C
Ref
log (T / Ref) – log (C / Ref )
22
Caveat
The advantage of direct over indirect comparisons was
first pointed out by Churchill & Kerr, and in general, we
agree with the conclusion. However, you can see in the
last M vs A plot that the difference is not a factor of 2, as
theory predicts. Why?
A likely explanation is that the assumption that log(T/Ref)
and log(C/Ref) are uncorrelated is not valid, and so the
gains are less than predicted. The reason for the
correlation is less obvious, but there are a number of
possibilities.
One is that we use mRNA from the same extraction;
another is that we didn‘t dye-swap with the two indirect
comparisons, but did when we replicated the direct
comparison. The answer is not yet clear.
Labeling
• 3 sets of self – self hybridization: (cerebellum vs cerebellum)
• Data 1 and Data 2 were labeled together and hybridized on two
slides separately.
• Data 3 were labeled separately.
Data 1
Data 1
Extraction
• Olfactory bulb experiment:
• 3 sets of Anterior vs Dorsal performed on different days
• #10 and #12 were from the same RNA isolation and
amplification
• #12 and #18 were from different dissections and amplifications
• All 3 data sets were labeled separately before hybridization
One-way layout: one factor, k levels
I) Common
Reference
A
B
II) Common
reference
C
A
B
III) Direct
comparison
C
ref
ref
A=B=C=1
A=B=C=2
A
C
B
Number of Slides
Ave. variance
Units of
material
Ave. variance
A=B=C=2
One-way layout: one factor, k levels
I) Common
Reference
A
B
II) Common
reference
C
A
B
ref
ref
Number of
Slides
N=3
N=6
Ave. variance
2
Units of material
A=B=C=1
Ave. variance
III) Direct
comparison
C
A
C
B
N=3
0.67
A=B=C=2
A=B=C=2
1
0.67
For k = 3, efficiency ratio (Design I / Design III) = 3. In general,
efficiency ratio = 2k / (k-1). However, remember the assumption!
Illustration from one experiment
Design I
A
B
C
Ref
Design III
A
C
Box plots of log ratios: we are still ahead!
Factorial experiments
•Treated cell lines
CTL
OSM
EGF
OSM &
EGF
•Possible
experiments
Here we are interested not in genes for which there is an O
or an E effect, but in which there is an OE interaction, i.e.
in genes for which log(O&E/O)-log(E/C) is large or small.
Other examples of factorial experiments
Suppose we have tumor T and standard cells S from the same
tissue, and are interested in the impact of radiation R on gene
expression. In general, genes for which log(RT/T) and log(RS/S)
are large or small, will be less interesting to us than those for
which log(RT/T) - log(RS/S) are large or small, i.e. those with
large interactions.
Next, suppose that our interest is in comparing gene expression
in two mutants , say M and M’, at two developmental stages, E
and P say. Then we are probably more interested in those
genes for which the temporal pattern in the two mutants differ,
than in the patterns themselves, i.e. interest focusses on genes
for which log(ME/MP)-log(M’E/M’P) is large or small, again the
ones with large interactions.
2 x 2 factorial: some design options
Indirect
A balance of direct and indirect
I)
II)
A
B
C
III)
A.B C
A
IV)
C
A
A.B B
B
# Slides
C
A
A.B B
A.B
N=6
Main
effect A
0.5
0.67
0.5
NA
Main
effect B
0.5
0.43
0.5
0.3
Interacti
on A.B
1.5
0.67
1
0.67
Table entry: variance (assuming all log ratios uncorrelated)
t vs t+1
Design choices in time
series. Entry: variance
N=3
A) T1 as common reference
T1
T2
T3
N=4
T2
C) Common reference
T1
T2
t vs
t+3
Ave
T1T2
T2T3
T3T4
T1T3
T2T4
T1T4
1
2
2
1
2
1
1.5
1
1
1
2
2
3
1.67
2
2
2
2
2
2
2
.67
.67
T4
B) Direct Hybridization
T1
t vs t+2
T3
T4
T3
T4
Ref
D) T1 as common ref + more
T1
T2
T3
T2
T3
T2
T3
1.06
.75
.75
.75
1
1
.75
.83
1
.75
1
.75
.75
.75
.83
T4
F) Direct Hybridization choice 2
T1
1.67 1
T4
E) Direct hybridization choice 1
T1
1.67 .67
T4
An recently designed factorial experiment
M1.WT.P1
M1.WT.P11
M1.WT.P21
M1.MT.P11
M1.MT.P21
Mutant 1 (M1)
M1.MT.P1
M2.WT P1
M2.WT.P11
M2.WT.P21
M2.MT.P1
M2.MT.P11
M2.MT.P21
Mutant 2 (M2)
Question: Seek genes that are changing over time and are different
in MT vs WT.
Analysis: Looking at the interaction effect between time and type.
Summary
The balance of direct and indirect comparisons
in a given context should be determined by
optimizing the precision of the estimates among
comparisons of interest, subject to the scientific
and physical constraints of the experiment.
Acknowledgments
Jean Yee Hwa Yang
Sandrine Dudoit
John Ngai’s Lab (Berkeley)
Jonathan Scolnick
Cynthia Duggan
Vivian Peng
Moriah Szpara
Percy Luu
Elva Diaz
Gary Glonek (Adelaide)
Dave Lin (Cornell)
Ingrid Lönnstedt (Uppsala)
Some web sites:
Technical reports, talks, software etc.
http://www.stat.berkeley.edu/users/terry/zarray/Html/
Statistical software R (“GNU’s S”)
http://www.R-project.org/
Packages within R environment:
-- SMA (statistics for microarray analysis)
http://www.stat.berkeley.edu/users/terry/zarray/Software/smacode.html
--Spot http://www.cmis.csiro.au/iap/spot.htm