Design of Experiments - CSC's mainpage — CSC

Download Report

Transcript Design of Experiments - CSC's mainpage — CSC

Panu Somervuo, March 20, 2007
Design of Experiments
• Problem formulation
• Setting up the experiment
• Analysis of data
Problem formulation
•
•
•
•
•
what is the biological question?
how to answer that?
what is already known?
what information is missing?
problem formulation  model of the biological system
Setting up an experiment
•
•
•
•
•
•
what kind of data is needed to answer the question?
how to collect the data?
how much data is needed?
biological and technical replicates
pooling
how to carry out the experiment (sample preparation,
measurements)?
Control
Test
Analysis of data
•
•
•
•
•
•
preprocessing
filtering & outlier removal
normalization
statistical model fitting
hypothesis testing
reporting the results, documentation
Everything depends on everything
problem formulation
model of the system
setting up the experiment
number of samples
analysis of data
statistical tests
Practical guidelines
• blocking unwanted effects (e.g. dye effect)
group1
group1
group2
cy3
cy5
cy3
cy5
group2
cy3
cy5
• randomization (avoid systematic bias by randomizing
e.g. the order of sample preparations)
• replication (replicate measurements can be averaged to
reduce the effect of random errors)
Control
Test
y = µ+F1+F2+...+error
Pairwise sample comparison vs
modeling
• pairwise sample comparison is easy and straightforward
Control
Test
• instead of comparing samples as such, we can construct
a model for the measurements and then perform
comparisons
Mathematical model of data
• try to capture the essence of a (biological) phenomenon in
mathematical terms
• here we concentrate on linear models: observation consists of
effects of one or more factors and random error
• factor may have several levels (e.g. factor sex has two levels,
male and female)
Examples of models
• single factor:
y = µ + gene + error
• two factors:
y = µ + treatment + gene + error
• two factors including interaction term:
y = µ + treatment + gene + treatment.gene + error
• four factors:
y = µ + treatment + gene + dye + array + error
From model to experimental design
y = µ + drug + sex + drug.sex + error
factor 1, drug: 3 levels
factor 2, sex: 2 levels
3x2 factorial design:
M
F
no treatment
y111, y112,
y113, y114
y121, y122,
y123, y124
treatment A
y211, y212,
y213, y214
y221, y222,
y223, y224
treatment B
y311, y312,
y313, y314
y321, y322,
y323, y324
Analysis of variance
• ANOVA can be used to analyse factorial designs
y = µ + drug + sex + drug.sex + error
M
F
1.0, 1.1,
0.9, 1.3
0.7, 0.5,
0.6, 0.8
treatment A
1.1, 1.2,
0.8, 1.3
0.7, 0.8,
0.6, 0.9
treatment B
2.1, 1.9,
1.7, 2.0
1.5, 1.3,
1.4, 1.1
no treatment
summary(aov(y~drug*sex,data=data))
Df
2
1
2
18
drug
sex
drug:sex
Residuals
--Signif. codes:
0.1 ` ' 1
Sum Sq
2.86750
1.26042
0.06583
0.50250
Mean Sq F value
Pr(>F)
1.43375 51.3582 3.644e-08 ***
1.26042 45.1493 2.673e-06 ***
0.03292 1.1791
0.3302
0.02792
0 `***' 0.001 `**' 0.01 `*' 0.05 `.'
Multiple pairwise comparisons
• ANOVA tells that at least one drug treatment has effect,
but in order to find which one we perform all pairwise
comparisons:
M
F
no treatment
1.0, 1.1,
0.9, 1.3
0.7, 0.5,
0.6, 0.8
treatment A
1.1, 1.2,
0.8, 1.3
0.7, 0.8,
0.6, 0.9
treatment B
2.1, 1.9,
1.7, 2.0
1.5, 1.3,
1.4, 1.1
TukeyHSD(aov(y~drug*sex,data=data,"drug")
Tukey multiple comparisons of means
95% family-wise confidence level
factor levels have been ordered
Fit: aov(formula = y ~ drug * sex, data = data)
$drug
diff
lwr
upr
A-0 0.0625 -0.1507113 0.2757113
B-0 0.7625 0.5492887 0.9757113
B-A 0.7000 0.4867887 0.9132113
Benefits of (good) models
• after fitting the model with data, model can be used to answer the
questions e.g.:
– is there dye effect?
– is the difference of gene expression levels in two conditions
statistically significant?
– is there interaction between gene and another factor?
• simple pairwise sample comparisons cannot give answers to all of
these questions simultaneously
Control
Test
y=µ+F1+F2+...+error
What is a good model?
• good model allows us to get more detailed results
• best model and parametrization is application specific
• simple vs complex model
y=µ+F1+F2+F3+...+error
• there should be balance between model complexity and
the amount of data
dye1
dye2
control
y111, y112,
y113
y121, y122,
y123
treatment A
y211, y212,
y213
y221, y222,
y223
treatment B
y311, y312,
y313
y321, y322,
y323
How the number of samples affects
the confidence of our results?
• measurement error is always present, see the example self-self
hybridization:
How the number of samples affects
the confidence of our results?
•
•
•
•
let’s compute the mean average of expression level of a gene
how accurate is this value?
variance(mean) = variance(error)/number of samples
samples from normal distribution (mean 0, sd 1):
Theoretical sample size
calculations
• for each statistical test, there is a (test-specific) relation
between:
– power of a test: 1 – probability(type I error)
– significance level: probability(type II error)
– error variance
– mean difference needed to be detected
– number of samples
actual situation
drug has effect
actual situation
drug has no effect
our conclusion
drug has effect
correct conlusion
true positive
probability 1-b
type I error
false positive
probability a
our conclusion
drug has no effect
type II error
false negative
probability b
correct conclusion
true negative
probability 1-a
How many samples are needed to detect
sample mean difference of 1 unit ?
R function power.t.test:
> power.t.test(delta=1,power=0.95,sd=1,sig.level=0.05)
Two-sample t test power calculation
n
delta
sd
sig.level
power
alternative
=
=
=
=
=
=
26.98922
1
1
0.05
0.95
two.sided
NOTE: n is number in *each* group
What is the power of test when using 10
samples ?
R function power.t.test:
> power.t.test(n=10,delta=1,sd=1,sig.level=0.05)
Two-sample t test power calculation
n
delta
sd
sig.level
power
alternative
=
=
=
=
=
=
10
1
1
0.05
0.5619846
two.sided
NOTE: n is number in *each* group
How small difference between sample means
we are able to detect using 10 samples ?
R function power.t.test:
> power.t.test(n=10,power=0.95,sd=1,sig.level=0.05)
Two-sample t test power calculation
n
delta
sd
sig.level
power
alternative
=
=
=
=
=
=
10
1.706224
1
0.05
0.95
two.sided
NOTE: n is number in *each* group
Two kinds of replicates
• biological replicates: biological variability
• technical replicates: measurement
accuracy
• most statistical programs assume
independent samples
A3
A2 A1
B3 B2B1
C3C2C1
D3 D2 D1
Pooling
A1
B1
A1
A2
A2
A3
B2
B1
B2
A3
B3
B3
Pooling
• ok when the interest is not on the individual, but on common
patterns across individuals (population characteristics)
• results in averaging  reduces variability  substantive features
are easier to find
• recommended when fewer than 3 arrays are used in each condition
• beneficial when many subjects are pooled
• one pool vs independent samples in multiple pools
inference for most genes was not affected by pooling
C. Kendziorski, R. A. Irizarry, K.-S. Chen, J. D. Haag, and M. N. Gould,
"On the utility of pooling biological samples in microarray experiments",
PNAS March 2005, 102(12) 4252-4257
How to allocate the samples to
microarrays?
A
B
C
D
•
•
•
•
which samples should be hybridized on the same slide?
different experimental designs
reference design, loop design
what is the optimal design?
Example of four-array experiment
B
cy5
cy3
array
cy3
cy5
log(cy5/cy3)
1
A
B
log(B) – log(A)
2
A
B
log(B) – log(A)
3
B
A
log(A) – log(B)
4
B
A
log(A) – log(B)
1 2 3 4
cy3
cy5
A
Reference design
A
1
2
Ref
B
array
cy3
cy5
log(cy5/cy3)
1
Ref
A
log(A) – log(Ref)
2
Ref
B
log(B) – log(Ref)
3
Ref
C
log(C) – log(Ref)
4
Ref
D
log(D) – log(Ref)
3
4
C
D
log(C/A) = log(C) - log(A)
= log(C) - log(Ref) + log(Ref) - log(A)
= log(C) - log(Ref) – (log(A) - log(Ref))
= logratio(array3) - logratio(array1)
Loop design
A
1
4
B
D
array
cy3
cy5
log(cy5/cy3)
1
A
B
log(B) – log(A)
2
B
C
log(C) – log(B)
3
C
D
log(D) – log(C)
4
D
A
log(A) – log(D)
2
3
C
log(C/A) = log(C) – log(B) + log(B) – log(A)
= logratio(array2) + logratio(array1)
log(C/A) = log(C) – log(D) + log(D) – log(A)
= - logratio(array3) - logratio(array4)
log(C/A)=(logratio1 + logratio2)/2
Comparing the designs
A
Ref
A
B
Ref
B
C
reference design reference design
with replicates
amount of RNA
required per sample
error
B
C
C
number of arrays
A
loop design
3
6
3
1+Ref
2+Ref
2
2.0
1.0
0.67
Design with all direct pairwise
comparisons
2
3
1
4
6
5
Example: examining genotype, phenotype,
and environment
Genotype
Parental - stressed
Derived - stressed
Reference Sample
Parental - unstressed
Environment
Derived - unstressed
Assay Variation
Optimal design
• maximize the accuracy of parameters of interest
• procedure: enumerate all possible designs, calculate
the parameter accuracy for each of them and select
the best design
• optimal design is model specific
About the nature of microarray data
• Microarray data can give hypothesis to be tested further
• Results from microarray analysis should be cerified by
other means (qPCR,...)
• quality of microarray data depends on samples, probes,
hybridization, lab work
• data pre-processing, normalization, and outlier detection
are as important as good experimental design
More about statistics
• M.J. Crawley: ”Statistics – An Introduction using R”, John
Wiley&Sons, 2005
• S.A. Glantz: ”Primer of Biostatistics”, McGraw-Hill, 5th
ed., 2002
• D.C. Montgomery: ”Design and Analysis of
Experiments”, John Wiley&Sons, 5th ed. 2001
• Google