Fei Ye_08.07.2014x - Vanderbilt University School of Medicine

Download Report

Transcript Fei Ye_08.07.2014x - Vanderbilt University School of Medicine

Statistical Considerations in High-Throughput RNAi Screens
for Identifying Genes Mediating Sensitivity to
Chemotherapeutic Drugs
Fei Ye, PhD
Vanderbilt Center for Quantitative Sciences
Department of Biostatistics, Vanderbilt School of Medicine
CQS Summer Institute 2014
Background
• RNA interference (RNAi): a valuable tool for modulating gene
expression through the introduction of short interfering RNAs including
small interfering RNAs (siRNAs) and short hairpin RNAs (shRNAs).
• RNAi has emerged as a powerful technology to knock down specific
genes for functional analysis and for therapeutic purposes, particularly
because we now know much more about specific genes involved in
disease processes.
• RNAi studies conducted with human tumor cell lines using synthetic
siRNAs/shRNAs targeting defined gene families or genomic-wide
libraries have identified modulators of drug sensitivity (hits).
• Large-scale systematic RNAi screens aim to test hundreds, or even
thousands, of siRNAs/ shRNAs to identify hits.
What is RNA interference (RNAi)?
Challenges in analyzing high-throughput
RNAi data
• To identify efficiently and accurately genes that, when lost,
significantly reduce or increase cell viability in response to
chemo treatment.
• Reduce false positives and false negatives
- in the wet lab: (1) technical and procedural improvements;
(2) increase the number of replicate measurements
- in the statistical analysis: (1) control type I error; (2) sufficient
power; (3) combined effects of RNAi and chemotheraputic
drug.
RNAi screening approach identifies genes
that increase sensitivity to Paclitaxel
in breast cancer cells
Real data analysis - background
• Paclitaxel is a potent anti-microtubule agent used in
the treatment of patients with locally advanced and
metastatic breast cancer. Despite its wide use,
paclitaxel-based chemotherapy results in full
response in only a small portion of patients; many
patients have an incomplete response or are
resistant to treatment.
Real data analysis
– material and methods
• Performed a loss-of-function RNAi screen to identify genes
that modulate paclitaxel sensitivity.
• Targeted a subset of genes (n=428) frequently found to be
”deregulated” in breast cancers and
known to be associated with a targeted
pharmacological agent.
• Selected 36 candidate genes by screening, and designed two
independent siRNAs for each of these genes to validate in two
breast cancer cell lines, MDA-MB-231 and MDA-MB-468.
shRNA screen
transfect w/ a subset of
the human genome
pGIPZ shRNAmir plasmid
library (n=1078)
divide
Vehicle
control
(DMSO)
Incubate
72-96 h
~20,000 cells
5 nM
paclitaxel
siRNA screen
reverse- transfect
divide
Breast cancer cell lines
MDA-MB-231
MDA-MB-468
Entirely experiment was performed in triplicate.
Vehicle
control
(DMSO)
5 nM
paclitaxel
Real data analysis
– preprocessing and normalization
• Sources of noise.
• Ideally, mixed-effects models should be used to estimate both fixed effects
(RNAi and treatment) and random effects (batch, plate, day, etc.).
• Baseline correction.
• Within-plate normalization: data from each plate were normalized to a
non-silencing (NS) control, to give a relative measurement of target-gene
knockdown effect and to control for the effects of siRNA transfection.
Other approaches: ‘%control’, ‘normalized %inhibition’, Z score, B score...
• Across-plate normalization: make measurements comparable across
culture plates by removing systematic plate-to-plate variation.
Approaches: median/mean centering, standardization methods, etc.
Robustness? - Well-well variation
replicate plates
controls
shRNA/siRNA
Within-plate normalization?
Replicability? - Plate-plate variation
replicate plates
controls
shRNA/siRNA
Global
normalization?
Reproducibility?
- Experimental variation
*Non-targeting siRNA
control- does not target any
mRNA in genome
%cell growth = siRNA non-targeting
siRNA for gene x
ρ_Spearman = 0.79- 0.89
Sensitivity Index
• To identify genes that when targeted promote paclitaxel sensitivity or
resistance, we calculated a sensitivity index (SI) score for each shRNA. The
SI score accounts for the individual and combined effects of shRNAs and
the effect of drug on cell viability.
Effect of the siRNA or shRNA:
Rc/Cc
Expected Combined Effect (ECE):
Rc/Cc
X
Effect of the drug:
Cd/Cc
Observed Combined Effect:
Cd/Cc
SI= ECE - OCE
Rd/Cc
> 0 = sensitive
< 0 = resistant
Range: from -1 to 1.
Rc: untreated RNA; Cc: untreated control;
Cd: drugged control; Rd: drugged RNA.
Swanton et al Cancer Cell 11, 498-512 2007
Selection of hits from shRNA screen
• Each gene has 2-11 shRNAs/clones.
• A bootstrap algorithm was used to estimate the variability of
the mean SI level for genes with > 3 shRNAs by randomly
sampling from all shRNAs of that gene with replacement. The
corresponding 95% bootstrap C.I. was calculated for each of
these genes.
• The mean SI value was calculated for the genes with ≤ 3
shRNAs. Hits represented by these genes were selected with a
more stringent cutoff.
Plot of SI scores for all shRNAs
Paclitaxel sensitivity index for indicated
genes from shRNA screen
Top sensitizing gene targets from siRNA screen and
the corresponding chemical inhibitors
Statistical approaches
 Methods used to find the genes that are sensitive/resistant to the drug include
fold-change
t test, Z-factor and their variants
Wilcoxon rank-sum
Sensitivity Index (SI)
 It is unclear whether:
(1) Drug effect, RNAi effect, and the interaction effect are all considered,
(2) Variation among replicates is taken into account in the estimation,
(3) Decision error rates (false-positive and false-negative) are appropriately
controlled.
A Linear Model with an Interaction Term
 Disadvantage of SI: it ignores the variation among replicates.
 Assuming normal distribution, we can model cell viability (Y) for each
siRNA with the explanatory variables treatment (x1, yes/no), RNAi (x2,
yes/no), and their interaction term (x1x2):
D1: Y = a + b1*x1 + b2*x2 + b3*x1x2 + err
D2: Y = a’ + b’1*x1 + err
 Using the goodness-of-fit test, a Chi-square statistic can be calculated,
based on the difference between the deviance of the reduced model
(with drug effect only) D2 and the residual deviance of the fitted model
D1 with 2 degrees of freedom.
A simulation study
 Number of true hits: Uniform{10, 11, …, 60} out of 900+ genes
 The viability measurements of non-hits:
N(µNH, σ2), with σ = c (0.2, 0.4, 0.6, 0.8).
 The distribution of true hits with a shifted mean relative to the non-hits:
N(µNH*C, σ2)
C>1 for an antagonizing effect, C<1 for a sensitizing effect.
 The parameter D was used to tune the strength of the treatment effect
 Parameter K (>1) was defined such that non-silencing control wells have a
distribution with mean µctl =µrna*K, where µrna = µNH / µNH*D.
FPR & FNR
Truth
Test (SI)
+
–
+
–
TP
FP
#Agreed
#Claimed Pos– #Agreed
FN
TN
#True Hits – #Agreed
N – TP – FP – FN
# True hits
(10~60)
FNR = FN/(TP+FN) = 1-sensitivity
FPR = FP/(FP+TN) = 1-specificity
# claimed
positives
N (900+)
Power analysis: weak drug effect
(low concentration)
low σ, low D, high C
moderate σ, low D, high C
high σ, low D, high C
Power analysis: strong drug effect
(high concentration)
low σ, high D, high C
moderate σ, high D, high C
WHY?
high σ, high D, high C
In the Case of Skewed Data
• Gamma distributions Ga(r,λ) used instead of Normal.
• The shape (r) and scale (λ) parameters of gamma distributions
were calculated by solving µ=rλ and σ2= rλ2.
• The skewness value (2 /
r ) is taken to be (0.5, 1, 1.5, 2).
Skewness =1
unskewed
skewed
Scale=0.2; shape=4 ; skewness =1 (for untreated non-hits)
Skewness=1.5
unskewed
skewed
Scale=0.45; shape=1.78; skewness=1.5 (for untreated non-hits)
Skewness=2
unskewed
skewed
Scale=0.8; shape=1; skewness=2 (for untreated non-hits)
Summary points of skewed data
• Ratio-based methods can be unstable: the SI method is affected by
the skewness the most.
• The t-test is affected more by the skewness when there is a small
number of replicates.
• LM is quite stable.
• FC: very unstable.
• Do transformation if the data are heavily skewed.
Recommendations
Number of
Noisea
Replicates
3
Low
Moderate
High
Low
Low
Moderate
High
Low
Low
6
Low
Moderate
High
Low
Low
Moderate
High
Low
Low
9
Low
Moderate
High
Low
Low
Moderate
High
Low
Low
12
Low
Moderate
High
Low
Low
Moderate
High
Low
Low
Drug
effectb
High
High
High
Moderate
Low
Low
Low
High
High
High
High
High
Moderate
Low
Low
Low
High
High
High
High
High
Moderate
Low
Low
Low
High
High
High
High
High
Moderate
Low
Low
Low
High
High
siRNA effectc
High
High
High
High
High
High
High
Moderate
Low
High
High
High
High
High
High
High
Moderate
Low
High
High
High
High
High
High
High
Moderate
Low
High
High
High
High
High
High
High
Moderate
Low
Recommended
method(s)
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM
LM, SI
LM, SI
LM
LM
LM
LM
LM
LM
SI
SI
SI
LM
LM
LM
LM
LM
LM
a Noise
can be measured by
coefficient of variation (CV)
or variance-to-mean ratio
(VMR). VMR<0.2: low noise,
0.2≤VMR<0.5: moderate
noise; VMR≥0.5 high noise.
b Drug
effect can be
estimated by Cd/Cc.
c
RNAi effect can be
estimated by Rc/Cc.
Increase the power of your study
Possible ways to improve the power of your study:
• Choose a proper statistical method that is most powerful for
your study.
• Reduce the variability.
• Increase #replicates.
• Use a relatively lower concentration of the drug if appropriate.
“…Here we show that several of these targets sensitize lung
cancer cells to paclitaxel concentrations 1,000-fold lower than
otherwise required for a significant response, and we identify
mechanistic relationships …” (Whitehurst et al., Nature 2007)