Dynamic Treatment Regimes, STAR*D & Voting
Download
Report
Transcript Dynamic Treatment Regimes, STAR*D & Voting
Revisiting an Old Topic:
Probability of Replication
D. Lizotte, E. Laber & S. Murphy
Johns Hopkins Biostatistics
September 23, 2009
Outline
•
•
•
•
•
Scientific Background
Our Estimand: Probability of Selection
Estimators
STAR*D
Where to go from here?
2
Scientific Background
First experiment results in
–
or
–
–
–
–
,
what is the chance that we will replicate this result
in a subsequent experiment?
Prob. of Concurrence or Prob. of Replication
Killeen (2005) followed by great controversy in
psychology (Cumming, (2005, 2006, 2008); MacDonald
(2005);Doros & Geier(2005); Iverson(2008); Iverson, Wagenmakers
& Lee (2008); Asby & O’Brien(2008), Iverson, Lee & Wagenmakers
(2009)……)
3
Scientific Background
Similar problem but discredited:
•
Post-hoc power/ Observed power: Assuming
the observed standardized effect size is the
truth, calculate the probability of rejecting
null hypothesis. Hoenig & Heisey (2001)
4
Scientific Background
First experiment results in
–
or
–
– what is the chance that we will replicate this result
in a subsequent experiment?
•
•
Why is this question so attractive?
Scientists (including statisticians!) often want to
answer this question with 1 – p-value
5
Scientific Background
•
First experiment results in
–
or
–
,
– what is the chance that we will replicate this result
in a subsequent experiment?
•
1 – p-value does not address this question.
– Goodman (1992), Cumming (2008)
– 1 – p-value is not an estimator.
6
Scientific Background
•
Much confusion about estimand:
–
, what is the chance that we will replicate this
result in a subsequent experiment?
• Do we want to “estimate”
1)
or
2)
or
3)
or
4)
?
•
Good frequentist properties are desired.
7
Our Estimand
•
•
Probabilities of Selection
2)
The probability of selection is a composite measure
of signal, noise, and sample size
8
Our Estimand
•
Advantages (The Hope) over the concept of p-value
–
–
–
–
•
Close to what many scientists want.
The intuitive interpretation is correct.
Does not rely on the correctness of a data generating
model for meaning.
Less ambitious than 3)
Disadvantages
–
–
–
We changed the question.
Some may think that there is no need for a confidence
interval—wrong.
Non-regular
9
Estimators
• Why is this a hard problem?
– The desire for good frequentist properties
– The fact that effect sizes tend to be small relative to
the noise.
– This is a non-regular problem—bias is of the same
order as variance.
• Back of the envelope calculations:
10
Estimators
•
• Use plug-in estimator
• Plug-in estimator is 1 – p-value (Goodman, 1992)!
– Nonregular
• Near a uniform distribution if
• If n is large, close to 0 or 1 otherwise
– We can expect
to be small.
11
Estimators
• Try a Bayesian approach.
– Random sample
from a
,
– Flat prior on , known
– Use
as an estimator of
–
• Bayesian methods do not eliminate non-regularity.
12
Estimators
Focus on MSE in formulating estimators for
1) Assume is approximately normal with mean
and variance
.
1) Flat prior (e.g. Killeen’s prep)
2) Normal Prior:
3) Prior is mixture between N(0,1) with probability w point
mass on
with probability 1-w
13
Estimators
Focus on MSE in formulating estimators for
2) Single bootstrap (Efron & Tibshirani:1989) .
•
This is 1 - p-value. No assumption of approximate
normality. If
is approximately normal then this is
approximately the plug-in estimator:
3) Double bootstrap
•
This is a bagged plug-in estimator. This bags the 1bootstrap p-value. No assumption of approximate
normality.
14
.
Why a double bootstrap?
Double bootstrap estimator for
.
• Bagging is used to trade variance for bias when
estimators are unstable (Buehlman & Yu, 2002).
• The bootstrap estimator of
is
unstable; if
it does not converge
as the sample size increases.
• Under local alternatives such as
the
bootstrap estimator is inconsistent as well.
15
Double Bootstrap
Double bootstrap estimator for
.
If
has an approximate normal distribution then the
double bootstrap estimator is
That is, the double bootstrap reduces to prep in this
case.
16
MSE Plots
• Two groups, each of size 25
• Two distributions (normal, bimodal)
• Two definitions of
–
–
• Compare
– prep, pnorm, pmix, single bootstrap, double
bootstrap
17
Estimators
Instead of a point estimator, consider a confidence
interval for
.
Assume
then
has an approximate normal distribution;
In this case a confidence interval for
can be found from a confidence interval for the
standardized effect size:
21
STAR*D
• Sequenced Treatment Alternatives to Relieve
Depression
• Large multi-site study focused on individuals
whose depression did not remit with citalopram
• In this trial each individual can proceed through
up to 4 stages of treatment. The individual
moves to a next stage if the individual is not
responding to present treatment.
• Each stage involves a randomization.
22
STAR*D
• This is a data from 683 individuals who did not
respond to citalopram and preferred a switch in
treatment.
• These individuals were randomized between
Venlafaxine, Bupropion, Sertraline
• Outcome: Time until remission.
• We model the area under the survival curve
from entry into this stage of treatment until 30
months. (e.g. min(T, 30)).
23
STAR*D
Regression formula at level 2:
STAR*D
• For each s,
• Double Bootstrap
– Inner-most bootstrap counts proportion of “votes”
in which
– Outer-most bootstrap averages over the proportion
across the bootstrap samples
25
Discussion
•
•
•
•
Definition of the probability of selection when
there is more than two treatments.
Confidence intervals for comparisons between
more than two treatments.
Is there a minimax estimator of the selection
probability?
Is there hope for the replication probability?
28
Truth in Advertising:
STAR*D
Missing Data + Study Drop-Out
•
•
•
•
1200 subjects begin level 2 (e.g. stage 1)
42% study dropout during level 2
62% study dropout by 30 weeks.
Approximately 13% item missingness for
important variables observed after the start
of the study but prior to dropout.
29
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/
seminars/HopkinsBiostat09.23.09.ppt
Email me with questions or if you would like a
copy!
[email protected]
30
Our Estimand
•
The probability of selection is a composite measure
of signal, noise and sample size
•
The p-value is a composite measure of estimated
signal, estimated noise and sample size.
31