SMART Experimental Designs for Developing Adaptive Treatment

Download Report

Transcript SMART Experimental Designs for Developing Adaptive Treatment

Sequential, Multiple Assignment,
Randomized Trials
and
Treatment Policies
S.A. Murphy
MUCMD, 08/10/12
Outline
• Treatment Policies
• Sequential Multiple Assignment Randomized
Trials, “SMART Studies”
• Q-Learning/ Fitted Q Iteration
• Where we are going……
2
Treatment Policies are individually tailored
treatments, with treatment type and dosage changing
according to patient outcomes. Operationalize
sequential decisions in clinical practice.
k Stages for each individual
Observation available at jth stage
Action at jth stage (usually a treatment)
3
Example of a Treatment Policy
•Adaptive Drug Court Program for drug
abusing offenders.
•Goal is to minimize recidivism and drug
use.
•Marlowe et al. (2008, 2009, 2011)
4
Adaptive Drug Court Program
non-responsive
low risk
As-needed court hearings
+ standard counseling
As-needed court hearings
+ ICM
non-compliant
high risk
non-responsive
Bi-weekly court hearings
+ standard counseling
Bi-weekly court hearings
+ ICM
non-compliant
Court-determined
disposition
5
Usually k=2 Stages (Finite Horizon=2)
Goal: Use a training set of n trajectories, each of
the form
(a trajectory per subject) to construct a treatment
policy that outputs the actions, ai. The treatment
policy should maximize total reward.
The treatment policy is a sequence of two
decision rules:
6
Randomized Trials
What is a sequential, multiple assignment,
randomized trial (SMART)?
Each subject proceeds through multiple stages of
treatment; randomization takes place at each stage.
Exploration, no exploitation.
Usually 2-3 treatment stages
7
Pelham’s ADHD Study
A1. Continue, reassess monthly;
randomize if deteriorate
Yes
8 weeks
A. Begin low-intensity
behavior modification
A2. Augment with other
treatment
AssessAdequate response?
No
Random
assignment:
A3. Increase intensity of
present treatment
Random
assignment:
B1. Continue, reassess monthly;
randomize if deteriorate
8 weeks
B. Begin low dose
medication
AssessAdequate response?
B2. Increase intensity of
present treatment
Random
assignment:
No
B3. Augment with other
treatment
8
Oslin’s ExTENd Study
Naltrexone
8 wks Response
Random
assignment:
Early Trigger for
Nonresponse
Random
assignment:
TDM + Naltrexone
CBI
Nonresponse
CBI +Naltrexone
Random
assignment:
8 wks Response
Naltrexone
Random
assignment:
TDM + Naltrexone
Late Trigger for
Nonresponse
Random
assignment:
Nonresponse
CBI
CBI +Naltrexone
9
Jones’ Study for Drug-Addicted
Pregnant Women
rRBT
2 wks Response
Random
assignment:
tRBT
Random
assignment:
tRBT
tRBT
Nonresponse
eRBT
Random
assignment:
2 wks Response
aRBT
Random
assignment:
rRBT
rRBT
Random
assignment:
Nonresponse
tRBT
rRBT
Usually 2 Stages (Finite Horizon=2)
Goal: Use a training set of n trajectories, each of
the form
X 1; A1; X 2; A2; X 3
(a trajectory per subject) to construct treatment
policy. The treatment policy should maximize
total reward.
Aj is a randomized action with known
randomization probability. Here binary actions
11
with P[Aj=1]=P[Aj=-1]=.5
Secondary Data Analysis: Q-Learning
•Q-Learning, Fitted Q Iteration, Approximate
Dynamic Programming (Watkins, 1989; Ernst et
al., 2005; Murphy, 2003; Robins, 2004)
• This results in a proposal for an optimal
treatment policy.
•A subsequent randomized trial would evaluate
the proposed treatment policy.
12
2 Stages—Terminal Reward Y
Goal: Use training set to construct
d1 (X 1); d2 (X 1; A 1; X 2 )
for which the average value, E d1 ;d2 [Y ], is
maximal.
The maximal average value is
V
opt
= max E d1 ;d2 [Y ]
d1 ;d2
13
Idea behind Q-Learning/Fitted Q
¯
¸¸
¯
= E max E max E [Y jX 1 ; A 1 ; X 2 ; A 2 = a2 ] ¯¯X 1 ; A 1 = a1
·
V opt
·
a1
a2
² Stage 2 Q-function Q2 (X 1 ; A 1 ; X 2 ; A 2 ) = E [Y jX 1 ; A 1 ; X 2 ; A 2 ]
¯
·
·
¸¸
¯
V opt = E max E max Q2 (X 1 ; A 1 ; X 2 ; a2 ) ¯¯X 1 ; A 1 = a1
a1
a2
¯
¸
¯
² Stage 1 Q-function Q1 (X 1 ; A 1 ) = E maxa 2 Q2 (X 1 ; A 1 ; X 2 ; a2 ) ¯¯X 1 ; A 1
·
·
¸
V opt = E max Q1 (X 1 ; a1 )
a1
14
Simple Version of Fitted Q-iteration –
Use regression at each stage to approximate Q-function.
• Stage 2 regression: Regress Y on
obtain
to
^= ®
Q
^ 2T S20 + ¯^2T S2 a2
• Arg-max over a2 yields
15
Value for subjects entering stage 2:
•
•
is a predictor of
maxa2 Q2 (X 1 ; A 1 ; X 2; a2)
is the dependent variable in the stage 1
regression for patients moving to stage 2
16
Simple Version of Fitted Q-iteration –
• Stage 1 regression: Regress
to obtain
on
• Arg-max over a1 yields
17
Decision Rules:
18
Pelham’s ADHD Study
A1. Continue, reassess monthly;
randomize if deteriorate
Yes
8 weeks
A. Begin low-intensity
behavior modification
A2. Augment with other
treatment
AssessAdequate response?
No
Random
assignment:
A3. Increase intensity of
present treatment
Random
assignment:
B1. Continue, reassess monthly;
randomize if deteriorate
8 weeks
B. Begin low dose
medication
AssessAdequate response?
B2. Increase intensity of
present treatment
Random
assignment:
No
B3. Augment with other
treatment
19
ADHD
138 trajectories of form: (X1, A1, R1, X2, A2, Y)
• Y = end of year school performance
• R1=1 if responder; =0 if non-responder
• X2 includes the month of non-response, M2,
and a measure of adherence in stage 1 (S2 )
– S2 =1 if adherent in stage 1; =0, if non-adherent
• X1 includes baseline school performance, Y0 ,
whether medicated in prior year (S1), ODD
(O1)
– S1 =1 if medicated in prior year; =0, otherwise.
20
Q-Learning using data on
children with ADHD
• Stage 2 regression for Y:
(1; Y0 ; S1 ; O1 ; A 1 ; M 2 ; S2 )®2 +
A 2 (¯21 + A 1 ¯22 + S2 ¯23 )
• Decision rule is “ if child is nonresponding then intensify initial treatment
if ¡ :72 + :05A 1 + :97S2 > 0 , otherwise
augment”
21
Q-Learning using data on
children with ADHD
• Decision rule is “if child is non-responding
then intensify initial treatment if
. + :05A 1 + :97S2 > 0 , otherwise augment”
¡ :72
Decision Rule for
Non-responding
Children
Initial Treatment
=BMOD
Initial
Treatment=MED
Adherent
Intensify
Intensify
Not Adherent
Augment
Augment
22
ADHD Example
• Stage 1 regression for
(1; Y0 ; S1 ; O1 )®1 + A 1 (¯11 + S1 ¯12 )
• Decision rule is, “Begin with BMOD if
. ¡ :32S1 > 0 , otherwise begin with MED”
:17
23
Q-Learning using data on
children with ADHD
• Decision rule is “Begin with BMOD if
. ¡ :32S1 > 0, otherwise begin with
:17
MED”
Initial Decision
Rule
Initial Treatment
Prior MEDS
MEDS
No Prior MEDS
BMOD
24
ADHD Example
• The treatment policy is quite decisive. We
developed this treatment policy using a trial on
only 138 children. Is there sufficient evidence
in the data to warrant this level of
decisiveness??????
• Would a similar trial obtain similar results?
• There are strong opinions regarding how to
treat ADHD.
• One solution –use confidence intervals.
25
ADHD Example
Treatment Decision for Non-responders. Positive
Treatment Effect  Intensify
90% Confidence Interval
Adherent to BMOD
(-0.08, 0.69)
Adherent to MED
(-0.18, 0.62)
Non-adherent to BMOD
(-1.10, -0.28)
Non-adherent to MED
(-1.25, -0.29)
26
ADHD Example
Initial Treatment Decision: Positive Treatment
Effect  BMOD
90% Confidence Interval
Prior MEDS
(-0.48, 0.16)
No Prior MEDS
(-0.05, 0.39)
27
Proposal for Treatment Policy
IF medication was not used in the prior year
THEN begin with BMOD;
ELSE select either BMOD or MED.
IF the child is nonresponsive and was nonadherent, THEN augment present treatment;
ELSE IF the child is nonresponsive and was
adherent, THEN select either intensification or
augmentation of current treatment.
28
Where are we going?......
• Increasing use of wearable computers (e.g
smart phones, etc.) to both collect real time data
and provide real time treatment.
• We are working on the design of studies
involving randomization (soft-max or epsilongreedy choice of actions) to develop/
continually improve treatment policies.
• Need confidence measures for infinite horizon
problems
29
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/
seminars/MUCMD.08.10.12.pdf
This seminar is based on work with many
collaborators, some of which are: L. Collins, E. Laber,
M. Qian, D. Almirall, K. Lynch, J. McKay, D. Oslin,
T. Ten Have, I. Nahum-Shani & B. Pelham. Email
with questions or if you would like a copy:
[email protected]
30