Assessing the Total Effect of Time

Download Report

Transcript Assessing the Total Effect of Time

Dynamic Treatment Regimes:
Challenges in Data Analysis
S.A. Murphy
Survey Research Center
January, 2009
Outline
•
•
•
•
•
What are Dynamic Treatment Regimes?
Myopic Decision Making
Constructing Regimes
Q-Learning
Example using CATIE
2
Dynamic Treatment Regimes operationalize multi-stage
decision making.
These are individually tailored sequences of
interventions, with intervention type and dosage
adapted to the individual.
•Generalization from a one-time decision to a
sequence of decisions concerning interventions
•Operationalize clinical practice.
Each decision corresponds to a stage of intervention
3
Dynamic Treatment Regime
“Jobs First” Welfare Program
• At each stage of intervention
– Use individual characteristics (assets, income, age,
health, employment), characteristics of the environment
(domestic violence, incapacitated family member, #
children, living arrangements…),
– To select actions/interventions such as child care, job
search skills training, amount of cash benefit, medical
assistance, education,
– In order to maximize long term rewards (maximize
employment/independence over longer term).
4
5
Why use a Dynamic Treatment
Regime?
– High heterogeneity in response to any one
intervention
• What works for one person may not work for
another
• What works now for a person may not work later
– Improvement often marred by relapse
• Remitted or few current symptoms is not the same
as cured.
– Co-occurring disorders/adherence problems are
common
6
Outline
•
•
•
•
•
What are Dynamic Treatment Regimes?
Myopic Decision Making
Constructing Regimes
Q-Learning
Example using CATIE
7
Myopic Decision Making
• In myopic decision making, decision makers use regimes
that seek to maximize immediate rewards.
Problems:
– Ignore longer term consequences of present actions.
– Ignore the range of feasible future actions/interventions
– Ignore the fact that immediate responses to present actions
may yield information that pinpoints best future actions
(A dynamic treatment regime tells us how to use the
observations to choose the actions/interventions.)
8
Treatment of Schizophrenia
•
Myopic action: Offer patients a treatment that reduces
schizophrenia symptoms for as many people as possible.
•
The result: Some patients are not helped and/or experience
abnormal movements of the voluntary muscles (TDs). The
class of subsequent medications is greatly reduced.
•
The mistake: We should have taken into account the variety
of treatments available to those for whom the first treatment is
ineffective.
•
The message: Use an initial medication that may not have as
large a success rate but that will be less likely to cause TDs.
9
Treatment of Opioid Dependence
•
Myopic action: Choose an intensive multi-component
treatment (methadone + counseling + behavioral
contingencies) that immediately reduces opioid use for as
many people as possible.
•
The result: Behavioral contingencies are
burdensome/expensive to implement and many people may
not need the contingencies to improve.
•
The mistake: We should allow the patient to exhibit poor
adherence prior to implementing the behavioral
contingencies.
•
The message: Use an initial treatment that may not have as
large an immediate success rate but carefully monitor patient
adherence to ascertain if behavioral contingencies are
10
required.
Outline
•
•
•
•
•
What are Dynamic Treatment Regimes?
Myopic Decision Making
Constructing Regimes
Q-Learning
Example using CATIE
11
Basic Idea for Constructing a Regime:
Move Backwards Through Stages.
Action
Observations
Action
Observations
Stage 1
Stage 1
Reward
Stage 2
Stage 2
(Pretend you are “All-Knowing”)
12
2 Stages for each individual
Observations available at jth stage
Action at jth stage
13
2 Stages
History available at each stage
Primary Outcome/Reward:
14
A dynamic treatment regime is the sequence of decision
rules:
A simple decision rule is: given weights β, switch
treatment at stage j if
otherwise maintain on current treatment; Sj is a vector
summary of the history, Hj.
15
Goal:
Use data to construct decision rules that input
information in the history at each stage and output a
recommended decision; these decision rules should lead
to a maximal mean Y.
In the future we employ the actions recommended by
the decision rules:
16
Example of Decision Rules
Treatment of depression. Goal is to achieve and
maintain remission.
Provide Citalopram for up to 12 weeks gradually increasing dose
as required.
If either the maximum dose has been provided for two weeks, or
12 weeks have occurred, yet there is no remission, then
if there has been a 50% improvement in symptoms,
augment with Mirtazapine.
else switch treatment to Bupropion.
Else (remission is achieved) maintain on Citalopram and provide
17
web-based disease management.
Idealized Data for Constructing the Dynamic Treatment
Regime:
Data from sequential, multiple assignment, randomized
trials in which at each stage subjects are randomized
among alternative options.
That is, Aj is a randomized action with known
randomization probability.
Binary actions with P[Aj=1]=P[Aj=-1]=.5
18
Outline
•
•
•
•
•
What are Dynamic Treatment Regimes?
Myopic Decision Making
Constructing Regimes
Q-Learning
Example using CATIE
19
Regression-based methods for
constructing decision rules
•Q-Learning (Watkins, 1989) (a popular method from
computer science)
•A-Learning or optimal nested structural mean model
(Murphy, 2003; Robins, 2004)
•The first method is an inefficient version of the second
method when each stages’ covariates include the prior stages’
covariates and the actions are centered to have conditional
mean zero.
20
Basic Idea for Constructing a Regime:
Move Backwards Through Stages.
Action
Observations
Action
Observations
Stage 1
Stage 1
Reward
Stage 2
Stage 2
(Pretend you are “All-Knowing”)
21
Dynamic Programming
(k=2)
22
A Simple Version of Q-Learning –binary actions
Approximate
vector summaries of the history,
for S', S
• Stage 2 regression: Use least squares with outcome,
Y, and covariates
to obtain
• Set
• Stage 1 regression: Use least squares with outcome,
and covariates
to obtain
23
A Simple Version of Q-Learning –binary actions
Approximate
for S', S
vector summaries of the history,
Stage j decision rule:
Select treatment = 1 if
Otherwise select treatment = -1
24
Outline
•
•
•
•
•
What are Dynamic Treatment Regimes?
Myopic Decision Making
Constructing Regimes
Q-Learning
Example using CATIE
25
Clinical Antipsychotic Trials of
Intervention Effectiveness
(Schizophrenia)
• Multi-stage trial of 18 months duration
• Relaxed entry criteria
• A large number of sites representing a broad
array of clinical settings (state mental health,
academic, Veterans’ Affairs, HMOs, managed
care)
• Approximately 1500 patients
26
CATIE Randomizations (simplified)
Phase 1
Randomized Treatments
OLAN QUET RISP ZIPR PERP
Phase 2
Treatment preference
Efficacy
Randomized Treatments CLOZ OLAN QUET RISP
Tolerability
OLAN QUET RISP ZIPR
Phase 3
Treatments selected
by preference
many options
27
Constructing Dynamic Treatment
Regimes using CATIE
• Reward: Time to Treatment Dropout
• Phase 1 analysis:
– Controls: TD, recent exacerbation, site
– Tailoring variable: pretreatment PANSS
• Phase 2 analysis:
– Controls: TD, recent exacerbation, site
– Tailoring variables: “treatment preference,” phase 1
treatment, end of phase 1 PANSS
28
29
30
Myopic versus Non-myopic
Analyses
• Reward: Integrated Quality of Life (QoL)
• Phase 1 analysis:
– Controls: TD, recent exacerbation, site
– Tailoring variable: pretreatment QoL
• Phase 2 analysis:
– Controls: TD, recent exacerbation, site
– Tailoring variables: “treatment preference,” phase 1
treatment, end of phase 1 QoL
31
32
33
Challenges
• It is extremely challenging to provide measures of
confidence that possess “good frequentist properties.”
• Clinical Decision Support Systems
– We need to be able construct dynamic treatment regimes
that recommend a group of treatment actions when there is
no evidence that a particular treatment action is best.
• Even in this randomized trial setting, the most
straightforward analyses are subject to confounding
bias. Some methods to avoid confounding bias are
available.
34
Acknowledgements: This presentation is based on
work with many individuals including Eric Laber,
Dan Lizotte, John Rush, Scott Stoup, Joelle
Pineau, Daniel Almirall and Bibhas Chakraborty,.
Email address: [email protected]
Slides with notes at:
http://www.stat.lsa.umich.edu/~samurphy/
Click on seminars > health science seminars
35
Causal Inference Challenges
Behavioral/Social/Medical Sciences
• Incomplete mechanistic models
– Unknown causes
• Use data on individuals to combat the
dearth of mechanistic models.
– Drawback: non-causal “associations” occur due
to the unknown causes of the observations.
36
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
Unknown
Causes
Observations
Stage 1
Unknown
Causes
Treatment
Stage 1
Observations
Treatment
Stage 2
Reward
Stage 2
37
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
Maturity/
Decision
to join "Adult"
Society
Unknown
Causes
+
-
Binge Drinking
Counseling on
Yes
Health
Consequences
Yes/No
-
Binge Drinking
Sanctions
Functionality
Yes/No
Time 2
+ counseling
Yes/No
Time 3
38
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
Unknown
Causes
High SAT
Scores
+
+
Observations
Student
is a superior
athlete
+
Student
admitted to
University
Treatment
Time 2
Grades
39
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
• The problem: Even when treatments are
randomized, non-causal associations occur
in the data.
• The solution: Statistical methods should
appropriately “average” over the non-causal
associations between treatment and reward.
40
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
Unknown
Causes
Maturity
of Student
+
-
Binge Drinking
Treatment
Time 1
Frequent Drinking
Binge Drinking
Time 2
Treatment
Time 2
Functionality
41
Unknown, Unobserved Causes
• Problem: We recruit students via flyers
posted in dormitories. Associations between
observations and rewards are highly likely
to be (due to the unknown causes) nonrepresentative.
• Solution: Sample a representative group of
college students.
42
Summary of Solutions To Causal
Problems
• If possible randomize treatments (e.g. actions).
• Develop methods that avoid being influenced by
non-causal associations yet help you construct the
policy.
• Subjects in your data should be representative of
population of subjects.
43