comparison of 3 learning methods

Download Report

Transcript comparison of 3 learning methods

Methods for Estimating the
Decision Rules in Dynamic
Treatment Regimes
S.A. Murphy
Univ. of Michigan
IBC/ASC: July, 2004
Dynamic Treatment Regimes
Dynamic Treatment Regimes are individually tailored
treatments, with treatment type and dosage changing
with ongoing subject information. Mimic Clinical
Practice.
•Brooner et al. (2002) Treatment of Opioid Addiction
•Breslin et al. (1999) Treatment of Alcohol Addiction
•Prokaska et al. (2001) Treatment of Tobacco Addiction
•Rush et al. (2003) Treatment of Depression
EXAMPLE: Treatment of alcohol dependency.
Primary outcome is a summary of heavy drinking scores
over time.
Treatment of Alcohol Dependency
Initial T xt
Intermediate Outcome
Responder
Secondary T xt
Monitor +
counseling
Monitor
Med B
Med A
Nonresponder
EM +
Med B+
Psychosocial
Intensive Outpatient
Program
Responder
Monitor +
counseling
Monitor
Med A +
Psychosocial
Med B
Nonresponder
EM +
Med B+
Psychosocial
Sequential Multiple Assignments
Initial T xt
Intermediate Outcome
Secondary T xt
Monitor +
Responder
R
counseling
Monitor
Med B
Med A
Nonresponder
R
EM +
Med B+
Psychosocial
R
Responder
Monitor +
R
counseling
Monitor
Med A +
Psychosocial
Med B
Nonresponder
R
EM +
Med B+
Psychosocial
Examples of sequential multiple assignment randomized
trials:
•CATIE (2001) Treatment of Psychosis in Alzheimer’s
Patients
•CATIE (2001) Treatment of Psychosis in
Schizophrenia
•STAR*D (2003) Treatment of Depression
•Thall et al. (2000) Treatment of Prostate Cancer
k Decisions
Observations made prior to jth decision
Action at jth decision
Primary Outcome:
for a known function f
A dynamic treatment regime is a vector of decision
rules, one per decision
If the regime is implemented then
Methods for Estimating Decision
Rules
Three Methods for Estimating Decision
Rules
• Q-Learning (Watkins, 1989)
---regression
• A-Learning (Murphy, Robins, 2003)
---regression on a mean zero space.
• Weighting (Murphy, van der Laan & Robins, 2002)
---weighted mean
One decision only!
Data:
is randomized with probability
Goal
Choose
to maximize:
Q-Learning
Minimize
A-Learning
Minimize
Weighting
Discussion
Discussion
•
Consistency of Parameterization
---problems for Q-Learning
•
Model Space
---bias
---variance
Q-Learning
Minimize
Minimize
Discussion
•
Consistency of Parameterization
---problems for Q-Learning
•
Model Space
---bias
---variance
Points to keep in mind
• The sequential multiple assignment randomized trial
is a trial for developing powerful dynamic treatment
regimes; it is not a confirmatory trial.
• Focus on MSE recognizing that due to the high
dimensionality of X, the model parameterization is
likely incorrect.
Goal
Given a restricted set of functional forms for the
decision rules, say
, find
Discussion
•
Mismatch in Goals
---problems for Q-Learning & A-Learning
Suppose our sample is infinite. Then in general
neither
or
is close to
Open Problems
• How might we “guide” Q-Learning or A-Learning so
as to more closely achieve our goal?
• Dealing with high dimensional X-- feature extraction--feature selection.
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/seminars/
ibc_asc_0704.ppt
My email address:
[email protected]