Meeting the Future in Managing Chronic Disorders

Download Report

Transcript Meeting the Future in Managing Chronic Disorders

Possible Roles for
Reinforcement Learning in
Clinical Research
S.A. Murphy
November 14, 2007
1
Outline
Goal: Improving Clinical Decision Support
Systems Using Data
–
–
–
–
Clinical Decision Support Systems
Critical Decisions
Types of Data
Challenges
• Incomplete, primitive, mechanistic models
• Measures of Confidence
– Clinical Trials
2
3
4
Patient Evaluation Screen with MSE
Questions
5
6
Outline
–
–
–
–
Clinical Decision Support Systems
Critical Decisions
Types of Data
Challenges
• Incomplete mechanistic models
• Measures of Confidence
– Clinical Trials
7
Critical Decisions
• Which treatments should be offered first?
• How long should we wait for these
treatments to work?
• How long should we wait before offering a
transition to a maintenance stage?
• Which treatments should be offered next?
8
Critical Decisions
• All of these questions relate to the formulation of
a policy.
• Actions include medications, behavioral therapies,
delivery mechanisms, monitoring
• Observations include biological measures, family
history, severity, side effects, functionality,
symptoms
• Rewards include functionality, side effects,
symptoms
9
Outline
–
–
–
–
Clinical Decision Support Systems
Critical Decisions
Types of Data
Challenges
• Incomplete mechanistic models
• Measures of Confidence
– Clinical Trials
10
Types of Data
• Large Observational Data Sets
– Actions are not manipulated by scientist
• Clinical Trial Data
– Actions are manipulated by scientist
• Bench research on cells/animals/humans
11
Clinical Trial Data Sets
• Experimental trial data collected for research
purposes
– Scientists decide proactively which data to collect and
how to collect this data
– Use scientific knowledge to enhance the quality of the
proxies for observation, reward
– Actions are manipulated (randomized) by scientist
– Short Horizon (less than 5)
– Hundreds of subjects.
12
Observational Data Sets
• Observational data collected for research
purposes
– Scientists decide proactively which data to
collect and how to collect this data
– Use scientific knowledge to enhance the quality
of the proxies for observation, action, reward
– Actions are not manipulated by scientist
– Moderate Horizon
– Hundreds to thousands of subjects.
13
Observational Data Sets
• Clinical databases or registries– (an
example in the US would be the VA
registries)
– Data was not collected for research purposes
– Use gross proxies to define observation, action,
reward
– Moderate to Long Horizon
– Thousands to Millions of subjects
14
Outline
–
–
–
–
Clinical Decision Support Systems
Critical Decisions
Types of Data
Challenges
• Incomplete mechanistic models
• Measures of Confidence
– Clinical Trials
15
Availability of Mechanistic Models
• In many areas of RL, scientists can use
mechanistic theory, e.g., physical laws, to model
or simulate the interrelationships between
observations and how the actions might impact the
observations.
• Scientists know many (the most important) of the
causes of the observations and know a model for
how the observations relate to one another.
16
Incomplete Mechanistic Models in
Medical Sciences
• Scientists who want to use data on
individuals to construct policies must
confront the fact that non-causal
“associations” occur due to the unknown
causes of the observations.
17
Conceptual Structure in the Medical
Sciences (observational data)
Unknown
Causes
Observations
Unknown
Causes
Action
Time 1
Observations
Time 2
Action
Time 2
Reward
Time 3
18
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
Maturity/
Decision
to join "Adult"
Society
Unknown
Causes
+
+
Binge Drinking
Treatment
Time 1
Binge Drinking
Time 2
Counseling
Time 2
Functionality
Time 3
19
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
• Problem: Non-causal associations between
treatment (here counseling) and rewards are likely.
• Solutions:
– Collect clinical trial data in which treatments are
randomized. This breaks the non-causal associations
yet permits causal associations.
– Prior to applying methods to observational data
proactively brainstorm with domain experts to ascertain
and measure the main determinants of treatment
selection. Then take advantage of causal inference
methods designed to minimize assumptions on the data20
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
Maturity/
Decision
to join "Adult"
Society
Unknown
Causes
"+"
Observations
Treatment
Time 1
Binge Drinking
Time 2
Counseling
Time 2
Functionality
Time 3
21
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
Maturity/
Decision
to join "Adult"
Society
Unknown
Causes
+
-
Binge Drinking
Yes
Counseling on
Health
Consequences
Yes/No
-
Binge Drinking
Yes/No
Time 2
Sanctions
+ counseling
Yes/No
Functionality
Time 3
22
Unknown, Unobserved Causes
(Incomplete Mechanistic Models)
• The problem: Even when treatments are randomized, noncausal associations occur in the data.
•
Solutions:
– Recognize that parts of the Q-function/transition probabilities can
not be informed by domain expertise as these parts reflect noncausal associations
– Or use methods for constructing policies that “average” over the
non-causal associations between action and reward.
• I think that the importance of this second causal inference
problem depends on the kind of data and how you use it.
23
Measures of Confidence
– Measures of confidence are essential
• Noisy data
• Need to know when any one of a subset of actions
will yield the best rewards –that is, when there is no
or little evidence otherwise.
• It is important to minimize the number of
observations that must be collected in the clinical
setting
24
Measures of Confidence
• We would like measures of confidence for
the following:
– To compare the value of two estimated policies
(both estimated using the training data).
– To assess if there is sufficient evidence that a
particular observation (e.g. output of a
biological test) should be part of the policy.
– To assess if there is sufficient evidence that a
subset of the actions lead to better rewards for a
given observation than the remaining actions.
25
Measures of Confidence
• I must both learn the policy and provide an
evaluation of the policy using one data set.
• The data set is small
26
Measures of Confidence
• Traditional methods for constructing
measures of conference require
differentiability (if frequentist properties are
desired).
• Q-functions are constructed via nondifferentiable operations (e.g.
maximization).
• The value of a policy is a non-differentiable
function of the policy.
27
Outline
–
–
–
–
Clinical Decision Support Systems
Critical Decisions
Types of Data
Challenges
• Causal ::: Unknown, unobserved causes
• Measures of Confidence
– Clinical Trials
28
Clinical Trials
• Data from the --short horizon– clinical trials make
excellent test beds for combinations of
supervised/unsupervised and reinforcement
learning methods.
– Developing methods for variable selection in decision
making (in addition to variable selection for prediction)
– Model selection when goal is learning good policies.
– Confidence intervals for the difference in value
between two policies.
– Feature Construction
29
ExTENd
• Ongoing study at U. Pennsylvania (D.
Oslin)
• Goal is to learn how best to help alcohol
dependent individuals reduce alcohol
consumption.
30
Oslin ExTENd
Naltrexone
8 wks Response
Random
assignment:
Early Trigger for
Nonresponse
Random
assignment:
TDM + Naltrexone
CBI
Nonresponse
CBI +Naltrexone
Random
assignment:
8 wks Response
Naltrexone
Random
assignment:
TDM + Naltrexone
Late Trigger for
Nonresponse
Random
assignment:
Nonresponse
CBI
CBI +Naltrexone
31
Adaptive Treatment for ADHD
• Ongoing study at the State U. of NY at
Buffalo (B. Pelham)
• Goal is to learn how best to help children
with ADHD improve functioning at home
and school.
32
ADHD Study
A1. Continue, reassess monthly;
randomize if deteriorate
Yes
8 weeks
A. Begin low-intensity
behavior modification
A2. Add medication;
bemod remains stable but
medication dose may vary
AssessAdequate response?
No
Random
assignment:
Random
assignment:
A3. Increase intensity of bemod
with adaptive modifications based on impairment
B1. Continue, reassess monthly;
randomize if deteriorate
8 weeks
B. Begin low dose
medication
AssessAdequate response?
No
Random
assignment:
B2. Increase dose of medication
with monthly changes
as needed
B3. Add behavioral
treatment; medication dose
remains stable but intensity
of bemod may increase
with adaptive modifications
33
based on impairment
Studies under review
• H. Jones study of drug-addicted pregnant
women (goal is to reduce cocaine/heroin
use during pregnancy and thereby improve
neonatal outcomes)
• J. Sacks study of parolees with substance
abuse disorders (goal is reduce recidivism
and substance use)
34
Jones’ Study for Drug-Addicted
Pregnant Women
rRBT
2 wks Response
Random
assignment:
tRBT
Random
assignment:
tRBT
tRBT
Nonresponse
eRBT
Random
assignment:
2 wks Response
aRBT
Random
assignment:
rRBT
rRBT
Random
assignment:
Nonresponse
tRBT
rRBT
35
Sack’s Study of Adaptive
Transitional Case Management
4 wks Response
Standard TCM
Standard TCM
Nonresponse
Random
assignment:
Augmented TCM
Random
assignment:
Standard TCM
Standard Services
36
Discussion
• Methods for online updating the policy as
data accumulates.
• Methods for producing composite rewards.
– High quality elicitation of functionality
• Human-Computer interface
• Improving tactics
37
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/
seminars/UAlberta07.ppt
Email me with questions or if you would like a
copy:
[email protected]
38
Unknown, Unobserved Causes
• Problem: We recruit students via flyers
posted in dormitories. Associations between
observations and rewards are highly likely
to be (due to the unknown causes) nonrepresentative.
• Solution: Sample a representative group of
college students.
39
STAR*D
• This trial is over and the data is being
analyzed (PI: J. Rush).
• One goal of the trial is construct good
treatment sequences for patients suffering
from treatment resistant depression.
www.star-d.org
40
41