Bias Correction in Pharmaceutical Risk Assessment

Download Report

Transcript Bias Correction in Pharmaceutical Risk Assessment

Bias Correction in Pharmaceutical
Risk-Benefit Assessment
Bob Obenchain, PhD, FASA
Risk-Benefit Statistics LLC
Yin = Dark = Evil = Risk
Yang = Light = Good = Benefit
Outline:
• Covariate Adjustment (Simplistic,
Global Modeling) is Inadequate
• Local Control methods take BIG
Steps in “Right Directions.”
• Emerging Credibility Crisis in
Pharmaceutical Safety
With titles like these, do you
really need to read the paper?
Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score
methods gave similar results to traditional regression
modeling in observational studies: a systematic review.
J Clin Epidemiol 2005; 58: 550–559.
Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ,
Schneeweiss S. A review of the application of propensity
score methods yielded increasing use, advantages in
specific settings, but not substantially different estimates
compared with conventional multivariable methods.
[REVIEW ARTICLE] J Clin Epidemiol 2006; 59: 437–
447.
Early CA Modeling Efforts
Heckman JJ. Sample selection bias as a specification error.
Econometrica 1979; 47: 153–161.
Crown WE, Obenchain RL, Engelhart L, Lair TJ,
Buesching DP, Croghan TW. The application of sample
selection models in evaluating treatment effects: the case
for examining the effects of antidepressant medication.
Stat Med 1998; 17, 1943–1958.
Obenchain RL, Melfi CA. Propensity score and Heckman
adjustments for treatment selection bias in database
studies. 1997 Proceedings of the Biopharmaceutical
Section. Alexandria, VA: American Statistical
Association. 1998; 297–306.
Highly Influential ???
D’Agostino RB Jr. Propensity score methods
for bias reduction in the comparison of a
treatment to a non-randomized control
group. [TEACHER’S CORNER] Stat Med
1998; 17: 2265–2281.
Claimed that 3rd form of PS Adjustment
(after matching and sub-grouping) was to
simply use some function of PS estimates as
an additional X in Covariate Adjustment.
History of Local Control
Methods for Human Studies
• Epidemiology (case-control & cohort) studies
• Post-stratification and re-weighting in surveys
• Stratified, dynamic randomization to improve balance
on predictors of outcome
• Matching and Sub-grouping using Propensity Scores
• Econometric Instrumental Variables (LATEs)
• Marginal Structural Models (IPW  1/PS)
• Unsupervised Propensity Scoring: Nested Treatmentwithin-Cluster ANOVA model …with LATE, LTD
and Error sources of variation
“Local” Terminology:
• Subgroups of Patients
• Subclasses…
• Strata…
• Clusters… (natural or forced)
Notation for Variables
y = observed outcome variable(s)
x = observed baseline covariate(s)
t = observed treatment assignment
(usually non-random)
z = unobserved explanatory variable(s)
Fundamental PS Theorem
Joint distribution of x and t given p:
Pr( x, t | p )  Pr( x | p ) Pr( t | x, p )
= Pr( x | p ) Pr( t | x )
= Pr( x | p ) times p or (1p)
= Pr( x | p ) Pr( t | p )
...i.e x and t are conditionally independent given the
propensity for new, p = Pr( t = 1 | x ).
Conditioning (patient matching) on
Propensity Scores implies both…
Balance: local X-covariate distributions must
be the same for both treatments
and
Imbalance: Unequal local treatment fractions
unless Pr( t | p ) = p = 1p = 0.5
Constant PS Estimate  Calipers
from Discrete Choice (Logit or Probit) Model
x Linear
Functional
 constant
x2
ˆ

x1
x3
Infinite 3-D Slab
Pr( x, t | p ) = Pr( x | p ) Pr( t | p )
The unknown true propensity
score is the “most coarse”
possible balancing score.
The known x-vector itself is the
“most detailed” balancing score…
Pr( x, t ) = Pr( x ) Pr( t | x )
What is LESS “coarse” than
Pr( x, t | p ) = Pr( x | p ) Pr( t | p ) ?
Conditioning upon Cluster Membership is intuitively
somewhere between the two PS extremes in the limit as
individual clusters become numerous, small and compact…
Pr( x, t | C ) = Pr( x | C ) Pr( t | x, C )
= Pr( x | C ) Pr( t | x ) for xC
 constant  Pr( t | C )
But LESS “detailed” than
Pr( x, t ) = Pr( x ) Pr( t | x ) ?
Unsupervised No PS Estimates Needed
x2
x1
x3
3-D Clusters
(Informative or
Uninformative)
Nested ANOVA
Source
Degrees-ofFreedom
Clusters
(Subgroups)
C = Number of
Clusters
Treatment
within Cluster
Number of
“Informative”
Clusters  C
Local Treatment
Differences (LTDs)
Error
 Number of
Patients  2C
Uncertainty
Interpretation
Local Average Treatment
Effects (LATEs) are
Cluster Means
Although a NESTED model can be (technically)
WRONG, it is sufficiently versatile to almost always be
USEFUL as the number of “clusters” increases.
Nested ANOVA
Source
Degrees-ofFreedom
Clusters
(Subgroups)
C = Number of
Clusters
Treatment
within Cluster
Number of
“Informative”
Clusters  C
Local Treatment
Differences (LTDs)
Error
 Number of
Patients  2C
Uncertainty
Interpretation
Local Average Treatment
Effects (LATEs) are
Cluster Means
Although a NESTED model can be (technically)
WRONG, it is sufficiently versatile to almost always be
USEFUL as the number of “clusters” increases.
Multiplicative  “Shrinkage” Model
 i = 1 if "treated" Y1i is observed;  0,otherwise.
PropensityScore = Pr(  i =1) = pi  0 and < 1
and  i isstatisticallyindependent of Yi .
  iYi 
E (observed Y1i )  E 

p
 i 
Nested ANOVA Treatment
th
Difference within i Cluster:
n1i = Number treated patientsin i th cluster > 0
n0i = Number untreated patientsin i th cluster > 0
pˆ i  n1i /  n0i  n1i   n1i
1
1
 n  y for treated patient    n  y for untreated patient 
1i
0i
n1i  pˆ i
Local Treatment
Imbalance!
n0i  (1  pˆ i )
The “statistical methodology”
engine ideal for making fair
treatment comparisons is:
Cluster Analysis
(Unsupervised Learning)
plus Nested ANOVA
i.e. not Generalized Linear Models
and their Nonlinear extensions.
Inverse Probability Weighting
(IPW) for CA models:
E ( yi | xi )  xi 
V ( yi | xi )  pi
2




k
k
ˆ     ( yi  xi ) /    (xi xi )
 pi 
 pi 
The “Local Control” Philosophy:
• y = Outcome comparisons among patients with the
have most similar X characteristics are most relevant
• Robust, Nested Treatment-within-Cluster ANOVA
• Systematically form, compare, subdivide & recombine
subgroups (clusters) …built-in sensitivity
• Non-parametric Distribution of Observed Local
Treatment Differences (LTDs) …no prior distribution!
• Main Effect of Treatment is Mean of CDF formed by
combining LTD estimates weighted  Cluster Size
• Only when Combined CDF suggests Differential
Response: Which patient characteristics predict What?
Credibility…
• Conflicts of Interest between Pharmaceutical
Industry, Regulators and Data Custodians /
Analysts
• Why should industry pay BIG $$$ for
observational studies when poor / naïve
analyses of biased data can create perceived
needs for even more expensive RCTs?
The pieces don’t fit together very well in the USA!
Pharma
Industry
FDA
Aprotinin Case Study…
• Attack in early 2006 by a US MD who got
some very sloppy analyses of international
patient registry data published in NEJM
• Bayer (Germany) commissioned gigantic
admin claims analysis by the research
arm of their major US payer in mid 2006
• MC researcher emailed a flawed, highly
unfavorable analysis to Germany 8 days
before 2006 US advisory board meeting
Drug warnings fall flat
Bayer hides bad news; a researcher doesn't, and takes heat.
KRIS HUNDLEY, St. Petersburg Times, August 5, 2007
Dr. Thomas Kelly, a heart surgeon for 30 years, …routinely
uses Trasylol on repeat open-heart patients or people on blood
thinners.
"Bleeding is a tremendous problem" Kelly said. “In certain
populations, there is much less need for transfusions with
Trasylol. The alternatives are not nearly as effective.“
"This drug is used on high-risk people; that's why there's
a higher incidence of death," the surgeon said. "I think a
terrible disservice has been done to a very helpful drug."
Though he thinks the recent studies "unfairly impugned"
Trasylol, Kelly said he is using the drug more selectively and
reading all the research available on the topic.
Why should Pharma TRUST the other Players?
Pharma
Industry
FDA
What constitutes a BENEFIT ???
When a treatment is approved only for patients with
high disease severity or clear vulnerability / frailty,
there appear to be two possible “standards.”
Treated patients have better outcomes
than untreated patients with same risk
or
Treated patients have better outcomes
than untreated patients with high risk
References
Bang H, Robins JM. Doubly Robust Estimation in Missing Data
and Causal Inference Models. Biometrics 2005; 61: 962-972.
Fraley C, Raftery AE. Model based clustering, discriminant
analysis and density estimation. JASA 2002; 97: 611-631.
Imbens GW, Angrist JD. Identification and Estimation of Local
Average Treatment Effects. Econometrica 1994; 62: 467-475.
McClellan M, McNeil BJ, Newhouse JP. Does More Intensive
Treatment of Myocardial Infarction in the Elderly Reduce
Mortality?: Analysis Using Instrumental Variables. JAMA
1994; 272: 859-866.
McEntegart D. “The Pursuit of Balance Using Stratified and
Dynamic Randomization Techniques: An Overview.” Drug
Information Journal 2003; 37: 293-308.
References …concluded
Obenchain RL. USPS package: Unsupervised and Supervised
Propensity Scoring in R. Version 1.1-0. www.r-project.org
August 2007.
Obenchain RL. Unsupervised Propensity Scoring: NN and IV
Plots. 2004 Proceedings of the JSM.
Robins JM, Hernan MA, Brumback B. Marginal Structural
Models and Causal Inference in Epidemiology. Epidemiology
2000; 11: 550-560.
Rosenbaum PR, Rubin RB. The Central Role of the Propensity
Score in Observational Studies for Causal Effects. Biometrika
1983; 70: 41-55.
Rosenbaum PR. Observational Studies, Second Edition. 2002. New
York: Springer-Verlag.