Lecture 18 Matched Case Control Studies

Download Report

Transcript Lecture 18 Matched Case Control Studies

Lecture 18
Matched Case Control Studies
BMTRY 701
Biostatistical Methods II
Matched case control studies
 References:
• Hosmer and Lemeshow, Applied Logistic Regression
• http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mc
c.pdf
• http://staff.pubhealth.ku.dk/~bxc/Talks/NestedMatched-CC.pdf
• http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap49/s
ect35.htm
• http://www.ats.ucla.edu/stat/sas/library/logistic.pdf
(beginning page 5)
Matched design
 Matching on important factors is common
 OP cancer:
• age
• gender
 Why?
• forces the distribution to be the same on those
variables
• removes any effects of those variables on the
outcome
• eliminates confounding
1-to-M matching
 For each ‘case’, there is a matched ‘control
 Process usually dictates that the case is
enrolled, then a control is identified
 For particularly rare diseases or when large N is
required, often use more than one control per
case
Logistic regression for matched case control
studies
 Recall independence
iid
yi ~ Bern ( pi )
 e  0  1 xi
~ Bern 
 0  1 xi
1

e

iid



 But, if cases and controls are matched, are they
still independent?
Solution: treat each matched set as a stratum
 one-to-one matching: 1 case and 1 control per stratum
 one-to-M matching: 1 case and M controls per stratum
 Logistic model per stratum: within stratum,
independence holds.
e k  xi
pk ( xi ) 
1  e k  xi
 We assume that the OR for x and y is constant across
strata
How many parameters is that?
 Assume sample size is 2n and we have 1-to-1
matching:
 n strata + p covariates = n+p parameters
 This is problematic:
• as n gets large, so does the number of parameters
• too many parameters to estimate and a problem of
precision
 but, do we really care about the strata-specific
intercepts?
 “NUISANCE PARAMETERS”
Conditional logistic regression
 To avoid estimation of the intercepts, we can
condition on the study design.
 Huh?
 Think about each stratum:
• how many cases and controls?
• what is the probability that the case is the case and
the control is the control?
• what is the probability that the control is the case and
the case the control?
 For each stratum, the likelihood contribution is
based on this conditional probability
Conditioning
 For 1 to 1 matching: with two individuals in
stratum k where y indicates case status (1 =
case, 0 = control)
P( y1k  1, y2 k  0)
P( y1k  1, y2 k  0) 
P( y1k  1, y2 k  0)  P( y1k  0, y2 k  1)
 Write as a likelihood contribution for stratum k:
P( y1k  1 | x1k ) P( y2 k  0 | x2 k )
Lk 
P( y1k  1 | x1k ) P( y2 k  0 | x2 k )  P( y1k  0 | x1k ) P( y2 k  1 | x2 k )
Likelihood function for CLR
Substitute in our logistic representation of p and simplify:
P( y1k  1 | x1k ) P( y2 k  0 | x2 k )
Lk 
P( y1k  1 | x1k ) P( y2 k  0 | x2 k )  P( y1k  0 | x1k ) P( y2 k  1 | x2 k )

 e k  x1k 
1



 k  x1 k 
 k  x2 k 

1 e
 1  e

 e k  x2 k
1
1





 k  x2 k   
 k  x1 k 
 k  x2 k
1

e
1

e
1

e
 


 e k  x1k

 k  x1 k
1

e

e k  x1k
  k  x1k
e
 e k  x2 k
e x1k
 x1k
e  e x2 k



Likelihood function for CLR
 Now, take the product over all the strata for the full
likelihood
n
L(  )   Lk 
k 1
n
 e
k 1
e
x1k
x1k
e
x2 k
 This is the likelihood for the matched case-control design
 Notice:
• there are no strata-specific parameters
• cases are defined by subscript ‘1’ and controls by subscript ‘2’
 Theory for 1-to-M follows similarly (but not shown here)
Interpretation of β
 Same as in ‘standard’ logistic regression
 β represents the log odds ratio comparing the
risk of disease by a one unit difference in x
When to use matched vs. unmatched?
 Some papers use both for a matched design
 Tradeoffs:
• bias
• precision
 Sometimes matched design to ensure balance,
but then unmatched analysis
 They WILL give you different answers
 Gillison paper
Another approach to matched data
 use random effects models
 CLR is elegant and simple
 can identify the estimates using a ‘transformation’ of
logistic regression results
 But, with new age of computing, we have other
approaches
 Random effects models:
•
•
•
•
allow strata specific intercepts
not problematic estimation process
additional assumptions: intercepts follow normal distribution
Will NOT give identical results
. xi:
clogit control hpv16ser, group(strata) or
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-72.072957
-71.803221
-71.798737
-71.798736
Conditional (fixed-effects) logistic regression
Log likelihood = -71.798736
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
300
76.12
0.0000
0.3465
-----------------------------------------------------------------------------control | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hpv16ser |
13.16616
4.988492
6.80
0.000
6.26541
27.66742
------------------------------------------------------------------------------
. xi:
logistic control hpv16ser
Logistic regression
Log likelihood =
-145.8514
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
300
90.21
0.0000
0.2362
-----------------------------------------------------------------------------control | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hpv16ser |
17.6113
6.039532
8.36
0.000
8.992582
34.4904
------------------------------------------------------------------------------
. xi:
gllamm control hpv16ser, i(strata) family(binomial)
number of level 1 units = 300
number of level 2 units = 100
Condition Number = 2.4968508
OR = 17.63
gllamm model
log likelihood = -145.8514
-----------------------------------------------------------------------------control |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hpv16ser |
2.868541
.3429353
8.36
0.000
2.1964
3.540681
_cons | -1.464547
.1692104
-8.66
0.000
-1.796193
-1.1329
------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (strata)
var(1): 4.210e-21 (2.231e-11)
------------------------------------------------------------------------------