Lecture 18 Matched Case Control Studies
Download
Report
Transcript Lecture 18 Matched Case Control Studies
Lecture 18
Matched Case Control Studies
BMTRY 701
Biostatistical Methods II
Matched case control studies
References:
• Hosmer and Lemeshow, Applied Logistic Regression
• http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mc
c.pdf
• http://staff.pubhealth.ku.dk/~bxc/Talks/NestedMatched-CC.pdf
• http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap49/s
ect35.htm
• http://www.ats.ucla.edu/stat/sas/library/logistic.pdf
(beginning page 5)
Matched design
Matching on important factors is common
OP cancer:
• age
• gender
Why?
• forces the distribution to be the same on those
variables
• removes any effects of those variables on the
outcome
• eliminates confounding
1-to-M matching
For each ‘case’, there is a matched ‘control
Process usually dictates that the case is
enrolled, then a control is identified
For particularly rare diseases or when large N is
required, often use more than one control per
case
Logistic regression for matched case control
studies
Recall independence
iid
yi ~ Bern ( pi )
e 0 1 xi
~ Bern
0 1 xi
1
e
iid
But, if cases and controls are matched, are they
still independent?
Solution: treat each matched set as a stratum
one-to-one matching: 1 case and 1 control per stratum
one-to-M matching: 1 case and M controls per stratum
Logistic model per stratum: within stratum,
independence holds.
e k xi
pk ( xi )
1 e k xi
We assume that the OR for x and y is constant across
strata
How many parameters is that?
Assume sample size is 2n and we have 1-to-1
matching:
n strata + p covariates = n+p parameters
This is problematic:
• as n gets large, so does the number of parameters
• too many parameters to estimate and a problem of
precision
but, do we really care about the strata-specific
intercepts?
“NUISANCE PARAMETERS”
Conditional logistic regression
To avoid estimation of the intercepts, we can
condition on the study design.
Huh?
Think about each stratum:
• how many cases and controls?
• what is the probability that the case is the case and
the control is the control?
• what is the probability that the control is the case and
the case the control?
For each stratum, the likelihood contribution is
based on this conditional probability
Conditioning
For 1 to 1 matching: with two individuals in
stratum k where y indicates case status (1 =
case, 0 = control)
P( y1k 1, y2 k 0)
P( y1k 1, y2 k 0)
P( y1k 1, y2 k 0) P( y1k 0, y2 k 1)
Write as a likelihood contribution for stratum k:
P( y1k 1 | x1k ) P( y2 k 0 | x2 k )
Lk
P( y1k 1 | x1k ) P( y2 k 0 | x2 k ) P( y1k 0 | x1k ) P( y2 k 1 | x2 k )
Likelihood function for CLR
Substitute in our logistic representation of p and simplify:
P( y1k 1 | x1k ) P( y2 k 0 | x2 k )
Lk
P( y1k 1 | x1k ) P( y2 k 0 | x2 k ) P( y1k 0 | x1k ) P( y2 k 1 | x2 k )
e k x1k
1
k x1 k
k x2 k
1 e
1 e
e k x2 k
1
1
k x2 k
k x1 k
k x2 k
1
e
1
e
1
e
e k x1k
k x1 k
1
e
e k x1k
k x1k
e
e k x2 k
e x1k
x1k
e e x2 k
Likelihood function for CLR
Now, take the product over all the strata for the full
likelihood
n
L( ) Lk
k 1
n
e
k 1
e
x1k
x1k
e
x2 k
This is the likelihood for the matched case-control design
Notice:
• there are no strata-specific parameters
• cases are defined by subscript ‘1’ and controls by subscript ‘2’
Theory for 1-to-M follows similarly (but not shown here)
Interpretation of β
Same as in ‘standard’ logistic regression
β represents the log odds ratio comparing the
risk of disease by a one unit difference in x
When to use matched vs. unmatched?
Some papers use both for a matched design
Tradeoffs:
• bias
• precision
Sometimes matched design to ensure balance,
but then unmatched analysis
They WILL give you different answers
Gillison paper
Another approach to matched data
use random effects models
CLR is elegant and simple
can identify the estimates using a ‘transformation’ of
logistic regression results
But, with new age of computing, we have other
approaches
Random effects models:
•
•
•
•
allow strata specific intercepts
not problematic estimation process
additional assumptions: intercepts follow normal distribution
Will NOT give identical results
. xi:
clogit control hpv16ser, group(strata) or
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-72.072957
-71.803221
-71.798737
-71.798736
Conditional (fixed-effects) logistic regression
Log likelihood = -71.798736
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
300
76.12
0.0000
0.3465
-----------------------------------------------------------------------------control | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hpv16ser |
13.16616
4.988492
6.80
0.000
6.26541
27.66742
------------------------------------------------------------------------------
. xi:
logistic control hpv16ser
Logistic regression
Log likelihood =
-145.8514
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
300
90.21
0.0000
0.2362
-----------------------------------------------------------------------------control | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hpv16ser |
17.6113
6.039532
8.36
0.000
8.992582
34.4904
------------------------------------------------------------------------------
. xi:
gllamm control hpv16ser, i(strata) family(binomial)
number of level 1 units = 300
number of level 2 units = 100
Condition Number = 2.4968508
OR = 17.63
gllamm model
log likelihood = -145.8514
-----------------------------------------------------------------------------control |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------hpv16ser |
2.868541
.3429353
8.36
0.000
2.1964
3.540681
_cons | -1.464547
.1692104
-8.66
0.000
-1.796193
-1.1329
------------------------------------------------------------------------------
Variances and covariances of random effects
------------------------------------------------------------------------------
***level 2 (strata)
var(1): 4.210e-21 (2.231e-11)
------------------------------------------------------------------------------