M2 Medical Epidemiology

Download Report

Transcript M2 Medical Epidemiology

Multivariable Modeling
1
Multivariable Modeling
Adjustment by statistical model for the relationships of
predictors to the outcome.
Represents the frequency or magnitude of one phenomenon
as a mathematical function of “predictors” and random
variation.
–The phenomenon modeled may be continuous, e.g. HDL
cholesterol, or categorical, e.g. survival or death.
–The model consists of
a mathematical form of an equation to predict some
aspect of the distribution of the predicted phenomenon,
e.g. mean cholesterol or probability of death

a probability law that describes, on a group basis, how
individuals vary from what the equation predicts

2
Multivariable Modeling




The prediction equation typically includes.
– The exposure of interest.
– Other exposures of potential importance.
– Potential confounders.
The predictors may be.
– Continuous variables.
– Categorical variables with two or more categories, or.
– Combinations of these.
The data are used to estimate coefficients of the prediction
equation, and the magnitude of random variation.
The coefficients represent the statistical effects of the various
predictors, assuming that the other predictors are adjusted for by
holding them constant.
– The prediction equation relating the outcome to any single
predictor, holding the others constant, may be linear,
quadratic, cyclical, or of many other forms.
3
Multivariable Modeling: Multiple Linear
Regression





Models the mean of a quantitative outcome as a function of the
values of predictor variables.
Assumes independent observations with approximately
Gaussian (normal) distributions.
Contrary to what the name suggests, these models need not be
linear in the predictor variables. They are always, however,
linear in the coefficients by which the values of predictor
variables are multiplied.
Example:
E(y) = 1x1+2x2+3x3+1z1+2z2
where x1 is the value of the exposure of interest, x2 and x3 are
values of other variables that may biologically affect y, and z1
and z2 are possible confounders. The x’s and z’s may be
continuous, or values of 0-1 “dummy variables” representing
categories of qualitative variables.
The Greek coefficients are estimated from the observed data.
4
Example









Outcome is systolic BP
= 30
X1 is age
1=1.5
X2 is a dummy variable for gender (male=1,
female=0)
2 =10
Equation is: Mean BP =30+ 1.5 X1+ 10 X2+..+…
So mean systolic blood pressure=
30 +(1.5 X age)+ 10X0 for women
And for men = 30+(1.5Xage) +10X1
5
Interpretation




The most important item here is 1 =1.5 mm
Hg.
This will be reported as follows:
We found an association between age and
SBP, with a mean increase of 1.5 mmHg for
every increase in age of 1 year after
adjustment for ….
This applies equally to men and women with
the mean SBP being 10 mmHg higher in men
at every age.
6




Note that gender is not an effect modifier
because the 1.5 mmHg correlation is same
for men and women
Gender here is independently associated with
the outcome
Could be a confounder in your crude
calculations if it is associated with age in your
sample
But even if it is a confounder in the crude
calculation, the 1.5 mmHg correlation is
already adjusted for gender. (It is adjusted for
all the other variables in the equation)
7
Multiple Linear Regression




When a confounder is added into the
equation, the beta of the exposure you are
interested in becomes “adjusted” for the
confounder.
That is to say “This is the correct association
after a confounder is taken into account”.
This is how confounders are searched for in
regression. By adding each into the equation
and finding out whether the  for the
exposure of interest changes.
Whenever a  gets close to zero that variable
is taken out.
8
Regression

Think of it as each risk factor is adjusted
for all the other risk factors in the model.
9
Interpretaion





“There were 10 factors significantly
associated with the outcome in univariate
analysis. In multivariate analysis only factors
1-5 remained significant.”
Factors 1-5 are truly associated with the
outcome.
Factors 6-10 are not independently
associated with the outcome.
Factors 6-10 were confounded by factors 1-5.
 for factors 6-10 became 0 after adjustment
10
for factors 1-5
Multivariable Modeling: Multiple linear regression


Thus, 1 represents the predicted change in the
mean value of y associated with an increase of one
unit in the variable represented by x1, with the
variables represented by the other x’s and z’s held
fixed.
This type of model accommodates effect modification
through the use of interaction terms, e.g., x1z2,
which allows the effect of a change of one unit in x1 to
vary with the value of z2.
E(y) = 1x1+2x2+3x3+1z1+2z2+ x1z2.
 is thus a difference of differences: how the effect on
y of a one unit increase in x1 is itself modified by a
one unit increase in z2.
11
Multivariable Modeling: Multiple linear regression
E(y) = ………………………+ x1z2.
12
Example
Mean SBP=………+ 0.5 X age X race
 Mean SBP= 30
+1.5 X age
+ 10 X (1 for men and 0 for women)
+ 0.5 X age X (0 for white and 1 for black)
 That is to say for every increase of 1 yr
BP goes up 1.5 in whites but 1.5 + 0.5 =
2 in blacks.

13
Multivariable Modeling: Multiple logistic regression

Models the probability of a dichotomous outcome as
a function of the values of predictor variables.

Assumes independent observations with binomial
distributions.

The right side of equation is the same as linear
regression. The left side (the outcome) is different
Natural log of the odds of outcome =
1x1+2x2+3x3+1z1+2z2
e.g. the odds of response to cancer chemotherapy,
and the other symbols are all as defined above for
multiple linear regression.

14
Multivariable Modeling: Multiple logistic regression
More specifically:
 When x1 represents levels of a dichotomous predictor by the
values 0 (absent) and 1 (present), then
exp(1) is the predicted odds ratio relating predictor to outcome,

e.g., smoking to lung cancer,
adjusted for other possible predictors and confounders.

When x1 represents values of a quantitative predictor, then
exp(1) is the odds ratio between predictor and outcome,
e.g., stroke and diastolic blood pressure,
associated with a one unit increase in the predictor,
and adjusted for other possible predictors and confounders.
15
Multivariable Modeling: Multiple
Logistic Regression


Again multiple variables will be introduced to
see if the OR for others will become 1 ( or
close to 1). Or if the associated p-values will
become NS.
These variables are then dropped out of the
equation because they were not truly
associated with the outcome but were only
confounded by the other variables.
16
Multivariable Modeling: Multiple
Logistic Regression


At the end the relevant variables’ ORs
will be reported and also interactions
will be reported.
The OR will be reported as the adjusted
OR for that association. (Adjusted for all
the variables in the model)
17
Interpretaion




Moderate alcohol consumption protects from
coronary disease (OR =0.56)
It is well established that moderate alcohol
consumption CAUSES an increase in HDL.
It has been postulated that alcohol’s coronary
protective effect is mediated by raising HDL.
“When HDL level was introduced into the
model the RR for moderate drinking
increased from 0.56 to 0.77 but remained
significant.”
18
Interpretaion





HDL explains some but not all of alcohol’s
coronary protective effect.
the RR for alcohol (0.77) is independent of
it’s effect on HDL.
Alcohol offers more protection (RR 0.56)
through its effect on raising HDL.
Some of the protective effect is mediated (not
confounded) by HDL
Both HDL and alcohol are truly and
independently associated with decreased
19
coronary events.
Not a confounder




HDL is not a confounder
Why did we adjust for it?
When should you do that?
When should you not?
20
Propensity Scores





If there isn’t enough outcomes you can’t
use logistic regression.
Rifampin and Pyrazinamide versus
Isoniazid for latent TB.
411 patients. 18 cases of hepatotoxicity.
Not randomized.
Patients at higher risk for hepatotoxicity
received R/P.
21
Propensity Scores




A crude comparison would be unfair to
R/P.
Need some “adjustment’.
Typically we use logistic regression to
look for any and all factors associated
with hepatotoxicity and adjust for those.
When the outcome is rare this cannot
be done.
22
Propensity Scores




We can look for factors associated with
the treatment choice.
Certain variables (e.g. alcohol use)
make a patient more likely to receive
R/P.
These factors are given numeric scores.
The higher the score the higher the
propensity to be treated with R/P
23
Propensity Scores





You calculate propensity score for every
patient.
Compare patients with equal propensity
scores as to the incidence of the outcome.
There might be 90 patients with the same
propensity score.
They all are moderate alcohol drinkers, they
all had remote history of hepatitis, and so on.
This can accommodate many many
variables.
24
Those 90 patients



With identical high propensity scores
had a high likelihood of receiving R/P.
Guess what 75 of them received P/R
and only 15 received INH.
But NOW we can compare the
incidence of that outcome in these 75 to
these 15.
25
Typically

5 groups using the quintiles of the score
are used.
26
In a Clinical Trial of Platelet
Inhibitor




Data were collected regarding
outcomes (death etc.)
Also we have information about who
received early statin therapy.
But receiving early statin was not a
random process.
Totally up to clinical discretion
27
We want to study



The association between early statin
therapy and outcome.
BUT
The patients who received statins are
very different than those who didn’t.
28
Crude event rates




Would be unfair comparison.
PROPENSITY SCORES
We find out what factors were
associated with statin use.
For example younger patients were
more likely to receive statin.
29
Propensity Scores


Are then used to classify patients by
quintile of increasing probability of early
statin initiation. (The 1st quintile least
likely, the 5th most likely).
Patients within each quintile were
similar in their likelihood to receive a
statin.
30
31
32
Patients in



st
1
quintile
Were least likely to receive statin
Of 2391 patients144 received statin and
2247 did not.
All these 2391 patients are very similar
in all confounding factors and can be
compared.
33