Uses of Probabilities in Epidemiology Research

Download Report

Transcript Uses of Probabilities in Epidemiology Research

Uses of Probabilities in Epidemiology Research
C. Murray Ardies, Ph.D.
Professor of Health Sciences
Northeastern Illinois University
Textbook Definition of Epidemiology
… combination of knowledge and research methods concerned
with the distribution of determinants of health and illness in
populations and with contributors to health and control of
health problems … comprises an analytic, descriptive
component termed classical epidemiology and a component
concerned with critical appraisal of the research literature and
diagnosis and management of illness, which is termed clinical
epidemiology.
- PDQ EPIDEMIOLOGY by David Streiner and Geoffry
Norman, 1996, Mosby
My Definition of Epidemiology
(which is actually a composite definition from a dozen or more texts and websites on epidemiology):
Classical Epidemiology
Study of the incidence and distribution of determinants and
deterrents of morbidity and mortality in human populations
Modern Epidemiology
Study of the incidence and distribution of determinants and
deterrents of morbidity and mortality in manipulated & nonmanipulated human populations
(may not be a better definition, but it is shorter!!)
The research processes often (but not necessarily) start with an
epidemiology approach, either to “determine cause” of
something new that has appeared, or to figure out cause of
something that has been well characterized by symptoms but
not yet understood by etiology.
A change in incidence of “something” can very easily indicate
that “something” is going on and recognition of that change will
initiate a series of investigations (how’s that for vague).
The job of the CDC (USA) is to monitor incidence of diseases in
the US and to investigate any sudden changes; such as an
increase in the number of cases of an existing disease or simply
an increase in mortality due to unknown causes
Reports of Cancer in Plumcoulee, MB (Population 200, 1987)
Patient J
1
2
3
4
5
6
7
8
9
10
11
12
F
M
A
M
J
J
A
S
O
N
D
…………………………X
D…………………………………X
…………………………………………………………
………………………………………………….X
D...C
D……………………………X
…………C
D…………………………………
D…………………………..X
D…..
D…………………….C
D…………….X
4 Cases to start
12 Cases Total
8 New Cases
6 Deaths
Incidence – How many people get the disease
number of new cases
divided by
the number of people at risk
Annual Incidence in Plumcoulee, MB, 1987
8 new cases / 196 at risk
= 0.0408 cases / year
= 40.8 cases /1000 people / year
Some Incidence Data for Cancer (USA)
404.9 cancers / 100,000 females in 2001* (437**)
544.8 cancers / 100/000 males in 2001
(482)
0.3 cases lip cancers / 100,000 females in 2001 (0.4)
1.4 cases lip cancers / 100,000 males in 2001 (1.2)
127.2 breast cancers / 100,000 females in 2001 (135.1)
1.4 breast cancers / 100,000 males in 2001
(1.2)
45.8 colon & rectum cancers / 100,000 females in 2001 (51.4)
62.7 colon & rectum cancers / 100,000 males in 2001
(54.3)
53.2 lung & bronchus cancers / 100,000 females in 2001 (57.9)
87.7 lung & bronchus cancers / 100,000 males in 2001 (76.7)
*Age adjusted, CDC
**(crude)
Some more entertaining ways to play with the incidence data …
Relative Risk and Odds Ratio
Relative Risk
Use data from the Cholestyramine Study (Coronary Primary Prevention
Trial)
Cardiac
Still
Total
deaths
Alive
Cholestyramine
30 (A)
1870 (B)
1900
Placebo
38 (C)
1868 (D)
1906
RR = [A / (A + B)] / [C / (C + D)] = (30 / 1900) / (38 / 1906)
= 0.792 = risk of cardiac death while on the drug is 79.2 %
relative risk of cardiac death while on the drug is 79.2% of that when
not on the drug - or a risk reduction of ~ 21% by taking the drug
Odds Ratio (also called relative odds) is an approximation of RR and
is often used when disease incidence is very low and there is a long
latency period; it is commonly used in case-control studies
Using data from Wynder & Graham, JAMA, 1950
Smoker
Nonsmoker
OR = (A / C) / (B / D)
Cases
Controls
Total
659 (A)
984 (B)
1643
25 (C)
348 (D)
373
= (659 / 25) / (984 / 348) = 26.4 / 2.83
= 9.33
Another way to look at this is:
Odds of a lung cancer subject being exposed to smoke are 659/25 = 26.4
Odds of a non-cancer subject being exposed to smoke are 984/348 = 2.83
Relative Odds of lung cancer from smoke exposure are therefore 26.4 / 2.83 = 9.33
Some Probability stuff that is directly related to
the incidence stuff …
All of the prior examples looked at incidence of something from a total
population (ok, except for the Wynder data); ie.
Incidence of new cancer cases in the population of Plumcoulee …
Incidence of death due to heart disease in the USA in 2001 …
Incidence of death due to heart disease in a population of people who
participated in an experiment …
Appropriate standardized incidence rates or relative risks were then calculated
Another way to look at the same incidence (or counting) data is to consider the
numbers as illustrating probabilities rather than as fractions or ratios – of
course this means the same thing, just another and more useful term to get
used to.
Lets use the data (ok, a very tiny selected piece of data) from the 1991 USA
census …
Infant Mortality in the USA (1991)
Unmarried
Married
Total
Deaths
16,712
18,784
35,496
Alive
1,197,142
2,878,421
4,111,059
Total
1,213,854
2,897,205
4,111,563
Infant Mortality in the USA (1991)
Deaths
Alive
Total
Unmarried
16,712
1,197,142
1,213,854
Married
18,784
2,878,421
2,897,205
Total
35,496
4,075,563
4,111,059
The probability (or the Marginal Probability) of having a particular disease or
condition can be illustrated using the following formula:
P (D)
(remember this one for later)
Using the above data and the condition of infant mortality:
P “probability” (D) “infant death” = 35,496 total deaths / 4,111,059 total live births
= 0.0086 (8.6 infant deaths / 1000 live births)
Notice that this result is the total incidence of infant mortality for the
population; just presented in terms of a probability (0.0086) rather than an
incidence …
Deaths
Alive
Total
Unmarried
16,712
1,197,142
1,213,854
Infant Mortality in the USA (1991)
Married
Total
18,784
35,496
2,878,421
4,075,563
2,897,205
4,111,059
The probability of NOT having a particular disease or condition can be
illustrated using the following formula:
P (D)
(remember this one for later)
Using the above data and the condition of “infant living”:
P “probability” (D) “infant alive” = 4, 075,563 still alive @ 1 yr. / 4,111,059 total live
births
= 0.9914 (991.4 still alive @ 1 yr. / 1000 live
births)
Notice that this result is the total incidence of infant not-mortality for the
population; just presented in terms of a probability rather than an incidence …
Sample-Based Epidemiology Concepts
Deaths
Alive
Total
Unmarried
16,712
1,197,142
1,213,854
Infant Mortality in the USA (1991)
Married
Total
18,784
35,496
2,878,421
4,075,563
2,897,205
4,111,059
We rarely have the luxury of having the entire population at our disposal so we
usually take a small (or large, if you have the money and time and even larger
if you also have lots of post-docs to collate data) random sample from our
selected population and estimate the population incidence (probabilities) based
on the sample. This means that we will have errors in estimation; with big
errors if we use small numbers of people in our samples and smaller errors if we
use bigger numbers of people in our samples.
Because of the error in estimating the population parameter, we have to calculate
confidence limits for our estimate; our sample predicts a parameter but the
parameter could be smaller or larger than the predicted value – so we need to
know the range of possible values for the predicted parameter. To see how this
works we have to delve into the incredibly cool
Universe of Statistical Analysis.
The terms confidence limits and estimate of population parameters are
highly relevant to research in the health sciences because they are
statistical concepts.
Statistics and statistical analysis is nothing more than calculating
measures of probability, association, central tendency and variance of
sample data (statistics) and the probabilities that the calculated statistics
relate to the target population (statistical analysis).
Of course statistical probabilities are not exactly the same as the
actual population probabilities of infant mortality (0.0086) and infant
non-mortality (0.9914) for the USA in 1991; two separate population
parameters.
A parameter is any measure from a population while a statistic is any
measure from a sample.
If we test entire populations then we do not need statistical analysis. For
example:
If another population (lets say, another country) was measured in its’
entirety and the other country’s infant mortality and infant non-mortality
were calculated as 0.0085 and 0.9915, respectively [compared to infant
mortality (0.0086) and infant non-mortality (0.9914) for the USA in 1991]
we could conclude with absolute certainty (100% confidence) that the two
populations were completely different with regard to these two parameters
because we would be absolutely certain that the calculated numbers are
exactly descriptive of the respective populations (even though there is just a
tiny difference between the two populations). Different numbers means
different!
However, because samples are not necessarily exactly representative of the
population from which they came, differing numbers from two (or more)
different samples do not necessarily guarantee that the samples came from
two (or more) different populations.
As previously mentioned, we simply NEVER (well, not very often anyway)
have the luxury of being able to measure the entire population so we have to
suffer with a (usually) small sample that was selected from the population.
We then measure whatever it is we are interested in; lets say: “Infant
Mortality” or “Height”, and then assume that our sample represents our
population and that whatever the sample number is, that same number applies
to the entire population from which the sample was selected.
Because such an assumption may not be absolutely true; ie. the sample doesn’t
perfectly represent the population, we need to have some idea of what the
probability is that the sample does represent the population, in other words …
If there is a low probability the sample is like the population, then we won’t have
much confidence in our numbers ...
If there is a high probability our sample represents our population, then we can
have a higher level of confidence (but never 100% certain) in our numbers.
To understand how these statistical calculations are made we need to start
with a frequency distribution of the entire set of population data:
An extremely accurate, but rather cumbersome way to describe data; especially if
there were hundreds or thousands of people in the population . . . . .
A little less
accurate of a
description but a
whole lot easier to
describe because
only the shape of
the line is being
described; not
each of the
individual data
points. Note that
the shape of the
line still
accurately
describes how the
data is distributed
on the number
line, we just need
a more accurate
way to describe
the line …
And there even is a way to calculate those two parts of the curve.
(If you look at
the right and left halves of the curve separately, you may recognize them as sigmoid curves.)
The measure of central tendency most often used to describe
the peak of the data curve is called mu (µ) and the measure of
variability most often used to describe the dispersion of the
data along the number line is called the standard deviation (σ);
which is equal to the square root of the variance (σ2).
µ = ∑x/n
σ2 =
(commonly called the average – add up all the scores and divide by the total
number of scores)
∑ (x - µ)2
—————
n
(subtract the mean from each score, square each result, add up all
the squares, and then divide by n; then take the square root to get σ)
The µ corresponds to the exact point on the number line where
the central peak of the frequency distribution curve sits and the σ
corresponds to the exact point on the number line where the
data starts to spread out faster away from the mid-point.
An advantage of describing your
population in terms of how the data is
distributed on a number line using µ
and σ is that any population can be
represented by this exact same kind
of a curved line; a line often called a
normal curve.
An important property of these curves is that they are very easy to describe in
terms of mathematical probabilities. For example, we know that 50% of all
the body weights (data points) in the population are greater than the center
point (µ = 5’ 6.25”) which means there is a 0.50 probability that a randomly
selected individual is taller than 5’ 6.75”. We also know that 68.26% of all
the data points are between the 2 σ limits (4’ 1.75” to 6’ 10.75”) which
means there is a 0.6826 probability that a randomly selected individual will
be between 4’ 1.75” tall and 6’ 10.75” tall.
This graph simply illustrates more “percentages of the data distributed along the
number line” in different sections of the curve; based on how far along the number line
you go in σ units. Again, using percent as probabilities, there is a 0.3413 probability that
a randomly selected individual would be between the mean and one standard deviation
above the mean, or to put it a different way, we would be 34.13% confident that a
randomly selected individual would be somewhere between the mean and +1 sd, or 2.28%
confident that a randomly selected individual would be +2sd above the mean . . .
Note that the z-score number corresponds to the sd unit.
Now . . .to figure out where the confidence
limits actually come from in all those
epidemiology papers . . .
The “baby” data illustrates this fairly well . . .
Sample1
Sample2
Sample3
Sample4
Births
Births
Births
Births
Unmarried
35
29
33
41
Married
65
71
67
59
Total
100
100
100
100
If we randomly sampled 100 live births from all of the 4,111,059 live births in
the USA in 1991 we might find that 35 births were associated with unmarried
mothers. This would give a sample probability (statistic) of 35 unwed mothers /
100 live births = 0.35 - an estimate of the population probability (parameter)
that a birth is associated with an unmarried mother.
The sample probability (statistic) is not the correct probability for the entire
population, just the correct probability for the sample.
If we took 3 more (different) random samples from the same population, each of
100 live births, we would probably find a different probability that the birth is
associated with unwed mothers for each sample that was randomly selected; we
might get 29 / 100 = 0.29; 33 / 100 = 0.33; 41 / 100 = 0.41; and so on . . . and we
would never be 100% certain (confident) that any one sample probability would
exactly represent the population parameter.
We need some way to deal with this uncertainty so we construct confidence limits
or a confidence interval.
Sample1
Sample2
Sample3
Sample4
Marital status of samples of new mothers in the USA (1991)
Unmarried
Married
Births
35
65
Births
41
59
Births
33
67
Births
29
71
Total
100
100
100
100 …
If we could keep sampling samples (of n = 100) and calculating probabilities
forever we would end up with an infinite number of sample probabilities.
Sample probabilities close to the true population probability would appear
numerous times while those far away would appear less frequently; the most
frequently occurring sample probability (from the infinite number of samples)
would correspond to the population probability while the least frequent
probabilities would correspond to the extreme values (again, from the infinite
number of samples).
This infinite number of theoretical sample probabilities would obviously fit into
some kind of frequency distribution curve that is normally distributed. From this
theoretical Normal Distribution we can construct a confidence interval using
standard percentile scores (actually the same sd units called z-scores illustrated in
previous slides) which will then be related to just how confident we want to be;
95% confident? 90% confident? 99% confident? – just plug in the sample values
you are interested in, and appropriate z-score value that corresponds to your
chosen %-confidence level into the formula and voila: Confidence Intervals
This is another figure of that same normal curve with z-scores and percentages; the
actual z-scores that correspond to 95% and 90% of the data have been added …
Just imagine that this curve illustrates the distribution of an infinite number of
probabilities calculated from the infinite number of samples (n = 100) that were
randomly selected from the same population)
We already have some idea where the middle of this “population curve” fits on a
number line because we have the (ONE) sample estimate of that point; we are just
not 100% confident that the sample statistic is exactly the same as the population
parameter. What we need to know is the range of possible values that the actual
population center-point might be within – so we calculate that range using the
above theoretical curve …
Marital status of a sample of new mothers in the USA (1991)
Births
Unmarried
35
Married
65
Total
100
Probability
0.35
Confidence Interval - 95% (use z-score of 1.96)
0.35 x 0.65
0.35 ±
( 1.96 √
——————
100
)
=
=
0.35 ± (1.96 √0.002275)
0.35
(0.257, 0.443)
Confidence Interval - 90% (use z-score of 1.644)
=
0.35 ± (1.644 √0.002275)
=
0.35
(0.272, 0.428)
*True population probability
=
0.295
(1,213,854 / 4,111,059)
The confidence interval is simply the range of values in a frequency distribution
of values from all possible samples of the same size between which you might
expect to find the true population value (parameter), ie. The sample statistic
predicts that the parameter is 0.35 but it is 90% probable the true parameter is
somewhere between 0.272 and 0.428; and 95% probable the parameter is between
0.257 & 0.443.
These two graphs illustrate the
previous calculations as well as the
effect of sample size on the
“accuracy” of using the sample
statistics to predict the population
variance.
From the previous formula, the zscore values (1.96 or 1.644)
describe the confidence limits
between which we will look for
our predicted population “value”
The term √ (0.35 x 0.65) / 100
is a calculation of the sample
variance – note that the sample n
is part of the equation.
The larger the n, the narrower the
variance (n=1000 = .285 - .305 vs.
n=100 = .3 - .4) in predicting the
population variance.
With smaller sample sizes, or with highly variable data, or with p ~ 0 or 1, it is
problematic to accurately predict population variance using the sample
variance, so this next formula is actually used a lot more:
(2 x 100 x 0.35) + 1.962 ± 1.96 √1.962 + (4 x 100 x 0.35 x 0.65)
——————————————————————————————
2 ( 100 + 1.962)
[ previous calculation
=
35
±
=
35
± (0.257, 0.43) ]
True population probability =
(0.264, 0.447)
0.295
*You will notice that all epidemiology publications will give the confidence
intervals associated with each variable measured.
**and since computers do all the work nowadays and they can calculate exact
intervals based on the sampling distribution of P, based on the binomial
distribution, we don’t have to bother with knowing any of these formulas, just
have an idea about what the formulas are actually calculating …
Deaths
Alive
Total
Unmarried
16,712
1,197,142
1,213,854
Married
18,784
2,878,421
2,897,205
Total
35,496
4,075,563
4,111,059
Getting back to the original population data, we can now extend the
probability concept. The previous examples used total population data without
looking at any of the sub-categories of data.
Remember:
P “probability” (D) “infant death” = 35,496 total deaths / 4,111,059 total live births
= 0.0086 (8.6 infant deaths / 1000 live births)
If we are interested in the probability of infant death for a birth associated with
the unmarried status of the mother, then we have to look at the conditional
probability of the outcome; producing a new formula:
P (A | B)
= P “probability” (A “infant death” | “conditional on” B
“unmarried mother”)
P (A | B)
= f A : B / (f A + f A)
= 16,712 / (16,712 + 1,197,142) = 16,712 / 1,213,854
= 0.014 or 14 deaths / 1000 births
Deaths
Alive
Total
Unmarried
16,712
1,197,142
1,213,854
Married
18,784
2,878,421
2,897,205
Total
35,496
4,075,563
4,111,059
An equivalent formula would be:
P (A | B)
=
P (A & B)
—————— =
P (B)
f A& B / (total) f
—————————
f B / (total) f
P (A & B) = 16,712 / 4,111,059 = 0.0041
P (B)
= 1,213,854 / 4,111,059 = 0.295
P (A | B)
= 0.0041 / 0.295 = 0.014 or 14 deaths / 1000 births
Notice that the P (A | B) is simply the joint probability A & B (which =
incidence of unmarried-mother births that do not survive), divided by the
marginal probability B (which = incidence of unmarried mother births)
Now, all we have to do is take this probability approach to the concepts
of relative risk, odds ratio, attributable risk, excess risk, and linear
regression of P relative to levels of exposure …
To do this we need to go back to those probability expressions:
P (D)
P (D)
P (E)
P (E)
and a brief review of the concepts of Joint Probabilities, Marginal
Probabilities, and Conditional Probabilities.
Lets now use a sample (n = 200) to illustrate these associations between low
birthweight infants and marital status of the mother:
Unmarried
Married
Total
Low
7
7
14
Joint Probability:
(P within population)
Marginal Probability:
(P within population)
Conditional Probability:
(P within conditioned variable)
Birthweight
Normal
52
134
186
Total
59
141
200
P (unmarried mother AND low birthweight)
=
7 / 200 = 0.035
P (not unmarried AND not low birthweight)
= 134 / 200 = 0.67
P (low birthweight infant)
= 14 / 200 = 0.07
P (not low birthweight infant)
= 186 / 200 = 0.93
P (low birthweight | unmarried)
=
7 / 59
= 0.119
P (low birthweight | not unmarried)
=
7 / 141 = 0.050
Relative Risk is the ratio of 2 conditional probabilities; you simply take the
probability of the disease in question conditional on the presence of the risk
factor and divide that probability by the probability of disease conditioned on
the absence of the risk factor.
P (D | E)
RR
=
——————
P (D | E)
Birthweight
Unmarried
Married
Total
Low
7
7
14
Normal
52
134
186
Total
59
141
200
Using the infant data with low birthweight as the “disease” and unmarried
mother as the “risk” :
RR
=
P (D | E)
——————
P (D | E)
=
7 / 59
——— =
7 / 141
0.118644
———— =
0.049645
2.39
Indicating there is a 2.39 fold increase in risk for low birthweight if the mother is
unmarried; relative to the risk of low birthweight if the mother is married.
Thune & Eiliv Lund, The Influence of Physical Activity on Lung-Cancer Risk
A Prospective Study of 81,516 men & women. Int. J. Cancer: 70, 57-62, 1997
RELATIVE RISK* FOR LUNG CANCER IN MALES
Occupational Physical Activity
Sedentary
Walk
Lifting
Heavy Labor
RR
1.00
1.15
1.13
0.99
Recreational Physical Activity
Sedentary
Moderate
Regular Training
RR
1.00
0.75
0.71
95% CI
reference
0.90 - 1.47
0.87 - 1.47
0.70 - 1.41
p (trend) = 0.71
95% CI
Reference
0.60 - 0.94
0.52 - 0.97
p (trend) = 0.01
*Adjusted for age, BMI, region, smoking habits (amount and duration)
RELATIVE RISK FOR LUNG CANCER IN MALE SMOKERS (>15 cig/day)
Recreational Physical Activity
Sedentary
Moderate
Regular Exercise
RR
1.00
0.77
0.59
95% CI
0.52 – 0.96
0.35 – 0.97
p (trend) = 0.01
Odds Ratio is a slightly different concept; it compares the odds of D in the
exposed and unexposed subgroups.
OR
=
P (D | E)
——————
P (D | E)
÷
P (D | E)
——————
P (D | E)
Using the same data:
Birthweight
Unmarried
Married
Total
Low
7
7
14
P (D | E)
P (D | E)
OR = ———— ÷ ————
P (D | E)
P (D | E)
Normal
52
134
186
Total
59
141
200
7 / 59
7 / 141
0.13461
= ——— ÷ ———— = ———— = 2.58
52 / 59
134 / 141
0.05223
Indicating that the odds of a low birthweight infant for unmarried mother are
2.58 fold greater than the odds of a low birthweight infant if the mother is married
Friedenreich, CM, Bryant, HE, and Courneya, KS, Case-Control Study of Lifetime
Physical Activity and Breast Cancer Risk. Am. J. Epidemiol. 154 (4): 336-347, 2001.
ODDS RATIOS FOR BREAST CANCER IN POSTMENOPAUSAL WOMEN
Lifetime Total Physical ActivityOR
0 - 104 MET hours/week/year
104 - 128
128 - 160
160+
95% CI
1.00
0.74
0.82
0.80
0.56 - 0.98
0.62 - 1.08
0.61 - 1.06
p (trend) = 0.04
ODDS RATIOS FOR BREAST CANCER RISK IN NULLIPAROUS WOMEN
Lifetime Total Physical ActivityOR
0-104 MET hours/week/year
104 - 128
128 – 160
160+
95% CI
1.00
0.36
0.88
0.34
0.15 - 0.85
0.36 – 2.13
0.12 – 0.94
p (trend) = 0.02
Attributable Risk is a different concept altogether; it relates to absolute
differences in risk rather than relative (or ratios of) risks.
AR
=
P (D) - P (D | E)
—————————
P (D)
Birthweight
Unmarried
Married
Total
Low
7
7
14
P (D) - P (D | E)
AR = —————————
P (D)
Normal
52
134
186
=
Total
59
141
200
14 / 200 - 7 / 141
——————————
14 / 200
=
0.29
Indicating that 29% of low birthweights in the population (ok, really in the
sample) are attributed to the marital status of the mother. Of course, being
unmarried does not cause low-birthweights but rather there are many factors
associated with being an unmarried mother that may be causal.
Excess Risk also is a concept relating to absolute differences in risk rather than
relative (or ratios of) risks.
ER
=
P (D | E) -
P (D | E)
Birthweight
Unmarried
Married
Total
ER
=
Low
7
7
14
P (D | E) -
Normal
52
134
186
7
P (D | E) = ——
59
Total
59
141
200
-
7
——
141
=
0.069
Indicating that there would be an increase of ~ 7% in low birthweights in the
population (ok, really in the sample) if marital status of the mother changed
from married to unmarried. Of course, being unmarried does not cause lowbirthweights but rather there are many factors associated with being an
unmarried mother that may be causal.
Regression Analysis (several different variations, but only two will be
illustrated) is used when one or more of the variables are stratified or
continuous in nature. These analyses can illustrate how risk (or
probability) for disease may change when the degree of exposure changes.
Linear model
Px
=
P (D | X = x) = a + bx
Plotting the probabilities of D conditional on exposure level X (at each
level x determined) on a graph produces a straight line with intercept = a
and slope = b (changes in P for each unit of x).
The intercept (a) illustrates the risk
of D as a probability when exposure
= 0. The slope of the line illustrates
excess risk for each increase in E in
x units (whatever unit E was
measured in) . . .
Logistic Regression Analysis (and multiple logistic regression analysis) are
used extensively in epidemiology research because associations between the
calculated probabilities and exposure variables as measured are rarely
perfectly linear.
Px
log ( ——— ) = log (odds for D | X = x) = a + bx
1 - Px
Plotting the log of the odds of D conditional on exposure level X = (at each
level x determined) on a graph, often produces a curved line with intercept =
a and slope = b (changes in P for each unit of x). Again, the intercept
illustrates risk with exposure = 0 and the slope is the change in the log of the
OR (for each change in the level of E).
Multiple logistic regression is used when there are more than one exposure
variables measured and the log OR is a function which takes into account all
of the measured variables associated with risk.
(notice there were no formulas presented with which the a and bx are calculated)
RR, OR, AR, ER, and Logistic Regression Analysis are all used in epidemiology
research to locate and characterize possible D:E associations. Confidence
limits are always calculated for each risk relationship to illustrate the potential
error in predicting population parameters from sample statistics.
OR, RR, ER, and AR are commonly used with binary data while logistic
regression analysis is used when one or more of the E variables are either
stratified or continuous in nature . . .