Transcript Lecture5

Biostatistics
Lecture 5 (3/21 & 3/22/2016)
Chapter 6 Probability and
Diagnostic Tests - II
Outline
•
•
•
•
6.1 Operations on Events and Probability
6.2 Conditional Probability
6.3 Bayes’ Theorem
6.4 Diagnostic Tests
–
–
–
–
6.4.1 Sensitivity and Specificity
6.4.2 Application of Bayes’ Theorem
6.4.3 ROC Curves
6.4.4 Prevalence evaluation
• 6.5 The Relative Risk (RR) and the
Odds Ratio (OR)
6.4.4 Prevalence
evaluation
• Prevalence (盛行率) or prevalence
proportion, in epidemiology, is the
proportion of a population found to have a
condition (typically a disease or a risk
factor such as smoking or seat-belt use).
• It is usually expressed as a fraction, as a
percentage or as the number of cases per
10,000 or 100,000 people.
Example #1
• A program was conducted to screen HIV
infections in mothers. (The purpose
is to know whether a mother is infected or
not.)
• One cannot test on mothers. They may
not like to be tested at all.
• Instead, one can test naturally on newborn babies to understand infections in
mothers.
Example #1 – cont’d
• Since maternal antibodies cross the
placenta, the presence of antibodies in an
infant signals infection in the mother.
• No verification of the results is possible.
(No mother is tested!!!)
• Is the result from testing
newborns can really represent
HIV prevalence in mothers?
Defining various events
• H : the event that a mother is infected
with HIV.
• HC : the event that a mother is NOT
infected with HIV.
• n : total number of infants tested
• n+ : number of infants with positive results
• T+ : the event for a positive test result for
an infant
• T- : the event for a negative test result for
an infant
Taking Manhattan for
example
• n = 50,364 infants were tested and n+ =
799 were positive, that is:
• In other words, P(T+) = 0.0159 (from
infants)
• We want to know P(H) : the prevalence of
mother infection.
• Is it true P(H) = P(T+) = 0.0159?
A test is not perfect…
• If the screen tests were perfect, then
P(H) = P(T+) = 0.0159.
• We may both true positive and false
positive cases.
• Similarly, we may have both true
negative and false negative.
Infants tested positive came
from two sources:
Mother infected, and
infant tested positive
(true positive)
Mother not infected,
and infant tested
positive (false positive)
P(T  )  P(T   H )  P(T   H C )
 P(T  | H ) P( H )  P(T  | H C ) P( H C )


 P(T | H ) P( H )  P(T | H )[1  P( H )]
C
(from previous page)



P(T )  P(T  H )  P(T  H )
C
 P(T  | H ) P( H )  P(T  | H C ) P( H C )
 P(T  | H ) P( H )  P(T  | H C )[1  P( H )]
P(T  )  P(T  | H ) P( H )  P(T  | H C )  P(T  | H C ) P( H )
P(T  )  P( H )[ P(T  | H )  P(T  | H C )]  P(T  | H C )
P(T  )  P(T  | H C )  P( H )[ P(T  | H )  P(T  | H C )]
• Solving the previous equation for P(H) leads to:
• P(T+ | H) : Those infected mothers being
tested positive in infants. This is the
sensitivity of the test.
• P(T+ | Hc) = 1 – P (T- | Hc), the last term
represents healthy mothers being tested
negative in infants. This is the specificity
of the test.
• Assuming that this test has 0.99 sensitivity
and 0.998 specificity:
• A scale-down from 0.0159 to 0.0141.
Consider a different region…
• In upstate urban region of New York:
• By using the same formula, we have:
• This however, turns into a negative
prevalence? [Making no sense!!!]
A brief summary
• Note that P(T+)=0.0014 in the
second case, which is very small
comparing with 0.0159 in the first
case.
• The testing procedure is not
accurate enough to measure the
very low prevalence of HIV in the
second case.
6.5.1 The Relative Risk
• Relative risk (RR) is the risk of an
event (or risk of developing a
disease) relative to exposure. (For
example, exposed to second-hand
smoke…)
Cont’d
• Often useful to compare the
probabilities of disease in two
different groups or situations.
• Relative risk is a ratio of the
probability of the event occurring in
the exposed group versus a nonexposed (often called a control)
group.
The Relative Risk (RR)
Disease (D)
No Disease (~D)
Exposure (E)
No Exposure
(~E)
a
c
a+c
b
d
b+d
P( D|E )
a /( ac)
RR 

P( D|~ E ) b /(bd )
risk to the
exposed
risk to the
unexposed
Example #2
• It has been proposed that women
first gave birth at an older age are
more susceptible to breast cancer.
Example #2 – cont’d
• Two groups are considered:
– One is “exposed” (to danger) if she first
gave birth at 25 or older. Out of 1,628
women in this group, 31 were
diagnosed with cancer.
– The “unexposed” group (first gave birth
younger than 25). Out of 4,540 in this
group, 65 developed cancer.
- Out of 1,628 women who first gave birth at 25
or older, 31 were diagnosed with cancer.
- Out of 4,540 women who first gave birth
younger than 25, 65 developed cancer.
Exposure (E)
No Exposure
(~E)
Disease (D)
a=31
b=65
No Disease (~D)
c=1,597
d=4,475
a+c=1,628 b+d=4,540
31 / 1628
a
/(
a

c
)
RR 

 1.33
b /(bd ) 65 / 4540
6.5.2 The Odds
(wiki)
• The odds in favor of an event is the ratio
of the probability that the event will
•
happen to the probability that the
event will not happen.
Often 'odds' are quoted as odds
against, rather than as odds in favor.
The odds ‘in favor of’
• If an event takes place with probability p,
the odds in favor of the event are p/(1p)
to 1. If p=0.5, for example, the odds are
0.5/0.5=1 to 1. (Or we call it 50/50.)
• For example, if you chose a random day
of the week (7 days), then the odds that
you would choose a Sunday would be
1/ 7
1/ 7 1


1  1/ 7 6 / 7 6
Note that the probability of picking up
Sunday would be 1/7.
The odds ‘against’
• The odds against you choosing Sunday
are 6/1=6, meaning that it's 6 times more
likely that you don't choose Sunday.
• These 'odds' are actually relative
probabilities.
The Odds Ratio (OR)
• As the name suggested, the odds ratio
(OR) is the ratio between two odds –
the odds for a disease to occur in an
exposed group to the odds for a
disease to occur in an unexposed
(control) group:
P(disease | exposed ) /[1  P(disease | exposed )]
OR 
P(disease | unexposed ) /[1  P(disease | unexposed )]
Cont’d
• OR can also be defined as the odds of
exposure among diseased individuals
divided by the odds of exposure among
those who are not diseased, as
P(exposure | diseased ) /[1  P(exposure | diseased )]
OR 
P(exposure | nondisease d ) /[1  P(exposure | nondisease d )]
• Odds ratio is also known as “relative
odds”.
Example #3
• Among 989 women who had breast cancer,
273 had previously used oral contraceptives
(口服避孕藥) and 716 had not.
• Of 9,901 women who did not have breast
cancer, 2,641 had previously used oral
contraceptives and 7,260 had not.
• We’d like to know the OR for women
previously used oral contraceptives
to have breast cancer.
• In this case, ‘exposure’ represents previously
using oral contraceptives, ‘diseased’ means
having breast cancer and ‘nondiseased’ means
not having breast cancer.
• - Among 989 women who had breast cancer, 273
had previously used oral contraceptives (口服避孕
藥) and 716 had not.
• - Of 9,901 women who did not have breast
cancer, 2,641 had previously used oral
contraceptives and 7,260 had not.
P(exposure | diseased )[1  P(exposure | diseased )]
OR 
P(exposure | nondisease d )[1  P(exposure | nondisease d )]
(273 / 989) /[1  (273 / 989)]

 1.0481
(2641 / 9901) /[1  (2641 / 9901)]
Building a frequency
table for these statistics
Cancer
oral
contraceptives
No oral
contraceptives
273
No
Cancer
2641
2941
716
7260
7976
989
9901
• Rephrase the statistics:
– Among 2914 women who had previously used
oral contraceptives, 273 had breast cancer.
- – Among 7976 women who had not previously
used oral contraceptives, 716 had breast
cancer.
P( disease | exposed ) /[1  P( disease | exposed )]
OR 
P( disease | unexposed ) /[1  P( disease | unexposed )]
( 273 / 2914) /[1  ( 273 / 2914)]

(716 / 7976) /[1  (716 / 7976)]
 1.0481
Conclusion
• Women who have used oral
contraceptives have an odds of
developing breast cancer that is only
1.0481 times the odds of non-users.
• This is not significant, meaning that one
cannot conclude that women using oral
contraceptives is susceptible to breast
cancer.
Summary
• Some studies use relative risks (RRs) to
describe results; others use odds ratios
(ORs). Both are calculated from simple 2x2
tables. The question of which statistic to use
is subtle but very important.
• OR and RR are usually comparable in
magnitude when the disease studied is rare
(e.g., most cancers). However, an OR can
overestimate and magnify risk, especially
when the disease is more common (e.g.,
hypertension) and should be avoided in
such cases if RR can be used.
Reminder
• Next Tuesday (3/29/2016) we will
have the first mid-term exam.
• This test covers up to Chapter 6
(inclusive).
• A close-book test (using hand
calculator, Excel or MATLAB only).
Exercise problem - 1
• Suppose that you just came back from a
winter vacation in Fukushima (日本福島
縣). Because of the threat of radiation
leak (輻射外洩) from the nuclear power
plant damaged by the recent tsunami (海
嘯), you thought it would be safe if you
have a medical exam in a hospital.
• After the check, the doctor told you that
your test for radiation is positive.
Exercise problem - 1
• The radiation test you received has the
following facts:
– (1) Among every 1,000 people got radiated,
983 are tested positive and 17 negative
– (2) Among every 1,000 healthy people (did
not get the radiation), 975 would be tested
negative and 25 positive.
– (3) For every 1,000 tourists visited the same
area during the same period of time, only
one would be infected from a statistical point
of view.
Cont’d
• Based on the provided information,
estimate the probability that you
were actually infected. [Write down
your variable definition, formula, and
computation steps in details.]
Solution
• Let “D+” be the sample for diseased people,
“D” be non-diseased.
• Let “+” represents people having the test
positive, and “–“ be the ones tested negative.
• 1st statement says “Among every 1,000
people got radiated, 983 are tested
positive and 17 negative”.
• This means P(+|D+)=0.983, or
P(|D+)=0.017.
• Let “D+” be the sample for diseased people,
“D” be non-diseased.
• Let “+” represents people having the test
positive, and “–“ be the ones tested negative.
• The 2nd statement says “Among every
1,000 healthy people (did not get the
radiation), 975 would be tested negative
and 25 positive.”
• This means P(|D)=0.975, or
P(+|D)=0.025
• Let “D+” be the sample for diseased people,
“D” be non-diseased.
• Let “+” represents people having the test
positive, and “–“ be the ones tested negative.
• The 3rd statement says “For every 1,000
tourists visited the same area during the
same period of time, only one would be
infected”.
• This means P(D+)=0.001, or P(D)=0.999.
(1) P(+|D+)=0.983, or P(|D+)=0.017.
(2) P(|D)=0.975, or P(+|D)=0.025
(3) P(D+)=0.001, or P(D)=0.999.
• According to Bayes’ theorem, we wish to
know P(D+|+), which is given by the formula
(note that D and D+ are mutually exclusive)
P( | D ) P( D )
P( D  | ) 
P (  | D  ) P ( D  )  P (  | D ) P ( D )
0.983  0.001
983
983
983




0.983  0.001  0.025  0.999 983  25  999 983  24975 25958
 0.03787
Exercise problem - 2
• Continue from previous question.
Assuming that you have been worried
about the test accuracy and decided to
take the same test again for a second
time (following your first positive test).
• The result, unfortunately, showed positive
again. Now, please estimate the
probability that you actually got infected
based on the fact that you have been
repeatedly tested positive for 2
consecutive times.
Solution
• Let “++” be people having tested positive on
their second test provided their first test is
also positive. We wish to know P(D+|++),
which is given by the formula (note that D
and D+ are still mutually exclusive)
P( | D ) P( D )
P( D  | ) 
P( | D  ) P( D )  P ( | D ) P ( D )
(0.983) 2  0.001
983 2
966289



2
2
2
2
0.983  0.001  (0.025)  0.999 983  25  999 966289  624375
966289

 0.60748
1590664
Comments
• Recall that the test has very good
sensitivity of 0.983 and very good
specificity of 0.975.
• The second positive test result greatly
boosted up the radiation probability
from 0.03787 to 0.60748.
• The raise wouldn’t be so significant if
the sensitivity or specificity are not that
great (as in out quiz #2 problem).