T - University of Kansas Medical Center

Download Report

Transcript T - University of Kansas Medical Center

Introduction to Biostatistics
for Clinical and Translational
Researchers
KUMC Departments of Biostatistics & Internal Medicine
University of Kansas Cancer Center
Course Information
 Jo A. Wick, PhD
 Office Location: 5028 Robinson
 Email: [email protected]
 Lectures are recorded and posted at
http://biostatistics.kumc.edu under ‘Events and
Lectures’
Inferences:
Hypothesis Testing
Last Week
# Groups
2
>2
Normal or large
n
Independent
Samples
Dependent
Samples
2-sample t
Non-normal or
small n
Independent
Samples
Paired t
Normal or large
n
Dependent
Samples
Wilcoxon
Signed-Rank
Continuous outcome,
compared between groups
Independent
Samples
Wilcoxon RankSum
Non-normal or
small n
Dependent
Samples
ANOVA
Independent
Samples
2-way ANOVA
Dependent
Samples
Kruskal-Wallis
Friedman’s
Today
 Yes/No or categorical outcome compared between
groups? Chi-square tests
 Time-to-event compared between groups?
Survival Analysis
 Association between two continuous outcomes?
Correlation
 What if we want to ‘adjust’ any of these for
additional factors? Regression Methods
Chi-Square Tests
Inferences on Proportions
 When do we do when we have nominal (categorical)
data on more than one factor?
 Gender and hair color
 Menopausal status and disease stage at diagnosis
 ‘Handedness’ and gender
 Tumor response and treatment
 Presence/absence of disease and exposure
 These types of tests are looking at whether two
categorical variables are independent of one another
(versus associated)—thus, tests of this type are often
referred to as chi-square tests of independence.
Inferences on Proportions
 Remember, this is essentially looking at the
association between two outcomes, where both
are categorical (nominal or ordinal).
 Assumptions?
 ROT: No expected frequency should be less than 5 (i.e.,
nπ < 5)
 If not met, Fisher’s Exact test is appropriate.
Inferences on Proportions
 Example: Hair color and Gender
 Gender: x1 = {M, F}
 Hair Color: x1 = {Black, Brown, Blonde, Red}
Male
Female
Total
Black
Brown
Blonde
Red
Total
32 (32%)
43 (43%)
16 (16%)
9 (9%)
100
64 (32%)
16 (8%)
200
80
25
N = 300
55 (27.5%) 65 (32.5%)
87
108
What the data should look like
in the actual dataset:
Gender
Hair
Color
Male
Black
Female
Red
Female
Blonde
Hair Color and Gender
 The researcher hypothesizes that hair color is not
independent of sex.
 H0: Hair color is independent of gender (i.e., the
phenotypic ratio is the same within each gender).
 H1: Hair color is not independent of gender (i.e.,
the phenotypic ratio is different between genders).
Hair Color and Gender
 Chi-square statistics compute deviations between
what is expected (under H0) and what is
actually observed in the data:
 
2
x
O  E 
2
E
 DF = (r – 1)(c – 1)
where r is number of
rows and c is
number of columns
Hair Color and Gender
 Does it appear that this type of sample could have
come from a population where the different hair
colors occur with the same frequency within each
gender?
 OR does it appear that the distribution of hair color
is different between men and women?
Male
Female
Total
Black
Brown
Blonde
Red
Total
32 (32%)
43 (43%)
16 (16%)
9 (9%)
100
64 (32%)
16 (8%)
200
80
25
N = 300
55 (27.5%) 65 (32.5%)
87
108
Hair Color and Gender
Male
Female
Total
Black
Brown
Blonde
Red
Total
32 (32%)
43 (43%)
16 (16%)
9 (9%)
100
64 (32%)
16 (8%)
200
80
25
N = 300
55 (27.5%) 65 (32.5%)
87
108
32  7.815
 Conclusion: Reject H0: Gender and Hair Color are
independent. It appears that the researcher’s
hypothesis that the population phenotypic ratio is
different between genders is correct (p = 0.029).
Inferences on Proportions
 Special case: when you have a 2X2 contingency
table, you are actually testing a hypothesis
concerning two population proportions: H0: π1 = π2
(i.e., the proportion of males who are blonde is the
same as the proportion of females who are
blonde).
Blonde
Non-blonde
Total
Male
16 (16%)
84 (84%)
100
Female
64 (32%)
136 (68%)
200
Total
80 (26.7%)
220 (73.3%)
N = 300
Inferences on Proportions
 When you have a single proportion and have a
small sample, substitute the Binomial test which
provides exact results.
 The nonparametric Fisher Exact test can be
always be used in place of the chi-square test
when you have contingency table-like data (i.e.,
two categorical factors whose association is of
interest)—it should be substituted for the chisquare test of independence when ‘cell’ sizes are
small.
Survival Analysis
Inferences on Time-to-Event
 Survival Analysis is the class of statistical
methods for studying the occurrence (categorical)
and timing (continuous) of events.
 The event could be
 development of a disease
 response to treatment
 relapse
 death
 Survival analysis methods are most often applied
to the study of deaths.
Inferences on Time-to-Event
 Survival Time: the time from a well-defined point
in time (time origin) to the occurrence of a given
event.
 Survival data includes:
 a time
 an event ‘status’
 any other relevant subject characteristics
Inferences on Time-to-Event
 In most clinical studies the length of study period is
fixed and the patients enter the study at different
times.
 Lost-to-follow-up patients’ survival times are measured
from the study entry until last contact (censored
observations).
 Patients still alive at the termination date will have
survival times equal to the time from the study entry until
study termination (censored observations).
 When there are no censored survival times, the set
is said to be complete.
Functions of Survival Time
 Let T = the length of time until a subject
experiences the event.
 The distribution of T can be described by several
functions:
 Survival Function: the probability that an individual
survives longer than some time, t:
S(t) = P(an individual survives longer than t)
= P(T > t)
Functions of Survival Time
 If there are no censored observations, the survival
function is estimated as the proportion of patients
surviving longer than time t:
# of patients surviving longer than t
ˆ
S (t ) =
total # of patients
Functions of Survival Time
 Density Function: The survival time T has a
probability density function defined as the limit of
the probability that an individual experiences the
event in the short interval (t, t + t) per unit width
t:
P  an individual dying in the interval (t, t + t ) 
f (t ) = lim
t 0
t
Functions of Survival Time
 Hazard Function: The hazard function h(t) of
survival time T gives the conditional failure rate.
It is defined as the probability of failure during a
very small time interval, assuming the individual
has survived to the beginning of the interval:
P an individual of age t fails in the time interval (t , t + t )
h(t )  lim
t 0
t
Functions of Survival Time
 The hazard is also known as the instantaneous
failure rate, force of mortality, conditional mortality
rate, or age-specific failure rate.
 The hazard at any time t corresponds to the risk of
event occurrence at time t:
 For example, a patient’s hazard for contracting influenza
is 0.015 with time measured in months.
 What does this mean? This patient would expect to
contract influenza 0.015 times over the course of a month
assuming the hazard stays constant.
Functions of Survival Time
 If there are no censored observations, the hazard
function is estimated as the proportion of
patients dying in an interval per unit time, given
that they have survived to the beginning of the
interval:
# of patients dying in the interval beginning at time t
hˆ(t ) =
 # of patients surviving at t interval width 
=
# of patients dying per unit time in the interval
# of patients surviving at t
Estimation of S(t)
 Product-Limit Estimates (Kaplan-Meier): most
widely used in biological and medical applications
 Life Table Analysis (actuarial method): appropriate
for large number of observations or if there are
many unique event times
Methods for Comparing S(t)
 If your question looks like: “Is the time-to-event
different in group A than in group B (or C . . . )?”
then you have several options, including:
 Log-rank Test: weights effects over the entire
observation equally—best when difference is constant
over time
 Weighted log-rank tests:
• Wilcoxon Test: gives higher weights to earlier effects—better for
detecting short-term differences in survival
• Tarome-Ware: a compromise between log-rank and Wilcoxon
• Peto-Prentice: gives higher weights to earlier events
• Fleming-Harrington: flexible weighting method
 Early? Late? Proportional?
Difference is early
and maintained
Early difference
that fades
Difference
appears late
Inferences for Time-to-Event
 Example: survival in squamous cell carcinoma
 A pilot study was conducted to compare
Accelerated Fractionation Radiation Therapy
versus Standard Fractionation Radiation Therapy
for patients with advanced unresectable squamous
cell carcinoma of the head and neck.
 The researchers are interested in exploring any
differences in survival between the patients
treated with Accelerated FRT and the patients
treated with Standard FRT.
Squamous Cell Carcinoma
Gender
Male
Female
Age
Median
Range
Primary Site
Larynx
Oral Cavity
Pharynx
Salivary Glands
Stage
III
IV
Tumor Stage
T2
T3
T4
AFRT
SFRT
28 (97%)
1 (3%)
16 (100%)
0
61
30-71
65
43-78
3 (10%)
6 (21%)
20 (69%)
0
4 (25%)
1 (6%)
10 (63%)
1 (6%)
4 (14%)
25 (86%)
8 (50%)
8 (50%)
3 (10%)
8 (28%)
18 (62%)
2 (12%)
7 (44%)
7 (44%)
Inferences for Time-to-Event
 H0: S1(t) = S2(t) for all t
Overall Survival by Treatment
 H1: S1(t) ≠ S2(t) for at least one t
1.00
AFRT
SFRT
Survival Probability
0.75
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (months)
84
96
108
120
Squamous Cell Carcinoma
Median Survival Time:
AFRT: 18.38 months (2 censored)
SFRT: 13.19 months (5 censored)
Overall Survival by Treatment
1.00
AFRT
SFRT
Survival Probability
0.75
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (months)
84
96
108
120
Squamous Cell Carcinoma
Overall Survival by Treatment
Log-Rank1.00
test p-value= 0.5421
AFRT
SFRT
Survival Probability
0.75
Median Survival Time:
AFRT: 18.38 months (2 censored)
SFRT: 13.19 months (5 censored)
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (months)
84
96
108
120
Squamous Cell Carcinoma
Gender
Male
Female
Age
Median
Range
Primary Site
Larynx
Oral Cavity
Pharynx
Salivary Glands
Stage
III
IV
Tumor Stage
T2
T3
T4
AFRT
SFRT
28 (97%)
1 (3%)
16 (100%)
0
61
30-71
65
43-78
3 (10%)
6 (21%)
20 (69%)
0
4 (25%)
1 (6%)
10 (63%)
1 (6%)
4 (14%)
25 (86%)
8 (50%)
8 (50%)
3 (10%)
8 (28%)
18 (62%)
2 (12%)
7 (44%)
7 (44%)
Squamous Cell Carcinoma
 Staging of disease is also prognostic for survival.
 Shouldn’t we consider the analysis of the survival
of these patients by stage as well as by treatment?
Squamous Cell Carcinoma
Overall Survival by Treatment and Stage
1.00
AFRT/Stage 3
AFRT/Stage 4
SFRT/Stage 3
SFRT/Stage 4
Median Survival Time:
AFRT Stage 3: 77.98 mo.
AFRT Stage 4: 16.21 mo.
SFRT Stage 3: 19.34 mo.
SFRT Stage 4: 8.82 mo.
Survival Probability
0.75
0.50
0.25
0.00
0
12
24
36
48
60
72
Survival Time (Months)
Log-Rank test p-value = 0.0792
84
96
108
120
Inferences on Time-to-Event
 Concerns a response that is both categorical
(event?) and continuous (time)
 There are several nonparametric methods that can
be used—choice should be based on whether you
anticipate a short-term or long-term benefit.
 Log-rank test is optimal when the survival curves
are approximately parallel.
 Weight functions should be chosen based on
clinical knowledge and should be pre-specified.
Publication Bias
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
Publication Bias
Table 4 Risk factors for time to publication using univariate Cox regression analysis
Characteristic
# not published
# published
Hazard ratio (95% CI)
Null
29
23
1.00
Non-significant
trend
16
4
0.39 (0.13 to 1.12)
Significant
47
99
2.32 (1.47 to 3.66)
From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)
Interpretation: Significant results have a 2fold higher incidence of publication compared
to null results.
Correlation
Linear Correlation
 Linear regression assumes the linear dependence
of one variable y (dependent) on a second variable
x (independent).
 Linear correlation also considers the linear
relationship between two continuous outcomes but
neither is assumed to be functionally dependent
upon the other.
 Interest is primarily in the strength of association, not in
describing the actual relationship.
Scatterplot
42
Correlation
 Pearson’s Correlation Coefficient is used to
quantify the strength.
r
  x  x  y  y 
 x  x   y  y 
2
2
 Note: If sample size is small or data is non-normal,
use non-parametric Spearman’s coefficent.
43
Correlation
r>0
r<0
r=0
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
3-56
Inferences on Correlation
 H0: ρ = 0 (no linear association) versus
 H1: ρ > 0 (strong positive linear relationship)
 or H1: ρ < 0 (strong negative linear relationship)
 or H1: ρ ≠ 0 (strong linear relationship)
 Test statistic: t (df = 2)
Correlation
46
Correlation
* Excluding France
47
Regression Methods
What about adjustments?
 There may be other predictors or explanatory
variables that you believe are related to the
response other than the actual factor (treatment) of
interest.
 Regression methods will allow you to incorporate
these factors into the test of a treatment effect:
 Logistic regression: when y is categorical and nominal
binary
 Multinomial logistic regression: when y is categorical
with more than 2 nominal categories
 Ordinal logistic regression: when y is categorical and
ordinal
What about adjustments?
 Regression methods will allow you to incorporate
these factors into the test of a treatment effect:
 Linear regression: when y is continuous and the factors
are a combination of categorical and continuous (or just
continuous)
 Two- and three-way ANOVA: when y is continuous and
the factors are all categorical
What about adjustments?
 Regression methods will allow you to incorporate
these factors into the test of a treatment effect:
 Cox regression: when y is a time-to-event outcome
Linear Regression
 The relationship between two variables may be
one of functional dependence—that is, the
magnitude of one of the variables (dependent) is
assumed to be determined by (dependent on)
the magnitude of the second (independent),
whereas the reverse is not true.
 Blood pressure and age
 Dependent does not equate to ‘caused by’
Linear Regression
 In it’s most basic form, linear regression is a
probabilistic model that accounts for unexplained
variation in the relationship between two variables:
y = Deterministic Component + Random Error
=  mx + b  + ε
= β0 + β1x + ε
 This model is referred to as simple linear
regression.
10
Simple Linear Regression
yy ==0.78
0 +1x+ +
0.89
0 x+ε
8
y = β0 + β1x + ε
4
y
6
y  response variable
x  explanatory variable
β0  intercept
2
β1  slope
0
ε  'error'
0
2
4
6
x
8
10
Arm Circumference and Height
 Data on anthropomorphic measures from a
random sample of 150 Nepali children up to 12
months old
 What is the relationship between average arm
circumference and height?
 Data:
 Arm circumference: x = 12.4cm
 Height: x = 61.6cm
s = 1.5cm
R = (7.3cm,15.6cm)
s = 6.3cm
R = (40.9cm,73.3cm)
Arm Circumference and Height
 Treat height as continuous when estimating the
relationship
 Linear regression is a potential option--it allows us
to associate a continuous outcome with a
continuous predictor via a linear relationship
 The line estimates the mean value of the outcome for
each continuous value of height in the sample used
 Makes a lot of sense, but only if a line reasonably
describes the relationship
Visualizing the Relationship
 Scatterplot
Visualizing the Relationship
 Does a line reasonably describe the general shape
of the relationship?
 We can estimate a line using a statistical software
package
 The line we estimate will be of the form:
ŷ = β0 + β1x
 Here,ŷ is the average arm circumference for a
group of children all of the same height, x
Arm Circumference and Height
Arm Circumference and Height
Arm Circumference and Height
 How do we interpret the estimated slope?
 The average change in arm circumference for a one-unit
(1 cm) increase in height
 The mean difference in arm circumference for two
groups of children who differ by one unit (1 cm) in height
 These results estimate that the mean difference in
arm circumferences for a one centimeter difference
in height is 0.16 cm, with taller children having
greater average arm circumference
Arm Circumference and Height
 What is the estimated mean difference in arm
circumference for children 60 cm versus 50 cm
tall?
Arm Circumference and Height
 Our regression results only apply to the range of
observed data
Arm Circumference and Height
 How do we interpret the estimated intercept?
 The estimated y when x = 0--the estimated mean arm
circumference for children 0 cm tall.
 Does this make sense given our sample?
 Frequently, the scientific interpretation of the
intercept is meaningless.
 It is necessary for fully specifying the equation of a
line.
Arm Circumference and Height
 X = 0 isn’t even on the graph
Inferences using Linear
Regression
 H0: β1 = 0 (no relationship) versus
 H1: β1 > 0 (strong positive linear relationship)
 or H1: β1 < 0 (strong negative linear relationship)
 or H1: β1 ≠ 0 (strong linear relationship)
 Test statistic: t (df = n – 2)
ˆ 1
t

sˆ
1
  x  x  y  y 
i
i
s
  xi  x 
  xi  x 
2
2
Notes
 Linear regression performed with a single predictor
(one x) is called simple linear regression.
 Correlation is a measure of the strength of the linear
relationship between two continuous outcomes.
 Linear regression with more than one predictor is
called multiple linear regression.
y = β0 + β1x1 + β 2 x2 +
+ βk xk + ε
Logistic Regression
 When you are interested in describing the relationship
between a dichotomous (categorical, nominal)
outcome and a predictor x, logistic regression is
appropriate.
  
ln 
 = β0 + β1x + ε
 1  
  Pr  y = 1
 Conceptually, the method is the same as linear
regression MINUS the assumption of y being
continuous.
Logistic Regression
 Interpretation of regression coefficients is not
straight-forward since they describe the
relationship between x and the log-odds of y = 1.
 We often use odds ratios to determine the
relationship between x and y.
Odds of Death
 A logistic regression model was used to describe
the relationship between treatment and death:
 Y = {died, alive}
 X = {intervention, standard of care}
  
ln 
 = β0 + β1x + ε
 1  
  Pr  y = death 
if intervention
1
x=
2 if standard of care
Odds of Death
 β1 was estimated to be 0.69. What does this
mean?
 If you exponentiate the estimate, you get the odds ratio
relating treatment to the probability of death!
 exp(0.69) = 0.5—when treatment involves the
intervention, the odds of dying decrease by 50% (relative
to standard of care).
 Notice the negative sign—also indicates a decrease in
the chances of death, but difficult to interpret without
transformation.
Death
 β1 was estimated to be 0.41. What does this
mean?
 If you exponentiate the estimate, you get the odds ratio
relating treatment to the probability of death!
 exp(0.41) = 1.5—when treatment involves the
intervention, the odds of dying increase by 50% (relative
to standard of care).
 Notice the positive sign—also indicates an increase in
the chances of death, but difficult to interpret without
transformation.
Logistic Regression
 What about when x is continuous?
 Suppose x is age and y is still representative of
death during the study period.
  
ln 
 = β0 + β1x + ε
 1  
  Pr  y = death 
x = baseline age in years
Death
 β1 was estimated to be 0.095. What does this
mean?
 If you exponentiate the estimate, you get the odds ratio
relating age to the probability of death!
 exp(0.095) = 1.1—for every one-year increase in age,
the odds of dying increase by 10%.
 Notice the positive sign—also indicates a decrease in the
chances of death, but difficult to interpret without
transformation.
Multiple Logistic Regression
 In the same way that linear regression can
incorporate multiple x’s, logistic regression can
relate a categorical y response to several
independent variables.
 Interpretation of partial regression coefficients is
the same.
Cox Regression
 Cox regression and logistic regression are very
similar
 Both are trying to describe a yes/no outcome
 Cox regression also attempts to incorporate the timing
of the outcome in the modeling
Cox vs Logistic Regression
 Distinction between rate and proportion:
 Incidence (hazard) rate: number of “events” per
population at-risk per unit time (or mortality rate, if
outcome is death)
 Cumulative incidence: proportion of “events” that occur
in a given time period
Cox vs Logistic Regression
 Distinction between hazard ratio and odds ratio:
 Hazard ratio: ratio of incidence rates
 Odds ratio: ratio of proportions
 Logistic regression aims to estimate the odds ratio
 Cox regression aims to estimate the hazard ratio
 By taking into account the timing of events, more
information is collected than just the binary yes/no.
Proportional Hazards
Assumption
 Early? Late? Proportional?
Treatment interacts with time!
Difference is early
and maintained
Early difference
that fades
Difference
appears late
Cox Regression
 Cox Regression is what we call semiparametric
 Kaplan-Meier is nonparametric
 There are also parametric methods which assume the
distribution of survival times follows some type of
probability model (e.g., exponential)
 Can accommodate both discrete and continuous
measures of event times.
 Can accommodate multiple x’s.
 Easy to incorporate time-dependent covariates—
covariates that may change in value over the
course of the observation period
Time Dependent Covariates
 For example, evaluating the effect of taking oral
contraceptives (OCs) on stress fracture risk in
women athletes over two years—many women
switch on or off OCs .
 If you just examine risk by a woman’s OC-status at
baseline, can’t see much effect for OCs. But, you
can incorporate times of starting and stopping
OCs.
Incidence and
Prevalence
Incidence and Prevalence
 An incidence rate of a disease is a rate that is
measured over a period of time; e.g., 1/100
person-years.
 For a given time period, incidence is defined as:
# of newly - diagnosed cases of disease
# of individuals at risk
 Only those free of the disease at time t = 0 can be
included in numerator or denominator.
Incidence and Prevalence
 A prevalence ratio is a rate that is taken at a
snapshot in time (cross-sectional).
 At any given point, the prevalence is defined as
# with the illness
# of individuals in population
 The prevalence of a disease includes both new
incident cases and survivors with the illness.
Incidence and Prevalence
 Prevalence is equivalent to incidence multiplied by
the average duration of the disease.
 Hence, prevalence is greater than incidence if the
disease is long-lasting.
Measurement Error
 To this point, we have assumed that the outcome
of interest, x, can be measured perfectly.
 However, mismeasurement of outcomes is
common in the medical field due to fallible tests
and imprecise measurement tools.
Diagnostic Testing
Sensitivity and Specificity
 Sensitivity of a diagnostic test is the probability that
the test will be positive among people that have the
disease.
P(T+| D+) = TP/(TP + FN)
 Sensitivity provides no information about people that
do not have the disease.
 Specificity is the probability that the test will be
negative among people that are free of the disease.
Pr(T|D) = TN/(TN + FP)
 Specificity provides no information about people that
have the disease.
Prevalence
SN == 56/70
SP
24/30
= =30/100
= 0.80
0.80 = 0.30
Healthy
Diseased
Non-Diseased
Diseased
Diseased
Non-Diseased
Positive Diagnosis
Negative Diagnosis
Diagnosed positive
A perfect diagnostic test has
SN = SP = 1
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
A 100% inaccurate diagnostic
test has SN = SP = 0
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
Sensitivity and Specificity
 Example: 100 HIV+ patients are given a new
diagnostic test for rapid diagnosis of HIV, and 80 of
these patients are correctly identified as HIV+
What is the sensitivity of this new diagnostic test?
 Example: 500 HIV patients are given a new
diagnostic test for rapid diagnosis of HIV, and 50 of
these patients are incorrectly specified as HIV+
What is the specificity of this new diagnostic test?
(Hint: How many of these 500 patients are correctly
specified as HIV?)
Positive and Negative Predictive
Value
 Positive predictive value is the probability that a person
with a positive diagnosis actually has the disease.
Pr(D+|T+) = TP/(TP + FP)
 This is often what physicians want-patient tests
positive for the disease; does this patient actually have
the disease?
 Negative predictive value is the probability that a
person with a negative test does not have the disease.
Pr(D-|T-) = TN/(TN + FN)
 This is often what physicians want-patient tests
negative for the disease; is this patient truly disease
free?
NPV
56/62 == 0.63
0.90
PPV == 24/38
Healthy
Diseased
Non-Diseased
Diseased
Diseased
Non-Diseased
Positive Diagnosis
Negative Diagnosis
Diagnosed positive
A perfect diagnostic test has
PPV = NPV = 1
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
A 100% inaccurate diagnostic
test has PPV = NPV = 0
Healthy
Diseased
Non-Diseased
Diseased
Positive Diagnosis
Negative Diagnosis
PPV and NPV
 Example: 50 patients given a new diagnostic test
for rapid diagnosis of HIV test positive, and 25 of
these patients are actually HIV+.
What is the PPV of this new diagnostic test?
 Example: 200 patients given a new diagnostic test
for rapid diagnosis of HIV test negative, but 2 of
these patients are actually HIV+.
What is the NPV of this new diagnostic test? (Hint:
How many of these 200 patients testing negative for
HIV are truly HIV?)