Evidence-Based Evaluation of Screening and Diagnostic Tests

Download Report

Transcript Evidence-Based Evaluation of Screening and Diagnostic Tests

Thomas B. Newman, MD, MPH
Andi Marmor, MD, MSEd
Outline
Overview and definitions
 Observational studies of screening
 Randomized trials of screening
 Conclusion – ecologic view

What is screening?

Common definition:
 “Testing to detect asymptomatic disease”

Better definition*:
 “Application of a test to detect a potential disease
or condition in people with no known signs or
symptoms of that disease or condition”
*Common screening tests. David M. Eddy, editor. Philadelphia, PA: American
College of Physicians, 1991
What is screening?

Common definition:
 “Testing to detect asymptomatic disease”

Better definition*:
 “Application of a test to detect a potential disease
or condition in people with no known signs or
symptoms of that disease or condition”
*Common screening tests. David M. Eddy, editor. Philadelphia, PA: American
College of Physicians, 1991
What is screening?

Common definition:
 “Testing to detect asymptomatic disease”

Better definition*:
 “Application of a test to detect a potential disease
or condition in people with no known signs or
symptoms of that disease or condition”

“ Condition” includes a risk factor for a
disease…
*Common screening tests. David M. Eddy, editor. Philadelphia, PA: American
College of Physicians, 1991
Screening Spectrum
Risk factor
Presymptomatic
disease
Unrecognized
symptomatic
disease
è Fewer people recognized and treated
è Easier to demonstrate benefit
è Less potential for harm
Recognized
symptomatic
disease
Examples of Screening Along the
Spectrum

Risk factor for disease:
 Hypercholesterolemia, hypertension

Presymptomatic disease:
 Neonatal hypothyroidism, syphilis, HIV

Unrecognized symptomatic disease:
 Vision and hearing problems in young children;
iron deficiency anemia, depression

Somewhere in between?:
 Prostate cancer, breast carcinoma in situ, more
severe hypertension
Screening for risk factors

Relationship between risk factor, disease and
treatment difficult to establish
 Does test predict disease?
 Does treatment of risk factor reduce disease?
 Does treatment reduce risk factor? (eg: CAST)
Measures of test accuracy apply to disease that
is prevalent at the time the test is done
 With risk factors, trying to measure incidence of
disease over time
 Potential for harm greatest when screening for
risk factors!

Goals of Screening for
Presymptomatic Disease

Detect disease in earlier stage than would
be detected by symptoms
 Only possible if an early detectable phase is
present
 Only beneficial if earlier treatment is more
effective than later treatment

Do this without incurring harm to the
patient
 Net benefit must exceed net harm
 Long follow up and randomized trial may be
needed to prove this
Screening for Cancer

Natural history heterogeneous
 Screening test may pick up slower growing
or less aggressive cancers
 Not all patients diagnosed with cancer will
become symptomatic

Diagnosis is subjective
 There is no gold standard
“It’s just a simple blood test.”
How can screening
be bad???
Possible harms from screening
To all
 To those with negative results
 To those with positive results
 To those not tested

Public Health Threats from
Excessive Screening

“When your only tool is a hammer, you
tend to see every problem as a nail.”
Abraham Maslow
Interventions aimed at individuals are
overemphasized
 Biggest threats are public health threats
 Biggest gains in longevity have been
PUBLIC HEALTH interventions

Top Ten Countries’ Per Capita Healthcare Spending, 1997 ($)
United States
Switzerland
Luxembourg
Germany
Canada
France
Iceland
Denmark
Netherlands
Norway
0
1000
2000
3000
4000
Anderson GF and Poullier JP Health Affairs 18;178-88 May/June 1999
5000
Potential Years of Life Lost*/100,000 population,
top 10 spending Countries, 1995
United States
Switzerland
Luxembourg
Germany
Canada
Male
Female
France
Iceland
Denmark
Netherlands
Norway
0
2000
4000
6000
8000
10000
Before age 70. From Anderson GF and Poullier JP Health Affairs 18;178-88 May/June 1999
Economic and Political Forces
behind excessive screening
Companies selling machines to do the
test
 Companies selling the test itself
 Companies selling products to treat the
condition
 Managed care organizations
 Politicians who are (or want to appear)
sympathetic

Ad by
company that
makes the
machines
Ad for:
Frosted flakes!
( no cholesterol)
Ad sponsored
by the company
that makes
interferon.
Screening as an Obligation
Copyright restrictions may apply.
Schwartz, L. M. et al. JAMA 2004;291:71-78.
Cultural characteristics
 "We
live in a wasteful, technology
driven, individualistic and deathdenying culture.“ George Annas, New Engl J Med, 1995
E-mail Excerpt
PLEASE, PLEASE, PLEASE TELL ALL
YOUR FEMALE FRIENDS AND
RELATIVES TO INSIST ON A CA-125
BLOOD TEST EVERY YEAR AS PART OF
THEIR ANNUAL PHYSICAL EXAMS. Be
forewarned that their doctors might try to
talk them out of it, saying, "IT ISN'T
NECESSARY."
…Insist on the CA-125 BLOOD TEST; DO
NOT take "NO" for an answer!
Source: Funny Times. (1-888-Funnytimes x 476)
Evaluating Studies of Screening
Screening test
Detect disease early
Treat disease
Patient outcome
Evaluating Studies of Screening
Screening test
Detect disease early
Treat disease
Patient outcome
Evaluating Studies of Screening
Screening test
Detect disease early
Treat disease
Patient outcome
Evaluating Studies of Screening

Ideal Study:
 Randomized to screen/control
 Compares outcomes in ENTIRE screened group to
ENTIRE unscreened group

Observational studies
 Compare outcomes in screened patients vs
unscreened (not randomized)
 Among patients with disease, compare outcomes
among those dx by screening vs those dx by
symptoms
Screened
R
Not screened
Screened
R
Not screened
Patients with
Disease
Patients with
Disease
D+
DD+
DD+
DD+
DScreened
Not screened
Diagnosed by
screening
Diagnosed by
symptoms
Survival from
Randomization
Survival from
Randomization
Survival from
Enrollment
Survival from
Enrollment
Survival after
Diagnosis
Survival after
Diagnosis
Survival after
Diagnosis
Survival after
Diagnosis
Biases in Observational Studies
of Screening Tests
Volunteer bias
 Lead time bias
 Length bias
 Stage migration bias
 Pseudodisease

Volunteer Bias
People who volunteer for studies differ from
those who do not
 Examples

 HIP Mammography study:
○ Women who volunteered for mammography had lower
heart disease death rates
 Coronary drug project:
○ RCT of medications for secondary prevention of CAD
○ Men who took their medicine (drug or placebo!) had
half the mortality of men who didn't

Can occur in any non-randomized trial of
screening
Avoiding Volunteer Bias
Randomize patients to screened and
unscreened groups
 Control for factors which might be
associated with both receiving screening
AND the outcome

 eg: family history, level of health concern,
other health behaviors
Lead Time Bias (zero-time bias)
Screening identifies disease during a
latent period before it becomes
symptomatic
 If survival is measured from time of
diagnosis, screening will always improve
survival even if treatment is ineffective

Lead Time Bias
Latent Phase
Biological Onset
Detectable by screening
Onset of symptoms
Death
Survival After Diagnosis
Lead Time
Detected by screening
Survival After Diagnosis
Contribution of lead time to survival
measured from diagnosis
Avoiding Lead Time Bias

Only present when survival from diagnosis
is compared between diseased persons
 Screened vs not screened
 Diagnosed by screening vs by symptoms

Avoiding lead time bias
 Measure survival from time of randomization
How Much Lead Time is Present?
Depends on relative lengths of latent phase
(LP) and screening interval (S)
 Screening interval shorter than LP:

 Maximum false increase in survival = LP
 Minimum = LP – S

Screening interval longer than LP:
 Max = LP
 Proportion of disease dx by screening = LP/S
Detectable by screening
Onset of symptoms
Death
LP
Detected by screening
Max
Min
S
Screen
Screen
Screen
Screen
Screen
Figure 1: Maximum and minimum lead time bias possible when screening
interval is shorter than latent phase
Max = LP
Min =LP – S
S
LP
Max
Screen
Screen
Screen
Figure 2: Maximum lead time bias possible when screening interval is
longer than latent phase
Max = LP
Proportion of disease diagnosed by screening: P = LP/S
Length Bias (Different Natural
History Bias)

If disease is heterogeneous:
 Slowly progressive : more time in presymptomatic
phase
 Cases picked up by screening disproportionately
those that are slowly developing

Higher proportion of less aggressive disease
in group detected by screening creates
appearance of reduced mortality even if
treatment is ineffective
Screen 1
TIME
Screen 2
Mortality when cancer
detected by screening
Mortality when cancer
detected by symptoms
Avoiding Length Bias
Only present when survival from diagnosis is
compared between diseased persons
 AND disease is heterogeneous
 Lead time bias usually present as well
 Avoiding length bias:

 Compare mortality in the ENTIRE screened group
to the ENTIRE unscreened group
Stage Migration Bias

Also called the "Will Rogers Phenomenon"
 "When the Okies left Oklahoma and moved to
California, they raised the average intelligence
level in both states."
Described by Feinstein and colleagues
(1985) as an explanation for lower stagespecific survival in a 1954 cohort of patients
with lung cancer in comparison to a 1977
cohort
 New technologies resulted in the 1977
group diagnosed with more advanced lung
cancer

Stage Migration Bias
Stage 0
Stage 0
Stage 1
Stage 1
Stage 2
Stage 2
Stage 3
Stage 3
Stage 4
Stage 4
Old test
New test
A Non-Cancer Example
“Infants in each of 3 birthweight strata
(VLBW, LBW and NBW) who are exposed to
Factor X have decreased mortality compared
with unexposed weight-matched infants”
 Is factor X beneficial?
 Maybe not! Factor X could be cigarette
smoking!

 Smoking moves otherwise healthy babies to
lower birthweight group, improving mortality in
each group
Other Examples Abound…

The more you look for disease, and the
more advanced the technology
 the higher the prevalence, the higher the
stage, and the better the (apparent)
outcome for the stage

Beware of stage migration in any
stratified analysis
 Check OVERALL survival in screened vs
unscreened group
Pseudodisease

A condition that looks just like the disease,
but never would have bothered the patient
 Type I: Indolent forms of disease which would
never cause symptoms
 Type II: Preclinical disease in people who will
die from another cause before disease presents

The Problem:
 Treating pseudodisease can only cause harm
Analogy to Double Gold Standard
Bias

Screening (test) result negative
 Clinical FU (first gold standard)

Screening (test) result positive
 Biopsy (2nd gold standard)

If pseudodisease exists
 Sensitivity (true positive rate) of screening
falsely increased
 Screening will also prolong survival among
diseased individuals
Example: Mayo Lung Project
RCT of lung cancer screening
 9,211 male smokers randomized to two
study arms

 Intervention: CXR and sputum cytology every
4 months for 6 years (75% compliance)
 Usual care: recommendation to receive
same tests annually
*Marcus et al., JNCI 2000;92:1308-16
MLP Extended Follow-up Results

Among those with lung cancer, intervention
group had more cancers diagnosed at early
stage and better survival
Marcus et al., JNCI 2000;92:1308-16
MLP Extended Follow-up Results

Intervention group: slight increase in lung-cancer
mortality (P=0.09 by 1996)
Marcus et al., JNCI 2000;92:1308-16
What happened?

After 20 years of follow up, there was a
significant increase (29%) in the total
number of lung cancers in the screened
group
 Excess of tumors in early stage
 No decrease in late stage tumors

Overdiagnosis (pseudodisease)
Black, cause of confusion and harm in cancer screening. JNCI
2000;92:1280-1
Looking for Pseudodisease

Impossible to distinguish from successfully
treated asymptomatic disease in individual
patient
 Very few compelling stories describe patients or
physician’s victories over pseudodisease…
Appreciate the varying natural history of
disease, and limits of diagnosis
 Clues to pseudodisease:

 Higher cumulative incidence of disease in screened
group
 No difference in overall mortality between screened
and unscreened groups
Better health behaviors
Screened Group
Prolonged survival
Volunteer Bias
Earlier “zero time”
Early detection
Prolonged survival
Lead Time Bias
Slower growing tumor
with better prognosis
Early detection
Higher cure rate
Length Bias
Lower stage assignment
Early detection
Higher cure rate
Stage Migration Bias
Pseudodisease
Early detection
Higher cure rate
Overdiagnosis
Screened
D+
D-
Survival from
Enrollment
Not screened
D+
D-
Survival from
Enrollment
R
Patients with
Disease
Diagnosed by
screening
Survival after
Diagnosis
Diagnosed by
symptoms
Survival after
Diagnosis
Screened
D+
D-
Survival from
Randomization
Not screened
D+
D-
Survival from
Randomization
R
Issues with RCTs of Cancer
Screening
Quality of randomization
Cause-specific vs total mortality
Poor Quality Randomization
Edinburgh mammography trial
 Randomization by healthcare practice
 7 practices changed allocation status
 Highest SES

 26% of women in control group
 53% of women in screening group

26% reduction in cardiovascular
mortality in mammography group
Cause-Specific Mortality

Problems:
 Assignment of cause of death is subjective
 Screening or treatment may have important
effects on other causes of death

Bias introduced can make screening
appear better or worse!
Example

Meta-analysis of 40 RCT’s of radiation
therapy for early breast cancer (N =
20,000)*
 Breast cancer mortality reduced (20-yr ARR
4.8%; P = .0001)
 BUT mortality from “other causes” increased
(20-yr ARR -4.3%; P = 0.003)

Were these additional deaths actually
due to screening
*Early Breast Cancer Trialists Collaborative Group. Lancet
2000;355:1757
Biases in Cause-Specific Mortality

“Sticky diagnosis” bias:
 If cancer diagnosis made, deaths of unclear
cause more often attributed to cancer
 Effect: overestimates cancer mortality in
screened group

“Slippery linkage” bias:
 Linkage lost between death and cancer
diagnosis (eg: due to screening or treatment)
 Death less likely counted in cause specific
mortality
 Effect: underestimates cancer mortality in
screened group
The truth about total mortality



Mortality from
other causes
generally exceeds
screening or
cancer-related
mortality
Effect on condition
of interest more
difficult to detect
Total mortality
more important for
some screening
tests than
others…
Conclusions -1
Promotion of screening by entities with a
vested interest and public enthusiasm
for screening are challenges to EBM
 High-quality RCT’s are needed
 Attention to study design, size of effect
and unmeasured costs

Conclusions - 2

Dysfunctional metaphors for health care *
 Military metaphor – battle disease, no cost too
high for victory, no room for uncertainty
 Market metaphor -- medicine as a business;
health care as a product; success measured
economically

Reframing of priorities is needed
*Annas G. Reframing the debate on health care reform by replacing our
metaphors. NEJM 1995;332:744-7
Reframing Priorities:
Ecology Metaphor
Sustainability
 Limited resources
 Interconnectedness
 More critical of technology
 Move away from domination, buying,
selling, exploiting
 Focus on the big picture

 Populations rather than individuals
 Causes rather than symptoms