Evidence-Based Medicine: Effective Use of the Medical
Download
Report
Transcript Evidence-Based Medicine: Effective Use of the Medical
Evidence-Based Medicine:
Effective Use of the Medical
Literature
Edward G. Hamaty Jr., D.O. FACCP, FACOI
Appraising Diagnosis Articles
Diagnosis
•
A diagnosis study is a prospective study with independent, blind comparison.
•
Diagnosis research design is different from the other types of research design
discussed in this module and is not represented in the levels of evidence pyramid.
Diagnosis research design involves the comparison of two or more diagnostic tests
that are both applied to the same study population. One of the diagnostic tests
applied to the study population is the reference standard, or “gold” standard; this
tool acts as the standard of test sensitivity and specificity against which the other
test is compared. Sensitivity and specificity are two measures that describe the
efficacy of a diagnostic tool in comparison to the reference standard diagnostic
tool.
•
Sensitivity is the proportion of people with the target disorder who have a positive
test result.
•
Specificity is the proportion of people without the target disorder who have a
negative test result.
•
To alleviate bias in diagnosis research, the reference standard and the test in
question are applied independently and the researchers interpreting the results
are blinded to the results of the other diagnostic test.
•
Diagnosis research design is also used evaluate screening tools.
Diagnosis
Is the Study Valid?
• 1) Was there a clearly defined question?
• What question has the research been
designed to answer? Was the question
focused in terms of the population group
studied, the target disorder and the test(s)
considered?
Is the Study Valid?
• 2) Was the presence or absence of the target disorder confirmed
with a validated test ('gold' or reference standard)?
• How did the investigators know whether or not a patient in the
study really had the disease?
• To do this, they will have needed some reference standard test (or
series of tests) which they know 'always' tells the truth. You need to
consider whether the reference standard used is sufficiently
accurate.
• Were the reference standard and the diagnostic test interpreted
blind and independently of each other?
• If the study investigators know the result of the reference standard
test, this might influence their interpretation of the diagnostic test
and vice versa.
Is the Study Valid?
• 3) Was the test evaluated on an appropriate spectrum of patients?
• A test may perform differently depending upon the sort of patients
on whom it is carried out. A test is going to perform better in terms
of detecting people with disease if it is used on people in whom the
disease is more severe or advanced.
• Similarly, the test will produce more false positive results if it is
carried out on patients with other diseases that might mimic the
disease that is being tested for.
• The issue to consider when appraising a paper is whether the test
was evaluated on the typical sort of patients on whom the test
would be carried out in real life.
Is the Study Valid?
• 4) Was the reference standard applied to all patients?
• Ideally, both the test being evaluated and the reference
standard should be carried out on all patients in the
study. For example, if the test under investigation
proves positive, there may be a temptation not to
bother administering the reference standard test.
• Therefore, when reading the paper you need to find
out whether the reference standard was applied to all
patients. If it wasn't, look at what steps the
investigators took to find out what the 'truth' was in
patients who did not have the reference test.
Is the Study Valid?
Is the Study Valid?
• Is it clear how the test was carried out?
• To be able to apply the results of the study to
your own clinical practice, you need to be
confident that the test is performed in the
same way in your setting as it was in the
study.
Is the Study Valid?
• Is the test result reproducible?
• This is essentially asking whether you get the same
result if different people carry out the test, or if the
test is carried out at different times on the same
person.
• Many studies will assess this by having different
observers perform the test, and measuring the
agreement between them by means of a kappa
statistic. The kappa statistic takes into account the
amount of agreement that you would expect by
chance. If agreement between observers is poor, then
the test is not useful.
Is the Study Valid?
Kappa is often judged as providing agreement
which is:
Poor if:
Fair if:
Moderate if:
Substantial if:
Good if:
k ≤ 2.0
2.1 ≤ k ≤ 4.0
4.1 ≤ k ≤ 6.0
6.1 ≤ k ≤ 8.0
> 8.0
Is the Study Valid?
Is the Study Valid?
• Κ=1 implies perfect agreement and Κ=0 suggests that
the agreement is no better than that which would be
obtained by chance.
• There are no objective criteria for judging intermediate
values.
• However, kappa is often judged as providing
agreement which is:
• Poor if k ≤ 0.2
• Fair if 0.21 ≤ k ≤ 0.40
• Moderate if 0.41 ≤ k ≤ 0.60
• Substantial if 0.61 ≤ k ≤ 0.80
• Good if k > 0.80
Is the Study Valid?
• The extent to which the test result is
reproducible may depend upon how explicit
the guidance is for how the test should be
carried out.
• It may also depend upon the experience and
expertise of the observer.
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look
after my patients?
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look
after my patients?
Basic design of diagnostic accuracy study
Series of patients
Index test
Reference (“gold”) standard
Blinded cross-classification
Validity of diagnostic studies
1. Was an appropriate spectrum of patients included?
2. Were all patients subjected to the gold standard?
3. Was there an independent, blind or objective
comparison with the gold standard?
1. Was an appropriate spectrum of patients
included? Spectrum bias
Selected Patients
Index test
Reference standard
Blinded cross-classification
1. Was an appropriate spectrum of
patients included? Spectrum bias
• You want to find out how good chest X rays
are for diagnosing pneumonia in the
Emergency Department
• Best = all patients presenting with difficulty
breathing get a chest X-ray
• Spectrum bias = only those patients in whom
you really suspect pneumonia get a chest X
ray
2. Were all patients subjected to the gold
standard? Verification (work-up) bias
Series of patients
Index test
Reference standard
Blinded cross-classification
2. Were all patients subjected to the gold
standard? Verification (work-up) bias
• You want to find out how good is exercise ECG
(“treadmill test”) for identifying patients with
angina
• The gold standard is angiography
• Best = all patients get angiography
• Verification (work-up bias) = only patients who
have a positive exercise ECG get angiography
3. Was there an independent, blind or objective
comparison with the gold standard? Observer bias
Series of patients
Index test
Reference standard
Unblinded cross-classification
3. Was there an independent, blind or objective
comparison with the gold standard? Observer bias
• You want to find out how good is exercise ECG
(“treadmill test”) for identifying patients with
angina
• All patients get the gold standard
(angiography)
• Observer bias = the Cardiologist who does
the angiography knows what the exercise ECG
showed (not blinded)
Incorporation Bias
Series of patients
Index test
Reference standard….. includes parts of Index test
Unblinded cross-classification
Differential Reference Bias
Series of patients
Index test
Ref. Std A
Ref. Std. B
Blinded cross-classification
Validity of diagnostic studies
1. Was an appropriate spectrum of patients included?
2. Were all patients subjected to the Gold Standard?
3. Was there an independent, blind or objective
comparison with the Gold Standard?
DOR (Diagnostic Odds Ratio)
Another measure for the diagnostic accuracy of a test
is the diagnostic odds ratio (DOR), the odds for a
positive test result in diseased persons relative to the
odds of a positive result in non-diseased persons.
The DOR is a single statistic of the results in a 2 x 2
table, incorporating sensitivity as well as specificity.
Expressed in terms of sensitivity
and specificity the formula is:
DOR = [Sensitivity/(1 – Sensitivity)]/[(1 Sensitivity)/Specificity]
Are the Results Important?
• What is meant by test accuracy?
• a The test can correctly detect disease that is present
(a true positive result).
• b The test can detect disease when it is really absent (a
false positive result).
• c The test can incorrectly identify someone as being
free of a disease when it is present (a false negative
result).
• d The test can correctly identify that someone does
not have a disease (a true negative result).
• Ideally, we would like a test which produces a high
proportion of a and d and a low proportion of b and c.
Are the Results Important?
Sensitivity and specificity
• Sensitivity is the proportion of people with
disease who have a positive test. (True
Positive)
• Specificity is the proportion of people free of
a disease who have a negative test. (True
Negative)
Are the Results Important?
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look
after my patients?
Sensitivity, specificity, positive
& negative predictive values,
likelihood ratios
…aaarrrg
gh!!
2 by 2 table
+
+
Test
-
Disease
-
2 by 2 table
+
Disease
-
+
a
b
-
c
d
Test
2 by 2 table
+
+
Test
-
Disease
-
a
b
True
positives
False
positives
c
d
False
negatives
True
negatives
2 by 2 table: sensitivity
+
+
a
-
c
Disease
-
Proportion of people with the
disease who have a positive test
result.
.…a highly sensitive test will not
miss many people
Test
Sensitivity = a / a + c
2 by 2 table: sensitivity
+
+
99
-
1
Disease
-
Test
Sensitivity = a / a + c
Sensitivity = 99/100 = 99%
2 by 2 table: specificity
+
Disease
-
+
b
-
d
Test
Proportion of people without
the disease who have a negative
test result.
….a highly specific test will not
falsely identify people as having
the disease.
Specificity = d / b + d
Tip…..
• Sensitivity is useful to me
• Specificity isn’t….I want to know about the
false positives
…so……use 1-specificity which is the false
positive rate
2 by 2 table:
+
Disease
-
+
a
b
-
c
d
Test
Sensitivity = a/a+c
False positive rate = b/b+d
(same as 1-specificity)
2 by 2 table:
+
Disease
-
+
99
10
-
1
90
Test
Sensitivity = 99%
False positive rate = 10%
(same as 1-specificity)
Example
Your father went to his doctor and was told that his test
for a disease was positive. He is really worried, and
comes to ask you for help!
• After doing some reading, you find that for men of
his age:
–The prevalence of the disease is 30%
–The test has sensitivity of 50% and specificity of 90%
• “Son/Daughter, tell me what’s the
chance I have this disease?”
A disease with a
prevalence of 30%.
• 100%
Always
• 50%
maybe
• 0%
Never
The test has sensitivity
of 50% and specificity
of 90%.
Prevalence of 30%, Sensitivity of 50%, Specificity of 90%
Sensitivity
= 50%
Disease +ve
30
100
Disease -ve
15
22 people test
positive……….
of whom 15 have
the disease
Testing +ve
70
False
positive rate
= 10% (1-Sp)
7
So, chance of
disease is 15/22
about 70%
Try it again
A disease with a prevalence of 4% must be
diagnosed.
The diagnostic test has a sensitivity of 50%
and a specificity of 90%.
If the patient tests positive, what is the
chance they have the disease?
Prevalence of 4%, Sensitivity of 50%, Specificity of 90%
(Same Positive Test – Lower Prevalence)
Sensitivity
= 50%
Disease +ve
4
100
Disease -ve
2
11.6 people test
positive……….
of whom 2 have
the disease
Testing +ve
9.6
96
False
positive rate
= 10% (1-Sp)
So, chance of
disease is 2/11.6
about 17%
(vs 70% in prior
example where
prevalence was
30%)
Doctors with an average of 14 yrs experience
….answers ranged from 1% to 99%
….half of them estimating the probability as 50%
Gigerenzer G BMJ 2003;327:741-744
Hi Sensitivity D-Dimer
– High negative predictive value for PE (based on
pulmonary angiography)
– For D-dimer <500ng/mL, negative predictive
value (NPV) 91-99%
– For D-dimer >500ng/mL, sens=93%, spec=25%,
and positive predictive value (PPV) = 30%
– PPV and NPV are affected by prevalence.
– Test is also useful for DVT rule out (<500ng/mL):
NPV 92%
– If pretest probability is intermediate (27.8) you
are supposed to image, but if you order a DDimer what do the results mean?
Prevalence of 27.8% (Well’s Intermediate), Sensitivity of 93%, Specificity
of 25%
Sensitivity
= 93%
Disease +ve
28
100
Disease -ve
26
80 people test
positive……….
of whom 26 have
the disease
Testing +ve
54
72
False
positive rate
= 75% (1-Sp)
So, chance of
disease is 26/80
about 32.5%
(PPV quoted at
30%)
Incidence of PE in the General
Population
• 650,000 to 900,000/year
• Current US Population = 307,085,301
Incidence Ranges :
– From 650,000/307,085,301 = 0.0021 or 2/1000
– To 900,000/307,085,301 = 0.00293 or 3/1000
– Admittedly this includes newborns, children, etc.
but is being used for illustrative purposes.
Prevalence of 0.003%, Sensitivity of 93%, Specificity of 25%
Sensitivity
= 93%
Disease +ve
3
1000
Disease -ve
3
751 people test
positive……….
of whom 3 have
the disease
Testing +ve
748
997
False
positive rate
= 75% (1-Sp)
So, chance of
disease is with a
+ test is 3/751 or
about 0.4%
(The lower the
prevalence, the
more false
positives)
Sensitivity and specificity don’t vary with
prevalence
• Test performance can vary in different settings/ patient
groups, etc.
• Occasionally attributed to differences in disease
prevalence, but more likely is due to differences in
diseased and non-diseased spectrums
2 x 2 table: positive predictive value
+
+
Disease
PPV = a / a + b
a
b
Test
-
c
d
Proportion of people with a
positive test who have the
disease
2 x 2 table: negative predictive value
+
Disease
-
+
a
b
-
c
d
Test
NPV = d / c + d
Proportion of people with a
negative test who do not have
the disease
What’s wrong with PPV and NPV?
• Depend on accuracy of the test and
prevalence of the disease
Are the Results Important?
• Using sensitivity and specificity: SpPin and SnNout
• Sometimes it can be helpful just knowing the sensitivity
and specificity of a test, if they are very high.
• If a test has high specificity, i.e. if a high proportion of
patients without the disorder actually test negative, it is
unlikely to produce false positive results. Therefore, if the
test is positive it makes the diagnosis very likely.
• This can be remembered by the mnemonic SpPin: for a test
with high specificity (Sp), if the test is positive, then it rules
the diagnosis 'in'.
• Similarly, with high sensitivity a test is unlikely to produce
false negative results. This can be remembered by the
mnemonic SnNout: for a test with high sensitivity (Sn), if
the test is negative, then it rules 'out' the diagnosis.
Are the Results Important?
SnNOut
SpPIn
Sen/Spec/PPV/LR of WBC Count>20
Are the Results Important?
• These measures are combined into an overall
measure of the efficiency of a diagnostic test
called the likelihood ratio: the likelihood that a
given test result would be expected in a patient
with the target disorder compared to the
likelihood that the same result would be
expected in a patient without the disorder
(With/Without).
– These possible outcomes of a diagnostic test are
illustrated below (sample data from Andriole 1988)
Are the Results Important?
• Positive Predictive Value = the proportion of
people with a positive test who have disease.
• True+/(True+ plus False+)
• Negative Predictive Value = the proportion of
people with a negative test who are free of
disease.
• True-/(True- plus False-)
Likelihood ratios are extremely valuable !
Likelihood ratios
• Can be used in situations with more than 2
test outcomes
• Allow a direct link from pre-test probabilities
to post-test probabilities
2 x 2 table: positive likelihood ratio
+
+
a
Disease
-
How much more often a positive test occurs in
people with compared to those without the
disease
b
LR+ = a/a+c / b/b+d
Test
or
-
c
d
LR+ = sens/(1-spec)
2 x 2 table: negative likelihood ratio
+
+
a
Disease
-
b
LR- = c/a+c / d/b+d
Test
-
How less likely a negative test result is in
people with the disease compared to those
without the disease
or
c
d
LR- = (1-sens)/(spec)
LR<0.1 = strong
negative test
result
LR=1
No diagnostic
value
LR>10 = strong
positive test
result
Likelihood Ratios
3.4
McGee: Evidence based Physical Diagnosis (Saunders Elsevier)
%
Bayesian
reasoning
Pre test 5%
Post test 20%
? Appendicitis:
McBurney tenderness
LR+ = 3.4
Fagan nomogram
%
Are the Results Important?
• What Likelihood Ratios Were Associated With the Range of
Possible Test Results?
• The starting point of any diagnostic process is the patient
presenting with a constellation of symptoms and signs.
• Consider two patients with nonspecific chest pain and shortness of
breath without findings suggesting diagnoses such as pneumonia,
airflow obstruction, or heart failure, in whom the clinician suspects
pulmonary embolism.
• One is a 78-year-old woman 10 days after surgery and the other is a
28-year-old man experiencing a high level of anxiety.
• Our clinical hunches about the probability of pulmonary embolism
as the explanation for these two patients' complaints -- that is, their
pretest probabilities -- are very different.
• In the older woman, the probability is high; in the young man, it is
low. As a result, even if both patients have intermediate-probability
ventilation-perfusion scans, subsequent management is likely to
differ in each. One might well treat the elderly woman immediately
with heparin but order additional investigations in the young man.
Are the Results Important?
• Two conclusions emerge from this line of reasoning.
• First, regardless of the results of the ventilationperfusion scan, they do not tell us whether pulmonary
embolism is present. What they do accomplish is to
modify the pretest probability of that condition,
yielding a new posttest probability.
• The direction and magnitude of this change from
pretest to posttest probability are determined by the
test's properties, and the property of most value is the
likelihood ratio.
As depicted in Table 1C-3, constructed
from the results of the PIOPED study,
there were 251 people with
angiographically proven pulmonary
embolism and 630 people whose
angiograms or follow-up excluded that
diagnosis.
For all patients, ventilation-perfusion
scans were classified into four levels:
high probability, intermediate
probability, low probability, and
normal or near-normal.
How likely is a high-probability scan
among people who do have
pulmonary embolism? Table 1C-3
shows that 102 of 251 (or 0.406)
people with the condition had highprobability scans. How often is the
same test result, a high-probability
scan, found among people in whom
pulmonary embolism was suspected
but has been ruled out? The answer is
14 of 630 (or 0.022) of them.
The ratio of these two likelihoods is
called the likelihood ratio (LR); for a
high-probability scan, it equals 0.406 ÷
0.022 (or 18.3). In other words, a highprobability ventilation-perfusion scan
is 18.3 times as likely to occur in a
patient with--as opposed to without--a
pulmonary embolism.
In a similar fashion, we can calculate the
likelihood ratio for each level of the
diagnostic test results.
Each calculation involves answering two
questions: First, how likely it is to obtain a
given test result (say, a low-probability
ventilation-perfusion scan) among people
with the target disorder (pulmonary
embolism)?
Second, how likely it is to obtain the same
test result (again, a low-probability scan)
among people without the target disorder?
For a low-probability ventilation-perfusion
scan, these likelihoods are 39/251 (0.155)
and 273/630 (0.433), respectively, and their
ratio (the likelihood ratio for lowprobability scan) is 0.36. Table 1C-3
provides the results of the calculations for
the other scan results.
Likelihood Ratios
• What do all these numbers mean? The Likelihood ratios indicate by how
much a given diagnostic test result will raise or lower the pretest
probability of the target disorder. A likelihood ratio of 1 means that the
posttest probability is exactly the same as the pretest probability.
Likelihood ratios >1.0 increase the probability that the target disorder is
present, and the higher the likelihood ratio, the greater is this increase.
Conversely, likelihood ratios <1.0 decrease the probability of the target
disorder, and the smaller the likelihood ratio, the greater is the decrease in
probability and the smaller is its final value.
• How big is a "big" likelihood ratio, and how small is a "small" one? Using
likelihood ratios in your day-to-day practice will lead to your own sense of
their interpretation, but consider the following a rough guide:
• Likelihood ratios of >10 or < 0.1 generate large and often conclusive
changes from pre- to posttest probability;
• Likelihood ratios of 5-10 and 0.1-0.2 generate moderate shifts in pre- to
posttest probability;
• Likelihood ratios of 2-5 and 0.5-0.2 generate small (but sometimes
important) changes in probability; and
• Likelihood ratios of 1-2 and 0.5-1 alter probability to a small (and rarely
important) degree.
Likelihood Ratios
• Having determined the magnitude and significance of the likelihood
ratios, how do we use them to go from pretest to posttest
probability? We cannot combine likelihoods directly, the way we
can combine probabilities or percentages; their formal use requires
converting pretest probability to odds, multiplying the result by
the Likelihood ratio, and converting the consequent posttest odds
into a posttest probability. Although it is not too difficult, this
calculation can be tedious and off-putting; fortunately, there is an
easier way.
• A nomogram proposed by Fagan (Figure 1C-2) does all the
conversions and allows an easy transition from pre- to posttest
probability. The left-hand column of this nomogram represents the
pretest probability, the middle column represents the likelihood
ratio, and the right-hand column shows the posttest probability. You
obtain the posttest probability by anchoring a ruler at the pretest
probability and rotating it until it lines up with the likelihood ratio
for the observed test result.
Recall the elderly woman mentioned earlier with suspected
pulmonary embolism after abdominal surgery.
Most clinicians would agree that the probability of this patient
having the condition is quite high-- about 70%. This value then
represents the pretest probability. Suppose that her
ventilation-perfusion scan was reported as being within the
realm of high probability. Figure 1C-2 shows how you can
anchor a ruler at her pretest probability of 70% and align it
with the Likelihood ratio of 18.3 associated with a highprobability scan. The results: her posttest probability is >97%.
If, by contrast, her ventilation perfusion scan result is reported
as intermediate (Likelihood ratio, 1.2), the probability of
pulmonary embolism hardly changes (it increases to 74%),
whereas a near-normal result yields a posttest probability of
19%. (Likelihood ration 0.1)
Likelihood Ratios
• The pretest probability is an estimate.
• Clinicians can deal with residual uncertainty
by examining the implications of a plausible
range of pretest probabilities. Let us assume
the pretest probability in this case is as low as
60%, or as high as 80%. The posttest
probabilities that would follow from these
different pretest probabilities appear in Table
1C-4.
Likelihood Ratios
Likelihood Ratios
• We can repeat this exercise for our second patient, the 28-year-old
man. Let us consider that his presentation is compatible with a 20%
probability of pulmonary embolism. Using our nomogram (see
Figure 1C-2), the posttest probability with a high-probability scan
result is 82%; with an intermediate-probability result, it is 23%; and
with a near-normal result, it is 2%. The pretest probability (with a
range of possible pretest probabilities from 10% to 30%), likelihood
ratios, and posttest probabilities associated with each of the four
possible scan results also appear in Table 1C-4.
• The investigation of women with possible appendicitis showed that
the CT scan was positive in all 32 in whom that diagnosis was
ultimately confirmed. Of the 68 who did not have appendicitis, 66
had negative scan results. These data translate into a Likelihood
ratio of 0 associated with a negative test and a Likelihood ratio of 34
for a positive test. These numbers effectively mean that the test is
extremely powerful. A negative result excludes appendicitis, and a
positive test makes appendicitis highly likely.
Wells Clinical Prediction Rule for PE
Clinical Prediction Rule for PE
V/Q Scan AND Clinical Probability
Likelihood Ratios
• Having learned to use likelihood ratios, you may be
curious about where to find easy access to the
Likelihood ratios of the tests you use regularly in your
own practice. The Rational Clinical Examination is a
series of systematic reviews of the diagnostic
properties of the history and physical examination that
have been published in JAMA. Black and colleagues
have summarized much of the available information
about diagnostic test properties in the form of a
medical text.
• Black ER, Bordley DR, Tape TG, Panzer RJ. Diagnostic
strategies for common medical problems. In: Black ER,
et al, eds. Philadelphia:American College of Physicians;
1999.
Likelihood Ratios
• Effect of prevalence
• Positive predictive value is the percentage of patients
who test positive who actually have the disease.
Predictive values are affected by the prevalence of the
disease: if a disease is rarer, the positive predictive
value will be lower, while sensitivity and specificity are
constant.
• Since we know that prevalence changes in different
health care settings, predictive values are not generally
very useful in characterizing the accuracy of tests.
• The measure of test accuracy that is most useful when
it comes to interpreting test results for individual
patients is the likelihood ratio (LR).
Likelihood Ratios
• There are two major advantages to using likelihood
ratios.
• The first is that they enable us to take into account the
exact value of a test result, rather than simply
classifying it as positive or negative. For example, a
patient with chest pain and a troponin I of 20 ng/ml
will certainly get our attention more than one with a
troponin of 0.6 ng/ml, despite the fact that both would
be reported as positive.
• The second is that they can be used to calculate a post
probability, even when multiple tests are used in
sequence.
Likelihood Ratios
• Pre-test odds x LR (test result) = post-test odds
• To use this formula, one must convert
between probability and odds using:
– Odds = Probability/(1-Probability)
– Odds of rolling a 6 with one die: 1/5 = 0.2
– Probability = Odds/(1 + Odds)
– Probability of a 6 = 1/6 = 0.167
Likelihood Ratios
• Assume you are faced with a patient with
abdominal discomfort and are considering
spontaneous bacterial peritonitis (SBP). After
exam you estimate the chances of SBP to be 20%.
You perform a paracentesis, and find 600
PMNs/uL. How does this affect the likelihood that
she has SBP?
Number of PMS in Ascitic Fluid
Likelihood Ratio
1000
22.3
501-1000
2.78
251-500
1.14
0-250
0.08
Likelihood Ratios
•
•
•
•
•
First convert pretest probability to odds:
Pre-test odds = 0.2/(1 – 0.2) = 0.25
Post-test odds = 0.25 x 2.78 = 0.695
Convert post-test odds back to probabilities:
Post-test probability = 0.695/(1 + 0.695) = 0.41
• Thus our new estimate is that she has about a
40% probability of SBP. Had the fluid shown
>1000 PMNs, the post-test probability would
have been 85%.
Do doctors use quantitative methods
of test accuracy?
• Survey of 300 US physicians
– 8 used Bayesian methods, 3 used ROC
curves, 2 used LRs
– Why?
…indices unavailable…
…lack of training…
…not relevant to setting/population.
…other factors more important…
(Reid et al. Academic calculations versus clinical
judgements: practicing physicians’ use of quantitative
measures of test accuracy. Am J Med 1998)
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look
after my patients?
Will the test apply in my setting?
• Reproducibility of the test and interpretation in my
setting
• Do results apply to the mix of patients I see?
• Will the results change my management?
• Impact on outcomes that are important to patients?
• Where does the test fit into the diagnostic strategy?
• Costs to patient/health service?
Reliability – how reproducible is the
test?
• Kappa = measure of intraobserver reliability
Test
Kappa value
Tachypnoea
0.25
Crackles on
auscultation
0.41
0.52
Value of Kappa
Strength of Agreement
Pleural rub
<0.20
Poor
0.21-0.40
Fair
CXR for
0.48
cardiomegaly
0.41-0.60
Moderate
0.61-0.80
Good
0.81-1.00
Very Good
MRI spine for 0.59
disc
Will the result change management?
0%
Probability of disease
No action
Testing
threshold
Test
100%
Action
(e.g. treat)
Action
threshold
Summary
• 1 Frame the clinical question.
• 2 Search for evidence concerning the accuracy of the test.
• 3 Assess the methods used to determine the accuracy of
the test.
• 4 Find out the likelihood ratios for the test.
• 5 Estimate the pre-test probability of disease in your
patient.
• 6 Apply the likelihood ratios to this pre-test probability
using the nomogram to determine what the post-test
probability would be for different possible test results.
• 7 Decide whether or not to perform the test on the basis
of your assessment of whether it will influence the care of
the patient, and the patient's attitude to different possible
outcomes.
Appraising Articles on
Harm/Etiology
Cohort Study
• A cohort study is an analytical study in which
individuals with differing exposures to a suspected
factor are identified and then observed for the
occurrence of certain health effects over some period,
commonly years rather than weeks or months.
• The occurrence rates of the disease of interest are
measured and related to estimated exposure levels.
• Cohort studies can either be performed prospectively
or retrospectively from historical records.
Cohort Study
• Patients who have developed a
disorder are identified and their
exposure to suspected causative
factors is compared with that of
controls who do not have the
disorder.
• This permits estimation of odds
ratios (but not of absolute risks).
• The advantages of case-control
studies are that they are quick,
cheap, and are the only way of
studying very rare disorders or
those with a long time lag
between exposure and outcome.
• Disadvantages include the
reliance on records to determine
exposure, difficulty in selecting
control groups, and difficulty in
eliminating confounding
variables.
Case Control Study
• A case-control study is an observational,
retrospective study which "involves
identifying patients who have the outcome of
interest (cases) and control patients without
the same outcome, and looking back to see if
they had the exposure of interest."
Case Control Study
• Patients with and without the
exposure of interest are
identified and followed over
time to see if they develop the
outcome of interest, allowing
comparison of risk.
• Cohort studies are cheaper
and simpler than RCTs, can be
more rigorous than casecontrol studies in eligibility
and assessment, can establish
the timing and sequence of
events, and are ethically safe.
• However, they cannot exclude
unknown confounders,
blinding is difficult, and
identifying a matched control
group may also be difficult.
Case Control - Retrospective
• Retrospective case-control studies rely on
people’s memories, making them prone to
error. Also, it may be difficult to measure the
exact amount of an exposure in the past.
Among people with bladder cancer, how
might researchers determine the amount of
artificial sweeteners used? Researchers might
ask patients to self-report their estimated
consumption. This method is inexact at best.
Case Control - Retrospective
Randomized Controlled Trial
• A randomized controlled trial is an experimental, prospective study
in which "participants are randomly allocated into an experimental
group or a control group and followed over time for the
variables/outcomes of interest."
• Study participants are randomly assigned to ensure that each
participant has an equal chance of being assigned to an
experimental or control group, thereby reducing potential bias.
Outcomes of interest may be death (mortality), a specific disease
state (morbidity), or even a numerical measurement such as blood
chemistry level.
• Now let’s look at a diagram of a typical RCT that represents the flow
of participants from the start of the study through the study
outcome. Notice in all diagrams the study start; studies progressing
from left to right represent prospective studies, “collecting data
about a population whose outcome lies in the future”
Randomized Controlled Trial
Randomized Controlled Trial
• Similar subjects are randomly
assigned to a treatment group
and followed to see if they
develop the outcome of
interest.
• RCTs are the most powerful
method of eliminating
(known and unknown)
confounding variables and
permit the most powerful
statistical analysis (including
subsequent meta-analysis).
• However, they are expensive,
sometimes ethically
problematic, and may still be
subject to selection and
observer biases.
Study Design
Is The Study Valid?
• In assessing an intervention's potential for
harm, we are usually looking at prospective
cohort studies or retrospective case–control
studies. This is because RCTs may have to be
very large indeed to pick up small adverse
reactions to treatment.
Is The Study Valid?
• 1) Was there a clearly defined question?
• What question has the research been
designed to answer? Was the question
focused in terms of the population group
studied, the exposure received, and the
outcomes considered?
Is The Study Valid?
• 2) Were there clearly defined, similar groups of
patients?
• Studies looking at harm must be able to demonstrate
that the two groups of patients are clearly defined and
sufficiently similar so as to be comparable.
• For example, in a cohort study, patients are either
exposed to the treatment or not according to a
decision. This might mean that sicker patients –
perhaps more likely to have adverse outcomes–are
more likely to be offered (or demand) potentially
helpful treatment.
• There may be some statistical adjustment to the results
to take these potential confounders into account.
Is The Study Valid?
• 3) Were treatment exposures and clinical
outcomes measured the same ways in both
groups?
• You would not want one group to be studied
more exhaustively than the other, because
this might lead to reporting a greater
occurrence of exposure or outcome in the
more intensively studied group.
Is The Study Valid?
• 4) Was the follow up complete and long
enough?
• Follow up has to be long enough for the
harmful effects to reveal themselves, and
complete enough for the results to be
trustworthy (lost patients may have very
different outcomes from those who remain in
the study).
Is The Study Valid?
• 5) Does the suggested causative link make
sense?
• You can apply the following rationale to help
decide if the results make sense.
• Is it clear the exposure preceded the onset of
the outcome? It must be clear that the
exposure wasn't just a 'marker' of another
disease.
Is The Study Valid?
• Is there a dose-response gradient? If the exposure was
causing the outcome, you might expect to see
increased harmful effects as a result of increased
exposure: a dose-response effect.
• Is there evidence from a 'dechallenge-rechallenge'
study? Does the adverse effect decrease when the
treatment is withdrawn ('dechallenge') and worsen or
reappear when the treatment is restarted
('rechallenge')?
• Is the association consistent from study to study? Try
finding other studies, or, ideally, a systematic review of
the question.
• Does the association make biological sense? If it does,
a causal association is more likely.
Are the Results Important?
• This means looking at the risk or odds of the
adverse effect with (as opposed to without)
exposure to the treatment; the higher the risk
or odds, the stronger the association and the
more we should be impressed by it.
• We can use the single table to determine if
the valid results of the study are important.
Are the Results Important?
Are the Results Important?
Are the Results Important?
• A cohort study compares the risk of an adverse event
amongst patients who received the exposure of interest
with the risk in a similar group who did not receive it.
• Therefore, we are able to calculate a relative risk (or risk
ratio). In case-control studies, we are presented with the
outcomes, and work backwards looking at exposures. Here,
we can only compare the two groups in terms of their
relative odds (odds ratio).
• Statistical significance
• As with other measures of efficacy, we would be concerned
if the 95% CI around the results, whether relative risk or
odds ratio, crossed the value of 1, meaning that there may
be no effect (or the opposite).
Interpretation: RISK
• There are a number of ways of summarizing
the outcome from binary data:
• Absolute Risk Reduction
• Relative Risk
• Relative Risk Reduction
• Odds Ratio
• Numbers Needed to Treat (or Harm)
Interpretation: RISK
Kennedy, et al. report on the study of acetazolamide and fursosemide versus standard
therapy for the treatment of post hemorrhagic ventricular dilation (PHVD) in
premature babies. The outcome was death or placement of a shunt by 1 year of age.
The standard method of summarizing binary outcomes is to use percentages or
proportions. Thus 35 out of 76 children died or had a shunt under standard therapy.
This is expressed as 35/76 or 0.46. Or as a percentage as 46% For a prospective study
such as this, the proportion can be thought of as the probability of an event
happening or a risk.
Interpretation: RISK
Thus, under the standard therapy there was a risk of 35/76 = 0.46 (46%) of dying or
getting a shunt by 1 year of age. In the drug plus standard therapy the risk was 49/75
= 0.65 (65%)
In clinical trials what we really want is to look at the contrast between differing
therapies.
We do this by looking at the difference in risks, or alternatively the ratio of risks.
The difference is usually expressed as the control risk minus the experimental risk and
is known as the absolute risk reduction (ARR).
The difference in risks in this case is 0.46-0.65 = -0.19 or -19% The negative sign
indicates that the experimental treatment in this case appears to be doing harm.
One way of thinking about this is if 100 patients were treated under standard therapy
and 100 treated under drug therapy, we would expect 46 to have died or have had a
shunt in standard therapy and 65 in the experimental therapy.
Another way of looking at this is to ask: how many patients would be treated for one
extra person to be harmed by the drug therapy? 19 (65-46) adverse events resulted
from treating 100 patients and so 100/19 = 5.26 patients would be treated for 1
adverse event. Thus roughly if 6 patients were treated with standard therapy and 6
with drug (experimental) therapy, we would expect 1 extra patient to die or require a
shunt in the drug therapy group.
This is know as the NNH (number needed to harm) and is simply expressed as the
inverse of the absolute risk reduction, with the sign ignored.
When beneficial, it is known as NNT=Numbers needed to treat.
For screening studies it is known as NNS=Numbers needed to screen.
Absolute Risk Reduction = 0.65-0.46 = 0.19; NNH =1/0.19 = 5.26
However, it is important to realize that comparison between NNTs can only be made if
the baseline risks are similar.
Thus, suppose a new therapy managed to reduce 5 year mortality of Creutzfeldt-Jakob
disease from 100% on standard therapy to 90% on the new treatment. This would be a
major breakthrough and has a NNT of (1/(1-0.9))=10.
In contrast, a drug that reduced mortality from 50% to 40% would also have a
NNT of 10, but would have much less impact.
We can express the outcome as a risk ratio or relative risk (RR), which is the ratio of the
two risks, experimental divided by control risk, namely 0.65/0.46 = 1.41. With relative
risk less than 1 the risk of an event is greater in the control group. RR is often used in
cohort studies.
It is important to consider the absolute risk!. The risk of DVT in women on a new type
of oral contraceptive is 30 per 100,000 women years, compared to 15 per 100,000 on the
old treatment. Thus the RR is 2 (200%) which shows that the new type of contraceptive
carries quite a high risk of DVT. However, an women need not be unduly concerned
since she has a probability of 0.0003 of getting a DVT in 1 year on the new drug which is
much less than if she were pregnant!
We can also consider the relative risk reduction (RRR) which is (control risk –
experimental risk)/control risk; this is easily shown to be 1-RR, often expressed as a
percentage. Thus in the drug arm of the PHVD trial there is a 41% higher risk of
experiencing an adverse event relative to the risk of a patient on standard therapy.
(0.65-0.46)/0.46 = 0.41
When the data come from a case-control study or a cross-sectional study, rather than
risks, we often use odds. The odds of an event happening are the ratio of the probability
that it happens to the probability that is does not.
Thus the probability of throwing a 6 on a die is 1/6 = 16.67%. (Notice the denominator is
the total of all outcomes (Happening + Not Happening). The probability of throwing any
other number is 5/6 = 83.33%. If P is the probability of an event we have:
Odds (event) = P/(1-P) = (Happening/Not Happening )1:5 = 0.1667/0.8333 = 0.2
(35/76=0.46)
(49/75=0.65)
Calculate absolute risk of coronary event in the E+P vs Placebo Group:
E+P = 164/8506 = 0.0193 = 1.93%; Placebo = 122/8102 = 0.0151 = 1.51%
Absolute Risk INCREASE = 1.93% - 1.51% = 0.42% (0.0042)
Risk Ratio = 0.0193/0.0151 = 1.26 (26% increase)
Relative Risk Increase/Reduction =( 0.0193-0.0151)/0.0151 = 0.278=27.8% increase
Odds Ratio =/[0.0193/(1-0.0193)] /[0.0151/(1-0.0151)] = 1.29 = 29%
NNH=Numbers needed to harm = 1/|0.0151-0.0193| = 238
For PE
E+P = 70/8506 = 0.0082; Placebo = 31/8102 = 0.0038
Absolute Risk INCREASE = 0.0082-0.0038 =0.0044 = 0.44%
Risk Ratio =0.0082/0.0038=2.16 = (116% increase)
Relative Risk Increase/Reduction =(0.0082-0.0038)/0.0038 = 1.15 = 115% increase
Odds Ratio = 0.0082/(1-0.0082)]/ [0.0038/(1-0.0038)] = 2.16 = 116%
NNH = 1/|0.0082-0.0038| = 227
Adverse risk ratios include those for CHD, stroke, breast cancer, and PE. Only risks from
hip fracture and colorectal CA are less than unity and suggest benefit.
Global Index
E + P = 751/8506 = 0.0883; Placebo = 623/8102 = 0.0769
NNH = 1/|0.0883-0.0769| = 87.7
Clinical Scenario
Do SSRIs Cause Gastrointenstinal Bleeding?
•
You are a general practitioner considering the optimal choice of antidepressant medication. Your patient is a 55year-old previously cheerful and well-adjusted individual who, during the past 2 months, has become sad and
distressed for the first time in his life. He has developed difficulty concentrating and experiences early morning
wakening, but lacks thoughts of self-harm. The patient has attended your practice for the past 20 years and you
know him well. You believe he is suffering from a major depressive episode and that he might benefit from
antidepressant medication.
During recent years, you have been administering a selective serotonin reuptake inhibitor (SSRI), paroxetine, as
your first-line antidepressant agent. However, recent reviews suggesting that the SSRIs are no more effective and
do not have lower discontinuation rates than tricyclic antidepressants (TCAs) have led you to revert to your
previous first choice, nortriptyline, in some patients. Patients in your practice usually consider the adverse effects
in some depth before agreeing to any treatment decisions and many choose SSRIs on the basis of a preferable
side-effect profile.
However, for the past 5 years the patient you are seeing today has been taking ketoprofen (a nonsteroidal antiinflammatory drug, or NSAID), 50 mg three times per day, which has controlled the pain from his hip
osteoarthritis. Your mind jumps to a review article suggesting that SSRIs may be associated with an increased risk
of bleeding, and you become concerned about the risk of gastrointestinal bleeding when you consider that the
patient is also receiving an NSAID. Unfortunately, an abstract from Evidence Based Mental Health, which you have
used to obtain a summary of side effects of antidepressant medications, provides no information regarding this
issue.
You remember the review article and locate a copy in your files, but at a glance you realize that it will not help
answer your question for three reasons: It did not use explicit inclusion and exclusion criteria, it failed to conduct a
systematic and comprehensive search, and it did not evaluate the methodologic quality of the original research it
summarized . In addition, it did not cite any original studies specific to an association between SSRI treatment and
gastrointestinal bleeding.
You consider that it is worth following up this issue before you make a final recommendation to the patient. You
inform him that he will need antidepressant medication, but you explain your concern about the possible bleeding
risk and your need to acquire more definitive information before making a final recommendation. You schedule a
follow-up visit two days later and you commit to presenting a strategy at that time.
Clinical Scenario
Do SSRIs Cause Gastrointenstinal Bleeding?
• You formulate the following focused question:
• Do adults suffering from depression and taking SSRI medications,
compared to patients not taking antidepressants, suffer an increased risk
of serious upper gastrointestinal bleeding?
• Later that day, you begin your search using prefiltered evidence-based
medicine resources-- the journal Evidence Based Mental Health, Best
Evidence, Clinical Evidence, and the Cochrane Library.
• For each database, you enter the term "serotonin reuptake inhibitor."
• Search of Evidence Based Mental Health yields eight reviews in volumes 1
(1998) and 2 (1999). Four of these deal with adverse effects associated
with SSRI use, but none addresses gastrointestinal bleeding.
• Searching Best Evidence yields 17 equally unhelpful articles. A Clinical
Evidence search identifies only a review on treatment of depressive
disorders in adults.
• The Cochrane Library search locates four complete reviews and two
abstracts of systematic reviews, but none addresses the issue of
gastrointestinal bleeding in SSRI users.
Clinical Scenario
Do SSRIs Cause Gastrointenstinal Bleeding?
• You now turn to the PubMed version of MEDLINE and PreMEDLINE
searching system (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi). For
optimum search efficiency, you click on "Clinical queries" under "PubMed
Services" to access systematically tested search strategies, or you go to
"Search hedges," which will help you identify methodologically sound
studies pertaining to your question on harm.
• You enter the following: "selective serotonin reuptake inhibitor" AND
"bleeding" for the subject search term; and you click on "Etiology" for
study category and "Specificity" for emphasis. Your MEDLINE search (from
1966 through 2000) identifies one citation, an epidemiologic study
assessing the association between SSRIs and upper gastrointestinal
bleeding. This study describes a threefold increased risk of upper
gastrointestinal bleeding associated with the use of SSRIs.
• Thinking that this article may answer your question, you download the full
text free of charge from the British Medical Journal (BMJ) Web site
(http://www.bmj.com/) as a portable document format (PDF) file, an
electronic version of a printed page or pages.
Clinical Scenario
• Are the Results Valid?
• Clinicians often encounter patients who are facing
potentially harmful exposures, either to medical
interventions or environmental agents, and important
questions arise. Are pregnant women at increased risk
of miscarriage if they work in front of video display
terminals? Do vasectomies increase the risk of prostate
cancer? Do hypertension management programs at
work lead to increased absenteeism? When examining
these questions, physicians must evaluate the validity
of the data, the strength of the association between
the assumed cause and the adverse outcome, and the
relevance to patients in their practice.
Clinical Scenario
•
Using the Guide: Returning to our earlier discussion, the study that we retrieved
investigating the association between SSRIs and risk of upper gastrointestinal bleeding
used a case-control design. Data came from a general practitioner electronic medical
record database in the United Kingdom, which included data from more than 3 million
people, most of whom had been entered prospectively during a 5-year period. The
investigators identified cases of upper gastrointestinal bleeding (n=1651) and ulcer
perforation (n=248) among patients aged 40 to 79 years between 1993 and 1997. They
then randomly selected 10,000 controls from the at-risk source population that gave
rise to cases, choosing their sample so that age, sex, and the year patients were
identified were similar among the cases and control groups.
•
The analysis controlled for a number of possible prognostic factors: previous dyspepsia,
gastritis, peptic ulcer and upper gastrointestinal bleeding or perforation, smoking
status, and current use of NSAIDs, anticoagulants, corticosteroids, and aspirin. The
database included prescription drugs only. The investigators examined the relative
frequency of SSRI prescription use in the 30 days before the index date (that is, the date
of the reported bleeding or perforation) in patients with and without bleeding and
perforation after controlling for the prognostic variables. Control patients received a
random date as their index date.
•
Although the investigators controlled for a number of prognostic factors, there are
other potential important determinants of bleeding for which they did not control. For
example, more patients being treated for depression or anxiety suffer from painful
medical conditions than those without depression and anxiety. Patients may have been
using over-the-counter NSAIDs for these problems. The database the investigators used
does not capture the use of self-medication with over-the-counter analgesics.
• Alcohol use is another potential confounder. Although the investigators
excluded patients with known alcoholism, many persons afflicted with
alcoholism remain unrevealed to their primary care physician, and
alcoholism is associated with an increased prevalence of depression and
anxiety that could lead to the prescription of SSRIs. Since alcoholism is
associated with increased bleeding risk, this prognostic variable fulfils all
the criteria for a confounding variable that could bias the results of the
study. Finally, it is possible that patients returning for prescription of SSRIs
would be more likely to have their bleeding diagnosed in comparison to
patients under less intense surveillance (a state of affairs known as
detection bias).
•
These biases should apply to all three classes of antidepressants (ie, SSRIs,
nonselective serotonin reuptake inhibitors, and a miscellaneous group of
others) that the investigators considered. The results of the study, which
we will discuss later in this section, showed an association only between
gastrointestinal bleeding and SSRIs, rather than between gastrointestinal
bleeding and other antidepressant medications. One would expect all
these biases to influence the association between any antidepressant
agent and bleeding. Thus, the fact that the investigators found the
association only with SSRIs decreases our concern about the threats to
validity from possible differences in prognostic factors in those receiving-and not receiving--SSRIs.
•
•
At the same time, most physicians make decisions regarding the prescription of SSRIs
or tricyclic antidepressant agents based on particular patient characteristics. Thus, it
remains possible that these characteristics include some that are associated with the
incidence of gastrointestinal bleeding. This would be true, for instance, if clinicians
differentially used SSRI rather than other antidepressant medications in patients in
whom they suspected alcohol abuse.
The major strength of the use of a large database for this study is that it eliminates
the possibility of biased assessment of exposure (or recall bias) to SSRIs in the
patients who suffered the outcomes as well as in those who did not. The outcomes
and exposures were probably measured in the same way in both groups, as most
clinicians are unaware that UGI bleeding may be associated with SSRI use. We have
no idea, however, about the number of patients lost to follow-up. Although the
investigators included only those patients who stayed in the practices of the
participating primary care physicians from the beginning to the end of the study, we
do not know, for instance, how many people in the database began to receive SSRIs
but subsequently left those practices.
•
In summary, the study suffers from the limitation inherent in any observational
study: that exposed and unexposed patients may differ in prognosis at baseline. In
this case, at least two unmeasured variables, over-the-counter NSAID use and
alcohol consumption, might create a spurious association between SSRIs and
gastrointestinal bleeding. The other major limitation of the study is the lack of
information regarding completeness of follow-up. That said, although these
limitations weaken any inferences we might make, we are likely to conclude that
the study is strong enough to warrant a review of the results.
How Strong Is the Association
Between Exposure and Outcome?
• In addition to showing a large magnitude of relative risk or
odds ratio, a second finding will strengthen an inference
that we are dealing with a true harmful effect.
• If, as the quantity or the duration of exposure to the
putative harmful agent increases, the risk of the adverse
outcome also increases (that is, the data suggest a doseresponse gradient), we are more likely to be dealing with a
causal relationship between exposure and outcome.
• The fact that the risk of dying from lung cancer in male
physician smokers increases by 50%, 132%, and 220% for 1
to 14, 15 to 24, and 25 or more cigarettes smoked per day,
respectively, strengthens our inference that cigarette
smoking causes lung cancer.
How Precise Is the Estimate of the
Risk?
• Using the Guide: Returning to our earlier discussion, the investigators
calculated odds ratios (ORs) of the risk of bleeding in those exposed to
SSRIs vs those not exposed, but they reported the results as relative risks
(RR). Unfortunately, this practice is not unusual. Fortunately, when event
rates are low, relative risks and odds ratios closely approximate one
another.
• The investigators found an association between current use of SSRIs and
upper gastrointestinal bleeding (adjusted odds ratio {OR}, 3.0; 95% CI,
2.1-4.4). They noted a weak association with nonselective serotonin
reuptake inhibitors (adjusted OR, 1.4; 95% CI, 1.1-1.9), but found no
association with antidepressant medications that had no action on the
serotonin reuptake mechanism.
• The investigators found that the association between NSAID use and
bleeding (adjusted OR, 3.7; 95% CI, 3.2-4.4) was of similar magnitude to
the association between bleeding and SSRIs. The current use of SSRIs with
prescription NSAID drugs further increased the risk of upper
gastrointestinal bleeding (adjusted OR, 15.6; 95% CI, 6.6-36.6). The dose
and duration of SSRI use had little influence on the risk of this adverse
outcome.
Clinical Resolution
• Turning to the results, you note the very strong association
between the combined use of SSRIs and NSAIDs. Despite the
methodologic limitations of this single study, you believe the
association is too strong to ignore. You therefore proceed to the
third step and consider the implications of the results for the
patient before you.
• The primary care database from which the investigators drew their
sample suggests that the results are readily applicable to the
patient before you. You consider the magnitude of the risk to which
you would be exposing this patient if you prescribed an SSRI and it
actually did cause bleeding. Using the baseline risk reported by
Carson et al in a similar population, you calculate that you would
need to treat about 625 patients with SSRIs for a year to cause a
single bleeding episode in patients not using NSAIDs, and about
55 patients a year taking NSAIDs along with an SSRI for a year to
cause a single bleeding episode.
Clinical Resolution
• From previous experience with the patient before
you, you know that he is risk averse. When he
returns to your office, you note the equal
effectiveness of the SSRIs and tricyclic
antidepressants that you can offer him, and you
describe the side-effect profile of the alternative
agents. You note, among the other
considerations, the possible increased risk of
gastrointestinal bleeding with the SSRIs. The
patient decides that, on balance, he would prefer
a tricyclic antidepressant and leaves your office
with a prescription for nortriptyline.
Carson JL, Strom BL, Soper KA, West
SL, Morse ML
•
•
•
•
Carson JL, Strom BL, Soper KA, West SL, Morse ML. The association of nonsteroidal
anti-inflammatory drugs with upper gastrointestinal tract bleeding. Arch Intern
Med 1987;147:85-8.
To evaluate the risk of developing upper gastrointestinal (UGI) bleeding from
nonsteroidal anti-inflammatory drugs (NSAIDs), a retrospective (historical) cohort
study was performed, using a computerized data base including 1980 billing data
from all Medicaid patients in the states of Michigan and Minnesota.
Comparing 47,136 exposed patients to 44,634 unexposed patients, the unadjusted
relative risk for developing UGI bleeding 30 days after exposure to a NSAID was
1.5 (95% confidence interval 1.2 to 2.0).
Univariate analyses demonstrated associations between UGI bleeding and age,
sex, state, alcohol-related diagnoses, preexisting abdominal conditions, and use of
anticoagulants. This association between NSAIDs and UGI bleeding was unchanged
after adjusting for these potential confounding variables using logistic regression.
A linear dose-response relationship and a quadratic duration-response relationship
were demonstrated. Non-steroidal anti-inflammatory drugs are associated with
UGI bleeding, although the magnitude of the increased risk is reassuringly small.
Clinical Resolution
Clinical Resolution
•
•
•
•
NNT for SSRI without NSAID:
Odds Ratio = 3.0
Relative Risk = 1.0024
NNT = 624.5
NNT for SSRI with NSAID:
• Odds Ratio = 15.6
• Relative Risk =1.019
• NNT = 56.17