analysis_259_2007

Transcript analysis_259_2007

Doing Analyses on Binary
Outcome
From November 14th
• Dr Sainani talked about how the math
works for binomial data.
Binomial Code
• There is some SAS code on the website to show
how to play around with binomial probabilities.
Run the macro code first that is the part from:
%macro binom(events, trials, prob);
• down to
%mend;
• Then you just plug in the number of events, trials
and probability and run this line:
• %binom(3, 5, .5);
Results
• The code kicks out two summary tables.
One has descriptive statistics and the
binomial probabilities plus a couple checks
on whether or not you can use Z
approximations of the confidence limits.
• Plug in several examples from the lecture
on the 12th and 14th to validate the code.
Results(2)
• The code also give you the confidence limits
around the probability:
• Search the SAS online documentation for proc
reliability to see the details on the CLS and
setting your own limits if you want say 90% CLs.
support.sas.com/onlinedoc/913/docMainpage.jsp
The Easy Answer
• Once the macro has been run once you
just need to run this line:
%binom(2, 6, .551);
Math Stuff
Diff between prop    P1  P2 
A
C

AB CD
A
P1 A  B
Relative Risk  
P2
A
Odds of disease in exposed 
B
(A  B)
A
B

C
D
(A  B)
C
Odds of disease in unexposed 

D
(C  D)
(C  D)
A
Odds of diseased in exposed
 B
C
Odds of disease in unexposed
D
C
CD
Difference in Proportion Math
A
C
Diff between prop    P1  P2 

AB CD
P1 (1  P1 ) P2 (1  P2 )
SE of  

N1
N2
95% CI on the Difference = (Δ – 1.96 * SE) up to (Δ + 1.96 * SE)
RR Math
A
P1 A  B
Relative Risk  
P2
C
CD
B/A
D/C
95% CI of ln(RR)  ln(RR)  1.96

AB CD
OR Math
A
Odds of disease in exposed 
B
(A  B)
A
B

C
D
(A  B)
C
Odds of disease in unexposed 

D
(C  D)
(C  D)
A
Odds of diseased in exposed
 B
C
Odds of disease in unexposed
D
1 1 1 1
95% CI of ln(OR)  1.96
  
A B C D
Ummm… No Thanks
• If you don’t want to do the algebra by hand
you don’t have to. SAS has can do all this
work for you easily.
Relative Risks
• If you use SAS for analyses be sure to set
your tables up correctly. You will recall that
Dr. Sainani showed this take with a RR of
1.61
Smoker (E)
Heart disease (D)
21
Non-smoker
(~E)
13
No Disease (~D)
79
87
100
100
Making the table
• There is an EG project showing how to
make the data:
A Contingency Table
• Contingency
table values
have to be
mutually
exclusive
counts.
Swap these for real life!!!!!
Getting a Relative Risk (sort of)
• It does not give you the RR!
Rotate the table
• The risk factor was in the first column.
• Notice it is the same odds ratio.
Risk Differences
Smoker (E)
Heart disease (D)
21
Non-smoker
(~E)
13
No Disease (~D)
79
87
100
100
P( D / E )  P( D / ~ E )  21%  13%  8%
Risk Difference
Association
Expected values in Chi Square
• Which cells are above or below the
expected values?
R Version
• This does a quick analysis then deletes
the table. Rerun the code to keep the
table.
Rerun these
Another Example
• This example is taken from Motulsky’s Intuitive
Biostatistics (a book I highly recommend when
you encounter people who HATE math).
• The data is from Cooper et al’s zidovudine (AZT)
trial for people who are HIV+.
– 76 of 475 on AZT had disease progress
– 129 of 461 on placebo had disease progress
16%
 .5718 risk of progressio n
27.98%
The effect
• 76/475 or 16% progression vs. 129/461 or
28%. Is the 12% reduction a significant
difference?
Grouped Data
16%
 .5718 risk of progressio n
27.98%
The percents
you care about
The difference
you care about
The bad thing is
in column 1
Important stuff
• Your subjects have to be randomly
selected (independent from each other)
from the population you wish to generalize
to and the only differences between the
two groups should be exposure to the risk
factor (in a cohort study) or treatment (in a
trial).
Beyond the Relative Risk
• Epidemiologists get very excited about
relative risks but look at the overall
prevalence.
– A risk factor that changes the risk from 1 in
1,000,000 to 2 in 1,000,000 is not too
important compared to a risk factor that
changes the risk from 1 in 10 to 2 in 10 but
the relative risk is the same. The relative risk
is the same .5 for both risk factors.
NNT
• The number needed to treat (NNT) is the
reciprocal of the difference in the
probabilities between the two groups. It
gives you a metric to judge the relative
importance of the effects. In this case you
need to treat a million people to made a
difference in the rare disease vs 10.
Risk of 1 in 1,000,000 =
Risk of 2 in 2,000,000 =
Probability vs Odds
• So far I have talked about
# people progressin g on AZT
probabilities using the
# people on AZT
number of people
progressing while on AZT
relative to all the people on it.
• You can also look at odds by
# people progressin g on AZT
make a ratio of people
# people not progressin g on AZT
progressing while on AZT to
those who are not
progressing on AZT
• Odds are not easy to think
about and because of this
not ideal for this data.
Relative Risk vs. Odds Ratio
• Relative Risk
# people progressin g on AZT
# people on AZT
# people progressin g on Placebo
# people on Placebo
• Odds Ratio
# people progressin g on AZT
# people not progressin g on AZT
# people progressin g on Placebo
# people not progressin g on Placebo
Why mess around with odds?
• If your disease/outcome of interest is rare
you will not want to study hundreds of
thousands of exposed people to find the
few who get disease.
• You will want to find people with disease
and match them to controls and then look
to see if they were exposed.
– This is a retrospective case-control study.
Dangers of RR
• You can get any relative risk you want by
sampling different numbers of cases and
controls in the case-control study!
The original study of
cat scratch fever
Get 100 times the cases
Get 100 times the controls
Rare diseases
• If the disease is rare then the odds ratio
approximates the relative risk.
A
P
Relative Risk  1  A  B
P2
A
Odds Ratio  B
C
D
C
CD
Case Control
• Finding the right controls is VERY tricky.
Take a class in epidemiology from Dr. Rita
Popat to learn about the problems
associated with the different types of
controls.
Contingency Table Analyses
• You have seen contingency tables used to
describe many kinds of studies.
– Experiments
– Cohorts
– Case-Control studies
• Contingency tables are also used to describe
results of lab results.
– You will see a new test which calls people sick or not
sick and you will have a gold standard. You want
statistics to describe how good the new test does.
Screening and Diagnostic
• Sensitivity - correctly calling people sick when
they are.
• Specificity - correctly calling people healthy
when they are.
• Predictive value of a positive test - the
percentage of people who are positive given a
positive test result
• Predictive value of a negative test - the
percentage of people who are negate given a
negative test result.
Test result
Testing Formals
Reality
Disease(+) Not Diseased (-)
a
b
+
true +
false +
c
d
false true -
sensitivity = 100 * a /(a+c)
specificity = 100 * d / (b+d)
predictive value of positive = 100 * a/(a+b)
predictive value of negative = 100 * d/(c+d)
• Given half a chance I will mess up the
algebra so I wrote code to do it.
Reading from the Bible
• Fleiss wrote the authoritative book on
categorical data analysis (Statistical
Methods for Rates and Proportions, 3rd
Edition 2003 Fleiss, Levin, Paik). Get a
copy if you are going to deal with categorical
data in real life. I have coded up a lot of the
book so you don’t need to think how to code
up what goes with the brilliant prose.
www.stanford.edu/class/hrp223/2003/Fleiss/
Using the Code
McNemar’s Test
• If you have pairs of matched data points
(husband and wife saying yes/no, right eye
vs. left eye vision good yes/no) you will
want to measure the association
considering that the pairs of data points
are related.
• You can do McNemar’s test to see if there
is an association in the paired data.
Agreement
• If you want to look at the degree of
agreement between two raters you need a
statistic that considers how frequently the
people would agree by chance alone. You
use the same SAS or EG code.

analysis_259_2007

Transcript analysis_259_2007

Directory