Randomization

Download Report

Transcript Randomization

To Add


Make it clearer that excluding variables from a model
because it is not “predictive” removes all meaning from
a CI since this is infinite repetitions
Talk about stats being based on sampling variability –
assumes sample is a random sample of some superpopulation (even if narrowly defined), but they are not a
random sample, they self-select, so we couldn’t have
infinite samples
Random Error I: p-values,
confidence intervals, hypothesis
testing, etc.
Matthew Fox
Advanced Epidemiology
Do you like/use p-values?
What is a relative risk?
What is a pvalue?
Table 1 of a randomized trial of
asbestos and LC
Factor
Asbestos
No
Asbestos
pvalue
Female
10%
25%
0.032
Smoking
60%
40%
0.351
>60 yrs
5%
7%
0.832
HBP
25%
24%
0.765
Alcohol use
37%
45%
0.152
Which result is more precise?
RR 2.0 (95% CI: 1.0 – 4.0)
RR 5.0 (95% CI: 2.5 – 10.0)
RR 2.0 (95% CI: 1.0 – 4.0)
What are the chances the true
results is between 1.0 and 4.0?
In a randomized trial, could the
finding be by change?
If yes, what does it
mean to be “by
chance?” What is it that
is caused by chance?
This Morning

Randomization
–

P-values
–
–
–

Why do we do it?
What are they?
How do we calculate them?
What do they mean?
Confidence Intervals
Last Session

Selection bias
–
–
–
–

Results from selection into our out of study related
to both exposure and outcome
Structural: conditioning on common effects
Adjustment for selection proportions
Weighting for LTFU
Matching
–
In a case control study, creates selection bias by
design, must be controlled in analysis
“There’s a certain feeling of ease and pleasure for
me as a scientist that any way you slice the data, it’s
statistically significant,” said Dr. Anthony S. Fauci,
a top AIDS expert in the United States government,
which paid most of the trial’s costs.
Randomization
Randomization lends meaning to
likelihoods, p-values and confidence
intervals
 It can reduce the probability of severe
confounding to an acceptable level
 But randomization does not prevent
confounding

Greenland: Randomization,
statistics and causal inference

Objective:
–

Clarify the meaning and limitations of inferential
statistics in the absence of randomization
Example — lidocaine therapy after acute MI
–
–
–
–
Patient 1: doomed
Patient 2: immune
lidocaine therapy assigned at random
two results are equally likely
Greenland: Randomization,
statistics and causal inference
Result 1
lidocaine
placebo
Patient 1
Patient 2
1-0=1
RD


True RD = 0, so both possible results are confounded
Expectation = 0 = (1 + -1)/2
–

Result 2
lidocaine
placebo
Patient 2
Patient 1
0 – 1 = -1
Statistically unbiased (expectation equals truth)
Conclusions
–
–
–
Randomization does not prevent confounding
Randomization does provide a known probability distribution for
the possible results under a specified hypothesis about the effect
Statistical unbiasedness of randomized exposure corresponds to
an average confounding of zero over the distribution of results
Probability Theory
With an assigned probability
distribution, can calculate expectation
 The expectation does not have to be
in the set of possible outcomes

–
Here, the expectation equals zero
Probability

If we randomize and assume null is true (as
we do when calculating p-values)
–

We expect half of the subjects to be exposed and
half the events to be among the exposed
If truly no effect of exposure, all data
combinations, permutations are possible
–
–
Everyone was either type 1 or 4
All the events (deaths) would occur regardless of
whether assigned the exposure or not
Probability Theory

The probability of each possible data
result in a 2x2 table is:
–
–
–
A function of the number of combinations
(permutations implies order matters)
Probability of each event is number of ways
to assign X subjects to exposure out of Y
and A events out of a total of B total events
Assumes the margins are fixed
Fixed margins, how many
parameters (cells) do I need to
estimate to fill in the entire table?
D+
DTotal
E+
?
?
500
E?
?
500
Total
100
900
1000
Greenland: Randomization,
statistics and causal inference


What comfort does this provide scientists
trying to interpret a single result?
Can make probability of severe confounding
small by increasing the sample size
E+
E-
Total
D+
30
70
100
D-
470
430
900
Total
500
500
1000
Risk
0.06
0.14
RR
0.43
RD
-0.08
Greenland: Randomization,
statistics and causal inference

Given there were 100 cases and an even
distribution of exposed and unexposed, how
many cases would we expect to be exposed?
E+
E-
Total
D+
30
70
100
D-
470
430
900
Total
500
500
1000
Risk
0.06
0.14
RR
0.43
RD
-0.08
Greenland: Randomization,
statistics and causal inference


What comfort does this provide scientists
trying to interpret a single result?
Can make probability of severe confounding
small by increasing the sample size
Probability under the P ( X  30 100 deaths in 1000 cases)
null that
randomization
100  900 
would yield a








30
result with at
i
500

i


  0.0001
least as much
downward
1000 
i 0
confounding as


the observed
500 

result


Back to the counterfactual

If association we measure differs from the
truth, even if by chance, what explains it?
–

This is confounding
–
–

Unexposed can’t stand in for what would have
happened to exposed had they been unexposed
But on average, zero confounding
This gives us a probability distribution to calculate
the probability of confounding explaining the results
This is a p-value
Randomized trial of E on D in 4
patients

We find:
D+
DTotal
E+
2
E0
0
2
2
2
Randomized trial of E on D in 4
patients

If the null is true, what CST types must they
be?
D+
DTotal
E+
2
E0
0
2
2
2
Hypergeometric distribution

The hypergeometric distribution:
 M  N  M 
 

x  n  x 

P X  x  
N 
 
n
Where X = random variable, x = exposed cases,
n = exposed population, M is = total cases, and
N = total population.
M!/
x!(M-x)!
Spreadsheet
Greenland: Randomization,
statistics and causal inference
Result 1
lidocaine
placebo
Patient 1
Patient 2
1-0=1
RD

When treatment is assigned by the physician,
Expectation depends on physician behavior
–

Result 2
lidocaine
placebo
Patient 2
Patient 1
0 – 1 = -1
Expectation does not necessarily equal truth
For observational data we DON’T have
probability distribution for confounding
–
–
When E isn’t randomized, statistics don’t provide valid
probability statements about exposure effects because
p-values, CIs, & likelihoods calculated with
assumption all data interchanges are equally likely
Greenland: Randomization,
statistics and causal inference

Alternatives
–
–
–
–
–
Limit statistics to data description (e.g., visual
summaries, tables of risks or rates, etc.)
Influence analysis: explore degree to which effect
estimates would change under small perturbations of
the data, such as interchanging a few subjects
Employ more elaborate statistical models
Sensitivity analysis
At the very least, interpret conventional statistics as
minimum estimates of the error
(1) The p-value is:

Probability under the test hypothesis
(usually the null) that a test statistic would
be ≥ to its observed value, assuming no
bias in data collection or analysis
–
–
–
–
–
Why the null? Our job is to measure
1-sided upper p-value is test stat ≥ observed value
1-sided lower p-value is test stat ≤ observed value
Mid-p assigns only half probability of the observation
to the 1-sided upper p-value
2-sided p-value is twice the smaller of the 1-sideds
(2) The p-value is not:

Probability that a test hypothesis (null
hypothesis) is true
–
–

Calculated assuming that test hypothesis is true.
Cannot calculate probability of an event that is
assumed in the calculation
Probability of observing the result under the
test hypothesis (null) [likelihood]
–
Also includes probability of results more extreme
(3) The p-value is not:

An -level (the Type 1 error rate)
–

More on that later
A significance level
–
–
Used to refer to both p-values and Type 1
error rates
Should be avoided to prevent confusion
(4) The 2-sided p-value is not:

Because 2-sided p-value is twice
smaller of lower and upper 1-sided pvalues, which may not be same and
may be > 1, it is not the:
–
–
Probability that the data would show as
strong an association as observed or
stronger if the null hypothesis were true;
Probability that a point estimate would be as
far or further from the test value as observed
Significance testing:

Compares p-value to an arbitrary or
conventional Type 1 error rate
–

=0.05
Emphasizes decision making, not
measurement
–
–
Derives from agricultural and industrial
applications of statistics
Reflects the roots of epidemiology as the
union of statistics and medicine
JNCI announces materials to “help
journalists get it right”
Response


They acknowledge the definitions were
incorrect, however:
“We were not convinced that working
journalists would find these definitions userfriendly, so we sacrificed precision for utility.
We will add references to standard textbooks
for journalists who want to learn more.”
Frequentist Statistics
Alternatives to pvalues

Two studies which is more precise?
–
–

RR 10.0, p = 0.039
RR 1.3, p = 0.062
The pvalue conflates the size of the effect
and its precision
–
–
RR 10.0, p = 0.039, 95% CI: 1.5-66.7
RR 1.3, p = 0.062, 95% CI: 0.99-1.7
Frequentist intervals (1)

Definition:
–
If the statistical model is correct and no bias, a
confidence interval derived from a valid test will,
over unlimited repetitions of the study, contain the
true parameter with a frequency no less than its
confidence level (e.g. 95%).


But the statistical model is only correct under
randomization
CAN’T say that the probability the interval includes the truth
equals the interval’s coverage probability (e.g., 95%).
Confidence Interval Simulation
Frequentist intervals (2)

Advantages
–
–

Provides more information than significance tests
or p-values: direction, magnitude, and variability
Economical compared with p-value function
Disadvantages
–
–
Less information than the p-value function
Underlying assumptions (valid statistical model, no
bias, repeated experiments)
Approximations: Test-based
ˆ
ˆ
R
D
ln( RR )  

ˆ
SE
(
R
D
)
ˆ
SE[ln( RR )]
CI  RRˆ
(1 z )

CI  RDˆ  (1  z )

Approximations: Wald

CI  e

ln(RRˆ )  z SE ln(RRˆ )


 
ˆ
ˆ
CI  RD  z  SE RD
Standard Errors (basic over i strata):
  
var ln ORˆ  
  

1
ai

1
bi

1
ci

i
c
d
i
i
ˆ
var ln RR  

bi N0 i
i ai N1i
  
a

b
i
i
ˆ
var ln IRR  
ai  bi
i
1
di

How do we measure precision?


Width of the confidence interval
Measured how?
–
–
–

If I tell you the 95% CI for an RR is 2 to 8, can you
tell me the point estimate?
Sqrt(U*L)
Difference measures, just subtract
Remember relative measures are on the
log scale, so width of a CI is measured by
the RATIO of the upper to the lower CI
Frequentist Intervals (4):
Interpretation
Conclusion about confidence
intervals

A CI used for hypothesis testing is an
abuse of the CI
–

The goal is precision, not significance
The goal of epi is precision, not
significance
–
–
A precise null estimate is just as important as a
precise significant estimate
An imprecise, statistically significant estimate is as
useless as a non-statistically significant, imprecise
estimate
What results are published or
highlighted in publications?



Parity
0
1
2
3
>=4
Find a publication with multiple results
Rank them in order of precision
Then see what is highlighted in the abstract
OR
1
1.08
1.14
1.65
1.45
Maternal age <35
LCL95 UCL95 width
0.86
0.87
1.13
0.93
1.35
1.49
2.4
2.26
1.57
1.71
2.12
2.43
rank
1
2
3
4
OR
1
1.11
1.67
1.24
2.41
Maternal age >= 35
LCL95 UCL95 width
0.65
0.99
0.69
1.41
1.88
2.82
2.23
4.12
2.89
2.85
3.23
2.92
rank
6
5
8
7
CIPRA Trial





Trial of nurse vs. Doctor managed HIV care
For primary results, co-investigators wanted
pvalues and confidence intervals
Didn’t want hypothesis testing even though
was aware people would do it anyway
I fought initially, and lost the debate
Put in both
CIPRA Trial

Reviewer comment:
Table 3: Column with p-values can be dropped
given that 95% confidence intervals are
presented;
perhaps mark significance as * (e.g. for
p<0.025) and ** (e.g. for p<0.005) after the
95% CI's.
Summary

Randomization gives meaning to statistics
–



Gives a probability distribution for confounding
When randomization doesn’t hold, we have no
probability distribution
Pvalues aren’t probability of chance, null, etc.
CIs allows us to assess precision
–
–
But are based on infinite repetitions
Do not contain the true value with 95% probability