Why you don’t do multiple t tests. Or any other test

Download Report

Transcript Why you don’t do multiple t tests. Or any other test

Why you don’t do multiple t tests.
Or any
other test, unless you have your eyes open….
Take random data and assemble into 2
piles, then test H0: no difference
between them. Using p = 0.05 you
know that you will reject this H0 1
time in 20. That is what p = 0.05
means.
hat
Now assemble into 3 piles, then
test H0: no difference between
teach pair: P1-P2, P1-P3, P2-P3
hat
p1
p2
p3
1 time in 20 p1-p2 is *
1 time in 20 p1-p3 is *
1 time in 20 p2-p3 is *
Now we ask what the probability is that we will end up accepting H0.
This involves accepting H0 in test 1 (P1P2), AND in P1-P3, AND in
P2P3. In each case the probability of accepting H0 is 0.95 (=1-p), but
the probability of accepting the 3 together is 0.95*0.95*0.95 = 0.857375
(nearly, but not quite, 1-3*p).
But if p(accepting H0) = 0.86, then p(rejecting H0) = 0.14. So in random
data you will reject H0 1 time in 7, not 1 in 20. So if you claim in your
write-up that you used p=0.05 you are lying, albeit probably unwittingly.
It is OK to do this PROVIDING you know what you are doing, and you
apply a more stringent criterion to each individual test. If you are doing
N different tests on subsets of the same data, each one should run at a
significance level of
P = 1-(1-α)1/N = 1- n
(1- α)
Where α is the final significance level.
3 tests, α = 0.05, adjusted p = 1-0.95^(1/3) = 0.017.