p - Math & Computer
Download
Report
Transcript p - Math & Computer
Biostatistics,
statistical software V.
Statistical errors, one-and two sided
tests. One-way and multifactor
analysis of variance.
Krisztina Boda PhD
Department of Medical Informatics,
University of Szeged
One- and two tailed (sided) tests
Two tailed test
H0: there is no change
Ha: There is change (in
either direction)
One-tailed test
H0: the change is
negative or zero
Ha: the change is positive
p-values: p(one-tailed)=p(two-tailed)/2
Krisztina Boda
INTERREG
2
Significance
Significant difference – if we claim that there is a
difference (effect), the probability of mistake is small
(maximum - Type I error ).
Not significant difference – we say that there is not
enough information to show difference. Perhaps
there is no difference
There is a difference but the sample size is small
The dispersion is big
The method was wrong
Even is case of a statistically significant difference
one has to think about its biological meaning
Krisztina Boda
INTERREG
3
Statistical errors
Truth
Decision
do not reject H0
reject H0 (significance)
H0 is true
correct
Type I. error
Ha is true
Type II. error
its probability:
correct
Krisztina Boda
its probability:
INTERREG
4
Error probabilities
The probability of type I error is known ( ).
The probability of type II error is not known
()
It depends on
Krisztina Boda
The significance level (),
Sample size,
The standard deviation(s)
The true difference between populations
others (type of the test, assumptions, design, ..)
The power of a test: 1-
ability to detect a real effect; probability to
have a significant p-value
INTERREG
5
The power of a test on case of fixed
sample size and , with two alternative
hypotheses
Krisztina Boda
INTERREG
6
ANOVA
Analysis of Variance
Comparison the mean of of several (>2),
normally distributed samples
Types:
One-way:
Control, treatment I, treatment II.
Two-way (treatment + sex)
Any „way” (factor) can be
„independent”
(„between-subjects”)
treatments
„repeated measures” („within-subjects”)
measured on the same patient
Krisztina Boda
INTERREG
sex,
data
7
Why not t-test (pair wise)?
We can get significant result only by chance at
every 20th case
CSOP
R1
R2
R3
R4
R5
R6
R7
R8
1. 00
- . 84
1. 73
2. 36
- . 30
- . 31
- . 31
- . 56
1. 58
1. 00
. 59
. 44
. 60
- . 75
- . 28
- 1. 51
- . 81
- . 12
1. 00
. 19
- . 73
- 1. 04
1. 27
. 69
- . 21
- . 52
- 1. 34
1. 00
- 1. 05
. 88
1. 27
1. 05
- . 87
. 68
- . 17
- . 15
1. 00
. 12
- . 75
- . 05
- 1. 13
2. 21
. 74
- . 90
- . 45
1. 00
1. 10
- . 20
- . 78
1. 02
. 67
. 18
- . 52
- . 34
1. 00
- . 19
- . 57
- . 41
2. 25
- 1. 26
- . 27
. 44
- 2. 52
1. 00
. 45
1. 20
2. 77
- . 17
- . 68
. 60
. 54
- . 37
1. 00
- . 58
- . 01
. 60
1. 66
2. 14
2. 31
- . 90
- 1. 75
1. 00
- . 39
. 93
- . 51
. 31
- . 60
- . 21
. 55
. 57
1. 00
- . 23
- 1. 21
- 1. 08
. 02
. 31
- 1. 28
1. 20
1. 62
1. 00
. 87
. 97
- 1. 04
. 60
- . 29
. 86
1. 09
- . 68
2. 00
. 42
- 1. 18
- . 64
- . 08
1. 10
. 39
- . 66
2. 12
2. 00
1. 26
- 2. 13
- 1. 78
- . 60
- 1. 25
- 1. 10
. 19
- 1. 54
2. 00
- . 60
- . 83
- . 94
1. 61
. 95
1. 37
. 10
- . 97
2. 00
- 1. 75
. 63
. 16
. 24
- . 25
1. 49
. 42
- 2. 01
2. 00
. 07
- . 33
- . 56
. 36
. 12
- . 48
. 78
- 1. 29
2. 00
. 15
. 85
. 10
- 2. 07
. 18
2. 14
1. 71
. 62
2. 00
. 98
- 1. 20
- . 46
- . 92
. 08
- 1. 37
. 80
- . 67
2. 00
- . 42
1. 05
- . 29
. 73
. 10
1. 42
. 79
1. 67
2. 00
2. 00
. 06
2. 24
- . 31
- . 13
- . 01
. 04
- . 45
2. 00
- 1. 85
- 1. 83
3. 35
1. 83
- . 12
- . 30
- 1. 68
. 57
2. 00
1. 06
- . 55
- . 36
- . 80
- 1. 41
- 1. 49
. 89
. 82
2. 00
- . 57
- 2. 15
2. 15
- . 99
- 1. 63
. 00
- . 41
1. 42
t - pr .
0. 882846 0. 053926 0. 96894 0. 205339 0. 418212 0. 928912 0. 391001 0. 508963
s i gn
4 Ty p e I e r r o r
Krisztina Boda
INTERREG
8
The increase of type I error
It can be shown that when t tests are used to test
for differences between multiple groups, the
chance of mistakenly declaring significance
(Type I Error) is increasing. For example, in the
case of 5 groups, if no overall differences exist
between any of the groups, using two-sample t
tests pair wise, we would have about 30%
chance of declaring at least one difference
significant, instead of 5% chance.
In general, the t test can be used to test the hypothesis that two group
means are not different. To test the hypothesis that three ore more group
means are not different, analysis of variance should be used.
Krisztina Boda
INTERREG
9
Each statistical test produces a ‘p’ value
If the significance level is set at 0.05 (false
positive rate) and we do multiple
significance testing on the data from a
single clinical trial,
then the overall false positive rate for the
trial will increase with each significance
test.
Krisztina Boda
INTERREG
10
False positive rate for each test = 0.05
Probability of incorrectly rejecting ≥ 1
hypothesis out of N testings
= 1 – (1-0.05)N=1-(1-)n
Krisztina Boda
INTERREG
11
Krisztina Boda
INTERREG
12
Compound hypotheses
(H01 and H02 and... H0n ) null hypotheses,
the significance levels are 1, 2, …, n
How to choose i-s so that the level of the
compound hypothesis (H01 and H02 and ...
H0n ) would be no greater than ?
(0,1)
Krisztina Boda
INTERREG
13
Bonferroni correction
The is divided by the number of
comparisons. (H01 and H02 and H0n ) is
rejected, if at least one pi</n
In case of many comparisons, this is too
conservative (will not show real
differences).
Krisztina Boda
INTERREG
14
Holm-modification
(SAS: step-down Bonferroni)
The pi-s are sorted. p1p2...pn
H0i is tested at leveln 1i
If any of them is significant, then reject (H01
and H02 and... H0n ) .
Pl. n=5
p1
p2
p3
p4
p5
Krisztina Boda
/5=0.01
if p1 is not smaller, then finish
/4=0.0125 ha p2 is not smaller, then finish
/3=0.0166 is not smaller, then finish
/2=0.025 ….
/1=0.05
INTERREG
15
FDR (false discovery rate)
Krisztina Boda
p1p2...pn
Begin with the greatest p-value, it remains the
same
The next is tested at level
n (n i )
Pl. n=5
p5
p4
p3
p2
p1
/(4*5)
/(3*5)
/(2*5)
/(1*5)=0.05
INTERREG
16
Correction of unique p-values
The SAS System
The Multtest Procedure
p-Values
Test
1
2
3
4
5
Krisztina Boda
Raw
0.9999
0.2318
0.3771
0.8231
0.0141
Stepdown
Bonferroni
1.0000
0.9272
1.0000
1.0000
0.0705
INTERREG
Hochberg
0.9999
0.9272
0.9999
0.9999
0.0705
False
Discovery
Rate
0.9999
0.5795
0.6285
0.9999
0.0705
17
One-Way ANOVA
Let us suppose that we have t independent samples (t
“treatment” groups) drawn from normal populations with
equal variances ~N(µi,).
Assumptions:
Independent samples
normality
Equal variances
Krisztina Boda
Null hypothesis: population means are equal,
µ1=µ2=.. =µt
INTERREG
18
http://lib.stat.cmu.edu/DASL/Stories/CancerSurvival.html.
Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the
supportive treatment of cancer: re-evaluation of prolongation of survival
times in terminal human cancer. Proceedings of the National Academy of
Science USA, 75, 4538Ð4542.
5000
70
4000
60
60
63
3000
50
52
34
40
2000
34
30
20
SQSURV
SURVIVAL
23
0
-1000
10
55
0
N=
N=
13
17
17
6
11
Stomach
Bronchus
Colon
Ovary
Breast
13
17
17
6
11
Stomach
Bronchus
Colon
Ovary
Breast
GROUP
GROUP
Original
Krisztina Boda
23
7
1000
Square root transformed
INTERREG
19
Method
If the null hypothesis is true, then the populations are the
same: they are normal, and they have the same mean
and the same variance. This common variance is
estimated in two distinct ways:
between-groups variance
within-groups variance
If the null hypothesis is true, then these two distinct
estimates of the variance should be equal
‘New’ (and equivalent) null hypothesis: 2between=2within
their equality can be tested by an F ratio test
The p-value of this test:
if p>0.05, then we accept H0. The analysis is complete.
if p<0.05, then we reject H0 at 0.05 level. There is at least one
group-mean different from one of the others
Krisztina Boda
INTERREG
20
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
.
0
0
1
2
3
4
0
1
2
3
4
a)
b
Random samples drawn from normal distribution with equal (a) and uneqal (b) means and unique
dispersion.
Krisztina Boda
INTERREG
21
The ANOVA table
Source of
variation
Between
groups
Within groups
Sum of squares
Qk
Qb
Q
Total
t
i 1
ni ( x i x )
t
ni
i 1
j 1
t
ni
i 1
j 1
Degrees of
freedom
Variance
F
p
sk2
F 2
sb
p
t-1
Q
s k
t 1
N-t
sb2
2
( xij xi ) 2
( xij x) 2
2
k
Qb
N t
N-1
ANOVA
SQSURV
Sum of
Squares
Between Groups 3295.038
Within Groups
7495.266
Total
10790.304
Krisztina Boda
df
4
59
63
Mean Square
823.759
127.038
INTERREG
F
6.484
Sig.
.000
22
Pairwise comparisons
As the two-sample t-test is inappropriate to do this, there are special tests for multiple
comparisons that keep the probability of Type I error as . The most often used multiple
comparisons are the modified t-tests.
Modified t-tests(LSD)
Bonferroni: α/(number of comparisons)
Scheffé
Tukey
Dunnett: a test comparing a given group (control) with the others
Multiple Comparisons
Dependent Variable: SQSURV
a
Dunnett t (2-sided)
(I) GROUP
Stomach
Bronchus
Colon
Ovary
(J) GROUP
Breas t
Breas t
Breas t
Breas t
Mean
Difference
(I-J)
-18.8090*
-19.9927*
-13.5661*
-7.6217
Std. Error
4.61748
4.36140
4.36140
5.72032
Sig.
.001
.000
.010
.474
95% Confidence Interval
Lower Bound Upper Bound
-30.3632
-7.2547
-30.9062
-9.0793
-24.4796
-2.6526
-21.9355
6.6922
*. The mean difference is significant at the .05 level.
a. Dunnett t-tes ts treat one group as a control, and compare all other groups against it.
Krisztina Boda
INTERREG
23
Example
http://lib.stat.cmu.edu/DASL/Stories/ReadingComprehension.html
Krisztina Boda
Researchers at Purdue University conducted an
experiment to compare three methods of teaching
reading.
Students were randomly assigned to one of the three
teaching methods, and their reading comprehension was
tested before and after they received the instruction.
Several different measures of reading comprehension,
from both the pre- and posttests are included in the
dataset.
Reference: Moore, David S., and George P. McCabe
(1989). Introduction to the Practice of Statistics. Original
source: study conducted by Jim Baumann and Leah
Jones of the Purdue University Education Department.
INTERREG
24
Krisztina Boda
INTERREG
25
ANOVA
POST2 Posttest s core on s econd reading c omprehension measure
Between Groups
W ithin Groups
Total
Krisztina Boda
Sum of
Squares
95.121
356.409
451.530
df
2
63
65
Mean Square
47.561
5.657
INTERREG
F
8.407
Sig.
.001
26
Multiple Com pari sons
Dependent Variable: POST2 P ostt est s core on sec ond reading comprehens ion measure
LS D
(I) groupcode Type
of instruction that
student rec eived
1 Bas al
2 DRTA
3 Strat
Bonferroni
1 Bas al
2 DRTA
3 Strat
(J) groupcode Type
of instruction that
student rec eived
2 DRTA
3 Strat
1 Bas al
3 Strat
1 Bas al
2 DRTA
2 DRTA
3 Strat
1 Bas al
3 Strat
1 Bas al
2 DRTA
Mean
Difference
(I-J)
St d.
-.682
-2. 818*
.682
-2. 136*
2.818*
2.136*
-.682
-2. 818*
.682
-2. 136*
2.818*
2.136*
E rror
.717
.717
.717
.717
.717
.717
.717
.717
.717
.717
.717
.717
Sig.
.345
.000
.345
.004
.000
.004
1.000
.001
1.000
.012
.001
.012
95% Confidenc e Interval
Lower Bound Upper Bound
-2. 11
.75
-4. 25
-1. 39
-.75
2.11
-3. 57
-.70
1.39
4.25
.70
3.57
-2. 45
1.08
-4. 58
-1. 05
-1. 08
2.45
-3. 90
-.37
1.05
4.58
.37
3.90
*. The mean differenc e is significant at the .05 level.
Krisztina Boda
INTERREG
27
Nonparametric one-way ANOVA
Kruskal-Wallis test.
Krisztina Boda
As a result, it gives one pvalue. If it is nit significant, the
null hypothesis is accepted.
If the null hypothesis is
rejected, further tests are
required to make pairwise
comparisons. These pairwise
comparisons are generally
not available in standard
statistical packages. Pairwise
comparisons can be
performed by Mann Whitney
U tests and p-values can be
corrected by Bonferroni
correction
Test Statisticsa, b
Chi-Square
df
Asymp. Sig.
INTERREG
SURVIVAL
14.954
4
.005
a. Kruskal Wallis Tes t
b. Grouping Variable: GROUP
28
Two-way ANOVA, example
Does systolic blood pressure depend on
Diabetes or not
Male or female
Independent factors
Krisztina Boda
INTERREG
29
Two-way repeated measurements ANOVA
Krisztina Boda
Does QT widening in the Langendorffperfused rat heart represent the effect of
repolarization delay or conduction
slowing? J Cardiovasc Pharmacol. 42
(2003) 612-21
INTERREG
30
Effect of regional ischemia and K+ content of the perfusion solution on the
QT90 interval (A) and heart rate (B)
in drug-free isolated rat hearts (n = 12 hearts per group). (mean ± SEM)
B.
A.
450
3 mM K+
5 mM K+
Heart rate (beats/min)
100
QT90 (ms)
90
80
70
60
400
350
300
250
50
-10
-5
0
5
10
15
20
25
-5
0
5
10
15
20
25
Time (min)
Time (min)
Krisztina Boda
-10
INTERREG
31
Krisztina Boda
Frequently, separate univariate analyses are
used for every time point and take no account
the fact that data are related in time. A second
problem is the frequent occurrence of missing
values in the data. A repeated measurement
ANOVA model is more appropriate (Brown and
Prescott).
repeated testing is taking place and therefore a
significant effect is more likely to occur at some
time point by chance.
INTERREG
32
Repeated measurement ANOVA model
We can examine:
B.
450
The treatment effect
(K+)
Time-effect
Their interaction
Heart rate (beats/min)
*
KALIUM
time
KALIUM*time
Krisztina Boda
400
* *
** *
350
*
*
*
*
20
25
300
250
Type 3 Tests of Fixed Effects
Effect
3 mM K+
5 mM K+
-10
-5
0
5
10
15
Time (min)
Num
DF
Den
DF
F Value
Pr > F
1
9
9
22
198
198
9.14
21.70
0.54
0.0063
<.0001
0.8465
In high potassium concentration the heart
rate is significantly higher, independently of
the time it was measured
INTERREG
33
Review questions and exercises
Problems to be solved by handcalculations
..\Handouts\Problems hand V.doc
Solutions
..\Handouts\Problems hand V solutions.doc
Problems to be solved using computer
..\Handouts\Problems comp V.doc,
..\Handouts\Problems comp V solutions.doc
Krisztina Boda
INTERREG
34
Useful WEB pages
Krisztina Boda
http://www-stat.stanford.edu/~naras/jsm
http://www.ruf.rice.edu/~lane/rvls.html
http://my.execpc.com/~helberg/statistics.html
http://www.math.csusb.edu/faculty/stanton/m26
2/index.html
INTERREG
35