Paired t-test, non

Download Report

Transcript Paired t-test, non

The paired t-test,
non-parametric tests, and ANOVA
July 13, 2004
Review: the Experiment
(note: exact numbers have been altered)

Grade 3 at Oak School were given an IQ test at the
beginning of the academic year (n=90).
 Classroom teachers were given a list of names of
students in their classes who had supposedly
scored in the top 20 percent; these students were
identified as “academic bloomers” (n=18).
 BUT: the children on the teachers lists had
actually been randomly assigned to the list.
 At the end of the year, the same I.Q. test was readministered.
The results
Children who had been randomly assigned
to the “top-20 percent” list had mean I.Q.
increase of 12.2 points (sd=2.0) vs. children
in the control group only had an increase of
8.2 points (sd=2.5)
Confidence interval (more
information!!)
95% CI for the difference: 4.0±1.99(.64) =
(2.7 – 5.3)
t-curve with 88 df’s
has slightly wider cutoff’s for 95% area
(t=1.99) than a normal
curve (Z=1.96)
The Paired T-test
The Paired T-test

Paired data means you’ve measured the same
person at different time points or measured pairs
of people who are related (husbands and wives,
siblings, controls pair-matched to cases, etc.

For example, to evaluate whether an observed
change in mean (before vs. after) represents a true
improvement (or decrease):
Null hypothesis: difference (after-before)=0
The differences are treated
like a single random variable
n
Xi
Yi
Xi - Yi
X1
Y1
D1
X2
Y2
D2
X3
Y3
D3
X4
Y4
D4
…
…
…
Xn
Yn
Dn
D
i 1
Dn 
n
n
D 
2

( Di  Dn ) 2
i 1
n 1
Dn  0
T=
SD
n
 SD
2
Example Data
baseline
Test2
improvement
10
10
9
8
12
11
11
7
6
9
9
10
9
9
12
13
8
11
12
13
11
8
9
8
9
9
-1
+2
+4
0
-1
+1
+2
+4
+2
0
-1
-1
0
Is there a significant increase in scores in
this group?
Average of differences = +1
Sample Variance = 3.3; sample SD = 1.82
T 12 = 1/(1.82/3.6) = 1.98
data _null_;
pval= 1-probt(1.98, 12);
put pval;
run;
0.0355517436
Significant for a one-sided
test; borderline for twosided test
Example 2: Did the control
group in the Oak School
experiment improve
at all during the year?
t71 
8.2
8.2

 28
2
.29
2.5
72
p-value <.0001
Confidence interval for annual
change in IQ test score
95% CI for the increase: 8.2±2.0(.29) = (7.6 –
8.8)
t-curve with 71 df’s
has slightly wider cutoff’s for 95% area
(t=2.0) than a normal
curve (Z=1.96)
Summary: parametric tests
True standard
deviation is known
One sample (or
paired sample)
Two samples
One-sample Z-test
Two-sample Z-test
Two-sample t-test
Standard deviation
is estimated by the
sample
One-sample t-test
Equal
variances
are pooled
Unequal
variances
(unpooled)
Non-parametric tests
Non-parametric tests

t-tests require your outcome variable to be
normally distributed (or close enough).
 Non-parametric tests are based on RANKS
instead of means and standard deviations
(=“population parameters”).
Example: non-parametric tests
10 dieters following Atkin’s diet vs. 10 dieters following
Jenny Craig
Hypothetical RESULTS:
Atkin’s group loses an average of 34.5 lbs.
J. Craig group loses an average of 18.5 lbs.
Conclusion: Atkin’s is better?
Example: non-parametric tests
BUT, take a closer look at the individual data…
Atkin’s, change in weight (lbs):
+4, +3, 0, -3, -4, -5, -11, -14, -15, -300
J. Craig, change in weight (lbs)
-8, -10, -12, -16, -18, -20, -21, -24, -26, -30
Enter data in SAS…
data nonparametric;
input loss diet $;
datalines ;
+4 atkins
+3 atkins
0
atkins
-3 atkins
-4 atkins
-5
atkins
-11 atkins
-14 atkins
-15 atkins
-300 atkins
-8 jenny
-10 jenny
-12 jenny
-16 jenny
-18 jenny
-20 jenny
-21 jenny
-24 jenny
-26 jenny
-30 jenny
;
run;
Jenny Craig
30
25
20
P
e
r
c 15
e
n
t
10
5
0
-30
-25
-20
-15
-10
-5
0
5
Weight Change
10
15
20
Atkin’s
30
25
20
P
e
r
c 15
e
n
t
10
5
0
-300
-280
-260
-240
-220
-200
-180
-160
-140
-120
-100
-80
Weight Change
-60
-40
-20
0
20
t-test doesn’t work…

Comparing the mean weight loss of the two
groups is not appropriate here.
 The distributions do not appear to be
normally distributed.
 Moreover, there is an extreme outlier (this
outlier influences the mean a great deal).
Statistical tests to compare
ranks:

Wilcoxon rank-sum test (equivalent to MannWhitney U test) is analogue of two-sample ttest.

Wilcoxon signed-rank test is analogue of onesample t-test, usually used for paired data
Wilcoxon rank-sum test







RANK the values, 1 being the least weight loss
and 20 being the most weight loss.
Atkin’s
+4, +3, 0, -3, -4, -5, -11, -14, -15, -300
1, 2, 3, 4, 5, 6, 9, 11, 12, 20
J. Craig
-8, -10, -12, -16, -18, -20, -21, -24, -26, -30
7, 8, 10, 13, 14, 15, 16, 17, 18, 19
Wilcoxon “rank-sum” test
Sum of Atkin’s ranks:
 1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 + 20=73
 Sum of Jenny Craig’s ranks:
7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137


Jenny Craig clearly ranked higher!
 P-value *(from computer) = .017
– from ttest, p-value=.60
*Tests in SAS…
/*to get wilcoxon rank-sum test*/
proc npar1way wilcoxon
data=nonparametric;
class diet;
var loss;
run;
/*To get ttest*/
proc ttest data=nonparametric;
class diet;
var loss;
run;
Wilcoxon “signed-rank” test
H0: median weight loss in Atkin’s group = 0
Ha:median weight loss in Atkin’s not 0
Atkin’s
 +4, +3, 0, -3, -4, -5, -11, -14, -15, -300
Rank absolute values of differences (ignore zeroes):
Ordered values: 300, 15, 14, 11, 5, 4, 4, 3, 3, 0
Ranks:
1 2 3 4 5 6-7 8-9 Sum of negative ranks: 1+2+3+4+5+6.5+8.5=30
Sum of positive ranks: 6.5+8.5=15
P-value*(from computer)=.043; from paired t-test=.27
*Tests in SAS…
/*to get one-sample tests (both
student’s t and signed-rank*/
proc univariate
data=nonparametric;
var loss;
where diet="atkins";
run;
What if data were paired?
e.g., one-to-one matching; find pairs of study
participants who have same age, gender,
socioeconomic status, degree of overweight,
etc.
Atkin’s
 +4, +3, 0, -3, -4, -5, -11, -14, -15, -300
J. Craig
 -8, -10, -12, -16, -18, -20, -21, -24, -26, -30
Enter data differently in SAS…
10 pairs, rather than 20
individual observations
data piared;
input lossa lossj;
diff=lossa-lossj;
datalines ;
+4 -8
+3 -10
0 -12
-3 -16
-4 -18
-5 -20
-11 -21
-14 -24
-15 -26
-300 -30
;
run;
*Tests in SAS…
/*to get all paired tests*/
proc univariate data=paired;
var diff;
run;
/*To get just paired ttest*/
proc ttest data=paired;
var diff;
run;
/*To get paired ttest, alternatively*/
proc ttest data=paired;
paired lossa*lossj;
run;
ANOVA
for comparing means between
more than 2 groups
ANOVA
(ANalysis Of VAriance)

Idea: For two or more groups, test difference
between means, for quantitative normally
distributed variables.
 Just an extension of the t-test (an ANOVA with
only two groups is mathematically equivalent to a
t-test).
 Like the t-test, ANOVA is “parametric” test—
assumes that the outcome variable is roughly
normally distributed
The “F-test”
Is the difference in the means of the groups more
than background noise (=variability within groups)?
Variabilit y between groups
F
Variabilit y within groups
Spine bone density vs.
menstrual regularity
1.2
1.1
1.0
S
P
I
N
E
0.9
Within group
variability
Between
group
variation
Within group
variability
Within group
variability
0.8
0.7
amenorrheic
oligomenorrheic
eumenorrheic
Group means and standard
deviations

Amenorrheic group (n=11):
– Mean spine BMD = .92 g/cm2
– standard deviation = .10 g/cm2

Oligomenorrheic group (n=11)
– Mean spine BMD = .94 g/cm2
– standard deviation = .08 g/cm2

Eumenrroheic group (n=11)
– Mean spine BMD =1.06 g/cm2
– standard deviation = .11 g/cm2
The size of the
groups.
Between-group
variation.
The F-Test
2
sbetween
The difference of
each group’s
mean from the
overall mean.
2
2
2
(.
92

.
97
)

(.
94

.
97
)

(
1
.
06

.
97
)
 ns x2  11* (
)  .063
3 1
2
swithin
 avg s 2  1 (.102  .082  .112 )  .0095
3
F2,30
The average
amount of
variation within
groups.
2
between
2
within
s

s
.063

 6.6
.0095
Large F value indicates
Each group’s variance.
that the between group
variation exceeds the
within group variation
(=the background
noise).
The F-distribution

The F-distribution is a continuous probability
distribution that depends on two parameters n
and m (numerator and denominator degrees
of freedom, respectively):
The F-distribution

A ratio of sample variances follows an Fdistribution:


2
between
2
within
The
F
~ Fn ,m
F-test tests the hypothesis that two sample
variances are equal.
will be close to 1 if sample variances are equal.
2
2
H 0 :  between
  within
H a :
2
between

2
within
ANOVA Table
Source of
variation
d.f.
Between k-1
(k groups)
Sum of
squares
Mean
Sum of
Squares
SSB
SSB/k-1
(sum of squared
deviations of
group means from
F-statistic
SSB
SSW
p-value
Go to
k 1
nk  k
Fk-1,nk-k
chart
grand mean)
Within
nk-k
(n individuals
per group)
Total
nk-1
variation
SSW
(sum of squared
deviations of
observations
from their
group mean)
s2=SSW/nk-k
TSS
(sum of squared
deviations of observations
from grand mean)
TSS=SSB + SSW
ANOVA=t-test
Source of
variation
Between
(2 groups)
Within
d.f.
1
2n-2
Sum of
squares
SSB
Squared
(squared difference
difference in means
in means)
SSW
equivalent to
numerator of
pooled
variance
Total
2n-1
variation
Mean
Sum of
Squares
TSS
Pooled
variance
F-statistic
p-value
Go to
(X  Y )
sp
2
2
(
X Y 2
)  (t 2 n  2 ) 2
sp
F1, 2n-2
Chart
notice
values
are just (t
2
2n-2)
ANOVA summary

A statistically significant ANOVA (F-test)
only tells you that at least two of the groups
differ, but not which ones differ.

Determining which groups differ (when it’s
unclear) requires more sophisticated
analyses to correct for the problem of
multiple comparisons…
Question: Why not just do 3
pairwise ttests?
Answer: because, at an error rate of 5% each test,
this means you have an overall chance of up to 1(.95)3= 14% of making a type-I error (if all 3
comparisons were independent)
 If you wanted to compare 6 groups, you’d have to
do 6C2 = 15 pairwise ttests; which would give you
a high chance of finding something significant just
by chance (if all tests were independent with a
type-I error rate of 5% each); probability of at
least one type-I error = 1-(.95)15=54%.

Multiple comparisons
With 18 independent
comparisons, we have
60% chance of at least 1
false positive.
Multiple comparisons
With 18 independent
comparisons, we expect
about 1 false positive.
Correction for multiple
comparisons





How to correct for multiple comparisons posthoc…
Bonferroni’s correction (adjusts p by most
conservative amount, assuming all tests
independent)
 Holm/Hochberg (gives p-cutoff beyond
which not significant)
 Tukey’s (adjusts p)
 Scheffe’s (adjusts p)
Non-parametric ANOVA
Kruskal-Wallis one-way ANOVA
Extension of the Wilcoxon Sign-Rank test
for 2 groups; based on ranks
Proc NPAR1WAY in SAS
Reading for this week

Chapters 4-5, 12-13 (last week)
 Chapters 6-8, 10, 14 (this week)