2 sample tests

Download Report

Transcript 2 sample tests

Testing for differences between 2 means
Does the mean weight of cats in Toledo differ from the
mean weight of cats in Cleveland?
Do the mean quiz scores of female and male
Biostatistics students differ?
Does the mean growth rate of plants given P fertilizer
differ from those not given P fertilizer
More from your own research ……..
Two tailed test
Null; H0: xbar1 = xbar2
Alternative; HA: xbar1 ≠ xbar2
Similar to one-sample, but….
Sample2 Mean
Sample1 Mean
t=
t-statistic
X1 – X2
s x1 – x2
SE of diff
between
sample means
The variance of the difference between two
(independent) variables equals the sum of the
variances of the two variables
Calculate from your sample data
Calculate pooled variance of the sample as best
estimate of the true population variance
SS1 + SS2
pooled variance =
df1 + df2
SE of diff
between
sample means
pooled var
s x1 – x2
=
+
n1
pooled var
n2
X1 – X2
t=
pooled var
n1
+
pooled var
n2
Excel demo
Ex. Test a new fertilizer (independent/predictor variable)
and measure plant height (dependent/response variable)
Diff between means
58
56
plant height (cm)
54
52
50
Series1
48
46
44
42
40
present fert
new fert
Now imagine that the variance was higher
Ratio of means and SE of difference
Ratio gets smaller -dividing by larger number
58
56
plant height (cm)
54
52
50
Series1
48
46
44
42
40
present fert
new fert
58
56
plant height (cm)
54
52
50
Series1
48
46
44
42
40
present fert
new fert
Diff is significant (because diff large relative to
variance), but is this difference biologically important?
100
90
plant height (cm)
80
4.5 cm (last example)
70
60
50
47 cm
40
30
20
10
0
present fert
new fert
Compare to very large diff between means, but large SE
Violations of the 2 sample t-test assumptions
1) Both samples come from normal populations with
equal variance
2) Samples collected randomly
T-test robust , especially at large sample size
How to from SAS HELP: The underlying assumption of
the t test in all three cases is that the observations are
random samples drawn from normally distributed
populations. This assumption can be checked using the
UNIVARIATE procedure; if the normality assumptions for
the t test are not satisfied, you should analyze your data
using the NPAR1WAY procedure.
Violations of the 2 sample t-test assumptions
3) Populations have equal varince
If population (sample) variances unequal then higher
Type I error than stated  (heteroscedastic) called the
Behrens-Fisher problem
Corrected t available when equal variance cannot be
assumed
How to from SAS HELP: PROC TTEST computes the group comparison t
statistic based on the assumption that the variances of the two groups are
equal. It also computes an approximate t based on the assumption that the
variances are unequal (the Behrens-Fisher problem). The degrees of freedom
and probability level are given for each; Satterthwaite's (1946) approximation is
used to compute the degrees of freedom associated with the approximate t. In
addition, you can request the Cochran and Cox (1950) approximation of the
probability level for the approximate t. The folded form of the F statistic is
computed to test for equality of the two variances (Steel and Torrie 1980).
4) The two populations must also be independent
Dealing with dependent samples on Friday