Comparing Two Means
Download
Report
Transcript Comparing Two Means
For 95 out of 100 (large) samples, the interval
x
x 1.96
n
But we
don’t know
?!
will contain the true population mean.
Inference for the Mean of a Population
To estimate m, we use a confidence interval
around x.
x 1.96
x
n
The confidence interval is built with , which
we replace with s (the sample std. dev.) if
is not known.
t-distributions
s
n
xm
t
s
n
The “standard
error” of x.
The
“standard
error” of x.
For an SRS sample, the
one-sample t-statistic has
the t-distribution with n-1
degrees of freedom.
(see Table D)
t-distributions
t-distributions with k (=n-1) degrees of
freedom
–
–
–
–
are labeled t(k),
are symmetric around 0,
and are bell-shaped
… but have more variability than Normal
distributions, due to the substitution of s in the
place of .
Example: Estimating the level of
vitamin C
Data:
26 31 23 22 11
22 14 31
Find a 95% confidence interval for m.
A: (
,
)
Write it as “estimate plus margin of error”
STATA Exercise 1
STATA Exercise 2
STATA Exercise 2
STATA Exercises 3 and 4
STATA Exercise 5
Paired, unpaired tests
“Paired” tests compare each individual between two variables
and ask whether the mean difference (“gain” in this example) is
zero.
Ho: mean(pretest - posttest) = mean(diff) = 0
STATA Exercise 6
Robustness of t procedures
t-tests are only appropriate for testing a
hypothesis on a single mean in these cases:
–
–
–
If n<15: only if the data is Normally distributed
(with no outliers or strong skewness)
If n≥15: only if there are no outliers or strong
skewness
If n≥40: even if clearly skewed (because of the
Central Limit Theorem)
Comparing Two Means
Comparing Two Means
Suppose we make a change to the
registration procedure. Does this reduce the
number of mistakes?
Basically, we’re looking at two populations:
–
–
the before-change population (population 1)
the after-change population (population 2)
Is the mean number of mistakes (per
student) different? Is m1 – m2 = 0 or 0?
Comparing Two Means
Notice that we are not matching pairs. We
compare two groups.
Comparing Two Means
Population
Variable
Mean
Standard
Deviation
1
x1
m1
1
2
x2
m2
2
Comparing Two Means
Population
Sample
Size
Sample
Mean
Sample
Standard
Deviation
1
n1
x1
s1
2
n2
x2
s2
Comparing Two Means
The population, really, is every single student
using each registration procedure, an infinite
number of times.
–
Suppose we get a “good” result today: how do we
know it will be repeated tomorrow?
We can’t repeat the procedure an infinite
number of times, we only have a “sample”:
numbers from one year.
We estimate (m1 – m2) with (x1 – x2) .
Comparing Two Means
Remember x is a Random Variable. To
estimate m we need both x and the margin of
error around x, which is t * x
x
n ,
So we need to know
n
or rather, the appropriate standard error for
this estimation.
Because we are estimating a difference, we
need the standard error of a difference.
Comparing Two Means
r=0
If the standard error for x1 is 1
Then the standard error for (x1 – x2) is
12
n1
2
2
n2
n1
Two-sample significance test
x1 x2 m1 m 2
t
2
1
n1
2
2
n2
STATA uses the Satterthwaite approximation as a
default.
This t* does not have a t-distribution because we are
replacing two standard deviations by their sample
equivalents.
STATA uses the Satterthwaite approximation as a default.
This t* does not have a t-distribution because we are replacing two
standard deviations by their sample equivalents.
STATA Exercise 7
STATA Exercise 5
Paired, unpaired tests
“Paired” tests compare each individual between two variables
and ask whether the mean difference (“gain” in this example) is
zero.
Ho: mean(pretest - posttest) = mean(diff) = 0
“Unpaired” tests take the mean of each variable and test
whether the difference of the means is zero.
Ho: mean(pretest) - mean(posttest) = diff = 0
ttest ego, by(group) unequal
STATA Exercise 8
Robustness and Small Samples
Two-sample methods are more robust than
one-sample methods.
–
More so if the two samples have similar shapes
and sample sizes.
STATA assumes that the variances are the same (what
the book calls “pooled t procedures”), unless you tell it
the opposite, using the unequal option.
Small samples, as always, make the test less
robust.
Pooled two-sample t procedures
Pooled two-sample t procedures
Suppose the two Normal population
distributions have the same standard
deviation.
Then the t-statistic that compares the means
of samples from those two populations has
exactly a t-distribution.
Pooled two-sample t procedures
The common, but unknown standard
deviation of both populations is . The
sample standard deviations s1 and s2
estimate .
The best way to combine these estimates is
to take a “weighted average” of the two,
using the dfs as the weights:
2
2
n
1
s
n
1
s
1
2
2
s 2p 1
n1 n2 2
THE POOLED TWO-SAMPLE T PROCEDURES
(assuming is the same for both populations)
sp
1 1
n1 n2
Here, t* is the value for the t(n1 + n2 – 2) density curve with area C
between – t* and t*.
To test the hypothesis Ho: m1 = m2, compute the pooled two-sample t
statistic
x1 x2
t
sp
1 1
n1 n2
And use P-values from the t(n1 + n2 – 2) distribution.
ttest ego, by(group)