Comparing Two Means

Download Report

Transcript Comparing Two Means


For 95 out of 100 (large) samples, the interval



x
x  1.96

n

But we
don’t know
 ?!
will contain the true population mean.
Inference for the Mean of a Population

To estimate m, we use a confidence interval
around x.


x  1.96


x

n
The confidence interval is built with , which
we replace with s (the sample std. dev.) if 
is not known.
t-distributions
s
n
xm
t
s
n
The “standard
error” of x.

The
“standard
error” of x.
For an SRS sample, the
one-sample t-statistic has
the t-distribution with n-1
degrees of freedom.
(see Table D)
t-distributions

t-distributions with k (=n-1) degrees of
freedom
–
–
–
–
are labeled t(k),
are symmetric around 0,
and are bell-shaped
… but have more variability than Normal
distributions, due to the substitution of s in the
place of .
Example: Estimating the level of
vitamin C



Data:
26 31 23 22 11
22 14 31
Find a 95% confidence interval for m.
A: (
,
)
Write it as “estimate plus margin of error”
STATA Exercise 1
STATA Exercise 2
STATA Exercise 2
STATA Exercises 3 and 4
STATA Exercise 5
Paired, unpaired tests

“Paired” tests compare each individual between two variables
and ask whether the mean difference (“gain” in this example) is
zero.
Ho: mean(pretest - posttest) = mean(diff) = 0
STATA Exercise 6
Robustness of t procedures

t-tests are only appropriate for testing a
hypothesis on a single mean in these cases:
–
–
–
If n<15: only if the data is Normally distributed
(with no outliers or strong skewness)
If n≥15: only if there are no outliers or strong
skewness
If n≥40: even if clearly skewed (because of the
Central Limit Theorem)
Comparing Two Means
Comparing Two Means


Suppose we make a change to the
registration procedure. Does this reduce the
number of mistakes?
Basically, we’re looking at two populations:
–
–

the before-change population (population 1)
the after-change population (population 2)
Is the mean number of mistakes (per
student) different? Is m1 – m2 = 0 or  0?
Comparing Two Means

Notice that we are not matching pairs. We
compare two groups.
Comparing Two Means
Population
Variable
Mean
Standard
Deviation
1
x1
m1
1
2
x2
m2
2
Comparing Two Means
Population
Sample
Size
Sample
Mean
Sample
Standard
Deviation
1
n1
x1
s1
2
n2
x2
s2
Comparing Two Means

The population, really, is every single student
using each registration procedure, an infinite
number of times.
–


Suppose we get a “good” result today: how do we
know it will be repeated tomorrow?
We can’t repeat the procedure an infinite
number of times, we only have a “sample”:
numbers from one year.
We estimate (m1 – m2) with (x1 – x2) .
Comparing Two Means

Remember x is a Random Variable. To
estimate m we need both x and the margin of
error around x, which is t *  x 

 x



n ,
 So we need to know 

n
or rather, the appropriate standard error for
this estimation.
Because we are estimating a difference, we
need the standard error of a difference.
Comparing Two Means
r=0


If the standard error for x1 is 1

Then the standard error for (x1 – x2) is
 12
n1
2

2

n2
n1
Two-sample significance test

x1  x2   m1  m 2 
t

2
1
n1


2
2
n2
STATA uses the Satterthwaite approximation as a
default.
This t* does not have a t-distribution because we are
replacing two standard deviations by their sample
equivalents.
STATA uses the Satterthwaite approximation as a default.
This t* does not have a t-distribution because we are replacing two
standard deviations by their sample equivalents.
STATA Exercise 7
STATA Exercise 5
Paired, unpaired tests


“Paired” tests compare each individual between two variables
and ask whether the mean difference (“gain” in this example) is
zero.
Ho: mean(pretest - posttest) = mean(diff) = 0
“Unpaired” tests take the mean of each variable and test
whether the difference of the means is zero.
Ho: mean(pretest) - mean(posttest) = diff = 0
ttest ego, by(group) unequal
STATA Exercise 8
Robustness and Small Samples

Two-sample methods are more robust than
one-sample methods.
–
More so if the two samples have similar shapes
and sample sizes.


STATA assumes that the variances are the same (what
the book calls “pooled t procedures”), unless you tell it
the opposite, using the unequal option.
Small samples, as always, make the test less
robust.
Pooled two-sample t procedures
Pooled two-sample t procedures


Suppose the two Normal population
distributions have the same standard
deviation.
Then the t-statistic that compares the means
of samples from those two populations has
exactly a t-distribution.
Pooled two-sample t procedures


The common, but unknown standard
deviation of both populations is . The
sample standard deviations s1 and s2
estimate .
The best way to combine these estimates is
to take a “weighted average” of the two,
using the dfs as the weights:
2
2




n

1
s

n

1
s
1
2
2
s 2p  1
n1  n2  2
THE POOLED TWO-SAMPLE T PROCEDURES
(assuming  is the same for both populations)
sp
1 1

n1 n2
Here, t* is the value for the t(n1 + n2 – 2) density curve with area C
between – t* and t*.
To test the hypothesis Ho: m1 = m2, compute the pooled two-sample t
statistic
x1  x2 
t
sp
1 1

n1 n2
And use P-values from the t(n1 + n2 – 2) distribution.
ttest ego, by(group)