Lecture 16 Inferences Involving Two samples
Download
Report
Transcript Lecture 16 Inferences Involving Two samples
Inferences based on TWO
samples
•New concept: Independent
versus dependent samples
•Comparing two population
means: Independent sampling
•Comparing two population
means: Dependent sampling
Inferences About Two Means
• In the previous chapter we used one sample
to make inferences about a single population.
Very often we are interested in comparing
two populations.
– 1) Is the average midterm grade in Stat 201.11
higher than the average midterm grade in Stat
201.12?
– 2) Is the average grade in Quiz #1 higher than
Quiz #2 in this section of Introductory Statistics?
2
Inferences about Two Means
• Each sample is an example of testing a
claim between two populations. However,
there is a fundamental difference between
1) and 2).
• In # 2) the samples are not independent
where as in # 1), they are.
• Why?
test.
1. Different people in each class.
2. Same people writing different
Recognizing independent
versus dependent samples
1. Is the average midterm grade in
Stat 201.11 higher than the average
midterm grade in Stat 201.12?
Independent samples
2. Is the average grade in Quiz #1
higher than Quiz #2 in this section
of Introductory Statistics?
Dependent samples
Definition. Independent and
Dependent Samples
Two samples are independent if the sample
selected from one population is not related
to the sample selected from the other
population.
If one sample is related to the other, the
samples are dependent. With dependent
samples we get two values for each
person, sometimes called paired-samples.
We consider first the case of
two dependent (or paired)
samples
•Calculations are very similar to
those in the previous chapter for a
CI or Test of Hypothesis involving
one sample
Organize work using a table
Sample 1
Sample 2
Difference
1
x1
y1
d1=x1 - y1
2
x2
y2
d2=x2 - y2
3
x3
y3
d3=x3 - y3
…
..l.
….
…..
n
xn
yn
dn=xn -yn
Organize work using a table
Sample 1
Sample 2
Difference
1
x1
y1
d1=x1 - y1
2
x2
y2
d2=x2 - y2
3
x3
y3
d3=x3 - y3
…
..l.
….
…..
n
xn
yn
dn=xn -yn
Can now use the methods of the previous chapter to find
a confidence interval for the population mean
of the
d
difference d between x1 and x2.
Notation for Two Dependent
Samples
d mean value of the difference s d
for the population of paired data.
d mean value of the difference s d for
the paired sample data
sd standard deviation of the difference s d
for the paired sample data
n number of pairs of data
Confidence Interval for the Mean Difference
(Dependent Samples: Paired Data )
The (1-a)*100% confidence interval for the mean
difference d is
sd
sd
d t n 1,a
d d t n 1,a
2
n
n
where d and sd are the mean and the standard
2
deviation of the difference s in the paired
sample data.
- - If n 30 then we assume that the population of
difference scores is normal.
- - If n 30 then z and are used instead of t and s.
Test Statistic for the Mean
Difference (Dependent Samples)
For n<30 the appropriate
test statistic for testing
the mean difference
between paired
samples is
with n-1 degrees of
freedom.
d d
t
sd
n
z
For n>30 then we use ‘z’
d d
d
n
We now turn to the more
challenging case of independent
samples
Testing Claims about the Mean
Difference (Independent Samples)
• When making claims about the mean difference
between independent samples a different
procedure is used than that for
dependent/paired samples.
• Again there are different procedures for large
(n>30) samples and small samples (n<30).
• In the small sample case, we must assume that
both populations are normal and have equal
variances.
Example
Suppose we wish to compare two brands
of 9-volt batteries, Brand 1 and Brand 2.
Specifically, we would like to compare
the mean life for the population of
batteries of Brand 1, 1, and the mean
life for the population of batteries of
Brand 2, 2. To obtain a meaningful
comparison we shall estimate the
difference of the two population means
by picking samples from the two
populations.
For Brand 1 a sample of size 64 was chosen.
x1 7.13
s1 1.4
For Brand 2 a sample of size 49 was chosen.
x 2 7.78
s2 1.2
From the data a point estimate for 1, would be
7.13. From the data a point estimate for 2 would
be 7.78.
It would therefore be natural for us to take as a point
estimate for (1-2) to be -0.65 hours.
Point Estimator
(Independent Samples)
The estimate x1 x2 is the best point estimator of
(1-2).
Having found a point estimate, our next goal is to
determine a confidence interval for it.
Point Estimator
(Independent Samples)
To construct a confidence interval for (1-2)
we need to know the distribution of its
point estimator.
The distribution of x1 x2 is normal with
mean (1-2) and standard deviation
(x x )
1
2
12
n1
22
n2
where n1 is the size of sample 1, n2 is the
size of sample 2.
Confidence Interval for
Difference in two Means
(Large samples or known
variance)
x1 x2 za
12
2
n1
22
n2
1 2 x1 x2 za
12
2
n1
22
n2
Example: Life span of Batteries
Let a = .05 so we are looking for the 95%
confidence interval for the mean
difference.
.65 1.96
1.42 1.22 .65 0.48825
1
2
64
49
1.14 1 2 0.16
What conclusion can you draw from the above?
Example: Life span of Batteries
Let a = .05 so we are looking for the 95%
confidence interval for the mean
difference.
.65 1.96
1.42 1.22 .65 0.48825
1
2
64
49
1.14 1 2 0.16
We are 95 percent certain that the difference is negative. Thus, we
are 95% certain that
1 2 0
1 2
Test Statistic for Two Means:
Independent and large samples
x x
z
1
2
1
2
1
n1
2
2
2
n2
Example: Life span of Batteries
• Hypothesis Testing. I claim that the two
brands of batteries do not have the same
life span. Using a 5% level of significance,
test this claim.
Example: Life span of Batteries
H 0 : 1 2
• Hypothesis
H A : 1 2
• Sample Data
x1 7.13
s1 1.4
x 2 7.78
s2 1.2
n 64
n 49
• Test Statistic
x x 7.13 7.78 0
z
2.65
1
2
12
n1
1
22
n2
2
1.42 1.2 2
64
49
Example: Life span of Batteries
• Critical Region
• Decision The test statistic lies in the critical region, therefore
we reject H0. The samples provide sufficient evidence to
claim that the Batteries do indeed have different life spans.
Exercise
Show that we would have rejected
the null hypothesis even if we had
used level of significance .008
(instead of .05. Thus…
We conclude that the mean
battery lives ARE different (p =
.008)
Overview
• Comparing Two Populations:
• Mean (Small Dependent (paired)
Samples)
– Asumptions: Samples are random plus eith
n>=30 or the population of differences is
approximately normal
• Mean (Large Independent Samples)
• Assumptions: Both samples are randomly chosen
plus both sample sizes >= 30.
NOTE
In the case of SMALL independent
samples, one must use the tdistribution plus additional
conditions must be satisfied AND
one must use what is called a
pooled estimate of the variance.
NOTE
In the case of SMALL independent
samples, one must use the tdistribution plus additional
conditions must be satisfied AND
one must use what is called a
pooled estimate of the variance.
You are not responsible for this
material