10.1 Inference Concerning the Difference Between Two Means

Download Report

Transcript 10.1 Inference Concerning the Difference Between Two Means

Essentials of Business Statistics: Communicating
with Numbers
By Sanjiv Jaggia and Alison Kelly
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Chapter 10 Learning Objectives
LO 10.1 Make inferences about the difference between two
population means based on independent sampling.
LO 10.2 Make inferences about the mean difference based on
matched-pairs sampling.
LO 10.3 Discuss features of the F distribution.
LO 10.4 Make inferences about the difference between three or
more population means using an analysis of variance
(ANOVA) test.
Statistical Inference Concerning Two Populations
10-2
10.1 Inference Concerning the Difference
Between Two Means
LO 10.1 Make inferences about the difference between two
population means based on independent sampling.
 Independent Random Samples



Two (or more) random samples are considered
independent if the process that generates one
sample is completely separate from the process that
generates the other sample.
The samples are clearly delineated.
m1 is the mean of the first population.
m2 is the mean of the second population.
Statistical Inference Concerning Two
Populations
10-3
LO 10.1

10.1 Inference Concerning the
Difference Between Two Means
Confidence Interval for m1  m2

X1  X 2 is a point estimator of m1  m2.


The values of the sample means x1 and x 2 are computed from
two independent random samples with n1 and n2 observations,
respectively.
Sampling distribution of X1  X 2 is assumed to be
normally distributed.


A linear combination of normally distributed random variables
is also normally distributed.
If underlying distribution is not normal, then by the central limit
theorem, the sampling distribution of X1  X 2 is approximately
normal only if both n1 > 30 and n2 > 30.
Statistical Inference Concerning Two
Populations
10-4
LO 10.1
10.1 Inference Concerning the
Difference Between Two Means
1. If s21 and s22 are known, a 100(1  a)%
confidence interval of the difference between
two population means m1  m2 is given by
 x1  x2   za /2
s 12
n1

Statistical Inference Concerning Two
Populations
s 22
n2
10-5
LO 10.1
10.1 Inference Concerning the
Difference Between Two Means
2. If s21 and s22 are unknown but assumed equal, a
100(1  a)% confidence interval of the difference
between two population means m1  m2 is given by
 x1  x2   ta /2,df
where
 1
1 
s 


n
 1 n2 
2
p
2
2
n

1
s

n

1
s




1
2
2
s2  1
p
n1  n2  2
and
2
1
s
and
s
2
are
2
the corresponding sample variances and
df  n1  n2  2
Statistical Inference Concerning Two
Populations
10-6
LO 10.1
10.1 Inference Concerning the
Difference Between Two Means
3. If s21 and s22 are unknown but cannot be assumed to
be equal, a 100(1  a)% confidence interval of the
difference between two population means
m1  m2 is given by
s12 s22

n1 n2
 x1  x2   ta /2,df
where
df 
s
2
1

s12 n1

2
n1  s
2
2

n2

2
 n1  1  s22 n2
 n
2
Statistical Inference Concerning Two
Populations
2
 1
10-7
LO 10.1

10.1 Inference Concerning the
Difference Between Two Means
Hypothesis Test for m1  m2

When conducting hypothesis tests concerning
m1  m2 , the competing hypotheses will take one
of the following forms:
where d0 is the hypothesized difference between
m1 and m2.
Statistical Inference Concerning Two
Populations
10-8
LO 10.1

10.1 Inference Concerning the
Difference Between Two Means
Test Statistic for Testing m1  m2 when the
sampling distribution for X1  X 2 is normal.
1. If s21 and s22 are known, then the test statistic is
assumed to follow the z distribution and its value is
calculated as
x1  x2   d0

z
s 12
n1

s 22
n2
Statistical Inference Concerning Two
Populations
10-9
10.1 Inference Concerning the
Difference Between Two Means
Test Statistic for Testing m1  m2 when the
sampling distribution for X1  X 2 is normal.
LO 10.1

2. If s21 and s22 are unknown but assumed equal, then
the test statistic is assumed to follow the
tdf distribution and its value is calculated as
tdf 
 x1  x2   d0
sp2 
2
2
n

1
s

n

1
s
 1  1  2  2
n1  n2  2
where


1
1
sp2   
 n1 n2  and df  n1  n2  2
Statistical Inference Concerning Two
Populations
10-10
LO 10.1

10.1 Inference Concerning the
Difference Between Two Means
Test Statistic for Testing m1  m2 when the sampling
distribution for X1  X 2 is normal.
3. If s21 and s22 are unknown and cannot be assumed
equal, then the
x1  x2   d0

test statistic is assumed to
tdf 
2
2


s
s
follow the tdf distribution and
1
2
  
its value is calculated as:
 n1 n2 
where
df 
s
2
1

s12 n1

2
n1  s
2
2
n2


2
 n1  1  s22 n2
 n
2
2
 1
is rounded down to the nearest integer.
Statistical Inference Concerning Two
Populations
10-11
10.2 Inference Concerning Mean Differences
LO 10.2 Make inferences about the mean difference based on
matched-pairs sampling.
 Matched-Pairs Sampling



Parameter of interest is the mean difference D where
D = X1  X2 , and the random variables
X1 and X2 are matched in a pair.
Both X1 and X2 are normally distributed or
n > 30.
For example, assess the benefits of a new medical
treatment by evaluating the same patients before
(X1) and after (X2) the treatment.
Statistical Inference Concerning Two
Populations
10-12
LO 10.2

10.2 Inference Concerning Mean
Differences
Confidence Interval for mD

A 100(1  a)% confidence interval of the mean
difference mD is given by
d  ta /2,df sD
n
where d and sD are the mean and the standard
deviation, respectively, of the n sample differences,
and df = n  1.
Statistical Inference Concerning Two
Populations
10-13
LO 10.2

10.2 Inference Concerning Mean
Differences
Hypothesis Test for mD

When conducting hypothesis tests concerning
mD, the competing hypotheses will take one
of the following forms:
where d0 typically is equal to 0.
Statistical Inference Concerning Two
Populations
10-14
LO 10.2

10.2 Inference Concerning Mean
Differences
Test Statistic for Hypothesis Tests About mD

The test statistic for hypothesis tests about mD
is assumed to follow the tdf distribution with
df = n  1, and its value is
tdf 
d  d0
sD
n
where d and sD are the mean and standard
deviation, respectively, of the n sample differences,
and d0 is a given hypothesized mean difference.
Statistical Inference Concerning Two
Populations
10-15
10.3 Inference Concerning Differences
among Many Means
LO 10.3 Discuss features of the F distribution



Inferences about the ratio of two population variances
2
are based on
ratio of the corresponding sample
s12 sthe
1
variances
.
These inferences are based on a new distribution: the
F distribution.
It is common to use the notation F(df1,df2) when
referring to the F distribution.
Statistical Inference Concerning Two
Populations
10-16
LO 10.3



10.3 Inference Concerning
Differences among Many Means
The distribution of the ratio of the sample variances is
the F(df1,df2) distribution.
Since the F(df1,df2) distribution is a family of
distributions, each one is defined by two degrees of
freedom parameters, one for the numerator and one
for the denominator
Here df1 = (n1 – 1) and df2 = (n2 – 1).
Statistical Inference Concerning Two
Populations
10-17
LO 10.3


10.3 Inference Concerning
Differences among Many Means
Fa,(df1, df2) represents a value such that the area in the
right tail of the distribution is a
With two df parameters, F tables occupy several
pages.
Statistical Inference Concerning Two
Populations
10-18
10.3 Inference Concerning Differences among
Many Means
LO 10.4 Make inferences about the difference between three or
more population means using an analysis of variance (ANOVA) test.



Analysis of Variance (ANOVA) is used to determine if
there are differences among three or more
populations.
One-way ANOVA compares population means based
on one categorical variable.
We utilize a completely randomized design,
comparing sample means computed for each
treatment to test whether the population means
differ.
Statistical Inference Concerning Two
Populations
10-19
LO 10.4

10.3 Inference Concerning
Differences among Many Means
The competing hypotheses for the one-way ANOVA:
H0: µ1 = µ2 = … = µc
HA: Not all population means are equal
Statistical Inference Concerning Two
Populations
10-20
LO 10.4



10.3 Inference Concerning
Differences among Many Means
We first compute the amount of variability between
the sample means.
Then we measure how much variability there is within
each sample.
A ratio of the first quantity to the second forms our
test statistic, which follows the F(df1,df2) distribution


Between-sample variability is measured with the Mean
Square for Treatments (MSTR)
Within-sample variability is measured with the Mean Square
for Error (MSE)
Statistical Inference Concerning Two
Populations
10-21
10.3 Inference Concerning
Differences among Many Means
LO 10.4

Calculating MSTR

Calculating MSE
ni
1.
c
1.
c
x
 x
i 1 j 1
ij
SSE   n1  1si2
i 1
nT
2. SSTR   ni x i  x 
c
2
2. MSE  SSE nT  c 
i 1
3. MSTR  SSTR c  1
Statistical Inference Concerning Two
Populations
10-22
LO 10.4

10.3 Inference Concerning
Differences among Many Means
Given H0 and HA (and a) the test statistic is calculated
as
F(df1 , df2 )  MSTR MSE
where df1 = (c-1) and df2 = (nT-c)
Statistical Inference Concerning Two
Populations
10-23