Transcript Chapter 6
Chapter 6
Introduction to Formal
Statistical Inference
Inferential Statistics
Two areas of statistics:
Descriptive Statistics
Inferential Statistics
Some Terminology
Quantities of a population are called
parameters and are typically denoted by
Greek letters
Quantities obtained from a sample are called
statistics and are typically denoted by
Roman letters
µ is a parameter, x is a statistic
Example
As a means of trying to estimate the mean
GPA of Bucknell students, a sample of 100
students yielded an average of 3.12.
The parameter of interest is the population
mean GPA (µ)
The statistic is the sample mean GPA of 3.12.
( ) x
Parameters
For every parameter of interest, there are
typically a number of statistics that can be
used for estimation purposes
If one is interested in the population mean,
the sample mean or sample median can be
used
If one is interested in the population variance,
the sample variance, sample range, or
sample IQR can be used
Sampling Distributions
The sampling distribution for a sample
statistic is the probability distribution of the
statistic
Sampling distributions are just like the
probability distributions discussed earlier (i.e.,
sampling distributions have a mean and
variance, usually dependent upon the sample
size)
Central Limit Theorem
If X1, X2, …Xn are iid random variables (with mean µ
and variance σ2), then for large n, the variable X
is approximately normally distributed.
That is, approximate probabilities can be calculated
using the normal distribution with mean µ and variance
σ2/n.
Z value for sample mean
x
z
VarX / n
x EX
Properties of
Sampling Distributions
A sample statistic used to estimate a population
parameter is called a point estimate (or point
estimator)
There are 2 properties that are desired for point
estimators:
The mean of the sampling distribution of the point
estimator is equal to the population parameter that it is
intending to estimate (i.e., the point estimator is an
unbiased estimator)
The point estimator has minimum variance among all
other point estimators
Sampling Distribution of Mean
x is always and unbiased point estimator of µ
There are 2 things that are always true about the
sampling distribution of x :
x
x
n
Applications
If the population is Normally distributed, then
X n ~ N x , x
Example
The weights of the jars of baby food are
Normally distributed with a mean of 137.2 g
and a standard deviation of 1.6 g.
What is the probability that if one jar was
selected at random, its weight would be more
than 140 grams?
Example
What is the probability that if nine jars were
selected at random, their average weight
would be more than 140 grams?
What if it’s not Normal?
If we don’t know the shape or the distribution or if
we know that it is not Normal, we can apply the
Central Limit Theorem to find out something about
the distribution.
For sufficiently large samples, the sampling
distribution of will be approximately Normal.
Typically, a sample size of 25 or 30 is “sufficiently
large”
The necessary sample size depends on the
skewness of the distribution of the population
The larger the sample size, the better the normality
Example
A soft-drink bottler purchases glass bottles from a vendor. The
bottles are required to have an internal pressure strength of at
least 150 pounds per square inch (psi). A prospective bottle
vendor claims that its production process yields bottles with a
mean internal strength of 157 psi and a standard deviation of 3
psi. The bottler strikes an agreement with the vendor that
permits the bottler to sample from the vendor’s production
process to verify the vendor’s claim. The bottler randomly
selects 40 bottles from the last 10,000 produced, measures the
internal pressure of each and fins that the mean pressure for the
sample to be 1.3 psi below the process mean cited by the
vendor.
Assuming the vendor’s claim to be true, what is the probability of
obtaining a sample mean this far or farther below the process
mean? What does your answer suggest about the validity of the
vendor’s claim?
Estimation Continued
Goals of Confidence Interval Estimation
Identify an interval of values likely to contain an
unknown parameter
Quantify how likely the interval is to contain the
correct value
Confidence Interval
A confidence interval for a parameter is a
data-based interval of numbers thought likely
to contain the parameter possessing a stated
probability-based confidence or reliability
A Large-n Confidence Interval
for µ Involving σ
point estimate ± margin of error
Gallup Poll
www.gallup.com
Results are based on telephone interviews with 825 likely voters,
aged 18 and older, conducted Oct. 10-12, 2008. For results based
on the total sample of likely voters, one can say with 95%
confidence that the maximum margin of sampling error is ±4
percentage points.
Interviews are conducted with respondents on land-line telephones
(for respondents with a land-line telephone) and cellular phones (for
respondents who are cell-phone only).
In addition to sampling error, question wording and practical
difficulties in conducting surveys can introduce error or bias into the
findings of public opinion polls.
Back to Baby Food Jars
Suppose we want to estimate the actual
mean weight of all baby food jars produced
at the plant
How can we do this?
What do we know?
Given that σ = 1.6 grams
Suppose we take a sample of 50 jars and
finds that their average weight is 142.7
grams.
Formula
Point estimate ± margin of error
xz
n
Z’s for Confidence Intervals
Desired Confidence
z
80%
1.28
90%
1.645
95%
1.96
98%
2.33
99%
2.58