The Statistical Imagination

Download Report

Transcript The Statistical Imagination

The Statistical Imagination
• Chapter 7. Using Probability
Theory to Produce Sampling
Distributions
© 2008 McGraw-Hill Higher Education
Sampling Error for a
Particular Sample
• Sampling error is the difference
between a calculated value of a sample
statistic and the true value of a
population parameter
• E.g., suppose the mean GPA on
campus is 2.60. A sample reveals a
mean of 2.80. The .20 difference is
sampling error
© 2008 McGraw-Hill Higher Education
Estimating the Parameters of
a Population
• Point estimate – a statistic provided
without indicating a range of error
• Point estimates are limited because a
calculation made for sample data is
only an estimate of a population
parameter. This is apparent when
different results are found with
repeated sampling
© 2008 McGraw-Hill Higher Education
Repeated Sampling
• Repeated sampling refers to the
procedure of drawing a sample and
computing its statistic, and then drawing a
second sample, a third, a fourth, and so on
• Repeated sampling reveals the nature of
sampling error
• An illustration of repeated sampling is
presented in Figure 7-1 in the text
© 2008 McGraw-Hill Higher Education
Symbols
• Sample statistics are usually
noted with English letters
• Population parameters are
usually noted with Greek letters
© 2008 McGraw-Hill Higher Education
What Repeated
Sampling Reveals
1. A given sample’s statistic will be slightly off from the
true value of its population’s parameter due to
sampling error
2. Sampling error is patterned, systematic, and
predictable
3. Sampling variability is mathematically predictable
from probability curves called sampling distributions
4. The larger the sample size, the smaller the range of
error
© 2008 McGraw-Hill Higher Education
A Sampling Distribution
• A mathematical description of all possible
sampling event outcomes and the
probability of each one
• Sampling distributions are obtained from
repeated sampling
• Many sampling distributions can be
displayed as probability curves; partitioning
(Chapter 6) tells us the probability of
occurrence of any sample outcome
© 2008 McGraw-Hill Higher Education
A Sampling Distribution
of Means
• A sampling distribution of means describes all
possible sampling event outcomes and the
probability of each outcome when means are
repeatedly calculated on an infinite number of
samples
• It answers the question: What would happen if we
repeatedly sampled a population using a sample
size of n, calculated each sample mean, and
plotted it on a histogram?
© 2008 McGraw-Hill Higher Education
Features of a Sampling
Distribution of Means
• A sampling distribution of means is
illustrated in the text in Figure 7-3. It
reveals that for an interval/ratio variable,
means calculated from a repeatedly
sampled population calculate to similar
values which cluster around the value of
the population mean
• Simply put: Sample means center on the
value of the population parameter
© 2008 McGraw-Hill Higher Education
The Normal Curve as a
Sampling Distribution
• When repeatedly sampling means for sample
sizes greater that 121 cases, a histogram plot
of the resulting means will fit the normal curve
• The X axis of a sampling distribution of
means is comprised of values of X-bars
• As with any normal curve, probabilities may
be calculated for specific values on the X-axis
© 2008 McGraw-Hill Higher Education
The Standard Error
• The standard error is the standard
deviation of a sampling distribution
• It measures the spread of sampling error
that occurs when a population is sampled
repeatedly
• Rather than repeatedly sample, we
estimate standard errors using the sample
standard deviation of a single sample
© 2008 McGraw-Hill Higher Education
The Law of Large Numbers
• The law of large numbers states that the
larger the sample size, the smaller the
standard error of the sampling distribution
• The relationship between sample size and
sampling error is apparent in the formula
for the standard error of the mean; a large
n in the denominator produces a small
quotient
© 2008 McGraw-Hill Higher Education
The Central Limit Theorem
• The central limit theorem states that
regardless of the shape of the raw score
distribution of an interval/ratio variable, its
sampling distribution:
1. will be normal when the sample size, n,
is greater than 121 cases and
2. will center on the true population mean
• This is illustrated in the text in Figure 7-8
© 2008 McGraw-Hill Higher Education
Sampling Distributions for
Nominal Variables
• A sampling distribution of proportions is
normal in shape when the smaller of P or Q
times n is greater than 5
• The larger the sample size, the smaller the
range of error
© 2008 McGraw-Hill Higher Education
Features of a Sampling Distribution for Nominal Variables
• The mean of a sampling distribution
of proportions is equal to the
probability of success ( P ) in the
population
• The standard error is estimated using
the probabilities of success and
failure in a sample
© 2008 McGraw-Hill Higher Education
Demystifying “Sampling
Distribution”
• Although we represent a sampling
distributions using formulas and a
probability curve, its occurrence is real
• To truly grasp how down to earth they
are, generate sampling distributions by
repeatedly sampling means and
proportions
© 2008 McGraw-Hill Higher Education
Keep Straight the
Assorted Symbols
• Take care to distinguish population
from sample from sampling
distribution
• Keep straight the symbols for each
of these entities
• See Figure 7-8 in the text
© 2008 McGraw-Hill Higher Education
Statistical Follies
• An appreciation of sampling distributions is a
key part of understanding statistics
• Poor understanding of sampling distributions
leads the statistically unimaginative person to
treat point estimates as though they are true
values of a population’s parameters
• Remember: A second sample will produce a
different point estimate
© 2008 McGraw-Hill Higher Education