Transcript pptx file

Sampling and Sampling
Distributions
Sampling Distribution Basics
• Sample statistics (the mean and standard
deviation are examples) vary from sample to
sample.
• Sample statistics are computed from random
variables from a population and, as such are
random variables themselves.
• A sampling distribution is simply a probability
distribution of a sample statistic.
Sampling Distributions
• Generally we do not know the mean or variance
of a random variable; and
• Often the purpose of sampling is to estimate
parameters (mean, variance, etc.) of a
population. We use samples because:
– The population is too large for a census;
– It is too expensive to conduct a census; and/or
– The units must be destroyed in order to test the
variable(s) of interest, i.e. destructive testing.
Definitions
• A parameter is a numerical descriptive
measure of a population. It is calculated
from the observations in the population.
• A sample statistic is a numerical
descriptive measure of a sample. It is
calculated from the observations in the
sample.
Sample Statistics
• Sample mean (used to estimate the population
mean - a parameter);
• Sample median;
• Sample variance (used to estimate the
population variance - another parameter);
• Sample standard deviation (derived from the
sample variance and used to estimate the
population variance - another parameter).
Example
• We want to estimate the population mean:
– Two possible sample statistics
• Sample mean • Sample median -
x
m
– Which one should be used? For example, toss a die
three times and let x be the number of dots showing
on the up face. Suppose we have 2, 2, and 6 come
up:
•
•
•
•
Expected value (of the population) is:   3.5
Mean of x is: x 10/ 3 3.33
While median is: m  2
Which is closer to the true mean (expected value)?
Example, cont.
– What if we had sample measurements of 3, 4,
and 6?
• Expected value (of the population) is still:
  3.5
• Mean of x is: x 13/ 3 4.33 While median is: m  4
• Now which is closer to the true mean (expected
value)?
Sampling Statistics
• Since sampling statistics are random
variables, they must be compared on the
basis of their probability distributions - the
collection of values and associated
probabilities of each statistic that would
be obtained if the sampling experiment
were repeated a very large number of
times.
Definitions
• The sampling distribution for a sample
statistic (calculated from a sample of n
measurements) is the probability
distribution for the statistic; or
• The sampling distribution is a function that
gives the probability of every possible
value of a sample statistic for specified
population and sample size.
More Definitions
• A point estimator of a population parameter is a
rule or formula that tells us how to use the
sample data to create a single number that can
be used as an estimate of the population
parameter.
• If a sample statistic has a sampling distribution
with a mean equal to the population parameter
the statistic is intended to estimate, the statistic
is said to be an unbiased estimator of the
parameter.
And More Definitions
• If the mean of the sampling distribution is not
equal to the parameter, the statistic is said to be
a biased estimator of the parameter.
Sampling Distribution of the
Sample Mean
• Often we are interested in making an
inference about the mean of some
population,  . The sample mean is a
good choice as the estimator for  .

S
Point Estimates
estimates

estimates

13
Variability among Samples
23
24
23.5 mpg
25
26
27
28
29
27.5 mpg
14
Normal Distribution for the Mean
Distribution
Revisited
Useful Useful
Probabilities
for Normal
Distributions
68%
95%
99%







• Confidence intervals assume that the sample means
15
are normally distributed.
The Mean and Standard Deviation of
Sampling Distribution of x
• Regardless of the shape of the population relative
frequency distribution:
– The mean of the sampling distribution of x will equal
 , the mean of the sampled population.
– The standard deviation of the sampling distribution of x
will equal  , the standard deviation of the sampled
population divided by the square root of the sample
size n:

x 
n
(often referred to as the standard error of the mean)
Standard Error of the Mean
• A statistic that measures the variability of your
estimate is the standard error of the mean.
• It differs from the sample standard deviation
because
the sample standard deviation is a measure
of the variability of data
the standard error of the mean is a measure
of the variability of sample means.
Standard error of the mean =
s
n
=
s
X
17
Example
• Let x be a normally distributed random
variable with a mean of 89 and a standard
deviation of 12:
– What is the probability that the mean of a
sample of size n=19 will be between 85 and
93?
– What is the probability that the mean of a
sample of size n=40 will exceed 91?
Answer to First Part
x 

n
So,  x 
z
12
 2.753
19
x
x
85  89
 1.45
2.753
93  89
And , z 
 1.45
2.753
So, z 
p( 1.45  z  1.45)  0.4265  0.4265  0.8530
n  29, p( 1.8  z  1.8)  0.9266
Answer to Second Part
x 

n
12
So,  x 
 1.897
40
91  89
z
 1.05
1.897
p( z  1.05)  0.500  0.3531  0.1469
Example
• The population of orders for printing jobs at a
print shop is approximately normal with a mean
of 200 pages and a standard deviation of 40
pages. The shop is almost out of paper and it
has five orders that must be finished before a
shipment of paper can be expected. If the shop
has 1,200 sheets of paper left, what is the
probability that the five orders will not exhaust
the stock of paper?
• Hint: Find P( x  240)
Answer

x 
n
40
So,  x 
 17.889
5
240  200
z
 2.236
17.889
p( z  2.236)  0.500  0.4875  0.9875
Example
• Let x be a random variable with a mean of 1,200
and a standard deviation of 20:
– What is the probability that the mean of a sample of
size 80 will exceed 1,202?
– What is the probability that the mean of a sample of
size 50 will be less than 1,202?
– If the probability that the mean of a sample of size n
will exceed 1,201 is 0.25, what must n equal?
Answers
• Part 1 - 0.1867
• Part 2 - 0.7611
• Part 3 - 180
Central Limit Theorem
• If a random sample of n observations is
selected from a population, when n is
sufficiently large, the sampling distribution
of x will be approximately a normal
distribution. Typically, a sample size of n  30
is considered large enough. The larger the
sample size n, the better the normal
approximation.
Normality and the Central Limit Theorem
• To satisfy the assumption of normality, you can
do one of the following:
verify that the population distribution is
approximately normal
apply the central limit theorem
• The central limit theorem states that the distribution of
sample means is approximately normal, regardless of
the population distribution’s shape, if the sample size is
large enough.
• “Large enough” is usually approximately 30
observations. It is more if the data are heavily skewed,
and fewer if the data are symmetric.
26
Central Limit Theorem, Illustrated
27
Sampling Distribution of the
Proportion
• We are often interested in making an inference
about the proportion of some population, p.
• Examples:
– Proportion of freshman that graduate from Virginia
Tech in four years.
– Proportion of defective items in a lot.
– Proportion of a set of loans that will become
nonperforming.
The Sample Proportion and Standard
Deviation of the Number of Successes
• The sample proportion p is the value of the
random variable x divided by the sample
X
size.
p
n
• The standard deviation of the sampling
distribution is:
 
p(1  p)
n
Normal Approximation to the Sampling
Distribution of the Proportion
• Rules:
np  5
n (1  p )  5
• Z-value for sampling distribution for p:
Z 
p p
p
Example
• If a sample of size 100 is taken from a
population of size 1000 and the population
contains 300 successes:
– What is the probability that the sample
proportion of successes will be 0.35 or more?
– What is the probability that the sample
proportion of successes will be between 0.25
and 0.45?
Answers
• Part a:
p (1  p )
0.3(1  0.3)
 

 0.0458
n
100
0.35  0.30
z
 1.09
0.0458
p ( p  0.35)  p ( z  1.09)  0.5  0.3621  0.1379
• Part b:
p(0.25  p  0.45)  p( 1.09  z  3.28)  0.3621  0.5  0.8621
Example
• An advertising campaign for a new perfume has
a goal of reaching 50% of the women in the
target group. Suppose a national sample of 300
women from the target group is drawn to see
how the campaign in working. 129 women in
the group can recall seeing an ad or commercial
for the new perfume. If the population
proportion was 0.50, what is the probability of
observing a sample proportion of 0.43 or less in
a sample of 300?
Answer
p(1  p )
0.5(1  0.5)
 

 0.0289
n
300
p  p 0.43  0.5
Z

 2.42
p
0.0289
p( p  0.43)  p( z  2.42)  0.5  0.4922  0.0078
From Here To Inference
• The primary function of getting a sampling
distribution is to produce a statistical inference.
• Probability distributions allow us to make
probability statements about values of a random
variable. Thus, knowledge of the population and
its parameters allows us to use the probability
distribution to make probability statements about
individual members of the population.
From Here To Inference (cont.)
• With sampling distributions, knowledge of the
parameters and some information about the
distribution allow us to make probability statements
about a sample statistic.
• In applying both probability distributions and
sampling distributions, we must know the value of
relevant parameters, a highly unlikely circumstance.
In the real world, parameters are almost always
unknown because they represent descriptive
measurements about extremely large populations.
• Statistical inference addresses this problem—now
we will assume that most population parameters are
unknown.