Transcript Lecture09
Lecture 9
Sampling Distributions
1
Background
We want to learn about the feature of a population (parameter)
In many situations, it is impossible to examine all elements of a
population because elements are physically inaccessible, too costly
to do so, or the examination involved may destroy the item.
Sample is a relatively small subset of the total population.
We study a random sample to draw conclusions about a population,
this is where statistics come into the picture.
Statistics, such as the sample mean and sample variance, computed
from sample measurements, vary from sample to sample.
Therefore, they are random variables.
The probability distribution of a statistic is called a sampling
distribution.
2
Sampling Distributions
A sampling distribution is a distribution of all of the possible values of a
statistic for a given size sample selected from a population
Sampling
Distributions
Sampling
Distribution of
the Mean
Sampling
Distribution of
the Proportion
3
Developing a
Sampling Distribution
Assume there is a population …
Population size N=4
A
B
C
D
Random variable, X,
is age of individuals
Values of X: 18, 20,
22, 24 (years)
4
Developing a
Sampling Distribution
(continued)
Summary Measures for the Population Distribution:
X
μ
P(x)
i
N
.3
18 20 22 24
21
4
σ
(X μ)
i
N
.2
.1
0
2
2.236
18
20
22
24
A
B
C
D
x
5
Sampling with replacement
Samples
Age
Sample means
A, A
18, 18
18
A, B
18, 20
19
A, C
18, 22
20
A, D
18, 24
21
B, A
20, 18
19
B, B
20, 20
20
B, C
20, 22
21
B, D
20, 24
22
C, A
22, 18
20
C, B
22, 20
21
C, C
22, 22
22
C, D
22, 24
23
D, A
24, 18
21
D, D
24, 20
22
D, C
24, 22
23
D, D
24, 24
24
6
Developing a
Sampling Distribution
(continued)
Sampling Distribution of All Sample Means
Sample Means
Distribution
16 Sample Means
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
_
P(X)
.3
.2
.1
0
18 19
20 21 22 23
24
_
X
7
Developing a
Sampling Distribution
(continued)
Summary Measures of this Sampling Distribution (note
that N=16 for the population of sample means):
μX
X
N
σX
i
18 19 21 24
21
16
2
(
X
μ
)
i X
N
(18 - 21)2 (19 - 21)2 (24 - 21)2
1.58
16
8
Comparing the Population with its
Sampling Distribution (with replacement)
Sample Means Distribution
n=2
Population
N=4
μ 21
σ 2.236
μ X 21
_
P(X)
.3
P(X)
.3
.2
.2
.1
.1
0
18
20
22
24
A
B
C
D
X
0
18 19
σ X 1.58
20 21 22 23
24
2
_
X
9
Mean and standard error of the sample
Mean (sample with replacement)
The mean of the distribution of sample mean:
X
A measure of the variability in the mean from sample to sample is
given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)
σX
σ
n
Note that the standard error of the mean decreases as the sample
size increases
10
If the Population is Normal
If a population is normal with mean μ and standard
deviation σ,
The sampling distribution of X is also normally distributed
with
μX μ
and
σ
σX
n
Or, equivalently, the sampling distribution of
normally distributed with
μ
Xi
nμ and
σ
Xi
n
X
i 1
i
is
n
11
Sampling Distribution Properties
(continued)
As n increases,
Larger
sample size
σ x decreases
Smaller
sample size
μ
x
12
If the Population is not normal
The central limit theorem states that when the number of
observations in each sample (called sample size) gets large enough
The sampling distribution of X is approximately normally
distributed with
μX μ
σ
σX
n
and
n
Or, equivalently, the sampling distribution of
approximately normally distributed with
μ
Xi
nμ
and
σ
Xi
X
i 1
i
is also
n
13
Z value for means
Standardize the sample mean:
Z
X
n
14
Visualizing the Central Limit
Theorem
Population Distribution
Sampling distribution
properties:
Central Tendency
μx μ
σ
σx
n
Variation
μ
x
Sampling Distribution
(becomes normal as n increases)
Larger
sample
size
Smaller
sample size
μx
x
15
How Large is Large Enough?
For most distributions, n > 30 will give a
sampling distribution that is nearly normal
For fairly symmetric distributions, n > 15
Recall that, for normal population distributions,
the sampling distribution of the mean is always
normally distributed regardless of sample size n
16
Calculating probabilities
Suppose we want to find out
P ( a X b)
If the population is normal, then regardless of the value of
n:
a
a
P(a X b) P
Z
n
n
If the population is not normal, then, when n is large
enough (n > 30)
a
a
P(a X b) P
Z
n
n
17
Example
Suppose a population has mean μ = 10 and
standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.
What is the probability that the sample mean is
between 9.7 and 10.3?
18
Example
(continued)
Solution:
Even if the population is not normally
distributed, the central limit theorem can be
used (n > 30)
… so the sampling distribution of
approximately normal
x
is
… with mean μx = 10
σ
3
…and standard deviation σ x n 36 0.5
19
Example
(continued)
Solution (continued):
9.7 - 10
X -μ
10.3 - 10
P(9.7 X 10.3) P
3
σ
3
36
n
36
P(-0.6 Z 0.6) 0.6514
Population
Distribution
???
?
??
?
?
?
?
?
μ 10
Sampling
Distribution
Sample
?
X
9.7
10
10.3
μ X 10
x
20
One more example
Time spent using e-mail per session is normally
distributed with =8 minutes and =2 minutes.
1. If a random sample of 25 sessions were selected, what
proportion of the sample mean would be between 7.8
and 8.2 minutes?
21
Example (Cont’d)
2. If a random sample of 100 sessions were selected, what proportion
of the sample mean would be between 7.8 and 8.2 minutes?
3. What sample size would you suggest if it is desired to have at least
0.90 probability that the sample mean is within 0.2 of the population
mean?
22
Sampling Distribution
of the Proportion
Sampling
Distributions
Sampling
Distribution of
the Mean
Sampling
Distribution of
the Proportion
23
Population Proportions
In Bernoulli trials, let
π = the proportion of successes
Recall that Y = the number of successes in n Bernoulli trials follows
Bin(n, π)
For the ith Bernoulli trial, Define
1 if the ith outcome is a " success"
Xi
0 if the ith outcome is a " failure"
Then, obviously
E ( X i ) and ( X i ) (1 )
24
Population proportions (Cont’d)
For large n, apply the CLT to sample mean and sum
n
pX
X
i 1
n
i
(1 - ) 2
is approximat ely distribute d as N ,
n
n
Y X i is approximat ely distribute d as N n ,
i 1
n (1 - )
2
How large is large?
n 5 and n( 1-) 5
Or
np 5 and n( 1-p) 5
25
Z-Value for Proportions
Standardize p to a Z value with the formula:
p
Z
σp
p
(1 )
n
26
Example
If the true proportion of voters who support
Proposition A is π = 0.4, what is the probability
that a sample of size 200 yields a sample
proportion between 0.40 and 0.45?
i.e.: if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
27
Example
(continued)
Find σ p : σ (1 ) 0.4(1 0.4) 0.03464
p
n
Convert to
standard P(0.40 p 0.45)
normal:
200
0.45 0.40
0.40 0.40
P
Z
0.03464
0.03464
P(0 Z 1.44) 0.4251
28
Review example
The number of claims received by an automobile insurance company
on collision insurance on one day follows the following probability
distribution:
x
0
1
2
3
4
p(x)
0.65
0.2
0.1
0.03
0.02
4
With
xp( x) 0.57
x 0
4
x
2
p ( x) 2 0.93
x 0
Suppose the number of claims received are independent from day to
day.
29
Review example (cont’d)
For a 50-day period, Find the probability of the following
events:
1) The total number of claims exceeds 20
2) On more than 20 days, at least one claim is received
30
Sampling distribution of difference
of two independent populations
An important estimation problem involves the
comparison of means of the two populations.
For example, you may want to make
comparisons like these:
The average scores on GRE for students who
majored in mathematics versus chemistry
The average income for male and female college
graduates
The proportion of patients receiving different
medications who recovered from a certain disease
31
Sample distributions of difference
of two independent sample means
Suppose there are two populations
Population Mean
I
II
1
2
S.d.
1
2
Independent random samples of size n₁ and n₂ observations have
been selected from the two populations with sample means X 1
and X 2 respectively
Recall that when n₁ and n₂ are large, X 1and X 2are approximately
normally distributed with
1
2
E X 1 1 , X 1
E X 2 2 , X 2
n1
n2
32
Since the two samples are independent
X X 1 2
1
2
2
σ X1 X 2
2
σ1 σ 2
n1 n 2
Standardize:
Z
X
1
X 2 1 2
2
2
σ1 σ 2
n1 n 2
33
Example
A light bulb factory operates two different
types of machines. The mean life expectancy
is 385 hours from machine I and 365 hours
from machine II. The process standard
deviation of life expectancy of machine I is 110
hours and of machine II is 120 hours.
What is the probability that the average life
expectancy of a random sample of 100 light
bulbs from Machine I is shorter than the
average life expectancy of 100 light bulbs from
Machine II?
34
Example (Cont’d)
Note that
n1 100, n2 100
1 385, 2 365
1 110, 2 120
Therefore
0 385 365
P X 1 X 2 0 P Z
2
2
110
120
100 100
20
P Z
PZ 1.23 0.1093
16.28
35
Sampling distribution of difference of
two independent sample proportions
Assume that independent random samples of n₁ and n₂
observations have been selected from binomial populations with
parameters 1 and 2 , respectively.
The sampling distribution of the difference in sample proportions
(p₁-p₂) can be approximated by a normal distribution with mean
and standard deviation
p p 1 2
1
2
p p
1
1 (1 1 ) 2 (1 2 )
n1
2
n2
The Z statistic is
Z
p1 p2 1 2
1 (1 1 ) 2 (1 2 )
n1
n2
36
Example
From a study by the Charles Schwab
Corporation, 74% of African Americans and
84% of Whites with an annual income above
$50,000 owned stocks.
For a random sample of 500 African American
and a random sample of 500 Whites with
income above $50,000, what is the probability
that more whites own stocks?
37
Example (Cont’d)
Summary data:
n1 500, n2 500
1 0.74, 2 0.84
It follows that
0 0.84 0.74
P p2 p1 0 P Z
0.74 0.26 0.84 0.16
500
500
P( Z 3.91) 0.99995
38
Important Summary of sampling
distributions
Param. Point
estimate
μ
2
N ,
n
X
1
N ,
n
p
1 2 X 1 X 2
1 2 p1
Sampling
distribution
2
2
1
2
N 1 2 ,
n
n
1
2
Standardized
Z
X
n
Z
p
Z
1 n
Z
X
1 1 1 1 1 1 Z
p2 N 1 2 , n n
1
1
1
X 2 1 2
12
n1
22
n2
p1 p2 1 2
1 1 1 2 1 2
n1
n2
39
Sampling methods
Simple random samples
Stratified samples
40
Simple Random Samples
Every individual or item from the frame has an equal
chance of being selected
Selection may be with replacement or without
replacement
Samples obtained from table of random numbers or
computer random number generators
Simple to use
May not be a good representation of the population’s
underlying characteristics
41
Stratified Samples
Divide population into two or more subgroups (called strata)
according to some common characteristic
A simple random sample is selected from each subgroup, with
sample sizes proportional to strata sizes
Samples from subgroups are combined into one
Ensures representation of individuals across the entire population
Population
Divided
into 4
strata
Sample
42
Types of Survey Errors
(continued)
Coverage error
Excluded from
frame
Non response error
Follow up on
nonresponses
Sampling error
Measurement error
Random
differences from
sample to sample
Bad or leading
question
43