Transcript Lecture09

Lecture 9
Sampling Distributions
1
Background
 We want to learn about the feature of a population (parameter)
 In many situations, it is impossible to examine all elements of a
population because elements are physically inaccessible, too costly
to do so, or the examination involved may destroy the item.
 Sample is a relatively small subset of the total population.
 We study a random sample to draw conclusions about a population,
this is where statistics come into the picture.
 Statistics, such as the sample mean and sample variance, computed
from sample measurements, vary from sample to sample.
Therefore, they are random variables.
 The probability distribution of a statistic is called a sampling
distribution.
2
Sampling Distributions
A sampling distribution is a distribution of all of the possible values of a
statistic for a given size sample selected from a population
Sampling
Distributions
Sampling
Distribution of
the Mean
Sampling
Distribution of
the Proportion
3
Developing a
Sampling Distribution
 Assume there is a population …
 Population size N=4
A
B
C
D
 Random variable, X,
is age of individuals
 Values of X: 18, 20,
22, 24 (years)
4
Developing a
Sampling Distribution
(continued)
Summary Measures for the Population Distribution:
X

μ
P(x)
i
N
.3
18  20  22  24

 21
4
σ
 (X  μ)
i
N
.2
.1
0
2
 2.236
18
20
22
24
A
B
C
D
x
5
Sampling with replacement
Samples
Age
Sample means
A, A
18, 18
18
A, B
18, 20
19
A, C
18, 22
20
A, D
18, 24
21
B, A
20, 18
19
B, B
20, 20
20
B, C
20, 22
21
B, D
20, 24
22
C, A
22, 18
20
C, B
22, 20
21
C, C
22, 22
22
C, D
22, 24
23
D, A
24, 18
21
D, D
24, 20
22
D, C
24, 22
23
D, D
24, 24
24
6
Developing a
Sampling Distribution
(continued)
Sampling Distribution of All Sample Means
Sample Means
Distribution
16 Sample Means
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
_
P(X)
.3
.2
.1
0
18 19
20 21 22 23
24
_
X
7
Developing a
Sampling Distribution
(continued)
Summary Measures of this Sampling Distribution (note
that N=16 for the population of sample means):
μX
X


N
σX 

i
18  19  21    24

 21
16
2
(
X

μ
)
 i X
N
(18 - 21)2  (19 - 21)2    (24 - 21)2
 1.58
16
8
Comparing the Population with its
Sampling Distribution (with replacement)
Sample Means Distribution
n=2
Population
N=4
μ  21
σ  2.236
μ X  21  
_
P(X)
.3
P(X)
.3
.2
.2
.1
.1
0
18
20
22
24
A
B
C
D
X
0
18 19
σ X  1.58 
20 21 22 23
24

2
_
X
9
Mean and standard error of the sample
Mean (sample with replacement)
 The mean of the distribution of sample mean:
X  
 A measure of the variability in the mean from sample to sample is
given by the Standard Error of the Mean:
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)
σX 
σ
n
 Note that the standard error of the mean decreases as the sample
size increases
10
If the Population is Normal
 If a population is normal with mean μ and standard
deviation σ,
 The sampling distribution of X is also normally distributed
with
μX  μ
and
σ
σX 
n
 Or, equivalently, the sampling distribution of
normally distributed with
μ

Xi
 nμ and
σ

Xi
n
X
i 1
i
is
 n
11
Sampling Distribution Properties
(continued)
As n increases,
Larger
sample size
σ x decreases
Smaller
sample size
μ
x
12
If the Population is not normal
 The central limit theorem states that when the number of
observations in each sample (called sample size) gets large enough
 The sampling distribution of X is approximately normally
distributed with
μX  μ
σ
σX 
n
and
n
 Or, equivalently, the sampling distribution of
approximately normally distributed with
μ
 Xi
 nμ
and
σ

Xi
X
i 1
i
is also
 n
13
Z value for means
Standardize the sample mean:
Z
X 

n
14
Visualizing the Central Limit
Theorem
Population Distribution
Sampling distribution
properties:
Central Tendency
μx  μ
σ
σx 
n
Variation
μ
x
Sampling Distribution
(becomes normal as n increases)
Larger
sample
size
Smaller
sample size
μx
x
15
How Large is Large Enough?
 For most distributions, n > 30 will give a
sampling distribution that is nearly normal
 For fairly symmetric distributions, n > 15
 Recall that, for normal population distributions,
the sampling distribution of the mean is always
normally distributed regardless of sample size n
16
Calculating probabilities
 Suppose we want to find out
P ( a  X  b)
 If the population is normal, then regardless of the value of
n:
 a
a 

P(a  X  b)  P
Z 


n

n


 If the population is not normal, then, when n is large
enough (n > 30)
 a
a 

P(a  X  b)  P
Z 


n

n


17
Example
 Suppose a population has mean μ = 10 and
standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.
 What is the probability that the sample mean is
between 9.7 and 10.3?
18
Example
(continued)
Solution:
 Even if the population is not normally
distributed, the central limit theorem can be
used (n > 30)
 … so the sampling distribution of
approximately normal
x
is
 … with mean μx = 10
σ
3
 …and standard deviation σ x  n  36  0.5
19
Example
(continued)
Solution (continued):


 9.7 - 10
X -μ
10.3 - 10 
P(9.7  X  10.3)  P



3
σ
3


36
n
36 

 P(-0.6  Z  0.6)  0.6514
Population
Distribution
???
?
??
?
?
?
?
?
μ  10
Sampling
Distribution
Sample
?
X
9.7
10
10.3
μ X  10
x
20
One more example

Time spent using e-mail per session is normally
distributed with =8 minutes and =2 minutes.
1. If a random sample of 25 sessions were selected, what
proportion of the sample mean would be between 7.8
and 8.2 minutes?
21
Example (Cont’d)
2. If a random sample of 100 sessions were selected, what proportion
of the sample mean would be between 7.8 and 8.2 minutes?
3. What sample size would you suggest if it is desired to have at least
0.90 probability that the sample mean is within 0.2 of the population
mean?
22
Sampling Distribution
of the Proportion
Sampling
Distributions
Sampling
Distribution of
the Mean
Sampling
Distribution of
the Proportion
23
Population Proportions
In Bernoulli trials, let
π = the proportion of successes
 Recall that Y = the number of successes in n Bernoulli trials follows
Bin(n, π)
 For the ith Bernoulli trial, Define
1 if the ith outcome is a " success"
Xi  
0 if the ith outcome is a " failure"
 Then, obviously
E ( X i )   and  ( X i )   (1   )
24
Population proportions (Cont’d)
 For large n, apply the CLT to sample mean and sum
n
pX 
X
i 1
n
i
   (1 -  )  2 
 
is approximat ely distribute d as N  , 
 
 
n

 

n
Y   X i is approximat ely distribute d as N n ,

i 1


n  (1 -  ) 

2
 How large is large?
n  5 and n( 1-)  5
Or
np  5 and n( 1-p)  5
25
Z-Value for Proportions
Standardize p to a Z value with the formula:
p 
Z

σp
p 
 (1   )
n
26
Example
 If the true proportion of voters who support
Proposition A is π = 0.4, what is the probability
that a sample of size 200 yields a sample
proportion between 0.40 and 0.45?
 i.e.: if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
27
Example
(continued)
Find σ p : σ   (1  )  0.4(1  0.4)  0.03464
p
n
Convert to
standard P(0.40  p  0.45) 
normal:

200
0.45  0.40 
 0.40  0.40
P
Z

0.03464 
 0.03464
P(0  Z  1.44)  0.4251
28
Review example
 The number of claims received by an automobile insurance company
on collision insurance on one day follows the following probability
distribution:
x
0
1
2
3
4
p(x)
0.65
0.2
0.1
0.03
0.02
4
With
   xp( x)  0.57
x 0

4
x
2
p ( x)   2  0.93
x 0
 Suppose the number of claims received are independent from day to
day.
29
Review example (cont’d)

For a 50-day period, Find the probability of the following
events:
1) The total number of claims exceeds 20
2) On more than 20 days, at least one claim is received
30
Sampling distribution of difference
of two independent populations
 An important estimation problem involves the
comparison of means of the two populations.
For example, you may want to make
comparisons like these:
 The average scores on GRE for students who
majored in mathematics versus chemistry
 The average income for male and female college
graduates
 The proportion of patients receiving different
medications who recovered from a certain disease
31
Sample distributions of difference
of two independent sample means
 Suppose there are two populations
Population Mean
I
II
1
2
S.d.
1
2
 Independent random samples of size n₁ and n₂ observations have
been selected from the two populations with sample means X 1
and X 2 respectively
 Recall that when n₁ and n₂ are large, X 1and X 2are approximately
normally distributed with
 
 
1
 
 
2
E X 1  1 ,  X 1 
E X 2  2 , X 2 
n1
n2
32
 Since the two samples are independent
 X X  1   2
1
2
2
σ X1  X 2
2
σ1 σ 2


n1 n 2
 Standardize:
Z
X
1

 X 2  1   2 
2
2
σ1 σ 2

n1 n 2
33
Example


A light bulb factory operates two different
types of machines. The mean life expectancy
is 385 hours from machine I and 365 hours
from machine II. The process standard
deviation of life expectancy of machine I is 110
hours and of machine II is 120 hours.
What is the probability that the average life
expectancy of a random sample of 100 light
bulbs from Machine I is shorter than the
average life expectancy of 100 light bulbs from
Machine II?
34
Example (Cont’d)
 Note that
n1  100, n2  100
1  385,  2  365
 1  110,  2  120
 Therefore




0  385  365 

P X 1  X 2  0  P Z 
2
2 
110
120



100 100 

20 

 P Z  
  PZ  1.23  0.1093
16.28 



35
Sampling distribution of difference of
two independent sample proportions
 Assume that independent random samples of n₁ and n₂
observations have been selected from binomial populations with
parameters 1 and  2 , respectively.
 The sampling distribution of the difference in sample proportions
(p₁-p₂) can be approximated by a normal distribution with mean
and standard deviation
 p  p  1   2
1
2
 p p 
1
 1 (1   1 )  2 (1   2 )

n1
2
n2
 The Z statistic is
Z
 p1  p2    1   2 
 1 (1   1 )  2 (1   2 )
n1

n2
36
Example
 From a study by the Charles Schwab
Corporation, 74% of African Americans and
84% of Whites with an annual income above
$50,000 owned stocks.
 For a random sample of 500 African American
and a random sample of 500 Whites with
income above $50,000, what is the probability
that more whites own stocks?
37
Example (Cont’d)
 Summary data:
n1  500, n2  500
 1  0.74,  2  0.84
 It follows that




0  0.84  0.74

P p2  p1  0  P Z 

0.74  0.26 0.84  0.16 



500
500


 P( Z  3.91)  0.99995
38
Important Summary of sampling
distributions
Param. Point
estimate
μ

 2 
N   , 
n 

X
  1    
N  ,

n 

p
1  2 X 1  X 2
1   2 p1 
Sampling
distribution
2
2




1
2

N  1   2 ,


n
n
1
2


Standardized
Z
X 
 n
Z
p 
Z
 1    n
Z
X

 1 1   1   1 1   1   Z 
p2 N 1   2 , n  n 
1
1


1

 X 2  1   2 
 12
n1

 22
n2
 p1  p2    1   2 
 1 1   1   2 1   2 

n1
n2
39
Sampling methods
 Simple random samples
 Stratified samples
40
Simple Random Samples
 Every individual or item from the frame has an equal
chance of being selected
 Selection may be with replacement or without
replacement
 Samples obtained from table of random numbers or
computer random number generators
 Simple to use
 May not be a good representation of the population’s
underlying characteristics
41
Stratified Samples
 Divide population into two or more subgroups (called strata)
according to some common characteristic
 A simple random sample is selected from each subgroup, with
sample sizes proportional to strata sizes
 Samples from subgroups are combined into one
 Ensures representation of individuals across the entire population
Population
Divided
into 4
strata
Sample
42
Types of Survey Errors
(continued)
 Coverage error
Excluded from
frame
 Non response error
Follow up on
nonresponses
 Sampling error
 Measurement error
Random
differences from
sample to sample
Bad or leading
question
43