Sampling Distributions
Download
Report
Transcript Sampling Distributions
Sampling Distributions
Introduction
Point Estimation
Sampling Distribution of
x
Sampling Distribution for the Difference between
Two Means
Sampling Distribution of p̂
Sampling Distribution for the Difference between
Two Proportions
Introduction
A sampling distribution is a distribution of all of the possible values of a
sample statistic for a given size sample selected from a population.
A sample is a portion of population.
For example, suppose you sample 50 students from your college
regarding their mean GPA. If you obtained many different samples of
50, you will compute a different mean for each sample. We are
interested in the distribution of all potential mean GPA we might
calculate for any given sample of 50 students.
The reason we select a sample is to collect data to
answer a research question about a population.
The sample results provide only estimates of the
values of the population characteristics.
The reason is simply that the sample contains only a
portion of the population.
With proper sampling methods, the sample results
can provide “good” estimates of the population
characteristics.
Point Estimation
Point estimation is a form of statistical inference.
In point estimation we use the data from the sample
to compute a value of a sample statistic that serves
as an estimate of a population parameter.
We refer to x as the point estimator of the population
mean .
s is the point estimator of the population standard
deviation .
p̂ is the point estimator of the population proportion p.
Relationships between the population distribution
and the sampling distribution of the sample mean:
The mean of the sample means is exactly equal to the
population mean
The dispersion of the sampling distribution of sample means
is narrower than the population distribution.
The sampling distribution of sample means tends to become
a bell-shaped and to approximate
Sampling Distribution of the Sample Mean
The probability distribution of is called its sampling distribution. It list
the various values that X can assume and the probability of each value
of X . In general, the probability distribution of a sample statistic is
called its sampling distribution.
If a population is normal with mean μ and standard deviation σ, the
sampling distribution of X is also normally distributed with X
X
and
Z-value for the sampling distribution of X
Z
( X )
n
X
n
The Central Limit Theorem
If all samples of a particular size are selected from any population,
the sampling distribution of the sample mean is approximately a
normal distribution.
This approximation improves with larger samples.
Sample Mean Sampling Distribution: If the Population is not Normal
We can apply the Central Limit Theorem:
Even if the population is not normal, sample means from the population
will be approximately normal as long as the sample size is large enough.
n↑
As the sample
size gets large
enough…
the sampling
distribution becomes
almost normal
regardless of shape
of population
Properties and Shape of the Sampling Distribution of the Sample
Mean.
If n≥30, X
is normally distributed, where X
2
X ~ N ,
n
Note: If the
2
unknown then it is estimated by s
2
.
If n<30 and variance is known. X is normally distributed
2
X ~ N ,
n
If n<30 and variance is unknown. t distribution with n-1 degree of
freedom is use
T
X
2
s
n
~ tn 1
Example 1
Suppose a population has mean μ = 8 and standard deviation σ = 3. Suppose a
random sample of size n = 36 is selected. What is the probability that the
sample mean is between 7.8 and 8.2?
Solution:
Even if the population is not normally distributed, the central
limit theorem can be used (n > 30)
… so the sampling distribution of
x is approximately normal
σ
3
with mean μ x = 8 and standard deviation σ x
0.5
n
36
7.8 - 8
X -μ
8.2 - 8
P(7.8 X 8.2) P
3
σ
3
36
n
36
P(-0.4 Z 0.4) 0.3108
Example 2:
The amount of time required to change the oil and filter of any vehicles is
normally distributed with a mean of 45 minutes and a standard deviation
of 10 minutes. A random sample of 16 cars is selected.
What is the standard error of the sample mean to be?
What is the probability of the sample mean between 45 and 52 minutes?
What is the probability of the sample mean between 39 and 48 minutes?
Find the two values between the middle 95% of all sample means.
Solution:
X: the amount of time required to change the oil and filter of any vehicles
X ~ N 45,102
n 16
X : the mean amount of time required to change the oil and filter of any
vehicles
102
X ~ N 45,
16
a) Standard error = standard deviation,
52 45
45 45
b) P 45 X 52 P
Z
2.5
2.5
P 0 Z 2.8
0.4974
10
2.5
16
48 45
39 45
c) P 39 X 48 P
Z
2.5
2.5
P 2.4 Z 1.2
0.4918 0.3849
0.8767
P a X b 0.95
d)
b 45
a 45
P
Z
0.95
2.5
2.5
P za Z zb 0.95
from table:
za 1.96
zb 1.96
a 45
1.96 a 40.1
2.5
b 45
1.96 b 49.9
2.5
Sampling Distribution for the Difference between Two
Means
Suppose we have two populations, X 1 and X 2 which are normally
distributed. X1 has mean 1 and variance 21 whileX 2has mean 2 and
variance 2 2 . These two distributions can be written as:
X 1 ~ N 1 , 21
and
X 2 ~ N 2 , 2 2
Now we are interested in finding out what is the sampling distribution of
the difference between two sample means, the distribution of X 1 X 2
12 2 2
X 1 X 2 ~ N 1 2 ,
n
n
1
2
Example 3:
A taxi company purchased two brands of tires, brand A and brand B. It is
known that the mean distance travelled before the tires wear out is 36300 km
for brand A with standard deviation of 200 km while the mean distance
travelled before the tires wear out is 36100 km for brand B with standard
deviation of 300 km. A random sample of 36 tires of brand A and 49 tires of
brand B are taken. What is the probability that the
a) difference between the mean distance travelled before the tires of brand
A and brand B wear out is at most 300 km?
b) mean distance travelled by tires with brand A is larger than the mean
distance travelled by tires with brand B before the tires wear out?
Solution:
X 1 : the mean distance travelled before the tires of brand A wear out
X 2 : the mean distance travelled before the tires of brand B wear out
2002 3002
X 1 X 2 ~ N 36300 36100,
36
49
X 1 X 2 ~ N 200, 2947.846
a) P | X 1 X 2 | 300 P 300 X 1 X 2 300
300 200
300 200
P
Z
2947.846
2947.846
P 9.21 Z 1.84 0.9671
b) P X 1 X 2 P X 1 X 2 0
0 200
PZ
2947.846
P Z 3.68 0.9999
Sampling Distribution of the Sample Proportion
The population and sample proportion are denoted by p and p̂ ,
respectively, are calculated as,
X
p
N
and
x
pˆ
n
where
N = total number of elements in the population;
X = number of elements in the population that possess a specific
characteristic;
n = total number of elements in the sample; and
x = number of elements in the sample that possess a specific characteristic.
For the large values of n (n ≥ 30), the sampling distribution is very closely
normally distributed.
pˆ
pq
N p,
n
Mean and Standard Deviation of Sample Proportion
P̂ p
P̂
pq
n
Example 4:
If the true proportion of voters who support Proposition A is pˆ 0.40
what is the probability that a sample of size 200 yields a sample
proportion between 0.40 and 0.45? If p 0.40 and n = 200, what
is P 0.40 pˆ 0.45 ?
σ pˆ
p(1 p)
0.4(1 0.4)
0.03464
n
200
0.45 0.40
0.40 0.40
P(0.40 pˆ 0.45) P
Z
0.03464
0.03464
P(0 Z 1.44) 0.4251
Example 5:
The National Survey of Engagement shows about 87% of freshmen
and seniors rate their college experience as “good” or “excellent”.
Assume this result is true for the current population of freshmen
and seniors. Let p̂ be the proportion of freshmen and seniors in a
random sample of 900 who hold this view. Find the mean and
standard deviation.
Solution:
Let p the proportion of all freshmen and seniors who rate their
college experience as “good” or “excellent”. Then,
p = 0.87 and q = 1 – p = 1 – 0.87 = 0.13
The mean of the sample distribution of p̂ is: pˆ p 0.87
The standard deviation of p̂ is:
pˆ
pq
0.87(0.13)
0.011
n
900
Sampling Distribution for the Difference between Two
Proportions
Now say we have two binomial populations with proportion of successes p1
and p2 respectively. Samples of size n1 are taken from population 1 and
samples of size n2 are taken from population 2. Then p̂1 and p̂2 are
the proportions from those samples.
p1 1 p1
ˆ
P1 ~ N p1 ,
n
1
p 1 p2
Pˆ2 ~ N p2 , 2
n
2
p1 1 p1 p2 1 p2
ˆ
ˆ
P1 P2 ~ N p1 p2 ,
n
n
1
2
Example 6:
A certain change in a process for manufacture of component parts
was considered. It was found that 75 out of 1500 items from the
existing procedure were found to be defective and 80 of 2000 items
from the new procedure were found to be defective. If one random
sample of size 49 items were taken from the existing procedure and a
random sample of 64 items were taken from the new procedure, what
is the probability that
a) the proportion of the defective items from the new procedure
exceeds the proportion of the defective items from the existing
procedure?
b) proportions differ by at most 0.015?
c) the proportion of the defective items from the new procedure
exceeds proportion of the defective items from the existing procedure
by at least 0.02?
Solution:
PˆN:The proportion of defective items from the new procedure
PˆE:The proportion of defective items from the existing procedure
80
0.04
2000
0.04(0.96)
PˆN ~ N 0.04,
64
pN
75
0.05
1500
0.05(0.95)
PˆE ~ N 0.05,
49
pE
0.05(0.95) 0.04(0.96)
ˆ
ˆ
PN PE ~ N 0.04 0.05,
49
64
Pˆ Pˆ ~ N 0.01, 0.0016
N
E
a) P PˆN PˆE P PˆN PˆE 0
0 0.01
PZ
0.0016
P Z 0.25
0.4013
b) P | PˆN PˆE | 0.015 P 0.015 PˆN PˆE 0.015
0.015 0.01
0.015 0.01
P
Z
0.0016
0.0016
P 0.125 Z 0.625
0.2838
c) P PˆN PˆE 0.02 P PˆN PˆE 0.02
0.02 0.01
PZ
0.0016
P Z 0.75
0.2266
Exercises:
1.
Assume that the weights of all packages of a certain brand
of cookies are normally distributed with a mean of 32
ounces and a standard deviation of 0.3 ounce. Find the
probability that the mean weight of a random sample of 20
packages of this brand of cookies will be between 31.8 and
31.9 ounces.
Answer: 0.0667
25
Institut Matematik Kejuruteraan, UniMAP
BQT 173
2. According to the BBMG Conscious Consumer Report, 51%
of the adults surveyed said that they are willing to pay more
for products with social and environmental benefits despite
the current tough economic times (USA TODAY, June 8,
2009). Suppose this result is true for the current population
of adult Americans. There is a sample 1050 adult Americans
who will hold the said opinion. Find the probability that the
value of proportion is between 0.53 and 0.55.
Answer: 0.0921
3. It is known that 30% and 35% of the residents in Taman
Sutera and Bandar Mas subscribe to New Straits Times
newspaper respectively. If a random sample of 50 newspaper
readers from Taman Sutera and 50 readers from Taman Mas
were taken randomly, what is the probability that the
proportion of New Straits Times subscribes in Taman Sutera
is larger than Bandar Mas?
Answer: 0.2981
End of Chapter 1