Engineering Statistics Chapter 3 Distribution of Samples

Download Report

Transcript Engineering Statistics Chapter 3 Distribution of Samples

Engineering Statistics
Chapter 3
Distribution of Samples
Distribution of sample statistics
3C - Proportions and Difference between
Proportions
Proportion of a property
• When a sample is collected in relation to a
property, it is important to know if its proportion
is reasonable. For example, when we interview a
group of people for work, we would like to know
if the proportion of candidates is normal based on
gender, age, race etc.
• The proportion of a property is highly dependent
on the size of samples. In small samples, it is not
surprising if the proportion of a sample is unusual.
When the size increases, we expect the proportion
to be closer to that of the population.
Distribution of proportions
• If the proportion of a property in a population is ,
and we take samples of size n, then the proportion
p is expected to follow the normal distribution,
with a mean , and a variance (1 - )/n.
• As can be seen, the variance decreases as the
sample size increases. When n is large, we would
expect the proportion of the property in the sample
to be very close to that of the population.
Need for Continuity Adjustment
• Since the proportion is based on a ratio m/n, the
value of m will be an integer. In order to avoid
bias in obtaining the correct proportion, it is
necessary to introduce a correction of ½ unit. This
is the same as for continuity correction in discreteto-continuous approximation.
• Thus we shall treat p > m/n as p > (m+½)/n, p 
m/n as p  (m – ½)/n, p < m/n as p < (m – ½)/n,
and p  m/n as p  (m+½)/n.
Example 1
• 30% of customers to a fast-food restaurant are old
folks who are given discounts. During a short
period, the restaurant serves 40 customers. What is
the probability the percentage of old folks is not
more than 25%?
Solution: p ~ N(0.3, 0.3×(1–0.3)/40).
P(p0.25)
 P(p  0.25 + 0.5/40) [Continuity adjustment]
= P(z  [0.2625–0.3]/0.00525)
= P(z  –0.52) = 0.5–0.1985 = 0.3015.
Example 2
• A furniture factory claims that less than 12% of its
executive chairs has defects. An office just
ordered 25 such chairs. What is the probability the
percentage of defects exceeds 15%?
Solution: p ~ N(0.12, 0.12×(1–0.12)/25).
P(p> 0.15)
 P(p > 0.15 + 0.5/25) [Continuity adjustment]
= P(z > [0.17–0.12]/0.004224)
= P(z > 0.77) = 0.5–0.2794 = 0.2206.
Example 3
• It is estimated that 65% of students in the Faculty
of Education are ladies. A class in FoE has 120
students. What is the probability the proportion of
ladies in the class exceeds 70%?
Solution: Let p represents the proportion for ladies,
then p ~ N(0.65, [0.65×(1-0.65)]/120). After
continuity correction, P(p > 0.70)
 P(p > 0.70 + 0.5/120) = P(z > [0.7042 –
0.65]/(0.65×0.35/120)
= P(z > 1.24) = 0.5–0.3925 = 0.1075.
Alternative: Binomial distribution.
We note that the same question can be solved using
binomial distribution as follows:
Let X represent number of ladies. X~Bin(120, 0.65).
As n>30, X is approximated by normal
distribution  X~N(120×0.65, 120×0.65×0.35).
70% of 120 is 84. We are looking for P(X>84).
By continuity adjustment, we have
P(X>84.5) = P(z>[84.5-78]/27.3)
= P(z > 1.24) = 0.1075, as we obtained earlier.
Example 4
• 18% of students withdraw half-way through a
course. In a class with 45 students, what is the
probability less than 15% will withdraw?
Solution: p ~ N(0.18, 0.18×(1–0.18)/45)
After continuity adjustment, the event
p < 0.15  p < 0.15–0.5/45
P(a < 0.1389) = P(z < [0.1389–0.18]/0.00328)
= P(z < –0.72) = 0.5 – 0.2642 = 0.2358.
Binomial Alternative:
Let W represent the number of students who
withdraw. Then W~Bin(45, 0.18).
15% of 45 is 6.75. So the event is W<6.75. Even
though the number here is a decimal, we still need
to make the same continuity adjustment. Thus we
look for W < 6.75–0.5.
As n>30, we use the approximation W~N(45×0.18,
45×0.18×0.82).
P(W < 6.25) = P([6.25 – 8.1]/6.642)
= P(z < –0.72) = 0.2358, as found above.
Difference between proportions
• The same rules on the distribution of the
difference between means will apply to the
difference between proportions. Thus if 1 and 2
are proportions of the same property for two
populations, and we take samples of sizes n1 and
n2 from those two population respectively, then we
expect the difference of proportions p1–p2 of the
samples to satisfy
p1–p2~N(1–2 , 1(1–1)/n1+2(1–2 )/n2).
Example 5
• In the 1985 cohort, it is known that 20% of
non-graduates and 14% of graduates remain
unemployed 6 months after coming on to
the market. A survey tracks 80 nongraduates and 50 graduates of the cohort.
Find the probability the percentage of nongraduates who remain unemployed exceed
that of graduates by at least 10%.
Solution:
Let pn represent the proportion that of nongraduates and pg that of unemployed
graduates.
pn – pg ~ (0.20–0.14,
0.2×0.8/80+0.14×0.86/50)
P(pn – pg > 0.1) = P(z > [0.1 – 0.06]
/(0.2×0.8/80+0.14×0.86/50))
= P(z > 0.60) = 0.5 – 0.2257 = 0.2743.
Example 6
• The Transport Ministry believes that 35% of express
buses exceed speed limits on the highway. On a
certain day, two teams track express buses going in
opposite directions. The team for north-bound traffic
monitor 60 buses, while the south-bound team has 75
buses on record. What is the probability the
percentage of speeding buses for north-bound
exceeds that of southbound by at least 4%?
Solution:Let pn represent the proportion of north-bound
buses which speed, and ps the same proportion for
south-bound buses.
pn–ps ~ (0.35-0.35, 0.35×0.65/60+0.35×0.65/75)
P(pn-ps > 0.04) = P(z > [0.04 – 0]
(0.35×0.65/60+0.35×0.65/75))
= P(z > 0.48) = 0.5 – 0.1844 = 0.3156.
So there is a probability of 0.3156 that the northbound speeding percentage might exceed that of
south-bound by 4% or more.
Note that in this case, we also have the same
probability 0.3156 that the proportion of southbound speeders exceeds that of north-bound by
4%!
Confidence Interval for
Proportion
• When we have the proportion of a property
from the population, we expect the
proportion for a sample to follow the
normal distribution.
• Hence, we may apply the same procedure to
estimate the (1–)100% confidence interval
as for the mean. We shall use two examples
to illustrate the method.
Example 7
• The Tourism Department reports says that 32% of tourists
are foreigners. A group of 150 tourists are visiting the
Royal Museum. What is 98% confidence interval for the
percent of foreign tourists?
Solution :
p~N(0.32, 0.32×0.68/150); p~N(0.32, 0.001451)
At 95% confidence, =0.05, /2=0.025. Z0.025 = 1.96.
Hence the 95% confidence interval for the proportion of
foreign tourist is
0.32–1.96×0.001451  p  0.32+1.96×0.001451
 0.2453  p  0.3947  24.53% to 39.47% of the
tourists are foreigners.
Example 8
•
The records of a bank shows that 17% of its
customers are business customers, but the
transactions for this group make up 75%.
During a certain hour, there were 50 customers
and 400 transactions. Find the 90% confidence
interval for the percentage of
(i) Business customers;
(ii) Business transactions.
Solution:
p1 = proportion of business customers;
p2 = proportion of business transactions.
p1~N(0.17, 0.17×0.83/50);
p2~N(0.75, 0.75×0.25/400).
At 90% confidence, =0.1, /2=0.05. z0.05= 1.6449.
The confidence intervals are:
0.17 – 1.6449×0.002822  p1  0.17 +
1.6449×0.002282  0.0826  p1  0.2574; and
0.75 – 1.6449×0.00046875  p2  0.75 + 1.6449×0.
00046875  0.7144  p2  0.7856.
Hence the range is 8.26% to 25.74% for business customers,
and 71.44% to 78.56% for business transactions.
Confidence Interval From
Sample
• When the proportions are derives from data of
samples, we expect the same normal distribution
can be used to model the population proportion,
using the sample proportion as the estimator.
• For such purposes, we expect the result will be
good only if the sample size is reasonably large.
For small samples, it is not reliable to use the
proportion obtained to obtain a general picture of
the population proportion.
Example 9
• In a survey on cleanliness of eating stalls, it
was found that only 55 out of 140 stalls
checked follow proper procedures to
maintain hygienic environments. Based on
this, estimate the 95% confidence interval
for the percentage of clean eating stalls
nationwide.
Solution:
Even though only the sample data are available, we can safely
assume that the proportion from such a big sample is a
good estimator for the wider proportion. Hence we shall
use the normal distribution to estimate the proportion for
the nation:
p~N(55/140, [55/140×85/140]/140)
At 95%, =0.05, /2=0.025. Z0.025 = 1.96.
So the 95% interval for population proportion of clean
eateries is 55/140 – 1.96([55/140×85/140]/140) to 55/140
+ 1.96([55/140×85/140]/140)
 0.3120 p  0.4738 or 31.2% to 47.38%.
Example 10
• During a screening process, it was found that 20
out of 80 boys 15-18 years old and 30 out of 100
girls of the same age group are fat. Based on this
study, find the probability the proportion of fat
girls exceeds that of boys by 2% or more.
NOTE: In this case, we only have the sample
proportions. However, as the sample sizes are
large enough, we can use these data to project the
likely distribution of the difference of proportions.
Solution: Note: 20/80 = 0.25, 30/100 = 0.3.
pb~N(0.25, 0.25×0.75/80);
pg~N(0.3, 0.3×0.7/100);
pg – pb ~ N(0.3-0.25, 0.25×0.75/80 +
0.3×0.7/100)
P(pg – pb > 0.02)
= P(z >[0.02-0.05]/(0.25×0.75/80 + 0.3×0.7/100)
= P(z > -0.45)
= 0.5 + 0.1736
= 0.6736.
Difference of proportions
• Using the distribution for difference between
proportions, we can find the probability for the
difference between proportions (Exs 11 & 12).
• When the sample sizes are large, we can also use
the sample proportions to estimate the interval for
the difference between population proportions.
The same procedure is used to determine the
confidence interval for the difference in
proportions (Ex 13).
Example 11
• On the average, 37% of men and 18% of women
in the country smoke. A survey is taken for 50
men and 60 women. What is the probability the
proportion of men who smoke exceeds that of
women by at least 20%?
Solution: Let pm and pw represent the proportion of
men and women who smoke. Then
pm ~ N(0.37, 0.37×0.63/50);
pw ~ N(0.18, 0.18×0.82/60).
Example 11 (Solution)
This means that
pm – pw ~N(0.37 – 0.18, 0.37×0.63/50+
0.18×0.82/60).
So P(pmpw +0.20) = P(pm – pw  0.20)
= P(z  [0.20 – 0.19]/(0.37×0.63/50+
0.18×0.82/60).
= P(z  0.12)
= 0.5 – 0.0478 = 0.4522
Example 12
• 65% of those achieving good results at STPM
exam and 55% of those for Matriculation exam get
admitted to universities of their choice. A check is
made on 72 students successful at STPM and 45
of those at Matriculation. What is the probability
the success rate in university admission for those
through Matriculation is at least as good as those
through STPM?
• Solution: Let ps be the proportion of STPM
candidates who are successful and pm for that of
matriculation candidates.
Example 12 (Solution)
Then we have:
ps ~ N(0.65, 0.65×0.35/72);
pm ~ N(0.55, 0.55×0.45/45). And so
pm – ps ~N(0.55 – 0.65, 0.65×0.35/72 +
0.55×0.45/45).
Hence P(pmps) = P(pm – ps  0.0)
= P(z  [0.0 – (-0.10]/(0.65×0.35/72 +
0.55×0.45/45).
= P(z  1.07)
= 0.5 – 0.3577 = 0.1423.
Example 13
• Out of 75 sticks of LajuMaut cigarettes, 20
are found to have nicotine exceeding danger
levels. For 60 sticks of LajuMaut cigarettes,
15 are also found to have nicotine
exceeding danger levels. What is the 90%
confidence interval of pL –pC, where pL and
pC represents the proportions of LajuMuat
and CepatMaut cigarettes with excessive
levels of nicotine?
• From the data given, pL=20/75 = 0.2667, and
pC=15/60 = 0.25. By theory, pL –pC ~N(0.2667–
0.25, 0.2667×0.7333÷75 + 0.25×0.75÷60).
• At 90% confidence, =0.1, /2=0.05. And
z0.05=1.6449. Hence the confidence interval for the
difference in proportion is from 0.0167 –
1.6449×(0.2667×0.7333÷75 + 0.25×0.75÷60) to
0.0167+1.6449×(0.2667×0.7333÷75 +
0.25×0.75÷60), I.e. –0.1078 to 0.1412.
• NOTE: The left boundary –0.1078 indicates that
pL may actually be less than pC.
Multiple Groups
• When we want to compare the proportions of multiple
(3 or more) groups in a population, the method using
normal distribution becomes ineffective.
• An alternative is to use the differences between what
are expected and what are obtained and treat them as
variations.
• The sum of squares of the differences can be modeled
using the 2 distribution. However, as 2 distribution
tables do not provide for probabilities, we shall only
look at these cases in hypothesis testing. (See 4C).