Engineering Statistics

Download Report

Transcript Engineering Statistics

Engineering Statistics
Chapter 2
Special Variables
2C Normal Distribution
Continuous distributions
• Recall that for a continuous random variable X
with probability density function f(x), the
probability of an event is defined using integral.
We are reminded that in this case P(X=k) = 0 for
all k. In addition P(X>a) = P(Xa) and P(X<a) =
P(Xa). Also P(X>a) = 1 – P(X  a).
b
p(a  X  b)   f ( x)dx
a
Normality
• Among the continuous distributions, one
particular one is most useful. Its pdf is
shown below.
f ( x) 
1
 2
( x   ) 2
e
2 2
This is the pdf for a normal distribution X.
The mean of X is  and its variance is 2.
The normal distribution
• Normal distributions are used to model variables
which are usually obtained from measurements of
naturally occurred matters. E.g. height of trees,
weight of people, length of daylight hours at a
certain place.
• For man-made things of large quantities, and other
human activities, we also expect the measurements
to follow the normal distribution: mass of bricks,
length of time to complete a fixed job, etc.
Properties of normal distributions
• If X is a normal distribution with mean  and
variance 2, then we write
X~N(, 2)
• The function f(x) of a normal distribution X is
symmetric about the mean. The variance
determines its shape. Small variances means the
function has values close to the mean, and large
variances means a flat distribution.
Graphs of Normal Distributions
• Small variance:
Mean=2, variance 1
• Large variance:
Mean=2, variance 16.
Probabilities of normal
distributions
• As X is a continuous distribution, we obtain
the probabilities by using integration. For
the event P(a<X<b), the probability is
1
 2
b
e
a
( x )2
2 2
dx
The evaluation of this integral is very
complex. Indeed, there is no direct method
of calculation.
Comparing normal distributions
• However, mathematicians found that, by using
transforms, we can convert the integral of the pdf
into simpler forms.
• Most importantly, the probability of an event in a
normal distribution X can be transformed into the
probability of an event in another normal distribution
Y.
• This means that, if we can find find the probabilities
of events in a fixed normal distribution, we can use
this as reference for events in other normal
distributions.
Comparing events
• If X~N(, 2), then the probabilities of P(X>x)
depends only on how far x is from  in terms of .
This means that we can compare the probabilities of
events in normal distributions by comparing the
measure (x-)/.
• If X1~N(5, 4), X2~N(10, 4), X3~N(5,9), and X4~N(10,
100), then P(X1>7) = P(X2>12) = P(X3>8) = P(X4>20)
because in each case, the event corresponds to one
standard deviation above the mean.
• This fact means that we should look for a basic
distribution as reference.
Standard Normal Distribution
• The normal distribution with mean 0 and
variance 1 is called the Standard Normal
Distribution (SND). WE designate it as Z,
and its pdf is usually represented as (z):
 ( z) 
1
2
e
z2
2
Properties of Z
• The standard normal
distribution has its mean at
0 and standard deviation
1. Its distribution is
therefore symmetric about
the mid-point 0.
• Since the sum of all
probabilities is 1, we
deduce that the probability
P(z>0) = P(Z<0) = 0.5.
Table for SND
• The table for SND is constructed for z from 0 to
about 4. The probability for P(Z>4) is less than
0.00001 and is seldom used except in some critical
work where extreme accuracies are required.
• The UTM table shows value of P(0<Zz). Thus
when you read the table for 0.4 and see 0.1554,
what is meant is P(0<Z0.4) = 0.1554. Similarly,
we see from the table P(0<Z1.33) = 0.4082 and
P(0<Z2.17) = 0.4850.
Transforming variables
• If X~N(, 2), and we wish to transform a value
of X = x into z, the formula is
z = (x – )/ .
Ex: X~N(50, 15).
x = 65  z = (65 – 50)/15 = 1
x = 80  z = (80 – 50)/15 = 2
x = 45  z = (45 – 50)/15 = -0.33
x = 30  z = (30 – 50)/15 = -1.67
Finding probabilities
•
To calculate the probabilities for events
with a normal distribution X with mean 
variance 2, we go through the following
steps:
I. Convert the x into z (SND)
II. Look up the table for z.
III. Interpret the results.
First example
•
The mean of a normal distribution X is 10
and its variance is 16. Find the values of
the following probabilities:
(i) P(10<X14);
(ii) P(X>12);
(iii) P(12<X<16).
Solution to First Example
Solution: X~N(10, 42), mean = 10, SD = 4.
•
x = 10  z = 0; x = 14  z = 1
so P(10<X14) = P(0<z1) = 0.3413.
•
x = 12  z = 0.5
so P(X > 12) = P(z>0.5) = 0.5 – P(0<z<0.5)
= 0.5 – 0.1915 = 0.3085.
•
x =16  z = 1.5;
so P(12<X<16) = P(0.5<z<1.5) = P(0<z<1.5) –
P(0<z<0.5) = 0.4332 – 0.1915 = 0.2417.
Negative z
• As the function z is symmetric about 0, we
have P(-k<z<0) = P(0<z<k).
• Similarly, we have
P(z < -k) = P(z > k)
P(-h < z < -k) = P(k < z <h)
Thus we can interpret probabilities of events
with negative z values by relating them to
corresponding positive z values.
Graph showing symmetry
Example 2
•
The height of men in a recruitment
exercise has a mean of 177.2 cm and
standard deviation 9.4 cm. Assume the
height follows the normal distribution.
What is the probability a recruit is
(i) Between 177.2 and 185.5 cm?
(ii) Taller than 190 cm?
(iii) Taller than 175 cm?
Let H represent their
height. Then
H~N(177.2, 9.42).
(i) x = 177.2  z = 0
x = 185.8  z =
(185.8 – 177.2)/9.4
= 0.91. From the
table P(0z0.91) =
0.3186.
So P(177.2x185.8) =
P(0z0.91) =
0.3186
Example 2 solution
2 Solution (contd)
(ii) x = 190  z = (190
– 177.2)/9.4 = 1.36.
From the table,
P(0<z<1.36) =
0.4131.
So P(x>190) = P(z>1.36)
= 0.5 – P(0z1.36)
= 0.0869
(iii) P(X > 175)
x = 175  z = (175 –
177.2)/9.4 = –0.23.
So P(X > 175) = P(z > –
0.23)
P(–0.23x0) =
P(0z0.23) = 0.0910
P(X > 175) = P(z > –
0.23)
= P(–0.23x0) + 0.5
= 0.5910.
Example 2 Solution(contd)
Example 3
X~N(65.5, 6.92). Hence
P(X<70) = P(z<0.65)
• The mean and
= 0.5 + P(0<z<0.65)
standard deviation of
= 0.5 + 0.2734 = 0.7734.
the weight of
expectant mothers at a So about 77% of them are
clinic are 65.5 kg and
less than 70 kg in weight.
6.9 kg respectively.
How many percent of
them will have weight
below 70 kg?
Solution:
Example 4
•
The daily traffic volume on PLUS is expected to
follow the normal distribution but is different
for weekdays (Mon-Fri) and weekends. The
mean for weekday is 365000, with SD 78000,
and for weekend, 680000 and 102000
respectively.
(i) What is the probability the volume on a
weekday is between 300000 and 400000
(ii) On a day when the traffic exceeds 850000, one
lane on the opposite side is used as a contra
lane. How frequently does this happen on a
weekend?
Example 4 - Solution
Solution:
(i) Let X be the volume
on a weekday. Then
X~N(365000,
780002).
P(300000<X<4000000)
= P(-0.83<z<0.45)
= 0.2967 + 0.1736
= 0.4703
• Graph showing (i).
Example 4 – Solution (contd)
(ii) Let Y represent the
volume on a weekend.
Y~N(680000, 1020002).
So P(Y>850000)
= P(z > 1.67)
= 0.5 – 0.4525
= 0.0475.
So the contra lane is opened
4.7% of the time.
Example 5 – Negative mean
•
The temperature at a town has a mean of –
11.2oC and SD of 5.5oC. What is the
probability
(i) The temperature drops below –15oC?
(ii) The temperature ranges between –5oC and
5oC?
Solution: T~N(–11.2, 5.52)
Solution to Ex 5
(i) P(T < –15) = P(z <[–15 – (–11.2)]/5.5)
= P(z < –0.69)
= 0.5 – 0.2549 = 0.2451.
(ii) P(–5<T<5) = P([–5 – (–11.2)]/5.5)<z <[5
– (–11.2)]/5.5)
= P(1.13 < z < 2.95)
= 0.4984 – 0.3708 = 0.1176.
Finding x
• For a variable following normal distribution, it is
also possible to estimate the percentiles based on
the mean and SD.
• The procedure to find the percentile is the same as
calculating probability (or percentage). The only
difference is that of inversely using the formula
for transform.
• Unlike the table for probabilities (table 5), the
percentage points table of UTM (table 6) shows
the value of z for P(Z>z)
• Let’s look at an example.
Percentage points on normal
graph
Example 6
•
The mean monthly
salary of a company’s
workers is RM 1500 and
its SD is RM 260.
Assume this distributes
normally.
(i) Ahmad’s salary is higher
than 85% of his coworkers. What is his
salary?
(ii) Mubin’s pay is at the
10th percentile. How
much does he make?
• X~N(1500, 2602).
Percentage point graph
(i) P(X>a) = 0.15
P(Z>[a-1500]/260) =
0.15.
From Table 6, z =
1.0364. So [a –
1500]/260 = 1.0364.
a = 1500 +
1.0364260 = 1769.
Ahmad’s pay is about
RM 1770.
Ex 6 (ii)
• Graph of normal
distribution, P(Z < -z)
(ii) P(X < m) = 0.10.
P(Z < [m-1500]/260)
= 0.1.
From the table, z =
1.2816. So [m –
1500]/260 = –1.2816.
M = 1500 – 1.2816260
= 1167.
So Mubin gets about RM
1170.
Example 7
• A supermarket sells chicken in 4 grades.
20% are grade A chicken. They are the
heaviest. On the other hand, grade D, 12%,
are the lightest. If the mean weight of the
chicken is 2.1 kg and the SD is 0.22 kg, find
the range of weight for grade A and D
chicken. (Assume the weights distribute
normally)
W = weight of chicken. W~N(2.1, 0.222)
Grade a chicken must be the
heaviest. Let a be the
minimum weight of a
grade A chicken, then
P(W>a) = 0.2.  P(z>[a–
2.1]/0.22)=0.2
From the table, z0.2 = 0.8416.
So [a-2.1]/0.22 =
0.8416 a = 2.29. So
grade A chicken are 2.29
kg or more in weight.
Negative z
Grade D chicken should be
lighter than d kg, where
P(W < d) = 0.12. This
means P(z < [d –
2.1]/0.22) = 0.12.
From Table 6, we have z0.12 =
1.175. Since this is on the
left of the mean, we put [d
– 2.1]/0.22 = -1.175 d =
1.84. So grade D chicken
are less than 1.84 kg.
Example 8
• A class teacher decides to send top 3 from
his 40 students to a competition. The
selection is based on the average scores of
some tests. The mean score of the test is
65.4, SD is 16.5. What is the minimum
score of the students selected? (Assume the
scores follow the normal distribution.)
Example 8: Solution
S~N(65.4, 16.52). Since only
3 out of 40 are selected,
P(S>m) = 3/40 = 0.075.
This is equivalent to P(z>
[m-65.4]/16.5) = 0.075 
m = 65.4 + 16.51.4395 =
89.2.
So a student should score
89.2 or more to be
selected.
Example 9
•
A forestry officer records the monthly
production of logs in his area. He finds that the
mean monthly yield is 2200 tonnes, with SD
420 tonnes.
(i) What is the probability the production of a
certain month is between 2100 and 2400
tonnes?
(ii) During a wet month, because of transport
problem, the production is among 15% of the
lowest. How much would the yield be?
Example 9: Solution
Let L represent the monthly yield of logs. Then L~N(2200,
4202)
(i) P(2100<L<2400)
= P([2100-2200]/420<z<[2400-2200]/420)
= P(-0.24<z<0.48)
= 0.0948 + 0.1844 = 0.2792
(ii) Let k represent the maximum production for that month.
Then P(L<k)=0.15 => P(z<[k-2200]/420) = 0.15.
From Table 6, z0.15 = 1.0364.
So [k-2200]/420 = -1.0364  k = 1765.
This means the production for that month is 1765 tonnes or
less.
Sum of, and Difference between,
two variables
• If X1~N(1, 12) and X2~(2, 22) then
X1 + X2~N(1 + 2, 12+22) and
X1 – X2~N(1 – 2, 12+22).
Combining two variables is used only when
the two variables use the same units, such as
weight and weight, height and height and so
on.
Example 10
•
The weight of a cabbage from farms are
expected to follow the normal distribution. At
Farm A, the mean is 625 g with SD 86 g; at
Farm B the mean is 708 g and SD 92 g. A
customer selects a cabbage from each of the
farms. What is the probability
(i) The total weighs less than 1.4 kg?
(ii) The cabbage from A is at least 100 g less than
that from b?
Example 10 - solution
(i) A~N(625, 862)
B~N(708, 922)
So A+B~N(625+708, 862+922).
P(A+B<1400) =P(z< [1400-1333]/ (862+922))
= P(z < 0.53) = 0.5 + 0.2019 = 0.7019.
(ii) A – B~N(625 – 708, 862+922).
P(A – B < –100) = P(z<[–100 –(–83) ]/ (862+922))
= P(z < – 0.13) = 0.5 – 0.0517 = 0.4483
Example 11
• A company maintains two sets of records
for different types of client. For type X, the
mean size of file for records is 5.8 MB, with
SD 0.82 MB. For type Y, the mean size of
file for records is 6.2 MB, with SD 0.76
MB. A customer from X and another from Y
get married, and their files are to combined.
What is the probability the new file will be
between 10 and 13 MB?
Ex 11 Solution
X~N(5.8, 0.822),
Y~N(6.2, 0.762)
X+Y~(5.8+6.2, 0.822+0.762)
P(10<X+Y <13)
= P([10-12]/(0.822+0.762) < z < [1312]0.822+0.762))
= P(-1.79 < z < 0.89)
= 0.4633 + 0.3133 = 0.7766.
Multiples of a variable
• If X~N(, 2), and we select 2 elements of
X, then, representing the sum of the two
items as Y, we have
Y~N( + , 2 + 2) or N(2, 22).
Generalising, if we sum up n items from X,
calling it S, we should have
S~N(n, n2).
Ex 12
On the average, a student in UAB spends RM
12.50 per day on food, with SD RM 2.20.
What is the probability
(i) A group of 5 UAB students spend a total
RM 70.00 or more today?
(ii) A group of 15 UAB students spend a total
less than RM 200.00 today?
S~N(12.50, 2.202).
S5 = food expenditure for 5 students;
S12 = food expenditure for 12 students.
(i) S5 ~ N(12.505, 2.2025)
P(S570) = P(z  [70 – 62.5]/(2.22 5)
= P(z  1.52) = 0.5 – 0.4357 = 0.0643.
(ii) S12 ~ N(12.5015, 2.20215)
P(S15<200) = P(z < [200 – 187.5]/(2.22
15)
= P(z < 1.78) = 0.5 + 0.4625 = 0.9625.