Engineering Statistics
Download
Report
Transcript Engineering Statistics
SSE 2193
Engineering Statistics
Chapter 2
Special Variables
2A Binomial Distribution
Events from experiments
• While it is educational to learn about
general discrete and continuous random
variables, such variables are not much use
in describing real life events.
• The result of experiments usually can be
expected to mimic some distributions with
special properties.
Bernoulli trials
• In experiments where a certain process is carried
out repeatedly, with each process independent of
the others, it is possible to deduce the result of an
event.
• In particular, if we can separate the outcomes of an
experiment into two groups, one which is
desirable, and we call it success, while the other a
failure, then we have the case of a Bernoulli trial.
Success and failure
• Take the weather. If we wish for a sunny day, then
we call a sunny day a success, and a rainy day a
failure.
• A person may expect a grade B+ or better as a
success, then anything less will be a failure.
• If you are expected to reach an office before 10.00
a.m. then arriving before then is a success, and
being late will be a failure.
Binomial Distribution
•
If we repeat a Bernoulli trial many times, and
count the number of successes using X, then we
shall have a binomial distribution if
1. Each trial is independent of the other;
2. The probability of each success remains the
same for each trial.
• If the probability of each success is p, and we
run the experiment n times, then suppose we use
X to represent the number of successes, we shall
write X ~ Bin(n, p).
The values of X
•
When a trail is done n times, the number of successes
must be one of 0 (all failures), 1, 2, … n (all successes).
Hence, unlike general DRV, a binomial distribution
X~Bin(n, p) must have all values from 0 to n only. There
is a non-trivial probability for every r in the range 0 … n.
In short:
1. X = {0, 1, …, n}
2. P(X=r) > 0 for all r X.
Of course, it is possible that P(X=r) is so small that in
practice, we will just put it as 0. But technically, the is no
zero probability for each of the X.
P(X=r)
• If X~Bin(n, p), then
1. P(X=0) designates probability of no success at
all. This is equal to (1-p)n.
2. P(X=n) means probability of getting all n
successes. This is equal to pn.
3. For any other r, P(X=r)=nCrpr(1-p)n-r.
NOTE: For brevity, we usually use the symbol q to
represent 1-p. Hence the formula is frequently
written as P(X=r)=nCrprqn-r.
nC
r
• nCr designates the number of ways to choose r
items out of n items. Its value is n(n-1)(n-2)…(nr+1)/r!.
• r! = 123…r
• For example 8C3 =876/(123) = 56. This
means, given 8 items, you have 56 different ways
of selecting 3 items out of the lot.
• Note that nCn = nC0 = 1.
Bin(n, p) – Example 1
1: If a Bernoulli trial has a success rate of 0.2, and it
is carried out 7 times, what is the probability we
get
(i) 4 successes?
(ii) No success?
Solution: Let X represent the number of successes.
Then X ~ Bin(7, 0.2).
(i) So P(X=4) = 7C40.240.83 = 0.0287.
(ii) P(X=0) = 0.87 = 0.2097.
Bin(n, p) – Example 2
2: It is known that in 35% of accidents involving motorcycles,
the rider dies. On a day when 10 such accidents are
reported, what is the probability
(i) 2 riders die?
(ii) At least 3 riders die?
Solution: Let D represent the number of deaths. Then D ~
Bin(10, 0.35).
(i) So P(D=2) = 10C20.3520.658 = 0.17565.
(ii) In this case, we want P(D3). This means we need to add
up P(D=3), P(D=4) … P(D=10). However, we note that
the sum of all probabilities is 1. So, we may also obtain
P(D3) as 1 – [P(D=0)+ P(D=1)+ P(D=2)] = 1 –
[0.01346+0.07249+0.17565] =0.7384 (correct to 4
decimal places)
Tables of binomial distributions
• In practice, it is not desirable to carry out
calculations such as P(D=3), P(D=4) … P(D=10)
as for the last example. Apart from time, this
tedious work may lead to errors.
• Particularly when calculators are not available,
statisticians find it more convenient to use readymade tables of binomial distributions.
• The UTM table provides probabilities cumulative
from below. I.e. the table shows P(Xk) for each
k, not P(X=k).
UTM Table for Binomial
Distribution
• The binomial distribution table shows cumulative
probabilities [P(Xk)] for k=0, … n. The table list
the probabilities separately for n=1, 2, 20, 23, 25,
27, and 30.
• The table shows p=0.01, … 0.09 (first part) 0.10,
0.15, …0.50 (second part).
• For values of p beyond 0.5, we have to use
complementary procedures. For values between
the given p, and n=21, 22, 26, 28 and 29, we may
use linear interpolation for approximate answers.
Use and Limitations of Table
• Going back to Example 2, we could have referred to the
table, and read off P(D2) which is 0.2616, and hence
obtain P(D3) as 1 – 0.2616 = 0.7384.
• However, space constraint means that only n30 can be
accommodated in a small handbook, and we only have
values of p=0.01, …0.09, 0.1, 0.15, …0.5. We have to deal
with cases of other p values ourselves.
• In the following examples, you will learn to deal with each
of the cases when the table can be used, and how to
overcome the limitations when they occur.
Example 3
•
25 trainees undergo a perseverance test. Based on
records, it is known that 40% of them will drop off
before completion. What is the probability that
(i) Up to 6 of them will drop?
(ii) 4 to 8 of them will drop?
Solution: Let X represent the number of trainees who drop.
Then X~Bin(25, 0.4)
(i) P(X6) = 0.0736 [Read from table]
(ii) P(4 X 8) = P(X 8) – P(X 3)
= 0.2735 – 0.0024 = 0.2711.
Note that in (ii), we have to express the value as the
difference between two others.
Example 4
•
The head of department calls a policy meeting among
his 23 clerical staff. From experience, he knows that
15% of them will be absent. What is the probability
(i) 3 to 5 of them will be absent?
(ii) At least 3 of them will be absent?
Solution: A = number absent ~ Bin(23, 0.15)
(i) We need P(3A5). In this case, we can read P(A5) =
0.8811, and P(A2) = 0.3080, so P(3A5) = P(A5) –
P(A2) = 0.5731.
(contd)
Example 4 (contd)
(i)
(contd) However, as P(3A5) = P(A=3) + P(A=4) +
P(A=5), it is just as easy to calculate the three values
using the formula. Now
P(A=3) =23C30.1530.8520 = 0.23167;
P(A=4) =23C40.1540.8519 = 0.20442;
P(A=5) =23C50.1550.8518 = 0.13708;
So P(3A5) = 0.23167+0.20442+0.13708=0.6618 (4 d.p.)
(ii) The event of A 3 need to be interpreted as the event
complement to A 2. Hence we decide that P(A3) = 1
– P(A2) = 1 – 0.3382 = 0.6618.
Example 5
•
The probability a new menu from a fast-food restaurant
MOM is successful in sale is 0.8. MOM proposes to
bring out 18 new menus this year. What is the
probability that
(i) All will be successful?
(ii) At least 12 will be successful?
(iii) 10 to 15 menus will be successful?
Solution :
Let S represent the number of successful menus. Then
S~Bin(18, 0.8)
(i) P(S=18) = 0.818 = 0.0180
Example 5 (contd)
(ii) The value we want is P(S12). Unfortunately,
the table does not provide for p=0.8. Let us
interpret this another way. Let S’ represent the
number of unsuccessful menus. Then
S’~Bin(18, 0.2).
[Refer to the next slide]
Now S12 corresponds to S’6. From the table, we
see that this is . Hence we decide that P(S12) =
P(S’6) = 0.9487.
Table for S and S’
(ii)
S 0 1 2 3 4 5 6 7 8 9
S’ 18 17 16 15 14 13 12 11 10 9
S 10 11 12 13 14 15 16 17 18
S’ 8 7 6 5 4 3 2 1 0
--------------------------------------------------------------(iii) S 0 1 2 3 4 5 6 7 8 9
S’ 18 17 16 15 14 13 12 11 10 9
S 10 11 12 13 14 15 16 17 18
S’ 8 7 6 5 4 3 2 1 0
Example 5 (contd)
Again we need S’ for this. In this case, 10S15 is
identical with 3S’8. [See the previous slide.]
So P(3S’8) = P(S’ 8) – P(S’2)
= 0.9957 – 0.2713 = 0.7244.
WARNING: Determining events in S using the
complement S’ is a tricky process. You need to
exercise care to make sure the events do match.
Otherwise you will be getting wrong results. The
tables shown are very useful in identifying the
equivalent complement events. Use it.
Example 6
•
The probability a flight of airline A is full is
0.75. There are 19 flights of A from an airport.
What is the the probability
(i) More than 15 will be full?
(ii) 10 to 15 will be full?
Solution : Let F represent the number of full flights.
Then F~Bin(19, 0.75). As p is more than 0.5, we
shall introduce F’ which represents the number
of flights which are not full. Hence F’~Bin(19,
0.25).
Table for F and F’
(ii)
F 0 1 2 3 4 5 6 7 8 9
F’ 19 18 17 16 15 14 13 12 11 10
F 10 11 12 13 14 15 16 17 18 19
F’ 9 8 7 6 5 4 3 2 1 0
--------------------------------------------------------------(iii) F 0 1 2 3 4 5 6 7 8 9
F’ 19 18 17 16 15 14 13 12 11 10
F 10 11 12 13 14 15 16 17 18 19
F’ 9 8 7 6 5 4 3 2 1 0
Example 6 (contd)
(i) More than 15 flights full is equivalent to 4
flights or less not full. Hence we conclude that
P(F>15) = P(F’3) = 0.2631.
(ii) Similarly, we find that 10F15 is the same as
4F’9. This means that
P(10F15)
= P(4F’9)
= P(F’9) – P(F’3)
= 0.9911 – 0.2631 = 0.7280.
Properties of binomial
distribution
• For X~Bin(n, p), the mean
= np, and the variance
2 = npq.
• When p is small, its main
values are near the point
when X=0. However, the
opposite is true when p is
close to 1.
• When p is close to 0.5,
(say from 0.3 to 0.7) the
distribution will be quite
symmetric.
Bar charts for p=0.4 and p=0.9
Example 7
• The mean of a binomial distribution X is 8, and the
variance is 4.8. Find the probability of 3 X 8.
Solution : Let X~Bin(n, p), then the mean = np = 8, and the
variance = npq = 4.8. Hence q = 4.8/8 = 0.6 p = 0.4, n =
20. And so X~Bin(20, 0.4)
P(3 X 8) = P(X 8) – P(X 2) = .
NOTE: This example is an exercise to test your skills in using
the knowledge about the mean and variance of a binomial
distribution. It has no other practical applications. As such,
we shall not be dealing with this type of problems
anymore.
Example 8
•
The probability a man suffers from colourblindness is 0.2; for a woman, it is 0.05. 4 men
and 7 women are checked for colour-blindness.
What is the probability
(i) Exactly 3 people are colour-blind?
(ii) More women than men are colour-blind?
Solution : Let M and W represents the numbers of
men and women who are colour-blind. Then
M~Bin(4, 0.2) and W~Bin(7, 0.05). We first
construct a table each for each of the variables.
Distributions of M and W
Probability Distribution of M
R
0
1
2
3
4
P(M=r)
0.4096 0.4096 0.1536 0.0256 0.0016
Probability Distribution of W
r
0
1
2
3
4
P(W=r) 0.6983 0.2572 0.0406 0.0036 0.0002
5
0.0000
Note that P(W=5) = 0.0000059, NOT 0. However, as we keep
our values correct to only 4 decimal places, we show it as 0.
The values for r=6 and 7 are even smaller, and are not
shown.
Answer to Ex 8(i)
(i) We need the value of P(M+W=3).
There is no short cut here. We shall just work
out as below:
P(M+W=3) = P(M=0W=3) +P(M=1W=2)
+P(M=2W=1) +P(M=3W=0)
= 0.40960.0036 + 0.40960.0406 +
0.15360.2572 +0.02560.6983
= 0.7549 (4 d.p.)
Answer to Ex 8(ii)
(ii) We need the value of P(W>M).
This also has to be analysed individually. We reason it this
way: Either
W=1 and M=0 0.25720.4096 or
W=2 and M=0 or 1 0.04060.8192 or
W=3 and M=0, 1 or 2 0.00360.9728 or
W=4 and M=0, 1, 2 or 3 0.00020.9984
As the probabilities for W=5 or more are very small, we need
not consider them.
Thus P(M>W) = 0.25720.4096 + 0.04060.8192 +
0.00360.9728 + 0.00020.9984 = 0.1423.
A note on combining binomial
distributions
• As can be seen from Ex 7, combining variables of
binomial distributions is difficult.
• In the exceptional event that X~Bin(n1, p) and Y~Bin(n2,
p), then we can conclude that X+Y~Bin(n1+n2, p).
• However, even when the p’s are the same, we can not deal
with comparisons of the type X>Y (as in 8(ii)).
• Except this case (equal p’s), there is no short-cut in
combining the two variables. The event has to be broken
down into separate cases and calculated step-by-step. This
is of course practical only when n1 and n2 are small. For
larger values of n, we need computer programs to perform
the calculations, or use approximations. (See 2D)
Interpolation
• It is impossible to provide a table which will include all
values of n and p for binomial distributions. In particular, it
is not meaningful to include tables for binomial
distributions for n>30.
• For p, the UTM tables gives p in increments of 0.01 from 0
to 0.1, then by increments of 0.05 to 0.5. This means we
have no table for p = 0.11, 0.12, 0.13 and 0.14, and
similarly for the other values between 0.15 and 0.2 and so
on. The table will become exceptionally thick if we were to
include these values.
• In problems where such p values appear, we have two
ways out. One is to do our own calculations, and the other
is to use interpolation.
Interpolation Procedure
• If f(x1) = a and f(x2) = b, then for a value x between x1 and
x2, we can obtain a linear interpolation of f(x) by the
formula
f(x) = f(x1) + [f(x2)-f(x1)][x-x1][x2-x1].
For example, if f(5) = 18.7 and f(8) = 22.9, then f(6.2) = f(5) +
[f(8)-f(5)][6.2-5][8-5] = 18.7 + [22.9-18.7][1.2][3]
=20.38.
As linear interpolation uses the linear rule to replace the actual
relation of f, it can only provide an approximate value. This
further assumes that the function is not discontinuous or
wildly uneven. This method is only used to save time and
when accuracy is not too important. Otherwise, the only
way out is to carry out calculations yourself.
Example 9
•
12% of dengghi patient hemorrhage. In a ward, there are
16 dengghi patients. What is the probability that
(i) Up to 3 will hemorrhage?
(ii) If, by a certain time, 2 patients already hemorrhage,
what is the probability at least 2 more will hemorrhage?
Solution: H = number who hemorrhage ~Bin(16, 0.12).
Unfortunately, the table we have do not have the table
for n=16 and p = 0.12. We have to either resort to direct
calculations for each value or use the method of
interpolation.
Interpolation for Ex 9
We shall use the tables of H1~Bin(16, 0.1) and H2~Bin(16,
0.15). Note that P(H13) = 0.9316 and P(H23) =
0.7899. By the rule of interpolation,
(i) P(H3) = 0.9316 + (0.7899 - 0.9316 )2/5 = 0.8749.
Note that exact calculations yield P(H3) = 0.8838 (4 d.p.)
(ii) This asks for the conditional probability P(H 4| H 2)
= P(H 4H 2)P(H 2). Since H 4H 2= H
4, the answer we need is P(H 4)P(H 2).
Now P(H 2) = 1 – [P(H=0)+P(H=1)] = 1 – [0.12934 +
0.28219] = 0.5885, and P(H4) = 1 – P(H3) = 0.1162
from (i). So P(H4)P(H2) = 0.1162/0.5885=0.1975.
[In this case, we carry out exact calculations because
only a few numbers are involved]
Complex problems
• In some cases, the probability of an event is
built on another event. The next few
examples are complex problems of such
nature.
• In addition to this, we shall also meet
problems which combine different
distributions, including Poisson, normal and
others. We shall come to such problems
when they arise.
Example 10
• A school has 18 classes, each with 20 students.
During a flu season, 15% of students are absent.
(i) To avoid the epidemic from spreading, a class
will be closed if 6 or more students are absent.
What is the probability a class will close?
(ii) For administration purposes, the school is forced
to close temporarily when three classes or more
are closed. What is the probability the school will
close?
Example 10 - Solution
(i)
Let A represent the number of absentees in a class, then
A~Bin(20, 0.15). The probability a class will close is
P(A6) = 1–P(A5) = 1–0.9327 = 0.0673.
(ii) Let the number of classes closed be represented by C.
Then C~Bin(18, 0.0673).
The probability the school will close is
P(C3) = 1 – P(C2) = 1 – (0.932718 +
180.06730.932717 + 18C20.067320.932716)
= 0.1168.
Note: If we wish to use the statistics tables, use Bin(18, 0.06)
and Bin(18, 0.07) and carry out interpolation.
Example 11
•
The probability Chai succeeds in netting a ball
is 0.8.
(i) Chai is given 6 chances to net a ball, and she is
to be rewarded with an ice-cream cone if she
nets at least 4. What is the probability she will
get the reward in a trial?
(ii) 5 classmates challenge Chai to win by netting 4
balls or more in 6 trials. Assuming teat she
remains consistent throughout the trial, what is
the probability she can get 3 ice-cream cones or
more?
Example 11 (Solution)
(i) B=number of balls netted. B~Bin(6, 0.8).
P(B4) = P(B=4)+P(B=5)+ P(B=6)
=6C40.840.22+ 6C50.850.2+ 0.86
=0.90112
(ii) I=number of ice-cream cones rewarded
I~Bin(5, 0.90112)
P(I 3)= 5C30.9011230.098882 +
5C 0.9011240.09888 + 0.901125 =
4
0.0715426+0.3259935+0.5941733 = 0.9917.
Example 12
• A lecturer runs a quiz with 4 questions. The
questions are a bit confusing and the
probability a student can guess at the correct
answer for each question is 0.75. A student
passes if he/she gets 3 correct answers.
What is the probability, out of 40 students,
at least 30 will pass?