Transcript Document
Known Probability Distributions
• Engineers frequently work with data that can be
modeled as one of several known probability
distributions.
• Being able to model the data allows us to:
model real systems
design
predict results
• Key discrete probability distributions include:
binomial / multinomial
negative binomial
hypergeometric
Poisson
EGR 252 - 6
1
Discrete Uniform Distribution
• Simplest of all discrete distributions
All possible values of the random variable have the
same probability, i.e.,
f(x; k) = 1/ k, x = x1 , x2 , x3 , … , xk
• Expectations of the discrete uniform distribution
k
EGR 252 - 6
k
xi
i 1
k
and
2
( x i )2
i 1
k
2
Binomial & Multinomial Distributions
• Bernoulli Trials
Inspect tires coming off the production line. Classify each as
defective or not defective. Define “success” as defective. If historical
data shows that 95% of all tires are defect-free, then P(“success”) =
0.05.
Signals picked up at a communications site are either incoming
speech signals or “noise.” Define “success” as the presence of
speech. P(“success”) = P(“speech”)
Administer a test drug to a group of patients with a specific
condition. P(“success”) = ___________
• Bernoulli Process
n repeated trials
the outcome may be classified as “success” or “failure”
the probability of success (p) is constant from trial to trial
repeated trials are independent.
EGR 252 - 6
3
Binomial Distribution
• Example:
Historical data indicates that 10% of all bits
transmitted through a digital transmission channel are
received in error. Let X = the number of bits in error in
the next 4 bits transmitted. Assume that the
transmission trials are independent. What is the
probability that
Exactly 2 of the bits are in error?
At most 2 of the 4 bits are in error?
more than 2 of the 4 bits are in error?
• The number of successes, X, in n Bernoulli trials
is called a binomial random variable.
EGR 252 - 6
4
Binomial Distribution
• The probability distribution is called the binomial
distribution.
n x nx
b(x; n, p) = p q
x
, x = 0, 1, 2, …, n
where p = _________________
q = _________________
• For our example,
b(x; n, p) = _________________
EGR 252 - 6
5
For Our Example …
• What is the probability that exactly 2 of the bits
are in error?
• At most 2 of the 4 bits are in error?
EGR 252 - 6
6
Your turn …
• What is the probability that more than 2 of the 4
bits are in error?
EGR 252 - 6
7
Expectations of the Binomial Distribution
• The mean and variance of the binomial
distribution are given by
μ = np
σ2 = npq
• Suppose, in our example, we check the next 20
bits. What are the expected number of bits in
error? What is the standard deviation?
μ = ___________
σ2 = __________ ,
EGR 252 - 6
σ = __________
8
Another example
A worn machine tool produces 1% defective parts. If we
assume that parts produced are independent, what is the
mean number of defective parts that would be expected
if we inspect 25 parts?
What is the expected variance of the 25 parts?
EGR 252 - 6
9
Helpful Hints …
• Sometimes it helps to draw a picture.
Suppose we inspect the next 5 parts …
P(at least 3)
P(2 ≤ X ≤ 4)
P(less than 4)
• Appendix Table A.1 (pp. 742-747) lists Binomial
Probability Sums, ∑rx=0b(x; n, p)
EGR 252 - 6
10
Your turn …
• Use Table A.1 to determine
1. b(x; 15, 0.4) , P(X ≤ 8) = ______________
2. b(x; 15, 0.4) , P(X < 8) = ______________
3. b(x; 12, 0.2) , P(2 ≤ X ≤ 5) = ___________
4. b(x; 4, 0.1) , P(X > 2) = ______________
EGR 252 - 6
11
Multinomial Experiments
• What if there are more than 2 possible outcomes?
(e.g., acceptable, scrap, rework)
• That is, suppose we have:
n independent trials
k outcomes that are
mutually exclusive (e.g., ♠, ♣, ♥, ♦)
exhaustive (i.e., ∑all k pi = 1)
• Then
n
x1 x2
p1 p2 ... pkxk
f(x1, x2, …, xk; p1, p2, …, pk, n) = x1, x2 ,..., xk
EGR 252 - 6
12
Example
• Look at problem 5.22, pg. 152
x1 = _______
p1 = _______
x2 = _______
p2 = _______
x3 = _______
p3 = _______
n = _____
f( __, __, __; ___, ___, ___, __) =_________________
= __________________________________
EGR 252 - 6
13
Hypergeometric Distribution
• Example*:
Automobiles arrive in a dealership in lots of 10. Five out
of each 10 are inspected. For one lot, it is know that 2
out of 10 do not meet prescribed safety standards.
What is probability that at least 1 out of the 5 tested from
that lot will be found not meeting safety standards?
*from Complete Business Statistics, 4th ed (McGraw-Hill)
EGR 252 - 6
14
• This example follows a hypergeometric distribution:
A random sample of size n is selected without replacement from N
items.
k of the N items may be classified as “successes” and N-k are
“failures.”
• The probability associated with getting x successes in the
sample (given k successes in the lot.)
k N k
x n x
P ( X x ) h( x; N, n, k )
N
n
Where,
k = number of “successes” = 2
N = the lot size = 10
EGR 252 - 6
n = number in sample
=5
x = number found
= 1 or 2
15
Hypergeometric Distribution
• In our example,
P ( X x ) P ( X 1) P ( X 2)
2 10 2 2 10 2
1 5 1 2 5 2
h(1;10,5,2) h(2;10,5,2)
10
10
5
5
= _____________________________
EGR 252 - 6
16
Expectations of the Hypergeometric
Distribution
• The mean and variance of the hypergeometric distribution
are given by
nk
N
N n
k
k
2
* n * (1 )
N 1
N
N
• What are the expected number of cars that fail inspection in
our example? What is the standard deviation?
μ = ___________
σ2 = __________ ,
EGR 252 - 6
σ = __________
17
Your turn …
A worn machine tool produced defective parts for a period of
time before the problem was discovered. Normal sampling of
each lot of 20 parts involves testing 6 parts and rejecting the
lot if 2 or more are defective. If a lot from the worn tool
contains 3 defective parts:
1. What is the expected number of defective parts in a sample of
six from the lot?
2. What is the expected variance?
3. What is the probability that the lot will be rejected?
EGR 252 - 6
18
Binomial Approximation
• Note, if N >> n, then we can approximate this with the
binomial distribution. For example:
Automobiles arrive in a dealership in lots of 100. 5 out of
each 100 are inspected. 2 /10 (p=0.2) are indeed below
safety standards.
What is probability that at least 1 out of 5 will be found
not meeting safety standards?
• Recall: P(X ≥ 1) = 1 – P(X < 1) = 1 – P(X = 0)
Hypergeometric distribution
EGR 252 - 6
Binomial distribution
(Compare to example 5.15, pg. 155)
19
Negative Binomial Distribution
• Example:
Historical data indicates that 30% of all bits
transmitted through a digital transmission channel are
received in error. An engineer is running an
experiment to try to classify these errors, and will start
by gathering data on the first 10 errors encountered.
What is the probability that the 10th error will occur on
the 25th trial?
EGR 252 - 6
20
• This example follows a negative binomial distribution:
Repeated independent trials.
Probability of success = p and probability of failure = q = 1-p.
Random variable, X, is the number of the trial on which the kth
success occurs.
• The probability associated with the kth success occurring on
trial x is given by,
x 1 k x k
p q , x k, k 1, k 2,...
b * ( x; k, p )
k 1
Where,
k = “success number” = 10
x = trial number on which k occurs = 25
p = probability of success (error) = 0.3
q = 1 – p = 0.7
EGR 252 - 6
21
Negative Binomial Distribution
• In our example,
25 1
(0.3)10 (0.7)25 10
b * (25;10,0.3)
10 1
= _____________________________
EGR 252 - 6
22
Geometric Distribution
• Example:
In our example, what is the probability that the 1st bit
received in error will occur on the 5th trial?
• This is an example of the geometric distribution,
which is a special case of the negative binomial
in which k = 1.
The probability associated with the 1st success
occurring on trial x is given by
g ( x; p) pq x 1
= __________________________________
EGR 252 - 6
23
Your turn …
A worn machine tool produces 1% defective parts. If we
assume that parts produced are independent:
1.What is the probability that the 2nd defective part will be the
6th one produced?
2.What is the probability that the 1st defective part will be seen
before 3 are produced?
3.How many parts can we expect to produce before we see the
1st defective part? (Hint: see Theorem 5.4, pg. 161)
EGR 252 - 6
24
Poisson Process
• The number of occurrences in a given interval or
region with the following properties:
“memoryless”
P(occurrence) during a very short interval or small region
is proportional to the size of the interval and doesn’t
depend on number occurring outside the region or
interval.
P(X>1) in a very short interval is negligible
EGR 252 - 6
25
Poisson Process
• Examples:
Number of bits transmitted per minute.
Number of calls to customer service in an hour.
Number of bacteria in a given sample.
Number of hurricanes per year in a given region.
EGR 252 - 6
26
Poisson Process
• Example
An average of 2.7 service calls per minute are received
at a particular maintenance center. The calls
correspond to a Poisson process. To determine
personnel and equipment needs to maintain a desired
level of service, the plant manager needs to be able to
determine the probabilities associated with numbers of
service calls.
What is the probability that fewer than 2 calls will be
received in any given minute?
EGR 252 - 6
27
Poisson Distribution
• The probability associated with the number of
occurrences in a given period of time is given by,
e t ( t ) x
p( x; t )
, x 0,1,2,...
x!
Where,
λ = average number of outcomes per unit time or region
= 2.7
t = time interval or region = 1 minute
EGR 252 - 6
28
Our Example
• The probability that fewer than 2 calls will be
received in any given minute is …
P(X < 2) = P(X = 0) + P(X = 1)
= __________________________
• The mean and variance are both λt, so
μ = _____________________
• Note: Table A.2, pp. 748-750, gives Σt p(x;μ)
EGR 252 - 6
29
Poisson Distribution
• If more than 6 calls are received in a 3-minute
period, an extra service technician will be
needed to maintain the desired level of service.
What is the probability of that happening?
μ = λt = _____________________
P(X > 6) = 1 – P(X < 6)
= _____________________
EGR 252 - 6
30
Poisson Distribution
50
Frequency
40
30
20
10
0
Calls per minute
EGR 252 - 6
31
Poisson Distribution
The effect of λ on the Poisson distribution
0.4
0.35
0.3
0.25
0.2
0.15
0.1
1
2
3
4
5
6
7
8
9
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
EGR 252 - 6
32
Continuous Probability Distributions
• Many continuous probability distributions,
including:
Uniform
Normal
Gamma
Exponential
Chi-Squared
Lognormal
Weibull
EGR 252 - 6
33
Uniform Distribution
• Simplest – characterized by the interval
endpoints, A and B.
1
f ( x; A, B )
BA
=0
A≤x≤B
elsewhere
• Mean and variance:
2
AB
(
B
A
)
2
and
2
12
EGR 252 - 6
34
Example
A circuit board failure causes a shutdown of a computing
system until a new board is delivered. The delivery time
X is uniformly distributed between 1 and 5 days.
What is the probability that it will take 2 or more days for
the circuit board to be delivered?
EGR 252 - 6
35
Normal Distribution
• The “bell-shaped curve”
• Also called the Gaussian distribution
• The most widely used distribution in statistical
analysis
forms the basis for most of the parametric tests we’ll
perform later in this course.
describes or approximates most phenomena in
nature, industry, or research
• Random variables (X) following this distribution
are called normal random variables.
the parameters of the normal distribution are μ and σ
(sometimes μ and σ2.)
EGR 252 - 6
36
Normal Distribution
• The density function of the normal random
variable X, with mean μ and variance σ2, is
n( x; , )
1
2
( x )2
2 2
e
all x.
Normal Distribution
P(x)
(μ = 5, σ = 1.5)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
2
4
6
8
10
12
x
EGR 252 - 6
37
Standard Normal RV …
• Note: the probability of X taking on any value
between x1 and x2 is given by:
x2
x2
P ( x1 X x 2 ) n( x; , )dx
x1
x1
1
2
( x )2
e
2 2
dx
• To ease calculations, we define a normal
random variable
Z
X
where Z is normally distributed with μ = 0 and σ2 = 1
EGR 252 - 6
38
Standard Normal Distribution
• Table A.3: “Areas Under the Normal Curve”
Standard Normal Distribution
-5
-4
-3
-2
-1
0
1
2
3
4
5
Z
EGR 252 - 6
39
Examples
• P(Z ≤ 1) =
-5
0
5
-5
0
5
-5
0
5
• P(Z ≥ -1) =
• P(-0.45 ≤ Z ≤ 0.36) =
EGR 252 - 6
40
Your turn …
• Use Table A.3 to determine (draw the picture!)
1. P(Z ≤ 0.8) =
2. P(Z ≥ 1.96) =
3. P(-0.25 ≤ Z ≤ 0.15) =
4. P(Z ≤ -2.0 or Z ≥ 2.0) =
EGR 252 - 6
41
The Normal Distribution “In Reverse”
•
Example:
Given a normal distribution with μ = 40 and σ = 6, find
the value of X for which 45% of the area under the
normal curve is to the left of X.
1) If P(Z < k) = 0.45,
k = ___________
2) Z = _______
-5
0
5
X = _________
EGR 252 - 6
42
Normal Approximation to the Binomial
• If n is large and p is not close to 0 or 1,
or
if n is smaller but p is close to 0.5, then
the binomial distribution can be approximated by the
normal distribution using the transformation:
X np
Z
npq
• NOTE: add or subtract 0.5 from X to be sure the value
of interest is included (draw a picture to know which)
• Look at example 6.15, pg. 191
EGR 252 - 6
43
Look at example 6.15, pg. 191
p = 0.4
n = 100
μ = ____________
σ = ______________
if x = 30, then z = _____________________
and, P(X < 30) = P (Z < _________) =
_________
EGR 252 - 6
44
Your Turn
•
Refer to the previous example,
DRAW THE
PICTURE!!
1. What is the probability that more than 50 survive?
2. What is the probability that exactly 45 survive?
EGR 252 - 6
45
Gamma & Exponential Distributions
• Recall the Poisson Process
Number of occurrences in a given interval or region
“Memoryless” process
• Sometimes we’re interested in the time or area
until a certain number of events occur.
• For example
An average of 2.7 service calls per minute are
received at a particular maintenance center. The calls
correspond to a Poisson process.
What is the probability that up to a minute will elapse before 2
calls arrive?
How long before the next call?
EGR 252 - 6
46
Gamma Distribution
• The density function of the random variable X with
gamma distribution having parameters α (number of
occurrences) and β (time or region).
x
1
f (x)
x 1e
( )
x > 0.
Gamma Distribution
(n ) (n 1)!
1
μ = αβ
σ2 = αβ2
f(x)
0.8
0.6
0.4
0.2
0
0
2
4
6
8
x
EGR 252 - 6
47
Exponential Distribution
• Special case of the gamma distribution with α = 1.
f (x)
1
x
e
x > 0.
Describes the time until or time between Poisson events.
μ=β
σ2 = β2
0
EGR 252 - 6
5
10
15
20
25
30
48
Example
An average of 2.7 service calls per minute are
received at a particular maintenance center. The
calls correspond to a Poisson process.
What is the probability that up to a minute will elapse
before 2 calls arrive?
β = ________
α = ________
P(X ≤ 1) = _________________________________
EGR 252 - 6
49
Example (cont.)
What is the expected time before the next call
arrives?
β = ________
α = ________
μ = _________________________________
EGR 252 - 6
50
Your turn …
• Look at problem 6.40, page 205.
EGR 252 - 6
51
Chi-Squared Distribution
• Special case of the gamma distribution with α = ν/2
and β = 2.
f (x)
1
2 / 2 ( / 2)
x
1
x2 e 2
x > 0.
where ν is a positive integer.
single parameter,ν is called the degrees of freedom.
μ=ν
σ2 = 2ν
6 6
EGR 252 -Ch.
52
Lognormal Distribution
• When the random variable Y = ln(X) is normally
distributed with mean μ and standard deviation σ,
then X has a lognormal distribution with the
density function,
f ( x; , )
e
6 6
EGR 252 -Ch.
2 x
e
1
2
2
[ln(
x
)
]
2
,
x0
2 / 2
e
2
1
2 2
(e 2 1)
53
Example
Look at problem 6.72, pg. 207 …
Since ln(X) has normal distribution with μ = 5 and
σ = 2, the probability that X > 50,000 is,
P(X > 50,000) = __________________________
6 6
EGR 252 -Ch.
54
Wiebull Distribution
• Used for many of the same applications as the
gamma and exponential distributions, but
• does not require memoryless property of the
exponential
f ( x; , ) x
F ( x ) 1 e
6 6
EGR 252 -Ch.
1 x
e
,
x 0
x
55
Example
• Designers of wind turbines for power generation
are interested in accurately describing
variations in wind speed, which in a certain
location can be described using the Weibull
distribution with
α = 0.02 and β = 2. A
designer is interested in determining the
probability that the wind speed in that location is
between 3 and 7 mph.
P(3 < X < 7) =
___________________________
6 6
EGR 252 -Ch.
56
Populations and Samples
• Population: “a group of individual persons,
objects, or items from which samples are taken
for statistical measurement”
• Sample: “a finite part of a statistical population
whose properties are studied to gain information
about the whole”
(Merriam-Webster Online Dictionary, http://www.m-w.com/, October 5, 2004)
EGR 252 - 6
57
Examples
Population
Samples
• Students pursuing
undergraduate engineering
degrees
• Cars capable of speeds in
excess of 160 mph.
• Potato chips produced at the
Frito-Lay plant in Kathleen
• Freshwater lakes and rivers
EGR 252 - 6
58
Basic Statistics (review)
1. Sample Mean:
n
X
X
i 1
i
n
• Example:
At the end of a team project, team members were
asked to give themselves and each other a grade on
their contribution to the group. The results for two
team members were as follows:
EGR 252 - 6
Q
S
92
85
95
88
85
75
78
92
XQ
= ___________________
XS
= ___________________
59
Basic Statistics (review)
1. Sample Variance:
n
S
2
( X
i 1
n 1
• For our example:
EGR 252 - 6
Q
S
92
85
95
88
85
75
78
92
i
X)
n
2
n
n X ( X i )2
i 1
2
i
i 1
n(n 1)
SQ2 = ___________________
SS2 = ___________________
60
Your Turn
• Work in groups of 4 or 5. Find the mean,
variance, and standard deviation for your group
of the (approximate) number of hours spent
working on homework each week.
EGR 252 - 6
61
Sampling Distributions
• If we conduct the same experiment several
times with the same sample size, the probability
distribution of the resulting statistic is called a
sampling distribution
• Sampling distribution of the mean: if n
observations are taken from a normal population
with mean μ and variance σ2, then:
x
...
n
2
2
2
2
2
...
2
x
2
n
n
EGR 252 - 6
62
Central Limit Theorem
• Given:
X : the mean of a random sample of size n taken from
a population with mean μ and finite variance σ2,
• Then,
the limiting form of the distribution of
X
Z
, n
/ n
is _________________________
EGR 252 - 6
63
Central Limit Theorem
• If the population is known to be normal, the
sampling distribution of X will follow a normal
distribution.
• Even when the distribution of the population
is not normal, the sampling distribution of X
is normal when n is large.
NOTE: when n is not large, we cannot assume the
distribution of X is normal.
EGR 252 - 6
64
Example:
The time to respond to a request for information from a
customer help line is uniformly distributed between 0 and
2 minutes. In one month 48 requests are randomly
sampled and the response time is recorded.
What is the probability that the average response time is
between 0.9 and 1.1 minutes?
μ =______________
σ2 = ________________
μX =__________
σX2 = ________________
Z1 = _____________
Z2 = _______________
P(0.9 < X < 1.1) = _____________________________
EGR 252 - 6
65
Sampling Distribution of the Difference
Between two Averages
• Given:
Two samples of size n1 and n2 are taken from two
populations with means μ1 and μ2 and variances σ12
and σ22
• Then,
X X 1 2
1
X2
2
1
2
1 X 2
n1
2
2
n2
and
Z
( X 1 X 2 ) ( 1 2 )
1
2
n1
EGR 252 - 6
2
2
n2
66
Sampling Distribution of S2
• Given:
S2 is the variance of of a random sample of size n
taken from a population with mean μ and finite
variance σ2,
• Then,
2
(n 1)s 2
2
n
i 1
( X i X )2
2
has a χ2 distribution with ν = n - 1
EGR 252 - 6
67
χ2 Distribution
χ2
• χα2 represents the χ2 value above which we find an area
of α, that is, for which P(χ2 > χα2 ) = α.
EGR 252 - 6
68
Example
• Look at example 8.10, pg. 256:
μ=3
σ=1
n=5
s2 = ________________
χ2 = __________________
χ2
If the χ2 value fits within an interval that covers 95% of the
χ2 values with 4 degrees of freedom, then the estimate for
σ is reasonable.
(See Table A.5, pp. 755-756)
EGR 252 - 6
69
Your turn …
• If a sample of size 7 is taken from a normal
population (i.e., n = 7), what value of χ2
corresponds to P(χ2 < χα2) = 0.95? (Hint: first
determine α.)
χ2
EGR 252 - 6
70
t- Distribution
• Recall, by CLT:
X
Z
/ n
is n(z; 0,1)
• Assumption: _____________________
(Generally, if an engineer is concerned with a familiar
process or system, this is reasonable, but …)
EGR 252 - 6
71
What if we don’t know σ?
• New statistic:
X
T
S/ n
Where,
n
X
i 1
Xi
n
and
( Xi X )
S
n 1
i 1
n
2
follows a t-distribution with ν = n – 1 degrees of
freedom.
EGR 252 - 6
72
Characteristics of the t-Distribution
• Look at fig. 8.13, pg. 259
• Note:
Shape: _________________________
Effect of ν: __________________________
• See table A.4, pp. 753-754
EGR 252 - 6
73
Using the t-Distribution
• Testing assumptions about the value of μ
Example: problem 8.52, pg. 265
• What value of t corresponds to P(t < tα) = 0.95?
EGR 252 - 6
74
Comparing Variances of 2 Samples
• Given two samples of size n1 and n2, with
sample means X1 and X2, and variances, s12 and
s 22 …
Are the differences we see in the means due to
the means or due to the variances (that is, are
the differences due to real differences between
the samples or variability within each samples)?
See figure 8.16, pg. 262
EGR 252 - 6
75
F-Distribution
• Given:
S12 and S22, the variances of independent random
samples of size n1 and n2 taken from normal
populations with variances σ12 and σ22, respectively,
• Then,
S12 / 12 22S12
F 2 2 2 2
S2 / 2 1 S2
has an F-distribution with ν1 = n1 - 1 and ν2 = n2 – 1
degrees of freedom.
(See table A.6, pp. 757-760)
EGR 252 - 6
76
Example
• Problem 8.55, pg. 266
S12 = ___________________
S22 = ___________________
F = _____________
• NOTE:
EGR 252 - 6
f0.05 (4, 5) = _________
1
f1 ( 1, 2 )
f ( 2 , 1 )
77