S2P3 - Lyle School of Engineering
Download
Report
Transcript S2P3 - Lyle School of Engineering
EMIS 7300
SYSTEMS ANALYSIS METHODS
Spring 2006
Dr. John Lipp
Copyright © 2002 - 2006 John Lipp
Today’s Topics
• Discrete Random Variables
–
–
–
–
–
–
–
–
–
–
–
Probability Density Functions (PDFs).
Cumulative Density Functions (CDFs).
Discrete Uniform Random Variables.
Binomial Random Variables.
Geometric Random Variables.
Inverse Binomial Random Variables.
Hypergeometric Random Variables.
Statistical Average / Expected value.
Mean and Variance Examples.
Poisson Random Variables.
Bivariate Random Variables.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-2
Frequency, Relative Frequency, and the Histogram
• A common way to show the distribution of data values is by
tabulating the number of occurrences within data sub-ranges.
– The data sub-ranges are commonly referred to as bins.
– The number of data occurrences within a particular bin is
referred to as the frequency.
– Frequency vs. bin is known as a frequency distribution.
When plotted, often as a bar chart, the result is called a
histogram.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-3
Frequency, Relative Frequency, and the Histogram (cont.)
• Consider the n = 3 weight sample mean
Bin Sub-Range
EMIS 7300 Spring 2006
Bin
“Label”
Frequency
Cumulative
Frequency
Copyright 2002 - 2006 Dr. John Lipp
Relative
Frequency
Cumulative
Relative
Frequency
S2P3-4
Sample Statistics (cont.)
98
EMIS 7300 Spring 2006
99
100
Copyright 2002 - 2006 Dr. John Lipp
101
102
S2P3-5
Frequency, Relative Frequency, and the Histogram (cont.)
• Histogram procedural suggestions
– A good rule of thumb for selecting the number of bins is to
use an integer close to the square root of the data set size.
– Bins should be sized so that at least 80% contain 5 or more
counts.
Combine the outside bins together in preference to the
inside bins.
Sometimes statistical outliers may be excluded from
the outer bins, such as when they seem to distort the
results.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-6
Frequency, Relative Frequency, and the Histogram (cont.)
• Histogram procedural suggestions
– A good choice for the (center) bin of the histogram is near
the mean, median, or mode.
– Use sigma, range, or quartiles for guide as to bin size.
For example, if the number of data points is 30, that
suggest using around 6 bins.
Choose bin 4 to be the mean, and the other bins to be
each one standard deviation away.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-7
Random Variables
“When you can express something in numbers,
you know something about it.”
- Albert Einstein
• Engineering is primarily concerned with analyzing numbers,
not events. This is facilitated by defining functions that map
outcomes (and events) from a sample space to the real number
line.
• Such functions are referred to as random variables. That is
even though they are not really variables.
• A random variable can then be operated on with appropriate
extensions to algebra and calculus.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-8
Discrete Random Variables
• When the elementary outcomes of random experiment are
numbers, the result is a discrete random variable.
A1
A2
A3
A4
A5
A6
1
2
3
4
5
6
– The numeric outcomes are typically, but not required to be,
integers.
– The number of elementary outcomes can be infinite.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-9
Discrete Random Variables (cont.)
• Denote a random variable as X and the possible elementary
outcomes as xi .
– The sample space S = {x1, x2, …, xM}.
– The probability of elementary outcomes is known as the
discrete probability density function (PDF) of X .
– The PDF is written as a function fX[x] = P(X = xi).
– The PDF is also known as the probability mass function.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-10
Discrete Random Variables (cont.)
Examples:
• Die (6-sided): fX[x] = 1/6 for x is one of {1,2,3,4,5,6}.
• Cards: fX[x] = 1/52 for x is one of {1..52}.
• Photons counted by an optical detector, X {0,1,2,…}.
• Weapon system computer simulation Monte Carlo score, i.e.,
the number of hits, X {0,1,2,…,N}.
• Digital volt meter measurement. The range for X is the
maximum scale equally divided by the number of counts
(typically 200, 2000, or 4000).
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-11
Discrete Random Variables (cont.)
• Probability Axiom 1:
M
P(S) = 1
f X [ xi ] 1
i 1
• Probability Axiom 2:
0 P(E) 1 0 fX[x] 1
• Probability Axiom 3:
P X xi X x j P( X xi ) P( X x j ) f X xi f X x j
xi and xj are disjoint
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-12
Discrete Random Variables (cont.)
• Since all elementary outcomes are disjoint,
Pxi X x j f x [ xi ] f x [ x j ] f x [ xk ]
j
k i
The Summation symbol
is short hand…
• The P(X xi) is known as the cumulative distribution function
(CDF) of X and is denoted FX[x],
i
FX [ xi ] f x [ x j ]
j 1
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-13
Discrete Random Variables (cont.)
• The CDF is bounded, 0 FX[x] 1,
– FX[-] = P(X -) = 0.
– FX[+] = P(X +) = 1.
• The CDF is monotone increasing, FX[x] FX[y] when x y.
• A discrete CDF is right-continuous and step-wise linear.
PDF
CDF
1
5/6
2/3
1/2
1/3
1/6
1
5/6
2/3
1/2
1/3
1/6
1
2
3
4
5
6
1
2
3
4
5
6
“Right Continuous”
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-14
Discrete Uniform PDF
• The simplest discrete distribution is a discrete uniform
distribution
– The random variable X can take on one of N values in the
ordered set {x1, x2, x3, …, xN}.
– PDF: fX[xi] = P(X = xi) = 1/N for any {x1, x2, x3, …, xN}.
– CDF: FX[xi] = P(X xi) = i/N.
– Note that X’s values don’t have to be uniformly spaced,
only the probability uniformly distributed.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-15
Binomial Random Variables
• A computer Monte Carlo simulation is a frequently used tool
to assess system performance such as Ph.
– The simulation is randomly initialized and executed N
times (or replicates).
– At the end of each run a performance metric, e.g., miss
distance, is compared to a requirement.
– A run that passes the requirement is a “hit” and scores 1, a
miss scores 0.
– The result of all the runs results in X out of N possible
successes, e.g., 11 out of 25.
– The result is often written as a percentage, e.g., 55% Ph.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-16
Binomial Random Variables (cont.)
• Now imagine this scenario (or is it a true story?)
– You work in a missile program that routinely measures
performance with a set of about 50 Monte Carlos.
– Each Monte Carlo contains 25 runs.
– Your customer just discovered the cost benefits of LINUX
and has a SETA contractor porting the simulation to that
environment.
– The SETA contractor is very excited! One of the Monte
Carlos has changed significantly – from 24 hits out of 25
down to 16 hits out of 25 – a 40% reduction!!!
– Should you be excited as well?
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-17
Binomial Random Variables (cont.)
• Denote each run of the Monte Carlo as Bi, a discrete random
variable with boolean outcomes of {0,1}.
• Traditionally, a 1 represents “success” and a 0 represents
“failure,” but that is not required.
• Assign P(Bi = 1) = p, 0 p 1.
– p is the probability of success.
– Consequently, P(Bi = 0) = n = 1- p.
• Let X be the sum of N independent, boolean random variables
with identical success rates p,
N
X B1 B2 BN Bi
i 1
X is called a binomial random variable.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-18
Binomial Random Variables (cont.)
• A binomial random variable is often used as a model of
expected Monte Carlo performance.
• What is the PDF for an N = 3 Monte Carlo?
xi
fX[xi] = P(X = xi)
0
1
2
3
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-19
Binomial Random Variables (cont.)
• The density function (valid for x = 0,1,…,N) is given by
N x N x
N!
N x
f X [ x] p n
p x 1 p
( N x)! x!
x
nCr
which is known as the binomial density function.
–
–
–
–
N is the number of trials,
x is the number of successful trials,
p is the probability an individual trial is successful,
All trials are independent, that is, one trial’s outcome does
not affect any others.
• This type of experiment is called a Bernoulli trial.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-20
Binomial Random Variables (cont.)
N = 25, p = 0.8
0.2
0.18
0.16
0.14
X
f (x)
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
x
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-21
Binomial Random Variables (cont.)
N = 25, p = 0.96
0.4
0.35
0.3
X
f (x)
0.25
0.2
0.15
0.1
0.05
0
0
5
15
10
20
25
x
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-22
Geometric Random Variables
• Consider a Bernoulli trial. How many trials are required until
a successful outcome is achieved?
• That is, the PDF for a geometric random variable is
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-23
Negative Binomial Random Variables
• You are negotiating a contract exit criterion with a customer
– The product is a munition costing $4 million per unit.
– The customer doesn’t pay for any munitions until the exit
criterion is reached.
– The proposed exit criterion is to test (shoot) munitions
until 10 are successful (not necessarily 10 in a row).
• Two scenarios are available
– Go with the product as is: 90% are good.
– Improve the product manufacturing. The estimated cost is
$2 million, and the result is 95% will be good.
• Should the $2 million be invested?
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-24
Negative Binomial Random Variables (cont.)
•
The math on the costs is straight forward
Cost = ($4 million) (# of Missiles Fired)
The # of missiles fired is random, so really the cost is
random too. Thus, consider the average (statistical) cost
Ave Cost = ($4 million) (10 P(10 Missiles Fired) +
11 P(11 Missiles Fired) + …)
•
The statistical question is how many missiles X must be fired
before r successes are achieved (r = 10 in this example).
•
X is said to have a negative binomial distribution. The
negative binomial distribution is a generalization of the
geometric distribution (r = 1).
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-25
Negative Binomial Random Variables (cont.)
• Assume __________________________________________.
• What does the PDF look like?
– P(9 Missiles Fired) =
– P(10 Missiles Fired) =
– P(11 Missiles Fired) =
– P(12 Missiles Fired) =
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-26
Negative Binomial Random Variables (cont.)
• The negative binomial PDF is
x 1
( x 1)!
x r r
f X [ x]
(1 p) xr p r
(1 p) p
( x r )! (r 1)!
r 1
• Average cost (in millions) to reach the exit criterion is
Ave Cost
4 xf X [ x]
x 10
x 1
4 x
(1 p ) x10 p10
x 10 9
10
4
p
• For p = 0.90 $44.4 million, for p = 0.95 $42.1million.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-27
Hypergeometric Random Variables
• You are in charge of selecting a vendor for the fuse in a new
missile design that expects to sell 10,000 or more units.
• The customer buys the missiles in lots of 100 and shoots 5
during lot acceptance testing.
– Each missile that fails is a lost sale of $250,000.
– If 3 or more missiles fail, the whole lot is rejected.
• Assuming the quality of the fuse is the dominant risk, which
of these vendors is the best choice?
– Vendor A: Fuses are $5k each and 3% are defective.
– Vendor B: Fuses are $4k each and 5% are defective.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-28
Hypergeometric Random Variables (cont.)
• The appropriate probability distribution is the hypergeometric
–
–
–
–
–
Set of N objects (N = 100 in the example).
K of the objects are classified “successful” (K = 97 or 95).
A sample of n objects is taken (n = 5).
The random variable is the number of successes, X.
The PDF of X is
K N K
K!
( N K )!
x n x ( K x)! x! ( N K n x)! (n x)!
f X [ x]
N!
N
( N n)! n!
n
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-29
Hypergeometric Random Variables (cont.)
Vendor A (3%)
Vendor B (5%)
Probability
Ave Cost (k$)
Probability
Ave Cost (k$)
n/a
5 100
n/a
4 100
5 Work
85.600%
0
76.959
0
4 Work
13.806%
34.516
21.142
52.857
3 Work
0.588%
2.938
1.838
9.192
2 Work
0.006%
1.546
0.059
14.827
1 Works
n/a
0
0.001
0.158
0 Work
n/a
0
0.000
0.000
TOTAL
n/a
539
n/a
477
Parts Cost
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-30
Discrete RVs - Statistical Average
• The average cost was 4 xf X [ x]
x 10
example.
in the inverse binomial
• The summation computes the average number of missiles that
had to be fired.
• From session one, recall the population mean given by
M
Ni M
1 N
1 M
xi or xi N i xi
xi f i
N i1
N i1
N i1
i 1
• Thus, we now have a third definition of population mean
using the PDF
M
xi f X [ xi ]
i 1
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-31
Discrete RVs - Statistical Average (cont.)
• The population variance can be similarly defined
M
M
2 xi f X [xi ] xi 2 f X [xi ] 2
2
i 1
i 1
• Do you see the pattern?
E{ X } and 2 E{( X ) 2 }
where
M
E{g ( X )} g ( xi ) f X [ xi ]
i 1
• The operator E{} is the statistical average of g(X).
• The statistical average E{} is more commonly known as the
expected value.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-32
Discrete RVs - Statistical Average (cont.)
• The expected value of a constant is itself, i.e., E{c} = c.
• Expected value is a linear operator, that is, the principle of
superposition applies:
Ec1 g1 ( X ) c2 g 2 ( X ) c1Eg1 ( X ) c2 Eg 2 ( X )
• For example
x2 E( X ) 2
E{ X 2 2 X 2 }
E{ X 2 } 2 E{ X } 2
E{ X 2 } 2
2
xi f X [ xi ] xi f X [ xi ]
i 1
i1
M
EMIS 7300 Spring 2006
M
Copyright 2002 - 2006 Dr. John Lipp
2
S2P3-33
Discrete RVs – Mean Examples
• Mean of a die,
= 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6)
= 21 / 6 = 7/2 = 3.5
• Mean of a 3 run Monte Carlo simulation with Ph = 80%,
= 0 P(0 hits) + 1 P(1 hit) + 2 P(2 hits) + 3 P(3 hits)
= 0(0.2)3 + 1(0.2)2(0.8)(3) + 2(0.2)(0.8)2(3) + 3(0.8)3
= 0 + 0.096 + 0.768 + 1.536
= 2.4
Compare with the formula = Np.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-34
Discrete RVs - Variance Examples
• Die variance,
2 = (12 + 22 + 32 + 42 + 52 + 62)(1/6) - (7/2)2
= (91 / 6) - (49/4) = 35/12 2.92
• Variance of a 3 run Monte Carlo simulation with Ph = 80%,
2 = 0(0.008) + 1(0.096) + 22(0.384) + 32(0.512) - 2.42
= 0 + 0.096 + 1.536 + 4.608 - 5.760
= 0.48
Compare this with the formula 2 = Np(1-p).
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-35
Bernoulli Trial Revisited
• The mean of a simple binomial RV is
2
xi f X [ xi ] 0(1 p) 1( p) p
i 1
and the variance is
2
xi2 f X [ xi ] x2 0 2 (1 p) 12 ( p) p 2 p(1 p)
2
i 1
• The mean of a Bernoulli Trial is = Np.
• The variance of a Bernoulli Trial is 2 = Np(1-p).
• How do you show
N
N
N!
N x
xi f X [ xi ]
xi p xi 1 p i Np
i 1
i 1 ( N xi )! xi !
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-36
Bernoulli Trial Revisited (cont.)
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-37
Poisson Random Variable
• Imagine a painted piece of sheet metal divided up into
“very,very, very small” regions and the following hold true
– There is at most one defect in the paint of each region,
– The probability of a defect in a particular region is
proportional to the size (in this case, area) of the region,
– The regions are mutually statistically independent.
• Then the total number of defects in the paint is
X B1 B2 B Bi
i 1
• Let the average number of flaws, E{X} = .
• Then X is a Poisson random variable.
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-38
Poisson PDF
e x
• The PDF of a Poisson random variable is f X [ x]
where
x!
x 0.
• Poisson random variables have the interesting property that
the mean and variance are equal, = 2 = .
• Examples:
EMIS 7300 Spring 2006
Copyright 2002 - 2006 Dr. John Lipp
S2P3-39