S2P3 - Lyle School of Engineering

Download Report

Transcript S2P3 - Lyle School of Engineering

EMIS 7300
SYSTEMS ANALYSIS METHODS
Spring 2006
Dr. John Lipp
Copyright © 2002 - 2006 John Lipp
Today’s Topics
• Discrete Random Variables
–
–
–
–
–
–
–
–
–
–
–
Probability Density Functions (PDFs).
Cumulative Density Functions (CDFs).
Discrete Uniform Random Variables.
Binomial Random Variables.
Geometric Random Variables.
Inverse Binomial Random Variables.
Hypergeometric Random Variables.
Statistical Average / Expected value.
Mean and Variance Examples.
Poisson Random Variables.
Bivariate Random Variables.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-2
Frequency, Relative Frequency, and the Histogram
• A common way to show the distribution of data values is by
tabulating the number of occurrences within data sub-ranges.
– The data sub-ranges are commonly referred to as bins.
– The number of data occurrences within a particular bin is
referred to as the frequency.
– Frequency vs. bin is known as a frequency distribution.
When plotted, often as a bar chart, the result is called a
histogram.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-3
Frequency, Relative Frequency, and the Histogram (cont.)
• Consider the n = 3 weight sample mean
Bin Sub-Range
EMIS 7300 Spring 2006
Bin
“Label”
Frequency
Cumulative
Frequency
Copyright  2002 - 2006 Dr. John Lipp
Relative
Frequency
Cumulative
Relative
Frequency
S2P3-4
Sample Statistics (cont.)
98
EMIS 7300 Spring 2006
99
100
Copyright  2002 - 2006 Dr. John Lipp
101
102
S2P3-5
Frequency, Relative Frequency, and the Histogram (cont.)
• Histogram procedural suggestions
– A good rule of thumb for selecting the number of bins is to
use an integer close to the square root of the data set size.
– Bins should be sized so that at least 80% contain 5 or more
counts.
Combine the outside bins together in preference to the
inside bins.
Sometimes statistical outliers may be excluded from
the outer bins, such as when they seem to distort the
results.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-6
Frequency, Relative Frequency, and the Histogram (cont.)
• Histogram procedural suggestions
– A good choice for the (center) bin of the histogram is near
the mean, median, or mode.
– Use sigma, range, or quartiles for guide as to bin size.
For example, if the number of data points is 30, that
suggest using around 6 bins.
Choose bin 4 to be the mean, and the other bins to be
each one standard deviation away.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-7
Random Variables
“When you can express something in numbers,
you know something about it.”
- Albert Einstein
• Engineering is primarily concerned with analyzing numbers,
not events. This is facilitated by defining functions that map
outcomes (and events) from a sample space to the real number
line.
• Such functions are referred to as random variables. That is
even though they are not really variables.
• A random variable can then be operated on with appropriate
extensions to algebra and calculus.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-8
Discrete Random Variables
• When the elementary outcomes of random experiment are
numbers, the result is a discrete random variable.
A1
A2
A3
A4
A5
A6
1
2
3
4
5
6
– The numeric outcomes are typically, but not required to be,
integers.
– The number of elementary outcomes can be infinite.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-9
Discrete Random Variables (cont.)
• Denote a random variable as X and the possible elementary
outcomes as xi .
– The sample space S = {x1, x2, …, xM}.
– The probability of elementary outcomes is known as the
discrete probability density function (PDF) of X .
– The PDF is written as a function fX[x] = P(X = xi).
– The PDF is also known as the probability mass function.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-10
Discrete Random Variables (cont.)
Examples:
• Die (6-sided): fX[x] = 1/6 for x is one of {1,2,3,4,5,6}.
• Cards: fX[x] = 1/52 for x is one of {1..52}.
• Photons counted by an optical detector, X  {0,1,2,…}.
• Weapon system computer simulation Monte Carlo score, i.e.,
the number of hits, X  {0,1,2,…,N}.
• Digital volt meter measurement. The range for X is the
maximum scale equally divided by the number of counts
(typically 200, 2000, or 4000).
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-11
Discrete Random Variables (cont.)
• Probability Axiom 1:
M
P(S) = 1 
 f X [ xi ]  1
i 1
• Probability Axiom 2:
0  P(E)  1  0  fX[x]  1
• Probability Axiom 3:
P X  xi   X  x j   P( X  xi )  P( X  x j )  f X xi   f X x j 
xi and xj are disjoint
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-12
Discrete Random Variables (cont.)
• Since all elementary outcomes are disjoint,
Pxi  X  x j   f x [ xi ]    f x [ x j ]   f x [ xk ]
j
k i
The Summation symbol
is short hand…
• The P(X  xi) is known as the cumulative distribution function
(CDF) of X and is denoted FX[x],
i
FX [ xi ]   f x [ x j ]
j 1
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-13
Discrete Random Variables (cont.)
• The CDF is bounded, 0  FX[x]  1,
– FX[-] = P(X  -) = 0.
– FX[+] = P(X  +) = 1.
• The CDF is monotone increasing, FX[x]  FX[y] when x  y.
• A discrete CDF is right-continuous and step-wise linear.
PDF
CDF
1
5/6
2/3
1/2
1/3
1/6
1
5/6
2/3
1/2
1/3
1/6
1
2
3
4
5
6
1
2
3
4
5
6
“Right Continuous”
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-14
Discrete Uniform PDF
• The simplest discrete distribution is a discrete uniform
distribution
– The random variable X can take on one of N values in the
ordered set {x1, x2, x3, …, xN}.
– PDF: fX[xi] = P(X = xi) = 1/N for any {x1, x2, x3, …, xN}.
– CDF: FX[xi] = P(X  xi) = i/N.
– Note that X’s values don’t have to be uniformly spaced,
only the probability uniformly distributed.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-15
Binomial Random Variables
• A computer Monte Carlo simulation is a frequently used tool
to assess system performance such as Ph.
– The simulation is randomly initialized and executed N
times (or replicates).
– At the end of each run a performance metric, e.g., miss
distance, is compared to a requirement.
– A run that passes the requirement is a “hit” and scores 1, a
miss scores 0.
– The result of all the runs results in X out of N possible
successes, e.g., 11 out of 25.
– The result is often written as a percentage, e.g., 55% Ph.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-16
Binomial Random Variables (cont.)
• Now imagine this scenario (or is it a true story?)
– You work in a missile program that routinely measures
performance with a set of about 50 Monte Carlos.
– Each Monte Carlo contains 25 runs.
– Your customer just discovered the cost benefits of LINUX
and has a SETA contractor porting the simulation to that
environment.
– The SETA contractor is very excited! One of the Monte
Carlos has changed significantly – from 24 hits out of 25
down to 16 hits out of 25 – a 40% reduction!!!
– Should you be excited as well?
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-17
Binomial Random Variables (cont.)
• Denote each run of the Monte Carlo as Bi, a discrete random
variable with boolean outcomes of {0,1}.
• Traditionally, a 1 represents “success” and a 0 represents
“failure,” but that is not required.
• Assign P(Bi = 1) = p, 0  p  1.
– p is the probability of success.
– Consequently, P(Bi = 0) = n = 1- p.
• Let X be the sum of N independent, boolean random variables
with identical success rates p,
N
X  B1  B2    BN   Bi
i 1
X is called a binomial random variable.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-18
Binomial Random Variables (cont.)
• A binomial random variable is often used as a model of
expected Monte Carlo performance.
• What is the PDF for an N = 3 Monte Carlo?
xi
fX[xi] = P(X = xi)
0
1
2
3
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-19
Binomial Random Variables (cont.)
• The density function (valid for x = 0,1,…,N) is given by
 N  x N x
N!
N x
f X [ x]    p n

p x 1  p 
( N  x)! x!
 x
nCr
which is known as the binomial density function.
–
–
–
–
N is the number of trials,
x is the number of successful trials,
p is the probability an individual trial is successful,
All trials are independent, that is, one trial’s outcome does
not affect any others.
• This type of experiment is called a Bernoulli trial.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-20
Binomial Random Variables (cont.)
N = 25, p = 0.8
0.2
0.18
0.16
0.14
X
f (x)
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
x
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-21
Binomial Random Variables (cont.)
N = 25, p = 0.96
0.4
0.35
0.3
X
f (x)
0.25
0.2
0.15
0.1
0.05
0
0
5
15
10
20
25
x
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-22
Geometric Random Variables
• Consider a Bernoulli trial. How many trials are required until
a successful outcome is achieved?
• That is, the PDF for a geometric random variable is
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-23
Negative Binomial Random Variables
• You are negotiating a contract exit criterion with a customer
– The product is a munition costing $4 million per unit.
– The customer doesn’t pay for any munitions until the exit
criterion is reached.
– The proposed exit criterion is to test (shoot) munitions
until 10 are successful (not necessarily 10 in a row).
• Two scenarios are available
– Go with the product as is: 90% are good.
– Improve the product manufacturing. The estimated cost is
$2 million, and the result is 95% will be good.
• Should the $2 million be invested?
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-24
Negative Binomial Random Variables (cont.)
•
The math on the costs is straight forward
Cost = ($4 million)  (# of Missiles Fired)
The # of missiles fired is random, so really the cost is
random too. Thus, consider the average (statistical) cost
Ave Cost = ($4 million)  (10 P(10 Missiles Fired) +
11 P(11 Missiles Fired) + …)
•
The statistical question is how many missiles X must be fired
before r successes are achieved (r = 10 in this example).
•
X is said to have a negative binomial distribution. The
negative binomial distribution is a generalization of the
geometric distribution (r = 1).
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-25
Negative Binomial Random Variables (cont.)
• Assume __________________________________________.
• What does the PDF look like?
– P(9 Missiles Fired) =
– P(10 Missiles Fired) =
– P(11 Missiles Fired) =
– P(12 Missiles Fired) =
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-26
Negative Binomial Random Variables (cont.)
• The negative binomial PDF is
 x  1
( x  1)!
x r r
f X [ x]  
(1  p) xr p r
(1  p) p 
( x  r )! (r  1)!
 r  1
• Average cost (in millions) to reach the exit criterion is

Ave Cost
 4  xf X [ x]
x 10

 x  1
 4  x
(1  p ) x10 p10
x 10  9 
 10 
 4 
 p
• For p = 0.90  $44.4 million, for p = 0.95  $42.1million.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-27
Hypergeometric Random Variables
• You are in charge of selecting a vendor for the fuse in a new
missile design that expects to sell 10,000 or more units.
• The customer buys the missiles in lots of 100 and shoots 5
during lot acceptance testing.
– Each missile that fails is a lost sale of $250,000.
– If 3 or more missiles fail, the whole lot is rejected.
• Assuming the quality of the fuse is the dominant risk, which
of these vendors is the best choice?
– Vendor A: Fuses are $5k each and 3% are defective.
– Vendor B: Fuses are $4k each and 5% are defective.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-28
Hypergeometric Random Variables (cont.)
• The appropriate probability distribution is the hypergeometric
–
–
–
–
–
Set of N objects (N = 100 in the example).
K of the objects are classified “successful” (K = 97 or 95).
A sample of n objects is taken (n = 5).
The random variable is the number of successes, X.
The PDF of X is
 K  N  K 
K!
( N  K )!
 

x  n  x  ( K  x)! x! ( N  K  n  x)! (n  x)!

f X [ x] 

N!
N
 
( N  n)! n!
n
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-29
Hypergeometric Random Variables (cont.)
Vendor A (3%)
Vendor B (5%)
Probability
Ave Cost (k$)
Probability
Ave Cost (k$)
n/a
5  100
n/a
4  100
5 Work
85.600%
0
76.959
0
4 Work
13.806%
34.516
21.142
52.857
3 Work
0.588%
2.938
1.838
9.192
2 Work
0.006%
1.546
0.059
14.827
1 Works
n/a
0
0.001
0.158
0 Work
n/a
0
0.000
0.000
TOTAL
n/a
539
n/a
477
Parts Cost
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-30
Discrete RVs - Statistical Average

• The average cost was 4  xf X [ x]
x 10
example.
in the inverse binomial
• The summation computes the average number of missiles that
had to be fired.
• From session one, recall the population mean given by
M
Ni M
1 N
1 M
   xi or    xi N i  xi
 xi f i
N i1
N i1
N i1
i 1
• Thus, we now have a third definition of population mean
using the PDF
M
   xi f X [ xi ]
i 1
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-31
Discrete RVs - Statistical Average (cont.)
• The population variance can be similarly defined
M
M
 2    xi    f X [xi ]   xi 2 f X [xi ]   2
2
i 1
i 1
• Do you see the pattern?
  E{ X } and  2  E{( X   ) 2 }
where
M
E{g ( X )}   g ( xi ) f X [ xi ]
i 1
• The operator E{} is the statistical average of g(X).
• The statistical average E{} is more commonly known as the
expected value.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-32
Discrete RVs - Statistical Average (cont.)
• The expected value of a constant is itself, i.e., E{c} = c.
• Expected value is a linear operator, that is, the principle of
superposition applies:
Ec1 g1 ( X )  c2 g 2 ( X )  c1Eg1 ( X )  c2 Eg 2 ( X )
• For example
 x2  E( X   ) 2 
 E{ X 2  2 X   2 }
 E{ X 2 }  2 E{ X }   2
 E{ X 2 }   2

 

2
   xi f X [ xi ]    xi f X [ xi ]
 i 1
  i1

M
EMIS 7300 Spring 2006
M
Copyright  2002 - 2006 Dr. John Lipp
2
S2P3-33
Discrete RVs – Mean Examples
• Mean of a die,
 = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6)
= 21 / 6 = 7/2 = 3.5
• Mean of a 3 run Monte Carlo simulation with Ph = 80%,
 = 0 P(0 hits) + 1 P(1 hit) + 2 P(2 hits) + 3 P(3 hits)
= 0(0.2)3 + 1(0.2)2(0.8)(3) + 2(0.2)(0.8)2(3) + 3(0.8)3
= 0 + 0.096 + 0.768 + 1.536
= 2.4
Compare with the formula  = Np.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-34
Discrete RVs - Variance Examples
• Die variance,
2 = (12 + 22 + 32 + 42 + 52 + 62)(1/6) - (7/2)2
= (91 / 6) - (49/4) = 35/12  2.92
• Variance of a 3 run Monte Carlo simulation with Ph = 80%,
2 = 0(0.008) + 1(0.096) + 22(0.384) + 32(0.512) - 2.42
= 0 + 0.096 + 1.536 + 4.608 - 5.760
= 0.48
Compare this with the formula  2 = Np(1-p).
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-35
Bernoulli Trial Revisited
• The mean of a simple binomial RV is
2
   xi f X [ xi ]  0(1  p)  1( p)  p
i 1
and the variance is
2
   xi2 f X [ xi ]   x2  0 2 (1  p)  12 ( p)  p 2  p(1  p)
2
i 1
• The mean of a Bernoulli Trial is  = Np.
• The variance of a Bernoulli Trial is  2 = Np(1-p).
• How do you show
N
N
N!
N x
   xi f X [ xi ]  
xi p xi 1  p  i  Np
i 1
i 1 ( N  xi )! xi !
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-36
Bernoulli Trial Revisited (cont.)
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-37
Poisson Random Variable
• Imagine a painted piece of sheet metal divided up into
“very,very, very small” regions and the following hold true
– There is at most one defect in the paint of each region,
– The probability of a defect in a particular region is
proportional to the size (in this case, area) of the region,
– The regions are mutually statistically independent.
• Then the total number of defects in the paint is

X  B1  B2    B   Bi
i 1
• Let the average number of flaws, E{X} = .
• Then X is a Poisson random variable.
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-38
Poisson PDF
e   x
• The PDF of a Poisson random variable is f X [ x] 
where
x!
x  0.
• Poisson random variables have the interesting property that
the mean and variance are equal,  =  2 = .
• Examples:
EMIS 7300 Spring 2006
Copyright  2002 - 2006 Dr. John Lipp
S2P3-39