Transcript Slide 1

STATISTICAL MODELS IN
SIMULATION
How is probability and variation related to
modelling of system performance?
Random variable
– any system variable, X, that can
take different values
- continuous or discrete
i.e. X has a range of values
17/07/2015
ENGN8101 Modelling and Optimization
1
How are the values within this range distributed?
- Described using
Probability Functions
Probability functions – used to define probabilities of events
associated with a random variable
Often mathematical functions, or graphical in nature
17/07/2015
ENGN8101 Modelling and Optimization
2
Discrete variables
X – described by a probability mass function P(x) – not P(X)
why?
x X
So P(x) = P(X = x)
Cumulative distribution function:
F(x) = P(X≤x)
F(x) = step function
bounded by 0 and 1
17/07/2015
ENGN8101 Modelling and Optimization
3
Continuous variables
Here – need a probability density function (pdf)
pdf = P(x ≤ X ≤ x + dx)/dx
e.g.
F(x) = P(X ≤ x)
x
  f ( x)dx

More generally:


17/07/2015
 f ( x)dx  1

ENGN8101 Modelling and Optimization
4
Probability Density Function
17/07/2015
ENGN8101 Modelling and Optimization
5
Descriptive parameters
Mean value = Expected value
Discrete:
  E ( X )   xi P( xi )
all i

Continuous:
  E ( X )   x f ( x)dx

Variance:
Discrete:
Continuous
E[( X   ) 2 ] 
2
2
(
x


)
p
(
x
)


 i
i
all xi

E[( X   ) 2 ]   ( xi   ) 2 f ( x)dx   2

17/07/2015
ENGN8101 Modelling and Optimization
6
Median (xm)
Mode:
P(X>xm) = 0.50
occurs where the density function has its peak
Skewness:
θ = E([X-μ]3)/σ3
What would the 4th moment describe?
CASE STUDY 8
The number of signals arriving at a satellite monitoring system in any hour is defined
by a random variable X. The probability mass function of X is believed to be
c
x 2 1
p ( x)  0
p ( x) 
for x  0,1, 2, 3 and 4
elsewhere
a) compute the constant c
b) compute the mean and standard deviation of X
c) compute the probability that the number of signals detected by the system in
any hour is less than or equal to 2
17/07/2015
ENGN8101 Modelling and Optimization
7
a) For all x, P(x) = 1
So
1
1
1
1 
 1
c




1

 0  1 1  1 4  1 9  1 16  1
Giving c = 0.538
b) Mean =

1
1
1
1
1 
xi P( xi )  c  x1 2
 x2 2
 x3 2
 x4 2
 x5 2 

x2  1
x3  1
x4  1
x5  1 
all xi
 x1  1
= 0.772
Similarly – variance = 1.094 giving σ = 1.05
c) P(X≤2) desired = P(X=1)+ P(X=2)+ P(X=3)
= 0.538(1 +0.5 + 0.2) = 0.915
17/07/2015
ENGN8101 Modelling and Optimization
8
Common probabilistic functions
A random variable – often takes values that follow a
probability ‘trend’
If they follow a numerical pattern – can be modelled easily
using distribution functions
DISCRETE
CONTINUOUS
Hypergeometric
Gaussian
Binomial
Exponential
Poisson
Weibull
17/07/2015
ENGN8101 Modelling and Optimization
9
HYPERGEOMTRIC
Applicable to sampling a population without subsequent
replacement of sample
For D non-conforming samples in a population N,
The probability of getting x non-conforming items in a sample
of size n is
 D  N  D 
 

x  n  x 
 D
D!

P( x) 
where   means
x
x!( D  x)!
N

 
n
nD
nD  D  N  n 
Mean 
Variance
1  

N
N  N  N  1 
17/07/2015
ENGN8101 Modelling and Optimization
10
CASE STUDY 9
A batch of 20 transistors is known to contain 5 non-conforming ones. If an inspector
randomly samples 4 items, find the probability of picking out 3 non-conforming ones
Here, N = 20,
D = 5,
n = 4,
x=3
Probability of 3 non-conformers:
 5 15
  
3  1 

P(3) 
 0.031
 20
 
4
17/07/2015
ENGN8101 Modelling and Optimization
11
BINOMIAL
Series of independent trials – each trial gives ‘yes’ or ‘no’
Probability of success = p = constant for any trial
Probability of x successes in n trials:
 n 3
P( x)    p (1  p) n x
 x
Mean = np, variance = np(1-p)
Uses:
Sampling without replacement from large populations, or
Sampling with replacement from small populations
As N →∞
17/07/2015
Hypergeometric → Binomial
ENGN8101 Modelling and Optimization
12
CASE STUDY 10
A signal filtering device is known to be 95% successful. If a random sample of 5
filtered signals is chosen, find the probability that 2 of them have not been correctly
processed.
Here, n = 5,
x = 2 and p = 0.05
(if success is defined as finding a dud filtered signal)
 5
P( X  2)   (0.05) 2 (0.95)3  0.021
 2
Additionally, mean and variance of the distribution:
  np  5(0.05)  0.25 and 2  np(1  p)  5(0.05)(0.95)  0.2375
17/07/2015
ENGN8101 Modelling and Optimization
13
POISSON RANDOM VARIABLE




Named after Simeon D. Poisson (1781-1840)
Originated as an approximation to binomial
Used extensively in stochastic modeling
Examples include:
 Number of phone calls received, number of messages
arriving at a sending node, number of radioactive
disintegration, number of misprints found a printed
page, number of defects found on sheet of processed
metal, number of blood cells counts, etc.
17/07/2015
ENGN8101 Modelling and Optimization
14
17/07/2015
ENGN8101 Modelling and Optimization
15
POISSON
Models the number of occurrences of an event over time
or space, or volume…
Events = random and independent
Uses: number of non-conformities in a product
number of machine breakdowns per month
 x
λ = average no.
e
here
of events over
p ( x) 
specified time
x!
period
mean = variance = λ

If n → ∞ and p → 0, then Poisson → binomial
17/07/2015
ENGN8101 Modelling and Optimization
16
CASE STUDY 11
It is estimated that the average number of surface defects in 20m2 of paper produced
by a process is 3. What is the probability of finding no more than 2 defects in 40m2 of
paper through random selection?
Here, 1 unit is now 40m2
so λ is now 6
We need: P(X≤2) = P(X=0) + P(X=1) + P(X=2)
6
0
6
0
6
2
e 6 e 6 e 6


 0.062
0!
1!
2!
The mean and variance of this distribution are both 6

17/07/2015
ENGN8101 Modelling and Optimization
17
GAUSSIAN (normal)



The most important continuous distribution in
probability and statistics
The story of the outcome of normal is really the story
of the development of statistics as a science.
Gauss discovered this while incorporating the method
of least squares for reducing the errors in fitting
curves for astronomical observations.
17/07/2015
ENGN8101 Modelling and Optimization
18
GAUSSIAN (normal)
Most widely used distribution for continuous random variables
- “Natures’ Distribution” –
For a population mean = μ and variance = σ2
Probability density function for x =
  ( x   )2 
f ( x) 
exp
  x  

2
2 2
 2

1
i.e. open-ended
bell curve
17/07/2015
ENGN8101 Modelling and Optimization
19
Graphs of various normal PDF
17/07/2015
ENGN8101 Modelling and Optimization
20
17/07/2015
ENGN8101 Modelling and Optimization
21
Often standardized such that σ2 = 1 and μ =0
Here - Z = standardized random variable and
 z2 
1
f ( z) 
exp
   x  
2
 2 
Alternatively
1
( z)  P(Z  z) 
2
Note
and
17/07/2015
 1 2
exp 2 z dz
z
(0)  0.5
( z )  1  ( z )
ENGN8101 Modelling and Optimization
due to symmetry
22
Impossible to document every σ/μ
combination
standardization required for easy
tabulation and reference
Tabulated values –
give areas under the curve –
hence – probabilities
z
X 
17/07/2015

ENGN8101 Modelling and Optimization
23
CASE STUDY 12
The length of a machined part is known to have a normal distribution with a mean of
100mm and a standard deviation of 2mm
a) What proportion of the parts can be expected to be over 103.3mm in length?
b) What proportion will be between 98.5mm and 102.0mm?
a)
X 1   103 .3  100
z1 

1.65

2
P(X>103.3) = P(X>1.65)
From tables: P(z≤1.65) = 0.9505
So P(z>1.65) = 1-0.9505 = 0.0495
4.95% of the parts will be above 103.3mm
17/07/2015
ENGN8101 Modelling and Optimization
24
b)
We need
P(98.5≤X≤102.0)
now
102 .0  100
z1 
1.00
2
98.5  100
z2 
  0.75
2
From tables: 1.00 → 0.8413 and -0.75 → 0.2266
answer = 0.8413 – 0.2266 = 0.6142
61.47% of the output lies in
the specified range
17/07/2015
ENGN8101 Modelling and Optimization
25
EXPONENTIAL
Main use – reliability analysis
e.g. time to failure of a system entity
 x
Here
λ = failure rate
f ( x)  e
pdf:
i.e. failure most likely at t=0 (switching on)
Mean = 1/λ
17/07/2015
variance = 1/λ2
ENGN8101 Modelling and Optimization
26
Most important facet – memoryless distribution
i.e. no reliance on what has occurred before
e.g. Markov chains in simulation and modelling
- also memoryless
- define probability of state change
CASE STUDY 13
It is known that the battery for a video game has an average life of 500 hours. The
failures of batteries are known to be random, independent and exponentially
distributed.
What is the probability of a battery failing within 200 hours?
Solution – very simple: failure rate = 1/500
P(X≤200) = 1 – e-(1/500)200 = 1 – e-0.04 = 0.330
17/07/2015
ENGN8101 Modelling and Optimization
27
WEIBULL
Main use – reliability and failure analysis
  x y
f ( x)  

  
 1
γ = location parameter
β = shape parameter
  x  y  
exp 
 
    
α = scale parameter
Very generic!
17/07/2015
ENGN8101 Modelling and Optimization
28
Joint PDF



So far we saw one random variable at a time.
However, in practice, we often see situations where
more than one variable at a time need to be studied.
For example, tensile strength (X) and diameter(Y) of
a beam are of interest.
Diameter (X) and thickness(Y) of an injection-molded
disk are of interest.
17/07/2015
ENGN8101 Modelling and Optimization
29
Joint PDF (Cont’d)
X and Y are continuous


f(x,y) dx dy = P( x < X < x+dx, y < Y < y+dy) is
the probability that the random variables X will
take values in (x, x+dx) and Y will take values in
(y,y+dy).


f(x,y) > 0 for all x and y and   f ( x, y) dx dy 1
P(a  X  b, c  Y  d )  
b
a
17/07/2015

d
c
ENGN8101 Modelling and Optimization
f ( x, y) dx dy
30
17/07/2015
ENGN8101 Modelling and Optimization
31
17/07/2015
ENGN8101 Modelling and Optimization
32
17/07/2015
ENGN8101 Modelling and Optimization
33
Measures of Joint PDF
17/07/2015
ENGN8101 Modelling and Optimization
34