Transcript Slide 1
STATISTICAL MODELS IN
SIMULATION
How is probability and variation related to
modelling of system performance?
Random variable
– any system variable, X, that can
take different values
- continuous or discrete
i.e. X has a range of values
17/07/2015
ENGN8101 Modelling and Optimization
1
How are the values within this range distributed?
- Described using
Probability Functions
Probability functions – used to define probabilities of events
associated with a random variable
Often mathematical functions, or graphical in nature
17/07/2015
ENGN8101 Modelling and Optimization
2
Discrete variables
X – described by a probability mass function P(x) – not P(X)
why?
x X
So P(x) = P(X = x)
Cumulative distribution function:
F(x) = P(X≤x)
F(x) = step function
bounded by 0 and 1
17/07/2015
ENGN8101 Modelling and Optimization
3
Continuous variables
Here – need a probability density function (pdf)
pdf = P(x ≤ X ≤ x + dx)/dx
e.g.
F(x) = P(X ≤ x)
x
f ( x)dx
More generally:
17/07/2015
f ( x)dx 1
ENGN8101 Modelling and Optimization
4
Probability Density Function
17/07/2015
ENGN8101 Modelling and Optimization
5
Descriptive parameters
Mean value = Expected value
Discrete:
E ( X ) xi P( xi )
all i
Continuous:
E ( X ) x f ( x)dx
Variance:
Discrete:
Continuous
E[( X ) 2 ]
2
2
(
x
)
p
(
x
)
i
i
all xi
E[( X ) 2 ] ( xi ) 2 f ( x)dx 2
17/07/2015
ENGN8101 Modelling and Optimization
6
Median (xm)
Mode:
P(X>xm) = 0.50
occurs where the density function has its peak
Skewness:
θ = E([X-μ]3)/σ3
What would the 4th moment describe?
CASE STUDY 8
The number of signals arriving at a satellite monitoring system in any hour is defined
by a random variable X. The probability mass function of X is believed to be
c
x 2 1
p ( x) 0
p ( x)
for x 0,1, 2, 3 and 4
elsewhere
a) compute the constant c
b) compute the mean and standard deviation of X
c) compute the probability that the number of signals detected by the system in
any hour is less than or equal to 2
17/07/2015
ENGN8101 Modelling and Optimization
7
a) For all x, P(x) = 1
So
1
1
1
1
1
c
1
0 1 1 1 4 1 9 1 16 1
Giving c = 0.538
b) Mean =
1
1
1
1
1
xi P( xi ) c x1 2
x2 2
x3 2
x4 2
x5 2
x2 1
x3 1
x4 1
x5 1
all xi
x1 1
= 0.772
Similarly – variance = 1.094 giving σ = 1.05
c) P(X≤2) desired = P(X=1)+ P(X=2)+ P(X=3)
= 0.538(1 +0.5 + 0.2) = 0.915
17/07/2015
ENGN8101 Modelling and Optimization
8
Common probabilistic functions
A random variable – often takes values that follow a
probability ‘trend’
If they follow a numerical pattern – can be modelled easily
using distribution functions
DISCRETE
CONTINUOUS
Hypergeometric
Gaussian
Binomial
Exponential
Poisson
Weibull
17/07/2015
ENGN8101 Modelling and Optimization
9
HYPERGEOMTRIC
Applicable to sampling a population without subsequent
replacement of sample
For D non-conforming samples in a population N,
The probability of getting x non-conforming items in a sample
of size n is
D N D
x n x
D
D!
P( x)
where means
x
x!( D x)!
N
n
nD
nD D N n
Mean
Variance
1
N
N N N 1
17/07/2015
ENGN8101 Modelling and Optimization
10
CASE STUDY 9
A batch of 20 transistors is known to contain 5 non-conforming ones. If an inspector
randomly samples 4 items, find the probability of picking out 3 non-conforming ones
Here, N = 20,
D = 5,
n = 4,
x=3
Probability of 3 non-conformers:
5 15
3 1
P(3)
0.031
20
4
17/07/2015
ENGN8101 Modelling and Optimization
11
BINOMIAL
Series of independent trials – each trial gives ‘yes’ or ‘no’
Probability of success = p = constant for any trial
Probability of x successes in n trials:
n 3
P( x) p (1 p) n x
x
Mean = np, variance = np(1-p)
Uses:
Sampling without replacement from large populations, or
Sampling with replacement from small populations
As N →∞
17/07/2015
Hypergeometric → Binomial
ENGN8101 Modelling and Optimization
12
CASE STUDY 10
A signal filtering device is known to be 95% successful. If a random sample of 5
filtered signals is chosen, find the probability that 2 of them have not been correctly
processed.
Here, n = 5,
x = 2 and p = 0.05
(if success is defined as finding a dud filtered signal)
5
P( X 2) (0.05) 2 (0.95)3 0.021
2
Additionally, mean and variance of the distribution:
np 5(0.05) 0.25 and 2 np(1 p) 5(0.05)(0.95) 0.2375
17/07/2015
ENGN8101 Modelling and Optimization
13
POISSON RANDOM VARIABLE
Named after Simeon D. Poisson (1781-1840)
Originated as an approximation to binomial
Used extensively in stochastic modeling
Examples include:
Number of phone calls received, number of messages
arriving at a sending node, number of radioactive
disintegration, number of misprints found a printed
page, number of defects found on sheet of processed
metal, number of blood cells counts, etc.
17/07/2015
ENGN8101 Modelling and Optimization
14
17/07/2015
ENGN8101 Modelling and Optimization
15
POISSON
Models the number of occurrences of an event over time
or space, or volume…
Events = random and independent
Uses: number of non-conformities in a product
number of machine breakdowns per month
x
λ = average no.
e
here
of events over
p ( x)
specified time
x!
period
mean = variance = λ
If n → ∞ and p → 0, then Poisson → binomial
17/07/2015
ENGN8101 Modelling and Optimization
16
CASE STUDY 11
It is estimated that the average number of surface defects in 20m2 of paper produced
by a process is 3. What is the probability of finding no more than 2 defects in 40m2 of
paper through random selection?
Here, 1 unit is now 40m2
so λ is now 6
We need: P(X≤2) = P(X=0) + P(X=1) + P(X=2)
6
0
6
0
6
2
e 6 e 6 e 6
0.062
0!
1!
2!
The mean and variance of this distribution are both 6
17/07/2015
ENGN8101 Modelling and Optimization
17
GAUSSIAN (normal)
The most important continuous distribution in
probability and statistics
The story of the outcome of normal is really the story
of the development of statistics as a science.
Gauss discovered this while incorporating the method
of least squares for reducing the errors in fitting
curves for astronomical observations.
17/07/2015
ENGN8101 Modelling and Optimization
18
GAUSSIAN (normal)
Most widely used distribution for continuous random variables
- “Natures’ Distribution” –
For a population mean = μ and variance = σ2
Probability density function for x =
( x )2
f ( x)
exp
x
2
2 2
2
1
i.e. open-ended
bell curve
17/07/2015
ENGN8101 Modelling and Optimization
19
Graphs of various normal PDF
17/07/2015
ENGN8101 Modelling and Optimization
20
17/07/2015
ENGN8101 Modelling and Optimization
21
Often standardized such that σ2 = 1 and μ =0
Here - Z = standardized random variable and
z2
1
f ( z)
exp
x
2
2
Alternatively
1
( z) P(Z z)
2
Note
and
17/07/2015
1 2
exp 2 z dz
z
(0) 0.5
( z ) 1 ( z )
ENGN8101 Modelling and Optimization
due to symmetry
22
Impossible to document every σ/μ
combination
standardization required for easy
tabulation and reference
Tabulated values –
give areas under the curve –
hence – probabilities
z
X
17/07/2015
ENGN8101 Modelling and Optimization
23
CASE STUDY 12
The length of a machined part is known to have a normal distribution with a mean of
100mm and a standard deviation of 2mm
a) What proportion of the parts can be expected to be over 103.3mm in length?
b) What proportion will be between 98.5mm and 102.0mm?
a)
X 1 103 .3 100
z1
1.65
2
P(X>103.3) = P(X>1.65)
From tables: P(z≤1.65) = 0.9505
So P(z>1.65) = 1-0.9505 = 0.0495
4.95% of the parts will be above 103.3mm
17/07/2015
ENGN8101 Modelling and Optimization
24
b)
We need
P(98.5≤X≤102.0)
now
102 .0 100
z1
1.00
2
98.5 100
z2
0.75
2
From tables: 1.00 → 0.8413 and -0.75 → 0.2266
answer = 0.8413 – 0.2266 = 0.6142
61.47% of the output lies in
the specified range
17/07/2015
ENGN8101 Modelling and Optimization
25
EXPONENTIAL
Main use – reliability analysis
e.g. time to failure of a system entity
x
Here
λ = failure rate
f ( x) e
pdf:
i.e. failure most likely at t=0 (switching on)
Mean = 1/λ
17/07/2015
variance = 1/λ2
ENGN8101 Modelling and Optimization
26
Most important facet – memoryless distribution
i.e. no reliance on what has occurred before
e.g. Markov chains in simulation and modelling
- also memoryless
- define probability of state change
CASE STUDY 13
It is known that the battery for a video game has an average life of 500 hours. The
failures of batteries are known to be random, independent and exponentially
distributed.
What is the probability of a battery failing within 200 hours?
Solution – very simple: failure rate = 1/500
P(X≤200) = 1 – e-(1/500)200 = 1 – e-0.04 = 0.330
17/07/2015
ENGN8101 Modelling and Optimization
27
WEIBULL
Main use – reliability and failure analysis
x y
f ( x)
1
γ = location parameter
β = shape parameter
x y
exp
α = scale parameter
Very generic!
17/07/2015
ENGN8101 Modelling and Optimization
28
Joint PDF
So far we saw one random variable at a time.
However, in practice, we often see situations where
more than one variable at a time need to be studied.
For example, tensile strength (X) and diameter(Y) of
a beam are of interest.
Diameter (X) and thickness(Y) of an injection-molded
disk are of interest.
17/07/2015
ENGN8101 Modelling and Optimization
29
Joint PDF (Cont’d)
X and Y are continuous
f(x,y) dx dy = P( x < X < x+dx, y < Y < y+dy) is
the probability that the random variables X will
take values in (x, x+dx) and Y will take values in
(y,y+dy).
f(x,y) > 0 for all x and y and f ( x, y) dx dy 1
P(a X b, c Y d )
b
a
17/07/2015
d
c
ENGN8101 Modelling and Optimization
f ( x, y) dx dy
30
17/07/2015
ENGN8101 Modelling and Optimization
31
17/07/2015
ENGN8101 Modelling and Optimization
32
17/07/2015
ENGN8101 Modelling and Optimization
33
Measures of Joint PDF
17/07/2015
ENGN8101 Modelling and Optimization
34