Transcript Document

Probability Theory Refresher
K.S. Trivedi
A Tool for Modeling Random
Phenomena
• Random Phenomena in a Computer
Environment
–
–
–
–
Arrival of jobs.
Execution time of jobs.
Memory requirement of jobs.
Status of computer resources (failed, non-failed).
• How to Quantify Randomness?
– Use probabilistic models.
• How to Estimate these Quantifiers?
– Use statistical techniques.
Modeling Random Phenomena
Measurement
Data
Statistical
Analysis
Input
Parameters
PM
Outputs
Input Output
Validation
Components of a Probability Model
1. Sample Space (S): A set of all possible
observable “states” of a random
phenomena.
2. Set of Events (F): A set of all possible
events of interest.
3. Probability of Events (P): A consistent
description of the likelihood of observing
an event.
PM = (S, F, P).
Example
Random Phenomena: The outcome of the
test of condition in an if statement:
if B then T;
else E;
S = {T,E}
F = { 0, {E}, {T}, {T,E} }
P = { 0, ½, ½, 1}
Terminology and Definitions
 F : Union of two events
E and F: E F , EF : Intersection
E or F: E
Not E: Ē, S\E : Complement
n
E
i
i 1
n
E
i 1
i
 E1  E 2 ... E n
 E1  E 2 ... E n  E1E 2 ...E n
Terminology and Definitions
E and F are mutually exclusive if EF = 0
E1, E2 , …, En are exhaustive if
n
E
i 1
i
S
Structure of 
S 
E  E
E, F    E F, E F  
Structure of P
1. P(E) > 0 for all E Є .
2. P(S) = 1.
3. If E and F are mutually exclusive
P(E F) = P(E) + P(F).

Consequences
1. P(0) = 0
2. P(Ē) = 1 – P(E)
3. If E1, E2, …, En are mutually exclusive
n
n
i 1
i 1
P( E i )   P(E i )
4. P(E F) = P(E) + P(F) – P(EF)
n
n
5. P( E i )   P(E i )   P(E i E j )
i 1
i 1

i j
 P(E E E
i  j k
i
j
)  ... (1) P(E1E 2 ...E n )
n
k
The last one is based on the principle of inclusion and exclusion.
Conditional Probability
A, B Є 
P(A) > 0.
P(B|A) = Conditional Probability that event B
takes place given that event A has taken
place.
Conditional Probability
P(B|A) = P(B
 A)/P(A).
Note 1: If P(A)=0, P(B|A) can be defined to be
any number.
Note 2: No implicit concept of chronological order
of A and B.
Independence
A, B independent
* Assume P(A)≠0, P(B)≠0
 P(B|A) = P(B)
 P(A|B) = P(A)
 P(A  B) = P(A)P(B)
Pairwise Independence (P.I.)
A1, A2, …, An - pairwise independent
 Ai , Aj independent for i ≠ j
Mutual Independence (M.I.)
A1, A2, …, An mutually independent
 P=(Ai 1 Ai 2 … Ai k)=P(Ai 1) P(Ai 2)
…P(Ai k)
P.I. ≠> M.I.
for any k indices i1, i2, …, ik.
Series System
1
2
n
The system is operative if every component is operative.
Ai = Component i is operative.
A = The system is operative.
Assume: A1, …, An are mutually independent.
P(A) = P (A1  A2  … An)
= P (A1) P(A2) … P(An)
Note: P(A) < min {P(A1), …, P(An)}, i.e., the system is
weaker than its weakest link.
Parallel System
1
2
n
Parallel System
• The system is operative if any one component is
operative.
• Ai = Component i is operative.
• A = System is operative.
• A1, A2, …, An are mutually independent.
• P(Ā) = P(Ā1  Ā2 … Ān)
= P(Ā1) P(Ā2) … P(Ān)
= (1-P(A1)) (1-P(A2)) … (1-P(An))
n
• ∴P(A) = 1 –
 (1-P(A ))
i 1
i
Series – Parallel System
1
4
b
a
3
5
• The system operates if there is at least one path
of operating components from a to b.
Ri = P {Component i operates}
R = P {System operates}
= 1 – (1 – R1R4) (1 – R3R5)
Law of Total Probability
• B1, B2, …, Bn = mutually exclusive and
exhaustive events.
n
• P(A) =  P(A|Bi ) P(Bi )
i 1
Bayes’ Rule
• Given: P(Bi ), P(A|Bi ) i =1, 2, ..., n
• Want: P(Bi |A)
P(Bi |A) = P(A|Bi ) P(Bi )
n
 P(A|Bj ) P(Bj)
j 1
Example of Law of Total Probability
1
4
2
a
3
b
5
Example of Law of Total Probability
P(A) = P(A|A2) P(A2) + P(A|Ā2) P(Ā2)
P(A|A2) = 1- (1-R4)(1-R5)
P(A|Ā2) = 1- (1-R1R4)(1-R3R5)
P(A2) = R2, P(Ā2) = 1-R2
∴ P(A) = [1- (1-R4)(1-R5)] R2
+ [1- (1-R1R4)(1-R3R5)] (1-R2)
= 1- R2(1-R4)(1-R5)
- (1-R2)(1-R1R4)(1-R3R5)
Example of Bayes’ Rule
To
P(Ro|To)
Ro
P(Ro|T1)
P(R1|To)
T1
P(R1|T1)
R1
Channel Diagram of an Unreliable
Binary Transmission Channel.
To = o is transmitted. T1 = 1 is tr.
Ro = o is received.
R1 = 1 is re.
Example of Bayes’ Rule
Data:
P (To) = 0.45
P (Ro|To) = 0.94
P (R1|T1) = 0.91
Compute: P (Ro), P(R1)
P (T1|R1), P(To|Ro)
P (Error)
Solution
P(Ro) = P(Ro|To) P(To) + P(Ro|T1) P(T1) = 0.4725
P(R1) = 1- P(Ro) = 0.5275
P(T1|R1) = P(R1|T1) P(T1)
P(R1|To) P(To) + P(R1|T1) P(T1) = 0.9488
P(To|Ro) = 0.8952
P(error) = P(T1 Ro) + P(To R1)
= P(Ro|T1)P(T1) + P(R1|To)P(To)
= 0.0765
U
U
Discrete Random Variables
A random variable is a function which assigns
numerical value to each sample point.
Example: Execute an if statement.
S = {TT, TE, ET, EE}
X = Number of then clauses in two executions.
X(TT) = 2, X(TE) = X(ET) = 1, X(EE) = 0.
A random variable is called discrete if it can take only
countably infinite values.
Generally X would take values in
A ⊂ {…, -3, -2, -1, 0, 1, 2, 3, …}.
How to Describe a Random
Variable?
By giving its probability mass function.
þx (x) = P(X=x)
xЄA
Obviously:
þx (x) ≥ 0
∑ þx (x) = 0
þx (x)
1/6
xЄA
1
2
3
4
5
6
x
Cumulative Distribution Function
F(t) = P(x ≤ t) = ∑ þx (x)
xЄA:
(x ≤ t)
F(t)
1
5/6
4/6
3/6
2/6
1/6
t
Examples of Discrete R.V.
1. Bernoulli
A = {0, 1}
Parameter: þ. 0<þ<1
þx(0) = 1-þ = q
þx(1) = þ.
Examples of Discrete R.V.
2. Binomial
A = {0,1,…,n}
Parameters: n,p
þx(k) = nk þk(1-p)n-k
n≥0, 0<þ<1
n
k
n!
k!(n-k)!
where
=
= number of ways of
picking k items out of n distinct items
without regard to order.
Examples of Discrete R.V.
3. Geometric
A = {1, 2, 3, …}
Parameter: p
px (k) = (1-p)k-1p
0<p<1
EX: Repeat S until B with P(B = true) = p
Number of executions of the repeat loop.
Examples of Discrete R.V.
Modified Geometric
A = {0, 1, 2, …}
px(k) = (1-p)k p
EX: While B do S; the number of executions
of the while loop.
Examples of Discrete R.V.
4. Negative Binomial or Pascal
A = {r, r+1, r+2, …}
Parameters: r > 1,
px(k) =
k-1
r -1
p: 0 < p < 1
p r -1 (1 - p)k-r
Examples of Discrete R.V.
5. Poisson
A = {0, 1, 2, …}
Parameter: λ>0
px (k) = e -λ 
k!
Where
e = base of natural logarithms
= 2.718…
EX:
• Number of jobs arriving per unit time.
• Number of component failures in unit time.
k
Examples of Discrete R.V.
6. Uniform
A = {1, 2, …, n)
Parameter: n
px (k) =
1
n
Cumulative Distribution Function
FX(n) = P (X ≤x)
-∞ ≤ x ≤ ∞
probability density function
fx (x)=
d
dx
Fx (x)
Fx(x) = ∫x -∞ fx(u) du
Cumulative Distribution Function
Properties:
1. fx(x) ≥ 0
2. fx(x) = ∫ ∞-∞ fx(x) dx = 1
3. limx ∞ Fx(x) = 1 (Non-defective distributions)1
4. limx ∞ Fx(x) = 0
5. Fx(x) is a nondecreasing function of x.
6. P (a ≤ X ≤ b) = F(b) – F(a)
1
For a defective distribution: limx
∞
SHARPE allows defective distributions.
Fx(x)<1
Continuous Random Variables
A random variable is called continuous if it can
take uncountably many values.
Generally, the random variables will take values in
R = (-∞, ∞)
RT = [0, ∞)
I = [a,b] or [a,b) or (a,b] or (a,b)
EX: Execution Time of a Program ∈ [0, ∞)
Time to failure of a component ∈ [0, ∞)
Examples of Continuous Random
Variables
1. Uniform
A = [a, b]
fx(x) =
1
b-a
(X is U(a, b))
Parameters: a,b
-∞ < a < b < ∞
if a ≤ x ≤ b
= 0 otherwise.
Examples of Continuous Random
Variables
2. Exponential
A = [0, ∞)
fx(x) =
(X is EXP (λ))
Parameters: λ, λ ≥ 0
λe -λx x ≥ 0
0
x<0
-λx x ≥ 0
1
e
Fx(x) =
0
x<0
EX: Time between two successive job arrivals.
Service time of a job.
Time between two component failures.
Examples of Continuous Random
Variables
3. Erlang
A = [0, ∞) Parameters: λ, λ ≥ 0
k, k = 1, 2, 3…
k 1
(

x
)
fx(x) =
e  λx
(x ≥ 0)
(k  1)!
0
Fx(x) =
(x < 0)
k 1
1   e x
r 0
0
(x) r
r!
(x ≥ 0)
(x < 0)
Sum of k independent, identically distributed EXP (λ) random variables.
Examples of Continuous Random
Variables
4. Hyperexponential
– Mixture of n EXP random variables.
A = [0, ∞) Parameters: n, n = 1, 2, …
∝1, ∝2, …, ∝n ∝i ≥ 0, ∑∝i =1
λ1, λ2, … λn
λi ≥ 0
Examples of Continuous Random
Variables
Hyperexponential cont.
n
fx (x) =
  e
i i
-λi x
i 1
x<0
0
n
Fx(x) = 1    ie
1
0
x≥0
-λi x
x≥0
x<0
Examples of Continuous Random
Variables
5. Weibull - Time to failure distribution.
A = [0, ∞)
Parameters: λ, λ > 0
∝, ∝ > 0
fx(x) =
λ∝x∝-1
∝-1
λx
e
0
Fx(x) = 1 – e-λx∝
0
(x ≥ 0)
(x < 0)
(x ≥ 0)
(x < 0)
Examples of Continuous Random
Variables
6. Normal or Gaussian
A = [-∞, ∞]
Parameters: μ
σ,σ≥0
2

1
1 x  
fx ( x) 
exp  
 
2 
 2    
Functions of a Random Variable
If x is a random variable, so is Y = f(x).
Computing CDF of Y:
1. Y = x2
f y(y) = 1 [fx(√y) = fx(-√y)] if y>0
2(√y)
2.
3.
4.
X ~ N(0, 1)
Y = aX+b = N(b, a2) -∞< a,b <
X ~ U(0, 1)
Y = aX+b ~ U(b, b+a)
X ~ Exp (λ)
Y = rX
~Exp ( λr )
∞
Expectation of a random variable
E(x) =
∑ xi þx(xi)
if x is discrete
i
  xfx ( x)dx
if x is cont.
Alternate Definition for positive r.v.

E(x) =  (1  Fx ( x))dx
o
Valid for all random variables.
Moments of a r.v.
Let Y = g(x)
∑ yi þy(yi)
E(Y) = ∫ ∞ y fy(y) dy
-∞
∑ g(xi ) þx(xi)
= ∞
∫-∞ g(x) fx(x) dx
if y is discrete
if y is cont.
if x is discrete
if x is cont.
Alternate Method for positive r.v.
If x and y = g(x) are positive r.v.
∞
E(y) = ∫ g(x)(1-Fx(x))dx
o
n-th moment of x = E(xn)
n-th central moment of x = E((x-E(x))n
Variance of x = E((x-E(x))2)
Properties
E(aX+b) = aE(x)+b
Var(aX+b) = a2 Var(x)
Coeff. of variation = c(x) =
√Var(x)
E(x)
√Var(x) is called the standard deviation of x,
denoted by σx.
Table of Moments:
Discrete Distributions
X
Bernoulli
Binomial
Geometric
Neg. Bin.
Poisson
Uniform
Mod. Geom.
Param.
þ
n,p
p
r, p
λ
n
Mean
þ
np
1/p
r/p
λ
Variance
p(1-p)
np(1-p)
q/p2
rq/p2
λ
n+1
2
n2-1
12
p
q/p
q/p2
Table of Moments:
Continuous R.V.
Distribution
Uniform
Exponential
Erlang
Hyper-Exp.
Param.
Mean
a, b
a+b
2
1
λ
k
λ
λ
λ, k
n, ∝i, λi
Weibull
λ, ∝
Normal
μ, σ
i
n

1
1/ d
1
 

i
 1
 1  
 d
μ
Variance
(b-a)2
12
1
λ2
k
λ2
i
 i
2 2    
i   i 
1
 

2/ d
2
 2
1    E ( x ) 2
 d
σ2
Reliability Concepts
Life time of a component can be modeled as
a random variable, T.
• R(t) = Reliability at time t
= P (Life time > t)
= I - FT(t)
• Mean Time To Failure (MTTF)
∞
∞
= E(T) = ∫o t fT(t)dt = ∫o (1-FT(t))dt
∞
= ∫o R(t)dt
Reliability Concepts
• Remaining Life Time at t
P(T ≤ t + x | T > t) = F(t+x)/R(t)
• Expected Remaining Life Time at t
∞ F( t  x)  F( t )
E(T|T>t) – t = ∫ o
1  F( t )
dx
Hazard Rate or Failure Rate
Gt(x) = P (A component fails in (t, t+x]
It survived up to t.
= P (x + t > T > t | T > t)
= F(t  x)  F(t )
1  F(t)
= R(t)  R(t  x)
R(t )
Instantaneous Failure Rate
R(t)  R(t  x) f ( t )
1 d
h(t ) 


R(t )
R(t )
R(t ) R(t ) dt
h(t)∆t ≈ probability that a component surviving up
to t fails during t to t+∆t
Cumulative Hazard
H(t )   h( x)dx
t

Some important relations
R(t) = 1 – F(t)
R(t) = exp(-H(t))
Exponential Distribution
T ~ Exp (λ)
R(t) = e-λt
MTTF = 1/λ
Remaining Life:
P(T > t + x | T > t) = e-λx (Memoryless Property)
E(T | T> t) – t
= 1/λ
h(t) = λ Constant Failure Rate
H(t) = λt
Weibull Distribution
T ~ Wei (∝, λ)
R(t) = e-λt∝
1  1
MTTF =    1   
P(T>t+x |T>t) = e-λ(t+x)∝/ e-λt∝
h(t) = λ∝t∝-1
H(t) = λt∝
1

Often used in practice to allow for age-dependent failure
rates.