Transcript ppt - COINS
Al-Imam Mohammad bin Saud University
CS433: Modeling and Simulation
Lecture 04: Statistical Models
Dr. Anis Koubâa
09 October 2010
Goals for Today
Review of the most common statistical models
Understand how to determine the empirical distribution
from a statistical sample
2
2
Topics
Discrete Probability Distributions
•
•
•
•
Binomial Distribution
Bernoulli Distribution
Geometric Distribution
Discrete Poisson Distribution/Poisson Process
Continuous Probability Distribution
• Uniform
• Exponential
• Normal
Empirical Distributions
3
3
Discrete and Continuous
Random Variables
4
Discrete versus Continuous Random Variables
Discrete Random Variable
Continuous Random Variable
Finite Sample Space
e.g. {0, 1, 2, 3}
Probability Mass Function (PMF)
p x i P X x i
1. p (x i ) 0, for all i
2. i 1 p (x i ) 1
Infinite Sample Space
e.g. [0,1], [2.1, 5.3]
Probability Density Function (PDF)
f x
1. f ( x) 0 , for all x in R X
2. f ( x)dx 1
RX
3. f ( x) 0, if x is not in RX
Cumulative Distribution Function (CDF) p X x
p X x x
i
x
p (x i )
p X x f t dt 0
x
p a X b f x dx
b
a
5
Expectation
E(X): The expected value (the mean) of X
E ( x) xi p( xi )
• Discrete
all i
E ( x) xf ( x)dx
• Continuous
Var(X) : The variance of X
• Discrete
• Continuous
n
2
V X
x i x p x i x i
i 1
i 1
n
V X
x
x
2
n
2
p x i x i p x i
i 1
f x dx x 2 f x dx
x2
6
2
Statistical Models: Properties
Context of applications
Probability distribution (PMF, PDF)
Cumulative Distribution Function (CDF)
Mean/Expected Value
Variance and Standard Deviation
7
Discrete Probability Distributions
Bernoulli Trials
Binomial Distribution
Geometric Distribution
Poisson Distribution
Poisson Process
8
Modeling of Random Events with
Two-States
Bernoulli Trials
Binomial Distribution
9
Bernoulli Trials
Context: Random events with two possible values
Two events:Yes/No, True/False, Success/Failure
Two possible values: 1 for success, 0 for failure.
Example: Tossing a coin, Packet Transmission Status,
An experiment comprises n trial.
Probability Mass Function (PMF): Probability in one trial
x j 1, j 1, 2,..., n
p,
PMF: p one trial p (x j ) 1 p q , x j 0 , j 1, 2 ,..., n
0,
otherwise
Expected Value: E X j p
Variance :V X j 2 p 1 p
Bernoulli Process: n trials
p X 1 , X 1 ,..., X n , p X 1 p X 2 ...p X n
10
Binomial Distribution
Context: Number of successes in a series of n trials.
Example: the number of 'heads' occurring when a coin is tossed 50 times.
The number of successful transmissions of 100 packets
Probability Mass Function (PMF)
Binomial distribution with parameters n and p, X ~ B(n,p)
Probability of having k successes in n trial
n k
n k
P X k p 1 p
k
where
k = 0, 1, 2, ......., n
n = 1, 2, 3, .......
0<p<1
with
n
n!
k k ! n k !
k: number of success
n: number of trials
p = success probability
Expected Value: E X n p
Variance :V X 2 n p 1 p
11
Binomial Distribution
Probability of k=2 successes in n=3 trials?
or
or
E1: 1 1 0
P(E1)= p(1 and 1 and 0) = p p (1-p) = p2 (1-p)
E2: 1 0 1
P(E2)= p(1 and 0 and 1) = p (1-p) p = p2 (1-p)
E3: 0 1 1
P(E3)= p(1 and 1 and 0) = p p (1-p) = p2 (1-p)
Probability of k=2 successes in n=3 trials?
p X 2 p E 1 p E 2 p E 3 3 p 2 1 p
3!
6
p 2 1 p p 2 1 p
2! 3 2 !
2
n
n!
k k ! n k !
12
End of Part 01
Administrative issues
• Groups Formation
13
Modeling of Discrete Random
Time
Geometric Distribution
14
Geometric Distribution
Context: the number of Bernoulli trials until achieving the
FIRST SUCCESS.
It is used to represent random time until a first transition occurs.
PMF
q k 1 p , k 0,1, 2,..., n
PMF: p (X k )
otherwise
0,
CDF: F X p X k 1 1 p
k
Expected Value : E X
Variance :V X
2
1
p
k
q
p
2
1 p
p2
15
Modeling of Random Number of
Arrivals/Events
Poisson Distribution
Poisson Process
16
Poisson Distribution
Context: number of events occurring in a fixed period of time
Possion distribution is characterized by the average rate
Events occur with a known average rate and are independent
The average number of arrival in the fixed time period.
Examples
The number of cars passing a fixed point in a 5 minute
interval. Average rate: = 3 cars/5 minutes
The number of calls received by a switchboard during a
given period of time. Average rate: =3 call/minutes
The number of message coming to a router per second
The number of travelers arriving to the airport for flight
registration
17
Poisson Distribution
The Poisson distribution with the average rate parameter
k
the probability that there are
exp for k 0,1, 2, .... exactly k arrivals in a certain
PMF: p k P X k k !
period of time.
0,
otherwise
CDF: F k p X k
k
i
i ! exp
the probability that there are at least k arrivals
in a certain period of time.
i 0
Expected value: E X
Variance: V X
PMF
CDF
18
Example: Poisson Distribution
The number of cars that enter the parking follows a Poisson
distribution with a mean rate equal to = 20 cars/hour
or
The probability of having exactly 15 cars entering the parking in one hour:
2015
p 15 P X 15
exp 20 0.051649
15!
p 15 F 15 F 14 0.156513 0.104864 0.051649
The probability of having more than 3 cars entering the parking in one hour:
p X 3 1 p X 3 1 F 3
1 p 0 p 1 p 2 p 3
0.9999967
USE EXCEL/MATLAB
FOR COMPUTATIONS
19
Example: Poisson Distribution
Probability Mass Function
Poisson ( = 20 cars/hour)
20k
p X k
exp 20
k!
Cumulative Distribution Function
Poisson ( = 20 cars/hour)
k
20i
F k p X k
exp 20
i!
i 0
20
Five Minutes Break
You are free to discuss with your classmates about the previous slides, or to
refresh a bit, or to ask questions.
Administrative issues
•
Groups Formation
21
Modeling of Random Number of
Arrivals/Events
Poisson Distribution
Poisson Process
22
Poisson Process
Process N(t)
random variable N that depends on time t
Poisson Process: a Process that has a Poisson Distribution
N(t) is called a counting poisson process
There are two types:
Homogenous Poisson Process: constant average rate
Non-Homogenous Poissoon Process: variable average rate
23
What is “Stochastic Process”?
State Space = {SUNNY,
RAINNY}
X day i "S " or " R ": RANDOM VARIABLE that varies with the DAY
X day 2 "S "
X day 1 "S "
X day 4 "S "
X day 3 " R "
X day 6 "S "
X day 5 " R "
X day 7 "S "
Day
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
THU
FRI
SAT
SUN
MON
TUE
WED
X day i IS A STOCHASTIC PROCESS
X(dayi): Status of the weather observed each DAY
24
Examples of using Poisson Process
The number of web page requests arriving at a server may be
characterized by a Poisson process except for unusual
circumstances such as coordinated denial of service attacks.
The number of telephone calls arriving at a switchboard, or at an
automatic phone-switching system, may be characterized by a
Poisson process.
The arrival of "customers" is commonly modelled as a Poisson
process in the study of simple queueing systems.
The execution of trades on a stock exchange, as viewed on a tick by
tick basis, is a Poisson process.
25
(Homogenous) Poisson Process
Formally, a counting process N , 0 is a (homogenous)
Poisson process with constant average rate if:
for t 0 and n 0,1, 2,...
( ) n
PMF: p N t N t n p N ( ) n
exp
n!
N ( )
t
N (q )
t+
t++q
N t N t describes
the number of events in time interval t , t
The mean and the variance are equal
E N V N
26
(Homogenous) Poisson Process
Properties of Poisson process
Arrivals occur one at a time (not simultaneous)
N , 0 has stationary increments, which
means N t N s N t s
The number of arrivals in time s to t is also Poissondistributed with mean t s
N , 0 has independent increments
27
CDF of Exponential distribution
Inter-Arrival Times of a Poisson Process
Inter-arrival time: time between two consecutive arrivals
The inter-arrival times of a Poisson process are random.
What is its distribution?
Consider the inter-arrival times of a Poisson process (A1, A2, …), where Ai is
the elapsed time between arrival i and arrival i+1
The first arrival occurs after time t MEANS that there are no
arrivals in the interval [0,t], As a consequence:
p A1 t p N t 0 exp t
p A1 t 1 p A1 t 1 exp t
The Inter-arrival times of a Poisson process are
exponentially distributed and independent with mean 1/
28
28
Splitting and Pooling
Splitting
N(t) ~ Poi()
p
(1-p)
N1(t) ~ Poi[p]
N2(t) ~ Poi[(1-p)]
Pooling
N1(t) ~ Poi[1]
N2(t) ~ Poi[2]
1
1 2
N(t) ~ Poi(1 2)
2
29
Modeling of Random Number of
Arrivals/Events
Poisson Distribution
Non Homogenous Poisson Process
30
Non Homogenous (Non-stationary) Poisson Process
(NSPP)
The non homogeneous Poisson process is
characterized by a VARIABLE rate parameter λ(t),
the arrival rate at time t. In general, the rate parameter
may change over time.
1
2
3
The stationary increments, property is not satisfied
The expected number of events (e.g. arrival)
between time s and time t is
s , t : N t N s N t s
t
s ,t λ(u) du
s
31
Example: Non-stationary Poisson Process
(NSPP)
The number of cars that cross the intersection of King Fahd Road and AlOurouba Road is distributed according to a non homogenous Poisson process
with a mean (t) defined as follows:
80 cars/mn if 8 am t 9am
60 cars/mn if 9am t 11pm
t
50 car/mn if 11am t 15 pm
70 car/mn if 15 pm t 17 pm
Let us consider the time 8 am as t=0.
Q1. Compute the average arrival number of cars at 11H30?
Q2. Determine the equation that gives the probability of having only 10000 car
arrivals between 12 pm and 16 pm.
Q3. What is the distribution and the average (in seconds) of the inter-arrival time
of two cars between 8 am and 9 am?
32
Example: Non-stationary Poisson Process
(NSPP)
Q1. Compute the average arrival number of cars at 11H30?
8:00,11:30
11:30
8:00
9:00
8:00
λ(u) du
λ(u) du
11:00
9:00
λ(u) du
11:30
11:00
λ(u) du
80cars/mn 60mn 60cars/mn 120mn 50cars/mn 30mn 13500 cars
Q2. Determine the equation that gives the probability of having only 10000 car
arrivals between 12 pm and 16 pm.
We know that the number of cars between 12 pm and 16 pm, i.e. N 16 N 12
follows a Poisson distribution. During 12 pm and 16pm, the average number of
cars is
12:0016:00 180 50 60 70 13200 cars
10000
Thus,
13200
p N 16 N 12 10000
exp 13200
10000!
Q3. What is the distribution and the average (in seconds) of the inter-arrival time
of two cars between 8 am and 9 am? (Homework)
33
Two Minutes Break
You are free to discuss with your classmates about the previous slides, or
to refresh a bit, or to ask questions.
Administrative issues
•
Groups Formation
34
Continuous Probability Distributions
Uniform Distribution
Exponential Distribution
Normal (Gaussian) Distribution
35
Uniform Distribution
36
Continuous Uniform Distribution
The continuous uniform distribution is a family of probability
distributions such that for each member of the family, all intervals of
the same length on the distribution's support are equally probable
A random variable X is uniformly distributed on the interval [a,b],
U(a,b), if its PDF and CDF are:
1
, a x b
PDF: f (x ) b a
otherwise
0,
a b
Expected value: E X
2
x a
0,
x a
CDF: F (x )
, ax
b
a
x b
1,
Variance: V X
b
a b 2
12
37
Uniform Distribution
PDF
Properties
p x 1 X x 2is proportional to the
length of the interval
F X 2 F X 1
X 2 X1
b a
Special case: a standard
uniform distribution U(0,1).
CDF
Very useful for random number
generators in simulators
38
Exponential Distribution
Modeling Random Time
39
Exponential Distribution
The exponential distribution describes the times between
events in a Poisson process, in which events occur continuously
and independently at a constant average rate.
A random variable X is exponentially distributed with
parameter 1/ > 0 if its PDF and CDF are:
exp x ,
x 0
PDF: f (x )
otherwise
0,
1
x
exp
,
f (x )
0,
0,
x 0
CDF: F (x ) x t
e dt 1 e x , x 0
0
0,
F (x )
x
1
exp
,
Expected value: E X
Variance: V X
1
1
2
x 0
otherwise
x 0
x 0
2
40
Exponential Distribution
µ=20
1
x
exp
x 0
,
f (x ) 20
20
0,
otherwise
µ=20
0,
F (x )
x
1
exp
,
20
x 0
x 0
41
Exponential Distribution
The memoryless property: In probability theory, memoryless is a
property of certain probability distributions: the exponential distributions
and the geometric distributions, wherein any derived probability from a set
of random samples is distinct and has no information (i.e. "memory") of earlier
samples.
Formally, the memoryless property is:
For all s and t greater or equal to 0:
p X s t | X s p X t
This means that the future event do not depend on the past event, but only on
the present event
The fact that Pr(X > 40 | X > 30) = Pr(X > 10) does not mean that the events X > 40 and X
> 30 are independent; i.e. it does not mean that
Pr(X > 40 | X > 30) = Pr(X > 40).
42
Exponential Distribution
The memoryless property: can be read as “the probability that
you will wait more than s+t minutes given that you have already
been waiting s minutes is equal to the probability that you will wait
t minutes.”
In other words “The probability that you will wait s more minutes
given that you have already been waiting t minutes is the same as
the probability that you had wait for more than s minutes from the
beginning.”
p X s t | X s p X t
The fact that Pr(X > 40 | X > 30) = Pr(X > 10) does not mean that the
events X > 40 and X > 30 are independent; i.e. it does not mean that
Pr(X > 40 | X > 30) = Pr(X > 40).
43
Example: Exponential Distribution
The time needed to repair the engine of a car is exponentially
distributed with a mean time equal to 3 hours.
The probability that the car spends more than 3 hours in reparation
3
p X 3 1 p X 3 1 F 3 1 1 exp 0.368
3
The probability that the car repair time lasts between 2 to 3 hours is:
p X 3 F 3 F 2 0.145
The probability that the repair time lasts for another hour given it has been
operating for 2.5 hours:
Using the memoryless property of the exponential distribution, we have:
1
p X 2.5 1 | X 2.5 p X 1 1 p X 1 exp 0.717
3
44
Normal (Gaussian) Distribution
45
Normal Distribution
The Normal distribution, also called the Gaussian
distribution, is an important family of continuous probability
distributions, applicable in many fields.
Each member of the family may be defined by two parameters,
location and scale: the mean ("average", μ) and variance
(standard deviation squared, σ2) respectively.
The importance of the normal distribution as a model of
quantitative phenomena in the natural and behavioral sciences
is due in part to the Central Limit Theorem.
It is usually used to model system error (e.g. channel error), the
distribution of natural phenomena, height, weight, etc.
46
Normal or Gaussian Distribution
A continuous random variable X, taking all real values in the
range (-∞,+∞) is said to follow a Normal distribution with
parameters µ and σ if it has the following PDF and CDF:
1 x 2
PDF: f x
exp
2
2
1
x
1
CDF: F x 1 erf
2
2
where
Error Function: erf x
2
x
exp t 2
0
The Normal distribution is denoted asX ~ N , 2
This probability density function (PDF) is
a symmetrical, bell-shaped curve,
centered at its expected value µ.
The variance is 2.
47
Normal distribution
Example
The simplest case of the normal distribution, known as the Standard
Normal Distribution, has expected value zero and variance one. This is
written as N(0,1).
48
Normal Distribution
Evaluating the distribution:
Independent of and , using the standard normal distribution:
Z ~ N 0,1
Transformation of variables: let
Z
X
, where ( z )
z
1 t 2 / 2
e
dt
2
x
F ( x ) P X x P Z
( x ) /
1 z2 / 2
e
dz
2
( x ) /
( z )dz ( x )
49
Normal Distribution
Example: The time required to load an oceangoing vessel, X, is
distributed as N(12,4)
The probability that the vessel is loaded in less than 10 hours:
10 12
F (10)
(1) 0.1587
2
Using the symmetry property, (1) is the complement of (-1)
50
Empirical Distribution
51
Empirical Distributions
An Empirical Distribution is a distribution whose
parameters are the observed values in a sample of data.
May be used when it is impossible or unnecessary to establish
that a random variable has any particular parametric
distribution.
Advantage: no assumption beyond the observed values in the
sample.
Disadvantage: sample might not cover the entire range of
possible values.
52
Empirical Distributions
In statistics, an empirical distribution function is a cumulative
probability distribution function that concentrates probability
1/n at each of the n numbers in a sample.
Let X1, X2, …, Xn be iid random variables in with the CDF equal to
F(x).
The empirical distribution function Fn(x) based on sample X1, X2, …,
Xn is a step function defined by
number of element in the sample x
1
Fn x
n
n
n
I X
i
x
i 1
1 if X x
A.I X i x i
0 otherwise
where I(A) is the indicator of event
For a fixed value x, I(Xi≤x) is a Bernoulli random variable with
parameter p=F(x), hence nFn(x) is a binomial random variable
with mean nF(x) and variance nF(x)(1-F(x)).
53
End of Chapter
4
54