Transcript Document
Probability distribution functions
•
•
•
•
•
Normal distribution
Lognormal distribution
Mean, median and mode
Tails
Extreme value distributions
Normal (Gaussian) distribution
• Normal density function
f X ( x)
1
1 x
exp
2
2
• What does the figure tell us about the values of the
CDF?
More on the normal distribution
• P = normcdf(X,MU,SIGMA) returns the cdf of the normal
distribution with mean MU and standard deviation SIGMA,
evaluated at the values in X. The size of P is the common size
of X, MU and SIGMA.
• normcdf(1)=0.8413.
• 1-normcdf(6)= 9.8659e-010
• If X is normally distributed, Y=aX+b is also normally
distributed. What would be the mean and standard deviation
of Y?
• Notation
N , 2
Estimating mean and standard
deviation
• Given a sample from a normally distributed variable,
the sample mean is the best linear unbiased
estimator of the true mean.
• For the variance the equation gives the best
unbiased estimator, but the square root is not an
unbiased estimate of the standard deviation
2
x=randn(5,10000); s=std(x);
mean(s) 0.9463
s2=s.^2; mean(s2) 1.0106
1 n
1 n
2
xi x x xi
n 1 i 1
n i 1
Lognormal distribution
• If ln(X) has normal distribution X has
lognormal distribution. That is, if X is normally
distributed exp(X) is lognormally distributed.
• Notation: ln N ,
• Probability distribution function (PDF)
2
ln x 2
1
f ( x)
exp
2
2
x 2
• Mean and variance
exp / 2 ,
2
X
X2 Var X e 1 e 2
2
2
Mean, mode and median
exp[ 2 ]
• Mode (highest point)
• Median (50% of samples)
e
Light and heavy tails
• Normal distribution has light tail. Six sigma is
equivalent to .999999999 (nine nines) safety.
• Lognormal is heavy tailed 0.9963
m=exp(0.5)
m =1.6487
v=exp(1)*(exp(1)-1)
v =4.6708
sig=sqrt(v)
sig =2.1612
sig6=m+6*sig
sig6 =14.6159
logncdf(sig6,0,1) =0.9963
Fitting distribution to data
• Typically fit to CDF.
Empirical CDF
[F,X] = ecdf(Y) calculates the Kaplan-Meier estimate of the
cumulative distribution function (cdf), also known as the empirical
cdf. Y is a vector of data values. F is a vector of values of the
empirical cdf evaluated at X.
[F,X,FLO,FUP] = ecdf(Y) also returns lower and upper confidence
bounds for the cdf. These bounds are calculated using Greenwood's
formula, and are not simultaneous confidence bounds.
ecdf(...) without output arguments produces a plot of the empirical
cdf. Use the data cursor to read precise values from the plot.
Example
x=lognrnd(0,1,1,20); ecdf(x)
hold on
x=lognrnd(0,1,1,10000); ecdf(x)
1
0.9
0.8
0.7
F(x)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
x
25
30
35
40
Extreme value distributions
• No matter what distribution you sample from, the
mean of the sample tends to be normally distributed as
sample size increases (what mean and standard
deviation?)
• Similarly, distributions of the minimum (or maximum)
of samples belong to other distributions.
• Even though there are infinite number of distributions,
there are only three extreme value distribution.
– Type I (Gumbel) derived from normal.
– Type II (Frechet) e.g. maximum daily rainfall
– Type III (Weibull) weakest link failure
Example
x=5-0.3*randn(10,1000); minx=min(x); hist(minx); ecdf(minx)
300
1
250
0.9
0.8
200
0.7
0.6
F(x)
150
0.5
0.4
100
0.3
0.2
50
0.1
0
3.6
3.8
4
4.2
4.4
4.6
4.8
5
0
3.6
3.8
4
4.2
4.4
x
4.6
4.8
5
Gumbel distribution
• PDF and CDF
PDF
1
exp z e z ,
z
x
CDF exp(e z )
• Mean, median, mode and variance
Mean
Variance
2
6
median ln(ln(2))
2
mode=
Euler-Mascheroni constant 0.5772
Weibull distribution
• Probability distributionf ( x; , k ) k x
• Used to describe distribution Of
strength or fatigue life in brittle
materials (weakest link connection)
• If it describes time to failure, then
k<1 indicates that failure rate
decreases with time,
k=1 indicates constant rate,
k>1 indicates increasing rate.
• Useful for other phenomena like wind
speed distribution.
• Can add 3rd parameter by replacing x
by x-c.
k 1
e
x / k
x 0, k 0, 0