Transcript Document

WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
WFM-6204: Hydrologic Statistics
Lecture-5: Probabilistic analysis: (Part-3)
Akm Saiful Islam
Institute of Water and Flood Management (IWFM)
Bangladesh University of Engineering and Technology (BUET)
December, 2006
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Probability Distributions and Their
Applications
 Continuous
Distributions
Normal distribution
 Exponential distribution
 Gamma distribution
 Lognormal distribution
 Extreme value distribution

Extreme value type-I: Gumbel distribution
 Extreme value type-III (minimum): Weibull distribution


Pearson Type III distribution
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Table: Area under standardized normal distribution
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Central Limit theorem

If Sn is the sum of n independently and
identically distributed random variables Xi
2


each having a mean
and variance
then in the limit as n approaches infinitely,
the distribution of Sn approaches a normal
distribution with mean n  and variance n 2
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Constructing normal curves for
data

Frequently the histogram of a set observed data
suggests that the data may be approximated by
a normal distribution. One way to investigate the
goodness of this approximation is by
superimposing a normal curve on the frequency
histogram and then visually compare the two
distributions. Statistical procedures for testing
the hypothesis that a set of data can be
approximated by a normal (or any other)
distribution.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Normal Approximations

Bi-nomial Distribution

If X is a binomial random variable with parameters n1 and p and
Y is a binomial random variable with parameters n2 and p the
Z=X+Y is a binomial random variable with parameters n=n1+n2.

Central Limit theorem would indicate that the normal distribution
approximates the binomial distribution if n is large. Thus as n
gets large the distribution of
Z  ( X  ) /   ( X  np) / np(1  p)

Approaches a N(0,1). This is sometimes knows as the DeMoivreLaplace limit theorem
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Example-1: [Haan, 1979]

X is a binomial random variable with n=25
and p=0.3. Compare the binomial and
normal approximation to the binomial for
evaluating the prob(5<X≤8).
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Normal Approximations

Poission Distribution
 The
sum of two Poissions random variables with
parameters 1 and 2 is also a Poissions random
variable with parameter =1+2 .. Extending this to
the sum of a large number of Poission random
variables, the Central Limit Theorem indicates that for
large , the Poission may be approximated by a
normal distribution. In this case the distribution of
Z  ( X   ) /   ( X   ) / 1 / 2
 approaches
a N(0,1). Since the Poission is the
limiting form of the binomial and the binomial can be
approximated by the normal, it is no surprise that the
Poission can also be approximated by the normal.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Assignment-2:[due next week]

X is a Poission random variable with =np
where n=25 and p=0.3. Compare the
Poission and normal approximation to the
Poission for evaluating the prob(5<X≤8).
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Exponential Distribution

The exponential density function is given by
p X ( x)   e x

x  0,   0
and the cumulative exponential by
X
x0
p X ( x)    e  t dt  1  e  X
0

The mean and variance of the exponential distribution are
var(X )  1 / 2
E( X )  1/ 

The exponential distribution is positively skewed with the
skewness coefficient of 2. Both the method of moments
and the maximum likelihood estimation give the parameter
ˆ  1 / X . The exponential distribution is a special case of the
gamma distribution.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Example-2: [Haan, 1979]

Haan and Johnson (1967) studied the physical
characteristics of depressions in north-central Iowa. The
data tabulated below shows the number of depressions
falling into various classes based on the surface area of
the depression. Plot a relative frequency histogram of
the data. Superimpose on the histogram the best fitting
exponential distribution. Estimate the probability that a
depression selected at random will have an area greater
than 2.25 acres.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Area (acres)
No. depressions
Area (acres)
No. depressions
0-½
106
4-4½
4
½-1
36
4½-5
5
1-1½
18
5-5½
2
1½-2
9
5½-6
6
2-2½
12
6-6½
3
2½-3
2
6½-7
1
3-3½
5
7-7½
1
3½-4
1
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Gamma Distribution


The gamma distribution failure probability density obeys
the equation
 (t ) r 1 e  t
 0 , r 0
f (t ) 
,
(4.6.1)
( r )
where parameter r need not be an integer. The two
parameters are the shape parameter r and the scale
parameter  . The shape of the distribution depends
significantly upon the value of r. It has also an impact on
the hazard rate  (t ) . In the special case that r is an
integer, the Erlangian distribution is recovered; in the
special case that  = 0.5 and r = 0.5, where is the
number of degrees of freedom, the gamma distribution
becomes the chi-square distribution.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam

The cumulative failure probability F(t) is:
1
1  t r 1  y

 (r , t ).
F (t ) 
y
e
dy
( r )
(r ) 0

The mean and variance of the gamma distribution are:
m  r/

(4.6.2)
and
 2  r / 2
The gamma distribution is especially appropriate for
systems subjected to an environment of repetitive,
random shocks generated according to the Poisson
distribution; thus the failure probability depends upon
how many shocks the device has suffered, i.e., its age.
As another application, if the mean rate of wear of a
device is a constant, but the rate of wear is subject to
random variations, then the gamma function should be
used.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam

For some devices, such as those for which
corrosion of metals is important, it may be
appropriate to modify the two-parameter
gamma distribution by introducing a time
delay  before the onset of failures begins.
Then equation (4.6.1) is modified to read
as:
r (t   ) r 1 e  (t  )
t   (4.6.3)
f (t ) 
( r )
0

t 
In such a case, the mean of the
distribution becomes:
(4.6.4)
m   r /
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Example-3:[MacCormick,1981]


Suppose that a device subjected to repetitive random
shocks satisfies a gamma distribution with parameters
r = 3 and   103 / hr, and that no failures can occur until
200 hour have passed. Estimate (a) the probability of
failure after the device has operated for t = 4500 hour
and (b) its mean time to failure. (MacCormick, 1981, p.
37)
Solution: In this problem, the time displacement is  =
200 hour. Integrating equation (4.6.3) from 0 to t and
using equation (4.6.2) gives the cumulative probability:
F (4500) 



1
1
 3,103 (4500 200) 
 (3,4,3)  0.8
(3)
(3)
Using equation (4.6.4) gives mean time to failure (MTTF)
= 200 + 3/1 0-3 = 3200 hr.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Example-4: [Haan, 1979]

The annual water yield for Cave Creek (near Fort Spring,
Kentucky) is shown in the following table. Estimate the
parameters of the gamma distribution for this data using
both the method of moments and the method of
maximum likelihood. Assuming the data follows a
gamma distribution, estimate the probability of an annual
water yield exceeding 20.0 inch.
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Log-Normal Distribution
The lognormal distribution (sometimes spelled out as the
logarithmic normal distribution) of a random variable is
one for which the logarithm of follows a normal or
Gaussian distribution. Denote , Y  ln X then Y has a
normal or Gaussian distribution given by:
f ( y) 
1
2 y2
e
1  y y
 
2   y




2
  y  
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam

Derived distribution: Since , , the
distribution of can be found as:
(4.13.2)
dy
1
1
1
f ( x)  f ( y ) 

e
 
e
1  y y
 
2   y
dx

2 y2




2
1  y y
 
2   y
x




2
2x 2 y2
Note that equation (4.13.1) gives the
distribution of Y as a normal distribution
with mean  y and variance  y2 . Equation
(4.13.2) gives the distribution of X as the
lognormal distribution with parameters
 y and  y2 .
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Estimation of parameters (  y ,  y2 ) of lognormal
distribution:
y
 Note: Y  ln X , y   i , S   y  ny Chow (1954) Method:
n
n 1
 (1) Cv  S x / X
1
X2
 (2) Y  2 ln 2
Cv  1
 (3) S 2  ln(C 2  1)
y
v
2
y



2
The mean and variance of the lognormal distribution are:
E( X )  exp( y   y2 / 2)

2
i
and

Var ( X )   x2 e
 y2

1
 y2
1
The coefficient of variation of the Xs is:
The coefficient of skew of the Xs is:   3C  C
Thus the lognormal distribution is skewed to the right;
the skewness increasing with increasing values of C v .
Cv  e
v
3
v
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Example-5:[Haan, 1979, p. 78]

Use the lognormal distribution and
calculate the expected relative frequency
for the third class interval on the data in
table 5.1
WFM 6204: Hydrologic Statistics © Dr. Akm Saiful Islam
Example-6: [Haan, 1979]

Assume the data of table 5.1 follow the
lognormal distribution. Calculate the
magnitude of the 100-year peak flood.