FREQUENCY ANALYSIS

Download Report

Transcript FREQUENCY ANALYSIS

FREQUENCY ANALYSIS
• Basic Problem:
To relate the magnitude of extreme events to
their frequency of occurrence through the use
of probability distributions.
FREQUENCY ANALYSIS
• Basic Assumptions:
(a) Data analyzed are to be statistically independent
& identically distributed - selection of data (Time
dependence, time scale, mechanisms).
(b) Change over time due to man-made (eg.
urbanization) or natural processes do not alter the
frequency relation - temporal trend in data
(stationarity).
FREQUENCY ANALYSIS
• Practical Problems:
(a) Selection of reasonable & simple distribution.
(b) Estimation of parameters in distribution.
(c) Assessment of risk with reasonable accuracy.
REVIEW OF BASIC CONCEPTS
Probabilistic
Outcome of a hydrologic event (e.g., rainfall amount & duration; flood
peak discharge; wave height, etc.) is random and cannot be predicted with
certainty.
Terminologies
- Population
The collection of all possible outcomes relevant to the process of
interest. Example:
(1) Max. 2-hr rainfall depth: all non-negative real numbers;
(2) No. of storm in June: all non-negative integer numbers.
- Sample
A measured segment (or subset) of the population.
REVIEW OF BASIC CONCEPTS
Terminologies
- Random Variable
A variable describable by a probability distribution which
specifies the chance that the variable will assume a particular
value.
Convention: Capital letter for random variables (say, X)
whereas the lower case letter (say, x) for numerical realization
that the random variable X will take.
Example:
X = rainfall amount in 2 hours (a random variable) ;
x = 100.2 mm/2hr (realization).
- Random variables can be
- discrete (eg., no. of rainy days in June) or
- continuous (eg., max. 2-hr rainfall amount, flood discharge)..
REVIEW OF BASIC CONCEPTS
Terminologies
Frequency & Relative Frequency
o For discrete random variables:
o Frequency is the number of occurrences of a specific event. Relative frequency
is resulting from dividing frequency by the total number of events. e.g.
n = no. of years having exactly 50 rainy days; N = total no. of years.
Let n=10 years and N=100 years. Then, the frequency of having
exactly 50 rainy days is 10 and the relative frequency of having
exactly 50 rainy days in 100 years is n/N = 0.1.
o For continuous random variables:
o Frequency needs to be defined for a class interval.
o A plot of frequency or relative frequency versus class intervals is called
histogram or probability polygon.
o As the number of sample gets infinitely large and class interval length
approaches to zero, the histogram will become a smooth curve, called
probability density function.
REVIEW OF BASIC CONCEPTS
Terminologies
Probability Density Function (PDF) –
• For a continuous random variable, the PDF must satisfy


f ( x) dx  1
-
and f(x) ≥ 0 for all values of x.
• For a discrete random variable, the PDF must satisfy
 p( x)  1
all x
and 1≥ p(x) ≥ 0 for all values of x.
REVIEW OF BASIC CONCEPTS
Terminologies
- Cumulative Distribution Function For a continuous random variable,
xo
Pr( X  xo ) 

f ( x) dx
-
For a discrete random variable, by
Pr( X  x o) =

all xi  xo
p( xi )
Statistical Properties of Random Variables
• Population - Synonymous to sample space, which describes
the complete assemblage of all the values representative of a
particular random process.
• Sample - Any subset of the population.
• Parameters - Quantities that are descriptive of the population
in a statistical model. Normally, Greek letters are used to
denote statistical parameters.
• Sample statistics (or simply statistics): Quantities calculated on
the basis of sample observations.
Statistical Moments of Random Variables
• Descriptors commonly used to show statistical
properties of a RV are those indicative
(1) Central tendency;
(2) Dispersion;
(3) Asymmetry.
• Frequently used descriptors in these three categories
are related to statistical moments of a RV.
• Two types of statistical moments are commonly used
in hydrosystem engineering applications:
(1) product-moments and
(2) L-moments.
Product-Moments
•
rth-order product-moment of X about any reference point X=xo is defined, for continuous
case, as


r
r
r


E  X - xo  =   x - xo  f x  x  dx =   x - xo  dFx  x 


-
-
whereas for discrete case,
K
r
E  X - xo   =   xk


- xo  p x  xk 
r
k=1
•
•
where E[] is a statistical expectation operator.
In practice, the first three moments (r=1, 2, 3) are used to describe the central tendency,
variability, and asymmetry.
Two types of product-moments are commonly used:
– Raw moments: µr'=E[Xr] rth-order moment about the origin; and
– Central moments: µr=E[(X-µx)r] = rth-order central moment
– Relations between two types of product-moments are:
r
μ r =   -1 C r,i  xi μ'r - i
i= 0
.
•
i
r
μ'r = C r,i  xi μ r - i
i= 0
where Cn,x = binomial coefficient = n!/(x!(n-x)!)
Main disadvantages of the product-moments are:
(1) Estimation from sample observations is sensitive to the presence of outliers; and
(2) Accuracy of sample product-moments deteriorates rapidly with increase in the order of the
moments.
Mean, Mode, Median, and Quantiles
•
Expectation (1st-order moment) measures central tendency of random variable X
E  X  = μx =



 x f  x  dx =  x dF  x  =  1 - F  x  dx
x
-
x
-
x
-
– Mean () = Expectation = l1 = location of the centroid of PDF or PMF.
– Two operational properties of the expectation are useful:
 K
 K
•
E   ak X k  =  ak k
k=1
k=1


in which k=E[Xk] for k = 1,2, …, K.
• For independent random variables,
•
 K
 K
E   X k  =  k
 k=1
 k=1
Mode (xmo) - the value of a RV at which its PDF is peaked. The mode, xmo, can be obtained by
 f  x  
solving
=0
x


 x  x  xmo
•
Median (xmd) - value that splits the distribution into two equal halves, i.e,
Fx  xmd  =
xmd
 f  x  dx = 0.5
x
-
•
•
Quantiles - 100pth quantile of a RV X is a quantity xp that satisfies P(X  xp) = Fx(xp) = p
A PDF could be uni-modal, bimodal, or multi-modal. Generally, the mean, median, and mode
of a random variable are different, unless the PDF is symmetric and uni-modal.
Uni-modal and bi-modal distributions
fx(x)
x
(a) Uni-modal distribution
fx(x)
x
(b) Bi-modal distribution
Variance, Standard Deviation, and
Coefficient of Variation
• Variance is the second-order central moment measuring the spreading of a
RV over its range,
Var  X  = 2  s = E  X - μx   =


2
x
2

  x - μx  f x  x  dx
2
-
• Standard deviation (sx) is the positive square root of the variance.
• Coefficient of variation, Wx=sx/x, is a dimensionless measure; useful for
comparing the degree of uncertainty of two RVs with different units.
• Three important properties of the variance are:
– (1) Var[c] = 0 when c is a constant.
– (2) Var[X] = E[X2] - E2[X]
– For multiple independent random variables,
 K
 K 2 2
Var  ak X k  =  ak s k
 k=1
 k=1
where ak =a constant and sk = standard deviation of Xk, k=1,2, ..., K.
Skewness Coefficient
• Measures asymmetry of the PDF of a random variable
• Skewness coefficient, gx, defined as
3


μ 3 E   X - μx  
g x = 1.5 =
s x3
μ2
• The sign of the skewness coefficient indicates the degree of
symmetry of the probability distribution function.
• Pearson skewness coefficient –
g1 =
μ x - x mo
sx
• In practice, product-moments higher than 3rd-order are less
used because they are unreliable and inaccurate when
estimated from a small number of samples
• See Table for equations to compute the sample productmoments.
Relative locations of mean, median, and mode
for positively-skewed, symmetric, and
negatively-skewed distributions.
fx(x)
fx(x)
(a) Positively Skewed, gx>0
(b) Symmetric, gx=0
x
x
xmo xmd x
xxmo=xmd
fx(x)
(c) Negatively skewed, gx<0
x xmd xmo
x
Product-moments of random variables
Moment
First
Measure of
Central
Location
Definition
Mean, Expected value
E(X)=x
Variance, Var(X)=2= sx2
Continuous Variable
x 

 x f ( x) dx
x
-

s   (x - μx ) fx (x) dx
2
x
2
-
Second
Dispersion
Third
3
 ( x - μx ) f x ( x) dx
-
Asymmetry
Skewness coefficient, gx
gx = 3 / sx3

μ4 
 (x - μ )
x
-
Fourth
Peakedness
Kurtosis, x
Excess coefficient, x
x = 4 / sx4

xx-
4
μx   xk p( xk )
x   xi / n
all x's
2
2
s    xk -  x  Px  xk 
x all x ' s
Wx = sxx

μ3 
Sample Estimator
s x  Var ( X )
Standard deviation, sx
s x  Var ( X )


Coefficient of variation,Wx Wx = sxx
Skewness
Discrete Variable
f x ( x) dx

1
 xi - x
n -1
2
2
Cv  s x
gx = 3 / sx3
g = m3 / s3
μ4   ( xk - μx )4 px ( xk ) m4 
x = 4 / sx4

xx-

s
m3 
all x's

1
 xi - x
n -1
μ3   ( xk - μx )3 px ( xk )
all x's

s2 

n
 xi - x
(n - 1)(n - 2)

3
n (n  1)
 xi - x
(n - 1)(n - 2)(n - 3)
k = m4 / s4


4
Kurtosis (x)
• Measure of the peakedness of a distribution.
• Related to the 4th central product-moment as
 X - μ x  4 
E
μ4

x = 2 = 
s x4
μ2
• For a normal RV, its kurtosis is equal to 3. Sometimes,
coefficient of excess, x=x-3, is used.
• All feasible distribution functions, skewness coefficient and
kurtosis must satisfy
g x2  1   x
Some Commonly Used Distributions
• NORMAL DISTRIBUTION

f N x | x, s
2
x
=
1
2π s x
 1
exp  2

 x - μx

 σ
x





2

 , for - < x < 


Standardized Variable:
Z
X -
s
Z has mean 0 and standard deviation 1.
Some Commonly Used Distributions
• STANDARD NORMAL DISTRIBUTION:
0.5
0.4
(z ) 0.3
0.2
0.1
0
-3
-2
-1
0
z
1
2
3
Some Commonly Used Distributions
• LOG-NORMAL DISTRIBUTION
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
(a) x = 1.0
Wx=0.
3
Wx=0.
6
0
1
2
Wx=1.3
3
x
4
0.7
5
6
x =1.65
0.6
(b) Wx = 1.30
0.5
0.4
x =2.25
fLN(x)
f LN  x |  ln x
 1  ln( x) -   2 
ln x
, s ln2 x  =
exp - 
 , x>0
2 s ln x x
 2  s ln x
 
1
0.3
x =4.50
0.2
0.1
0
0
1
2
3
x
4
5
6
fLN(x)
Some Commonly Used Distributions
• Gumbel (Extreme-Value Type I) Distribution

  x -
F EV 1  x |  ,   = exp  - exp  - 
  




for maxima
f EV 1  x |  ,   =
1
  x - 
exp - 

  
  x -

exp
- 


  
  
   for maxima
  

  x - ξ   
= 1 - exp - exp + 
   for minima

  β   
=
  x - 
exp + 

  
  x -

exp
+ 


  
  
   for minima
  



1
fEV1(y)
0.4
Max
Min
0.3
0.2
0.1
0
-4
-3
-2
-1
0
y
1
2
3
4
Some Commonly Used Distributions
• Log-Pearson Type 3 Distribution
f P 3 x |  ,  ,   =
1
β Γ  
 x-ξ 


 β 
α -1
e
-  x -  / 
0.30
4,1
0.25
1,4
0.20
fG(x) 0.15
2,4
0.10
0.05
0.00
0
2
4
6
8
x
10
12
14
Some Commonly Used Distributions
• Log-Pearson Type 3 Distribution
1
f LP 3  x |  ,  ,   =
x β Γ  
 ln  x  - ξ 


β


α -1
e
- ln( x ) -  / 
with >0, xe when >0 and with >0, xe when <0