FREQUENCY ANALYSIS
Download
Report
Transcript FREQUENCY ANALYSIS
FREQUENCY ANALYSIS
• Basic Problem:
To relate the magnitude of extreme events to
their frequency of occurrence through the use
of probability distributions.
FREQUENCY ANALYSIS
• Basic Assumptions:
(a) Data analyzed are to be statistically independent
& identically distributed - selection of data (Time
dependence, time scale, mechanisms).
(b) Change over time due to man-made (eg.
urbanization) or natural processes do not alter the
frequency relation - temporal trend in data
(stationarity).
FREQUENCY ANALYSIS
• Practical Problems:
(a) Selection of reasonable & simple distribution.
(b) Estimation of parameters in distribution.
(c) Assessment of risk with reasonable accuracy.
REVIEW OF BASIC CONCEPTS
Probabilistic
Outcome of a hydrologic event (e.g., rainfall amount & duration; flood
peak discharge; wave height, etc.) is random and cannot be predicted with
certainty.
Terminologies
- Population
The collection of all possible outcomes relevant to the process of
interest. Example:
(1) Max. 2-hr rainfall depth: all non-negative real numbers;
(2) No. of storm in June: all non-negative integer numbers.
- Sample
A measured segment (or subset) of the population.
REVIEW OF BASIC CONCEPTS
Terminologies
- Random Variable
A variable describable by a probability distribution which
specifies the chance that the variable will assume a particular
value.
Convention: Capital letter for random variables (say, X)
whereas the lower case letter (say, x) for numerical realization
that the random variable X will take.
Example:
X = rainfall amount in 2 hours (a random variable) ;
x = 100.2 mm/2hr (realization).
- Random variables can be
- discrete (eg., no. of rainy days in June) or
- continuous (eg., max. 2-hr rainfall amount, flood discharge)..
REVIEW OF BASIC CONCEPTS
Terminologies
Frequency & Relative Frequency
o For discrete random variables:
o Frequency is the number of occurrences of a specific event. Relative frequency
is resulting from dividing frequency by the total number of events. e.g.
n = no. of years having exactly 50 rainy days; N = total no. of years.
Let n=10 years and N=100 years. Then, the frequency of having
exactly 50 rainy days is 10 and the relative frequency of having
exactly 50 rainy days in 100 years is n/N = 0.1.
o For continuous random variables:
o Frequency needs to be defined for a class interval.
o A plot of frequency or relative frequency versus class intervals is called
histogram or probability polygon.
o As the number of sample gets infinitely large and class interval length
approaches to zero, the histogram will become a smooth curve, called
probability density function.
REVIEW OF BASIC CONCEPTS
Terminologies
Probability Density Function (PDF) –
• For a continuous random variable, the PDF must satisfy
f ( x) dx 1
-
and f(x) ≥ 0 for all values of x.
• For a discrete random variable, the PDF must satisfy
p( x) 1
all x
and 1≥ p(x) ≥ 0 for all values of x.
REVIEW OF BASIC CONCEPTS
Terminologies
- Cumulative Distribution Function For a continuous random variable,
xo
Pr( X xo )
f ( x) dx
-
For a discrete random variable, by
Pr( X x o) =
all xi xo
p( xi )
Statistical Properties of Random Variables
• Population - Synonymous to sample space, which describes
the complete assemblage of all the values representative of a
particular random process.
• Sample - Any subset of the population.
• Parameters - Quantities that are descriptive of the population
in a statistical model. Normally, Greek letters are used to
denote statistical parameters.
• Sample statistics (or simply statistics): Quantities calculated on
the basis of sample observations.
Statistical Moments of Random Variables
• Descriptors commonly used to show statistical
properties of a RV are those indicative
(1) Central tendency;
(2) Dispersion;
(3) Asymmetry.
• Frequently used descriptors in these three categories
are related to statistical moments of a RV.
• Two types of statistical moments are commonly used
in hydrosystem engineering applications:
(1) product-moments and
(2) L-moments.
Product-Moments
•
rth-order product-moment of X about any reference point X=xo is defined, for continuous
case, as
r
r
r
E X - xo = x - xo f x x dx = x - xo dFx x
-
-
whereas for discrete case,
K
r
E X - xo = xk
- xo p x xk
r
k=1
•
•
where E[] is a statistical expectation operator.
In practice, the first three moments (r=1, 2, 3) are used to describe the central tendency,
variability, and asymmetry.
Two types of product-moments are commonly used:
– Raw moments: µr'=E[Xr] rth-order moment about the origin; and
– Central moments: µr=E[(X-µx)r] = rth-order central moment
– Relations between two types of product-moments are:
r
μ r = -1 C r,i xi μ'r - i
i= 0
.
•
i
r
μ'r = C r,i xi μ r - i
i= 0
where Cn,x = binomial coefficient = n!/(x!(n-x)!)
Main disadvantages of the product-moments are:
(1) Estimation from sample observations is sensitive to the presence of outliers; and
(2) Accuracy of sample product-moments deteriorates rapidly with increase in the order of the
moments.
Mean, Mode, Median, and Quantiles
•
Expectation (1st-order moment) measures central tendency of random variable X
E X = μx =
x f x dx = x dF x = 1 - F x dx
x
-
x
-
x
-
– Mean () = Expectation = l1 = location of the centroid of PDF or PMF.
– Two operational properties of the expectation are useful:
K
K
•
E ak X k = ak k
k=1
k=1
in which k=E[Xk] for k = 1,2, …, K.
• For independent random variables,
•
K
K
E X k = k
k=1
k=1
Mode (xmo) - the value of a RV at which its PDF is peaked. The mode, xmo, can be obtained by
f x
solving
=0
x
x x xmo
•
Median (xmd) - value that splits the distribution into two equal halves, i.e,
Fx xmd =
xmd
f x dx = 0.5
x
-
•
•
Quantiles - 100pth quantile of a RV X is a quantity xp that satisfies P(X xp) = Fx(xp) = p
A PDF could be uni-modal, bimodal, or multi-modal. Generally, the mean, median, and mode
of a random variable are different, unless the PDF is symmetric and uni-modal.
Uni-modal and bi-modal distributions
fx(x)
x
(a) Uni-modal distribution
fx(x)
x
(b) Bi-modal distribution
Variance, Standard Deviation, and
Coefficient of Variation
• Variance is the second-order central moment measuring the spreading of a
RV over its range,
Var X = 2 s = E X - μx =
2
x
2
x - μx f x x dx
2
-
• Standard deviation (sx) is the positive square root of the variance.
• Coefficient of variation, Wx=sx/x, is a dimensionless measure; useful for
comparing the degree of uncertainty of two RVs with different units.
• Three important properties of the variance are:
– (1) Var[c] = 0 when c is a constant.
– (2) Var[X] = E[X2] - E2[X]
– For multiple independent random variables,
K
K 2 2
Var ak X k = ak s k
k=1
k=1
where ak =a constant and sk = standard deviation of Xk, k=1,2, ..., K.
Skewness Coefficient
• Measures asymmetry of the PDF of a random variable
• Skewness coefficient, gx, defined as
3
μ 3 E X - μx
g x = 1.5 =
s x3
μ2
• The sign of the skewness coefficient indicates the degree of
symmetry of the probability distribution function.
• Pearson skewness coefficient –
g1 =
μ x - x mo
sx
• In practice, product-moments higher than 3rd-order are less
used because they are unreliable and inaccurate when
estimated from a small number of samples
• See Table for equations to compute the sample productmoments.
Relative locations of mean, median, and mode
for positively-skewed, symmetric, and
negatively-skewed distributions.
fx(x)
fx(x)
(a) Positively Skewed, gx>0
(b) Symmetric, gx=0
x
x
xmo xmd x
xxmo=xmd
fx(x)
(c) Negatively skewed, gx<0
x xmd xmo
x
Product-moments of random variables
Moment
First
Measure of
Central
Location
Definition
Mean, Expected value
E(X)=x
Variance, Var(X)=2= sx2
Continuous Variable
x
x f ( x) dx
x
-
s (x - μx ) fx (x) dx
2
x
2
-
Second
Dispersion
Third
3
( x - μx ) f x ( x) dx
-
Asymmetry
Skewness coefficient, gx
gx = 3 / sx3
μ4
(x - μ )
x
-
Fourth
Peakedness
Kurtosis, x
Excess coefficient, x
x = 4 / sx4
xx-
4
μx xk p( xk )
x xi / n
all x's
2
2
s xk - x Px xk
x all x ' s
Wx = sxx
μ3
Sample Estimator
s x Var ( X )
Standard deviation, sx
s x Var ( X )
Coefficient of variation,Wx Wx = sxx
Skewness
Discrete Variable
f x ( x) dx
1
xi - x
n -1
2
2
Cv s x
gx = 3 / sx3
g = m3 / s3
μ4 ( xk - μx )4 px ( xk ) m4
x = 4 / sx4
xx-
s
m3
all x's
1
xi - x
n -1
μ3 ( xk - μx )3 px ( xk )
all x's
s2
n
xi - x
(n - 1)(n - 2)
3
n (n 1)
xi - x
(n - 1)(n - 2)(n - 3)
k = m4 / s4
4
Kurtosis (x)
• Measure of the peakedness of a distribution.
• Related to the 4th central product-moment as
X - μ x 4
E
μ4
x = 2 =
s x4
μ2
• For a normal RV, its kurtosis is equal to 3. Sometimes,
coefficient of excess, x=x-3, is used.
• All feasible distribution functions, skewness coefficient and
kurtosis must satisfy
g x2 1 x
Some Commonly Used Distributions
• NORMAL DISTRIBUTION
f N x | x, s
2
x
=
1
2π s x
1
exp 2
x - μx
σ
x
2
, for - < x <
Standardized Variable:
Z
X -
s
Z has mean 0 and standard deviation 1.
Some Commonly Used Distributions
• STANDARD NORMAL DISTRIBUTION:
0.5
0.4
(z ) 0.3
0.2
0.1
0
-3
-2
-1
0
z
1
2
3
Some Commonly Used Distributions
• LOG-NORMAL DISTRIBUTION
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
(a) x = 1.0
Wx=0.
3
Wx=0.
6
0
1
2
Wx=1.3
3
x
4
0.7
5
6
x =1.65
0.6
(b) Wx = 1.30
0.5
0.4
x =2.25
fLN(x)
f LN x | ln x
1 ln( x) - 2
ln x
, s ln2 x =
exp -
, x>0
2 s ln x x
2 s ln x
1
0.3
x =4.50
0.2
0.1
0
0
1
2
3
x
4
5
6
fLN(x)
Some Commonly Used Distributions
• Gumbel (Extreme-Value Type I) Distribution
x -
F EV 1 x | , = exp - exp -
for maxima
f EV 1 x | , =
1
x -
exp -
x -
exp
-
for maxima
x - ξ
= 1 - exp - exp +
for minima
β
=
x -
exp +
x -
exp
+
for minima
1
fEV1(y)
0.4
Max
Min
0.3
0.2
0.1
0
-4
-3
-2
-1
0
y
1
2
3
4
Some Commonly Used Distributions
• Log-Pearson Type 3 Distribution
f P 3 x | , , =
1
β Γ
x-ξ
β
α -1
e
- x - /
0.30
4,1
0.25
1,4
0.20
fG(x) 0.15
2,4
0.10
0.05
0.00
0
2
4
6
8
x
10
12
14
Some Commonly Used Distributions
• Log-Pearson Type 3 Distribution
1
f LP 3 x | , , =
x β Γ
ln x - ξ
β
α -1
e
- ln( x ) - /
with >0, xe when >0 and with >0, xe when <0