Multiple Random Variables and Joint Distributions

Download Report

Transcript Multiple Random Variables and Joint Distributions

Multiple Random Variables and
Joint Distributions
• The conditional dependence between
random variables serves as a foundation for
time series analysis.
• When multiple random variables are related
they are described by their joint distribution
and density functions
F( x, y)  P( X  x, Y  y)
2F
f ( x, y) 
xy
P( x, y  A)   f ( x, y)dxdy
A
P( x, y  A)   f ( x, y)dxdy
A
y
f(x,y)
x
Conditional and Joint Probability
Definition
P( D  E )  P ( D | E ) P( E )  P ( E | D ) P( D )
Bayes Rule
P( D  E ) P ( E | D ) P ( D )
P( D | E ) 

P( E )
P( E )
If D and E are independent
P( D | E )  P ( D )  P( D  E )  P( D ) P( E )
Partition of domain into non overlaping sets
P( E)  P( D1  E)  P( D2  E)  P( D3  E) D1
D2
P( E)  P( E | D1) P( D1)  P( E | D2 )P( D2 )  P( E | D3 ) P( D3 )
Larger form of Bayes Rule
P( D i | E ) 
P( E | Di ) P( Di )
P( E | D1 ) P( D1 )  P( E | D2 ) P( D2 )  P( E | D3 ) P( D3 )
D3
E
Conditional and joint density
functions
P( D  E )
P( D | E ) 
P( E )
Conditional density function
f ( x, y)
f (y | x) 
f (x)
Marginal density function
P( E)  P( D1  E)
 P( D2  E)  P( D3  E)

f (x) 
 f (x, y)dy
y  
If X and Y are independent
P( D  E )  P( D ) P( E )
f ( x , y )  f ( x )f ( y )
Marginal Distribution

f ( y) 
 f (x, y)dx
x  
y
f(x,y)
x
Conditional Distribution
f ( x, y)
f (y | x) 
f (x)
y
f(x,y)
x
Expectation and moments of
multivariate random variables
Population
Mean
Expectation
operator
x  

 xf (x, y)dxdy

E(g( X, Y))  
Cov( X, Y )  
Covariance
Correlation
Sample

 g(x, y)f (x, y)dxdy


 ( x   x )( y   y )f ( x, y)dxdy

 E([ X  E( X )][ Y  E( Y )])
 E( XY )  E( X ) E( Y )
Cov( X, Y )

x  y
1 N
X   Xi
N i 1
1 N
Ê( g( X, Y ))   g( X i , Yi )
N i 1
N
1
SXY 
 ( Xi  X )( Yi  Y )
N ( 1) i 1
ˆ 
SXY
SXS Y
R = 0.67
5
4
3
Log(Alafia Flow (cfs))
6
7
Covariance and Correlation are
measures of Linear Dependence
20
25
MD-11 DP Water Level
30
35
0.5
-1.0
-0.5
0.0
R=0.007
sc
• Is there a relationship
between these two
variables plotted?
• Correlation, the linear
measure of dependence
is 0.
• How to quantify that a
relationship exists?
1.0
Mutual Information
-1.0
-0.5
0.0
s
0.5
1.0
Entropy
• Entropy is a measure of randomness. The more
random a variable is, the more entropy it will
have.

H( X)   E[log( f ( x ))]    f ( x ) log( f ( x ))dx

f(x)
f(x)
Mutual Information
• Mutual information is a general information theoretic
measure of the dependence between two variables.
• The amount of information gained about X when Y is
learned, and vice versa.
• I(X,Y) = 0 if and only if X and Y are independent
I( X, Y )  H ( X )  H ( Y )  H ( X, Y )
  E[log( f ( x ))]  E[log( f ( y))]  E[log( f ( x, y))]

  f ( x , y ) 
 f ( x, y) 
 E log 
    f ( x, y) log 
dxdy
f ( x )f ( y ) 

  f ( x )f ( y )  

Mutual Information Sample Statistic
1 N   f̂ ( x i , yi ) 

Î( X, Y)   log 
N i 1   f̂ ( x i )f̂ ( yi ) 
• Requires Monte-Carlo procedure to
determine significance. (See later)
The theoretical basis for time series
models
• A random process is a sequence of random
variables indexed in time
X( t1), X( t 2 ), X( t 3 ), X( t 4 )....
X1, X2 , X3, X4 ....
• A random process is fully described by defining
the (infinite) joint probability distribution of the
random process at all times
F( X( t1), X( t 2 ), X( t 3 )....)  P(X( t1)  x1, X( t 2 )  x 2 , X( t 3 )  x3,...)
  
f ( x1, x 2 , x 3....) 
...F( x1, x 2 , x 3...)
x1 x 2 x 3
Random Processes
• A sequence of random variables indexed in time
• Infinite joint probability distribution
X1, X2 , X3, X4 ....
f ( x1, x 2 , x3, x 4 ....)
f ( x t , x t 1,..., x t d )
f ( x t | x t 1,..., x t d ) 
 f (x t , x t 1,..., x t d )dx t
xt+1 = g(xt, xt-1, …,) + random innovation
(errors or unknown random inputs)
Classification of Random Quantities
A time series constitutes a possible realization of a random
process completely described by the full (infinite) joint
probability distribution
Bras, R. L. and I. Rodriguez-Iturbe, (1985), Random Functions and Hydrology, Addison-Wesley, Reading,
MA, 559 p.
The infinite set of all possible
realizations is called the Ensemble.
Bras, R. L. and I. Rodriguez-Iturbe, (1985), Random Functions and Hydrology, Addison-Wesley, Reading,
MA, 559 p.
Random process properties are formally
defined with respect to the ensemble.
First order marginal density function
f(x(t))
from which the mean and variance can be
evaluated

m( t ) 
 x( t)f (x( t))dx( t)


2 ( t ) 
2
[
x
(
t
)

m
(
t
)]
f ( x( t ))dx( t )


Stationarity
A strictly stationary stochastic process {xt1,
xt2, xt3, …} has the same joint distribution as
the series of {xt1+h, xt2+h, xt3+h, …} for any
given value of h.
d
f ( X( t1 ), X( t 2 ),..., X( t N ))  f ( X( t1  h ), X( t 2  h ),...X( t N  h ))
This applies for all values of N, i.e. all orders of
joint distribution function
Stationarity of a specific order
• 1st Order. A random process is classified as first-order
stationary if its first-order probability density function
remains equal regardless of any shift in time to its time
origin
d
f(x(t1)) = f(x(t1+h)) for any value of h
• 2nd Order. A random process is classified as second-order
stationary if its second-order probability density function
does not vary over any time shift applied to both values.
d
f(x(t1), x(t2)) = f(x(t1+h), x(t2+h)) for any value of h
This means that the joint distribution is not a function of the
absolute values of t1 and t2 but only a function of the lag
=(t2-t1)
First order stationarity
d
f(x(t1)) = f(x(t2))  t1, t2
Stationarity of moments
m( t )  m
2 ( t )  2
Second order density function
f(x(t1), x(t2))
Second order moments
Cov( X( t1 ), X( t 2 ))  Cov( t1, t 2 )


 (X( t1)  m( t1))( X( t 2 )  m( t 2 ))f (x( t1), x( t 2 ))dx( t1)dx( t 2 )

Correlation
Cov( t1, t 2 )
( t1, t 2 ) 
( t1 )( t 2 )
Second order stationarity
f(x(t1), x(t2)) is not a function of the absolute values of t1 and t2
but only a function of the lag =(t2-t1)
Second moment stationarity
m( t )  m
2 ( t )  2
Cov( X( t1), X( t 2 ))  Cov( t 2  t1)  Cov( )
( t1, t 2 )  ( t 2  t1)  ()
Stationarity of the moments (weak or
wide sense stationarity)
2nd Moment. A random process is classified as 2nd Moment
stationary if its first and second moments are not a
function of the specific time.
mean: µ(t) = µ
variance: σ2(t)= σ
and:
covariance: Cov( X(t1), X(t2)) = Cov( X(t1+h), X(t2+h))
This means that the covariance is not a function of the
absolute values of t1 and t2 but only a function of the lag  =
(t2- t1).
-Subset of 2nd order stationarity
-For gaussian process equivalent to 2nd order stationarity
Periodic Stationarity
In hydrology it is common to work with data subject to
a seasonal cycle, i.e. that is formally non-stationary, but
is stationary once the period is recognized.
Periodic variable X y,m
y=year, m=month
Periodic first order stationarity
d
f(xy1,m) = f(xy2,m)  y1, y2 for each m
Periodic second moment stationarity
Cov(Xy,m1, Xy+,m2) = Cov(m1, m2, )
Ergodicity
•
•
•
•
Definitions givin are with respect to the ensemble
It is often possible to observe only one realization
How can statistics be estimated from one realization
The Ergodicity assumption for stationary processes
asserts that averaging over the ensemble is equivalent to
averaging over a realization

m

T
1
xf ( x )dx  lim
x( t )dt

T  T


0
T
1
2
   [x  m] f ( x )dx  lim
[
x
(
t
)

m
]
dt

T  T
2
2

 
Cov( ) 

0
T
1
(
x

m
)(
x

m
)
f
(
x
,
x
,

)
dx
dx

lim
( x ( t  )  m)( x ( t )  m)dt
2
1 2
1 2
 1

T
T 
 
0
Discrete representation
• A continuous random process can only be
observed at discrete intervals over a finite
domain
Z( t )  Zt , t  1,2,3...
• Zt may be averages from t-1 to t (Rainfall)
or instantaneous measurements at t
(Streamflow)
Markov Property
• The infinite joint PDF construct is not practical.
• A process is Markov order d if the joint PDF
characterizing the dependence structure is of dimension
no more than d+1.
Joint Distribution
f ( X t , X t 1,..., X t d )
Conditional Distribution
f ( X t , X t 1,..., X t d )
f ( X t | X t 1,..., X t d ) 
 f (X t , X t 1,..., X t d )dXt
Assumption of the Markov property is the basis for
simulation of time series as sequences of later values
conditioned on earlier values
Linear approach to time series
modeling
e.g. Xt=Xt-1+Wt
AR1
• Model structure and parameters identified to
match second moment properties
• Skewness accommodated using
– Skewed residuals
– Normalizing transformation (e.g. log, Box Cox)
• Seasonality through seasonally varying
parameters
Nonparametric/Nonlinear approach
to time series modeling
e.g. Multivariate nonparametric f̂ (X t , X t 1) estimated
directly from data then used to obtain f̂ (X t | X t 1) NP1
• 2nd Moments and Skewness inherited by distribution
• Seasonality through separate distribution for each season
Other variants
f̂ (X t | X t 1) Estimated directly using nearest neighbor
method KNN
f̂ (Xt | Xt 1)  LP(Xt 1)  Vt
Local polynomial trend
function plus residual