Transcript Estimation

Estimation
Method of Moments (MM)
Methods of Moment estimation is a general method where equations for
estimating parameters are found by equating population moments with the
corresponding sample moments:
 Y
 2  s2
 
E Y3
et c.
1 n 3
  Yi
n 1
Trivial MM estimates are estimates
of the population mean ( ) and the
population variance ( 2).
The benefit of the method is that the
equations render possibilities to
estimate other parameters.
Mixed moments
Moments can be raw (e.g. the mean) or central (e.g. the variance).
There are also mixed moments like the covariance and the correlation (which are
also central).
MM-estimation of parameters in ARMA-models is made by equating the
autocorrelation function with the sample autocorrelation function for a sufficient
number of lags.
For AR-models: Replace k by rk in the Yule-Walker equations
For MA-models: Use developed relationships between k and the parameters
1 , … , q and replace k by rk in these.
 Leads quickly to complicated equations with no unique solution.
Mixed ARMA: As complicated as the MA-case
Example of formulas
AR(1):
AR(2):
k   k  ˆ  r1
 k  1   k 1  2   k  2
  k  1   k 1  2   k  2
Set rk  ˆ1  rk 1  ˆ2  rk 2 for k  1,2

2


r

1

r
r

r
2
ˆ  2 1
ˆ1  1
and

2
1  r12
1  r12
MA(1):
 0   e2  1   2  ;  1     e2

1 

1 2
 ˆ
Set r1 
1  ˆ 2
ˆ  
1
 ˆ 2   ˆ  1  0 with solut ions
r1
1
1

1
2
2r1
4r1
If r1  0.5  Real - valued roots (necessary!)
ˆ  
1
1

1
2
2r1
4r1
Only one solution at a
time gives an invertible
MA-process
The parameter e2:
Set 0 = s2



ˆ e2 ( MM)  1  ˆ1 (MM)  r1    ˆp(MM)  rp  s 2
ˆ e2 ( MM) 
ˆ e2 ( MM)

1  ˆ1
s2
    ˆ 
(MM) 2
for MA(q)
(MM) 2
q


2
(MM)
ˆ
1  1

1  2  ˆ1 (MM) ˆ1(MM)  ˆ1(MM)

for AR( p)

2
 s2
for ARMA(1,1)
Example
Simulated from the model
Yt  1.3  0.2  Yt 1  et
;  e2  4
> ar(yar1,method="yw")
Call:
ar(x = yar1, method = "yw")
“Yule-Walker (leads to
MM-estimates)
Coefficients:
1
0.2439
Order selected 1
sigma^2 estimated as
4.185
Least-squares estimation
Ordinary least-squares
Find the parameter values p1, … , pm that minimise the square sum
S  p1 ,, pm    Yi  E Yi X, p1 ,, pm 
n
2
i 1
where X stands for an array of auxiliary variables that are used as predictors
for Y.
Autoregressive models
Yt    1  Yt 1       p  Yt  p    et
The counterpart of S (p1, … , pm ) is
Sc 1 ,,  p ,   
 Y       Y
n
t  p 1
t
1
t 1
Here, we take into
account the possibility
of a mean different
from zero.

       p  Yt  p   
2
Now, the estimation can be made in two steps:
1) Estimate  by Y
2) Find the values of 1 , …, p that minimises

  Y  Y    Y
Sc 1 ,,  p , Y 
n
t  p 1
t
1
t 1


 Y     p  Yt  p  Y

2
The estimation of the slope parameters thus becomes conditional on the
estimation of the mean.
The square sum Sc is therefore referred to as the conditional sum-of-squares
function.
The resulting estimates become very close to the MM-estimate for moderately
long series.
Moving average models
More tricky, since each observed value is assumed to depend on unobservable
white-noise terms (and a mean):
Yt    et  1  et 1   t q  et q
As for the AR-case, first estimate the mean and then estimate the slope
parameters conditionally on the estimated mean, i.e.
Substitute Yt '  Yt  Y for Yt   in the model
For an invertible MA-process we may write
Yt '   1 1 ,, q  Yt '1   2 1 ,, q  Yt ' 2    et
where the  i ' s are known functions of 1 ,, q
The square sum to be minimized is then generally
S  ,,    e    Y
2
c
1
q
t
t
'
   ,,  Y
1
1
q
'
t 1
2
 ,,  Y
1
q
'
t 2
Problems:
• The representation is infinite, but we only have a finite number of
observed values
• Sc is a nonlinear function of the parameters 1 , … , q  Numerical
solution is needed
Compute et recursively using the observed values Y1, … , Yn and setting
e0 = e–1 = … = e–q = 0 :
et  Yt  1et 1   q  et q
for a certain set of values 1 , … , q
n
Numerical algorithms used to find the set that minimizes
2


e
 t
t 1
 
2
Mixed Autoregressive and Moving average models
Least-squares estimation is applied analogously to pure MA-models.
et –values are recursively calculated setting ep = ep – 1 = … = ep + 1 – q = 0
Least-squares generally works well for long series
For moderately long series the initializing with e-values set to zero may have too
much influence on the estimates.
Maximum-Likelihood-estimation (MLE)
For a set of observations y1, … , yn the likelihood function (of the parameters) is a
function proportional to the joint density (or probability mass) function of the
corresponding random variables Y1, … , Yn evaluated at those observations:
L p1,, pm   fY1 ,,Yn y1,, yn p1,, pm 
For a times series such a function is not the product of the marginal
densities/probability mass functions.
We must assume a probability distribution for the random variables.
For time series it is common to assume that the white noise is normally distributed,
i.e.
et ~ N 0, e2 
 e1 
 
 e2 
2
~
MVN
0
;

I
e

 
e 
 n


with known joint density function

f e e1 ,, en   2  

n
2 2
e

1
 exp
2
 2  e
2
e

t 
t 1

n
For the AR(1) case we can use that the model defines a linear transformation
to form Y2, …, Yn from Y1, …, Yn–1 and e2, …, en
 Y2   
 Y1     e2 



  
 Y3   
 Y2     e3 
           



  
Y   
Y     e 
 n

 n 1
  n
This transformation has Jacobian = 1 which simplifies the derivation of the
joint density for Y2, …, Yn given Y1 to

fY2 ,,Yn Y1  y1 ,, yn y1   2  

n 1
2  2
e

1
 exp
2
 2  e
n

t 2

2
  yt        yt 1    
Now Y1 should be normally distributed with mean  and variance e2/(1– 2)
according to the derived properties and the assumption of normally distributed e.
Hence the likelihood function becomes


L  ,  ,  e2  fY2 ,,Yn Y1  y1 ,, yn y1  fY1  y1  

 2  

 2  

n 1
2  2
e

1
 exp
2
 2  e
  1   
n
2 2
e
1
2 2
n
n
  y
t 2
       yt 1   
2
t
 
 



2


 
2 
1 
 
2
e

1
2
2


y1    
 exp

2
2 
 2   e 1 



1


 exp

S

,


2
2


e




where S ,μ    Yt       Yt 1     1   2  Y1   
2
2
t 2
and the MLEs of the parameters  .  and e2 are found as the values that
maximises L

Compromise between MLE and Conditional least-squares:
Unconditional least-squares estimates of  and  are found by minimising
n


S  , μ    Yt       Yt 1     1   2  Y1   
2
2
t 2
The likelihood function can be put up for ant ARMA-model, however it is
more involved for models more complex than AR(1).
The estimation need (with a few exceptions) to be carried out numerically
Properties of the estimates
Maximum-Likelihood estimators has a well-established asymptotic theory:
ˆMLE is asymptotic
ally unbiased and ~N  , I 1  
  log L   2 
where I   is the Fisher information E 
 


 

Hence, by deriving large-sample expressions for the variances of the point
estimates, these can be used to make inference about the parameters (tests and
confidence intervals)
See the textbook
Model diagnostics
Upon estimation of a model, its residuals should be checked as usual.
Residuals eˆt  Yt  Yˆt
should be plotted in order to check for
• constant variance (plot them against predicted values)
• normality (Q-Q-plots)
• substantial residual autocorrelation (SAC and SPAC plots)
Ljung-Box test statistic
Let rˆk  SAC at lag k for theobtained residuals eˆt  and define
K
Q*, K  n  n  2
j 1
rˆj2
n j
If the correct ARMA(p,q)-model is estimated, then Q*,K follows a Chi-square
distribution with K – p – q degrees of freedom.
Hence, excessive values of this statistic indicates that the model has been
erroneously specified.
The value of K should be chosen large enough to cover what can be expected to
be a set of autocorrelations that are unusually high if the model is wrong.