ANN Regression

Download Report

Transcript ANN Regression

Gap filling using a Bayesian-regularized neural
network
B.H. Braswell
University of New Hampshire
Proper Credit
MacKay DJC (1992) A practical Bayesian framework
for backpropagation networks. Neural Computation,
4, 448-472.
Bishop C (1995) Neural Networks for Pattern
Recognition, New York: Oxford University Press.
Nabney I (2002) NETLAB: algorithms for pattern
recognition. In: Advances in Pattern Recognition, New
York: Springer-Verlag.
Two-layer ANN is a nonlinear regression
Two-layer ANN is a nonlinear regression
M
 d

˜

yˆ k (x)  f 
z
f
w
x




kj
ji
i


i 0

j 0
usually linear
usually nonlinear

e.g., tanh()
Neural networks are efficient with respect to
number of estimated parameters
Consider a problem with d input variables
Polynomial of order M:
Np ~ dM
Neural net with M hidden nodes: Np ~ d∙M
Avoiding the problem of overfitting:
Early
stopping
Regularization
Bayesian
methods
Avoiding the problem of overfitting:
Early
stopping
Regularization
Bayesian
methods
Avoiding the problem of overfitting:
Early
stopping
Regularization
Bayesian
methods
1
Artificial neural networks
An artificial neural network (ANN) is a functional mapping of a vector x containing d inputs, into a
vector y containing c outputs. An ANN consists o layersΣ, each having M odesΣ. A node is a
linear transformation of inputs followed by application of a prescribed function. The outputs of all
nodes in a layer are collected into a new vector and input into the next layer. For a two-layer
network,
 M
 d (1) 
(2) ˜
yˆ k (x)  f 
 w kj f  w ji x i 
,
i 0

j 0
(1)
where f is a function, typically f(a)=a, and f˜ is a diff erent function that is usually nonlinear (e.g.,
tanh(a)). The two matrices w(1) and w(2) represent the free parameters in the regression and include

bias terms for j=0 and i=0.

For parameter estimation, the standard
backpropagation algorithm is typically used. This method
updates the weights and biases w for the N pairs of observed data vectors x and yφin order to
minimi ze the error:
N
c


E(w)    y (nk ) (w)  yˆ (nk )
2
n1 k1
by estimating the derivative of E(w) with respect to w.

(2)
2
3
Gaussian approximation to the posterior distribution
To estimate the uncertainty of the predictions, we use a Gaussian approximation to the posterior
distribution for the weights and perform a Taylor series expansion around the most probable values
wMP,
1
S ( w )  S ( w MP )  ( w  w MP ) T A(w  w MP ) ,
2
(6)
where A is equal to:
A  S MP  E D(MP )  I
(7)
This is the Hessian matrix of the error function (Equation 5), and its elements can be calculated
numerically during the backpropagation. The expansion in Equation 6 allows us to rewr ite the
posterior distribution for the weights as:
p(w | D) 


1
1
expS(wMP )  wT Aw,
*


ZS
2
(8)
where w  w  w MP , and Z S* is given by



Z S* (,  )  (2 )W / 2 A

1/ 2
exp S(wMP ).
(9)
4
Posterior distribution of outputs
The distribution of network outputs (and thus the uncertainty of the prediction) is estimated by
assuming that the width of the posterior distribution is not extremely broad, and so the predictions
are expanded as
yˆ (w)  yˆ (wMP )  gT (w  wMP )
(10)
g   w yˆ |w MP .
(11)
where

The posterior distribution of the predictions is given by


p(y | D) 
 p(y | w) p(w | D)dw
In the Gaussian approximation for the posterior of the weights (Equation 8), and assuming zero
mean Gaussian noise, this becomes
 
 1

2 
p(y | D)   exp y  yˆ (w) exp w T Awdw
 2
  2




(12)
(13)
Thus, substituting Equation 10, and given Equations 7 and 9, the posterior distribution for the
outputs p(y | x,D) is normal, with standard deviation
 y2 
1

 gT A 1g
(14)
5
6
Previous Work
Hagen SC, Braswell BH, Frolking, Richardson A,
Hollinger D, Linder E (2006) Statistical uncertainty of
eddy flux based estimates of gross ecosystem carbon
exchange at Howland Forest, Maine. Journal of
Geophysical Research, 111.
Braswell BH, Hagen SC, Frolking SE, Salas WE
(2003) A multivariable approach for mapping subpixel
land cover distributions using MISR and MODIS: An
application in the Brazilian Amazon. Remote Sensing
of Environment, 87:243-256.
ANN Regression for Land Cover Estimation
Band1
Forest Fraction
Band2
Secondary Fraction
Band3
Cleared Fraction
Band4
Training data supplied
by classified ETM imagery
ANN Regression for Land Cover Estimation
Secondary
Forest
ETM+
Cleared
1.0
0.4
0.6
0.0
0.0
0.0
1.0
0.4
0.6
0.0
0.0
0.0
observed
MISR
predicted
Mean Val.
Error=0.045 km
(R2=.62)
2
Mean Val.
Error=0.038 km
(R2=.58)
2
Mean Val.
Error=0.025 km
(R2=.47)
2
ANN Estimation of GEE and Resp, with Monte Carlo
simulation of Total Prediction uncertainty
Clim
Flux
ANN Estimation of GEE and Resp, with Monte Carlo
simulation of Total Prediction uncertainty
Weekly GEE from Howland Forest, ME based on NEE
Some demonstrations of the MacKay/Bishop
ANN regression with 1 input and 1 output
1.4
Noise=0.10
Linear Regression
Noise=0.10
ANN Regression
Noise=0.10
ANN Regression
Noise=0.02
ANN Regression
Noise=0.20
ANN Regression
Noise=0.20
ANN Regression
Noise=0.10
ANN Regression
Noise=0.05
ANN Regression
Noise=0.05
ANN Regression
Noise=0.05
Issues associated with
multidimensional problems

Sufficient
sampling of the the input space

Data
normalization (column mean zero and
standard deviation one)

Processing

Algorithm
time
parameter choices
Our gap-filling algorithm
1.Assemble meteorological and flux data in an Nxd table
2.Create five additional columns for sin() and cos() of time of
day and day of year, and potential PAR
3.Standardize all columns
4.First iteration: Identify columns with no gaps; use these to fill
all the others, one at a time.
5.Create an additional column, NEE(t-1), flux lagged by one
time interval
6.Second iteration: Remove filled points from the NEE time
series, refill with all other columns
Room for Improvement
1.Don’t extrapolate wildly, revert to time-based filling in areas
with low sampling density, especially at the beginning and end
of the record
2.Carefully evaluate the sensitivity to internal settings (e.g.,
alpha, beta, Nnodes)
3.Stepwise analysis for relative importance of driver variables
4.Migrate to C or other faster environment
5.Include uncertainty estimates in the output
6.At least, clean up the code and make it available to others in
the project, and/or broader community