Transcript Slide 1

Emulation of a Stochastic Forest Simulator
Using Kernel Stick-Breaking Processes
(Work in Progress)
James L. Crooks (SAMSI, Duke University)
Background
●
●
●
●
We desire to predict the distribution of tree species in
the North Carolina forest under a variety of future
climate change scenarios.
Toward this end we can use the forest simulator
developed by J. Clark and P. Agarwal’s joint research
group.
This simulator models the life-cycle of individual trees
within a tree stand of pre-specified area.
Growth and fecundity are in part mediated by the
climate-influenced variables temperature and soil
moisture.
Motivation
●
●
The forest simulator has the following properties that
make emulation both important and difficult:
–
Its speed limits the physical area that can be simulated in
reasonable time (the current standard is 128 m x 128 m)
–
Its output is stochastic
–
Its output distribution can be non-gaussian
–
Its output distribution can vary over the input space.
Thus there is a need for a local, nonparametric
statistical method to emulate the entire output
distribution across in the input space.
Objectives
●
●
●
Run simulator with 3 species under “standard” climatic
conditions for 1000+ years to establish equilibrium
initial conditions.
Run simulator for a further 100 years at each of various
points in the climate input space (temperature and soil
moisture increase rates).
Emulate the output over this input space using the
Kernel Stick-Breaking Processes idea of Dunson and
Park (2006).
Summary of Input and Output Variables
Simulator Climate Input Variables
i indexes the run of the simulator
xi1 = Mean Temperature Increase / Century
xi2 = Mean Soil Moisture Increase / Century
Design Matrix (see below)
T
2
X i  1 xi1 xi 2 xi1

2
i2
x
xi1xi 2

Simulator Output Variables
yi1 = Final Number of Adult Trees of Species 1
yi2 = Final Number of Adult Trees of Species 2
yi3 = Final Number of Adult Trees of Species 3
We will focus
on number of
adult trees.
Forest Simulator
output for the
1001 year
initialization run.
Legend
— Total
— Species 1
— Species 2
— Species 3
Justifying the Choice of Model
• We expect that the mean response will be suppressed at
Number of Trees
extreme values of climate variables.
Single Regression
Surface
Climate Variable (Temperature or Soil Moisture Increase Rate)
→Therefore we model the mean response as:
6

μ ij  exp X ikβijk ,
 k 1

i  {1,..., N }
j  {1, 2,3}
k  {1,...,6}
i indexes simulator run
j indexes the tree species
k indexes the regression coef.
with a design matrix having up to quadratic terms
●
We do not a priori expect the output distribution to be Gaussian
anywhere on the input space.
Number of Trees
→ Use a non-parametric (Dirichlet Process) infinite mixture of
regression surfaces instead of a single surface.
Finite (Truncated)
Mixture of
Regression Surfaces
Climate Variable (Temperature or Soil Moisture Increase Rate)
●
We do not a priori expect the shape of the output distribution to
be constant over the input space.
→ Use the Kernel Stick-Breaking Process of Dunson and Park
(2006) to allow the DP mixture to be predictor-dependent.
Negative Binomial Likelihood
●
●
The output variable of interest is number of adult trees of each
species. Why not use a Poisson likelihood?
Preliminary data show Var[y] scales roughly like E[y]2, not
E[y], and Var[y] is also inversely dependent on the forest area.
→Use the negative binomial distribution, which has pmf:
Γ(y  ν)  ν 
f y | μ, ν  


Γ(y  1) Γ(ν)  ν  μ 
ν
 μ 


 νμ 
y
and moments:
Ey | μ, ν  μ
μ2
Var y | μ, ν  μ 
ν
where the prior range of n can be increased with area.
The Full Model


 

 
  
f y i | ν    f y i | βi , ν dG x i βi ,


i  1,...,N
3
  
f y i | βi , ν   NegBinν j , μ ij ,
j1
 




G x i βi   W x i ; Vl , Γl

l 1




 
ml
 6

μ ij  exp  X ikβijk 
 k 1




1  W x i ; Vm , Γ m G l βi ,

  



 
Kernel Stick-Breaking
W x i ; Vl , Γl  Vl K x i , Γl ,
Process
 
2
2
K x i , Γl  exp  ψ1 x i1  Γl1   ψ 2 x i2  Γl 2 




Γl ~ DiscreteGrid,
Vl ~ Beta1,α ,
G l ~ DPηG0 ,


  
G 0 ~ Mat rixNormal β0 , Σ0 , Φ0 ,

β0 ~ Mat rixNormal,
 1  1
Σ0 , Φ0 ~ Wishart
ψ1,2 ~ LogNormal
ν j ~ DiscreteGrid,
j  {1,2,3}
Comments on the Model
●
This model, unlike Dunson and Park’s original, lacks
conjugacy between f and G0; thus two changes must be
made to their algorithm:

– We no longer have the full conditional for β i, so we must use
a Metropolis-Hastings step to update it.




  
– The integral  f yi | βi , ν dG 0 βi cannot be evaluated exactly
so we must approximate it numerically using (e.g., ) MonteCarlo integration.
●
The original MATLAB code is itself not fast, but once a
posterior sample has been generated it is cheap to
predict the output pmf at new points in the input space.
Generating Simple Climate Change
Scenarios
●
The ballpark estimates of today’s (soil moisture,
temperature) mean and covariance are:
mean  16.61,18.94
●
●
 1.78  0.19
cov  

  0.19 14.38
The 1000+ year initialization run has temperature and
soil moisture generated by a MVN with this mean and
covariance.
Temperature is measured in °C and soil moisture in %.
●
Future 100 year scenarios are generated assuming the
means change linearly in time with rates given by the
points on plot below:
• GCM’s generally
predict hotter, drier
conditions for the
Southeastern US.
•Accordingly, ranges
were:
[-1,+2]*SD/century
for Temperature and
[-2,+1]*SD/century
for Soil Moisture.
Shown are the
generated soil
moisture and
temperature used
in the
initialization run,
and three
generated future
scenarios.
Climate change
begins at year
1052.
Legend
— Stable Climate
— Hotter/Drier
— Cooler/Wetter
Results
●
I just got the initialization run back last week, so
ask me in 3 months.
Other Thoughts
●
●
May need to continue the initialization run another
500-1000 years to get a better equilibrium.
Need a lot more runs when using nonparametrics
anyway, so the benefits of using a Latin HyperCube design are less obvious (in 2-D anyway).
Acknowledgements
●
●
●
Jim Clark’s group for use of their simulator, and
especially Sean McMahon for his invaluable assistance.
David Dunson and Ju-Hyun Park for explaining their
paper to me and letting me use their algorithm.
The SAMSI Methodology and Terrestrial Models
Working Groups for fruitful discussions.
References
Dunson, D. B., and J.-H. Park, “Kernel Stick-Breaking Processess”, ISDS Discussion Paper
22 (2006) and Biometrika (accepted)
Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable simulator for forest
dynamics”, Symposium on Computational Geometry 2004: 106-115
Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, “A scalable algorithm for
dispersing populations”, Journal of Intelligent Information Systems 2004 (online)