Regression Trees generated for each

Download Report

Transcript Regression Trees generated for each

14 January 2009
2009 AMS Artificial Intelligence
Conference
A Data Mining Approach to Soil
Temperature and Moisture
Prediction
Bill Myers
Seth Linden, Gerry Wiener
Project Overview and Goals
• Improve soil temperature and moisture
prediction
• Integrate and Evaluate NASA-MODIS data sets
– Leaf Area Index (LAI)
– Green Vegetation Fraction (i.e. FPAR)
– Albedo
• Deliver tailored products to end users
– Soil forecasts will drive Agriculture-specific models (e.g.
pest models)
– RAL partnered with DTN/Meteorlogix
– DTN DSS delivers Ag-specific forecasts to 80,000 users
Soil State Prediction
• Current soil state modified by
atmospheric forcing
conditions
Solar
Energy
Weather
• Heat and moisture are
transferred between adjacent
nodes
• Typically done with a physical
model, called a Land-Surface
Model (LSM)
Subsurface
Nodes
Fixed
Node
Physical Model
• This project uses the High Resolution Land Data
Assimilation System and the Noah LSM
– Used by NCEP as part of the NAM (WRF model)
• Many parameters are necessary to model soil
type and land surface characteristics
– Affect incident solar energy, heat transfer, etc
– Parameters must be generalized
• “Sandy loam” will have same parameterization at all sites
• Chemical compositions of “sandy loam” differ between sites
– Heat and moisture transfer will not be exact at ANY site
• Goal of this study:
Determine if a data mining approach can produce
results comparable to those of the physical
model
Data Mining System
•
Regression Tree (Cubist)
–
–
–
–
–
Available from www.rulequest.com
Looks for patterns in data
Builds rule-based numerical models
Rules are developed based on training data
At each leaf node, a regression equation is developed that best fits that
subset of the training data
– Effectively, linear approximations are being made when certain conditions
are met
– Soil state forecasts are generated by applying rule set to forecast data
•
Training Data
– 29 Soil Climate Analysis Network (SCAN) sites
– Two years of observational history at each site used to develop rules
– NCAR scientists were consulted to determined most important inputs to soil
state evolution
– These were extracted or derived from observed variable set
Regression Tree Model Generation
• 10 Regression trees were developed for each site
– One regression tree for soil temperature and soil moisture at
each depth (5, 10, 20, 50, 100 cm)
• Input variables:
–
–
–
–
–
–
–
–
Julian day
Air Temperature
Delta air temperature (in current hr)
Downward Shortwave Radiation
Wind Speed
Dew point temperature
Precip amt
Previous soil state:
• Previous hour’s soil temperature and moisture at adjacent depths
• A target variable (e.g. Current Soil Temp at 5 cm)
was provided with each hour’s data
Example training data
•
| Names file for 5cm temperature prediction
•
ST5_curr
| Predictand in list of variables below
•
•
•
•
•
•
•
•
•
•
•
•
•
•
siteID:
date:
mon:
AirT:
deltaT:
dsw:
wspd:
TD:
qpf:
ST5_prev:
ST10_prev:
SM5_prev:
SM10_prev:
ST5_curr:
ignore
ignore
continuous
continuous
continuous
continuous
continuous
continuous
continuous
continuous
continuous
continuous
continuous
continuous
| SCAN site ID
| YYYYMMDDHH
| fraction of Julian year
| 2m air temp (avg over last hr)
| air temp change over last hour
| avg downward shortwave radiation over last hr
| avg wind speed over last hour
| avg dew point temp over last hour
| precip amt over last hour
| 5 cm soil temp at previous hour
| 10 cm soil temp at previous hour
| 5 cm soil moisture at previous hour
| 10 cm soil moisture at previous hour
| 5 cm soil temp at previous hour
Sample line of training data
2001, 2007110211, 0.9167, 4.53, -0.89, 0.00, 2.81, -3.28, 0.00, 8.158, 9.847, 33.858, 39.616, 8.32
Time of year
Air Temp
Wind
Speed
No Precip
Dewpoint
Temp
Air Temp Falling
in this hour
No downward
Radiation (night)
Previous hour’s
soil moisture at
5 cm and 10cm
Previous hour’s
soil temperature at
5 cm and 10cm
Current hour’s
5 cm Soil T
(Predictand)
Rules Development and Application
• Regression Trees generated for each predictand at each site
– Separate tree for Soil Temperature and Moisture at each depth
– Two years of training data for most sites
– Example rule and associated regression:
if dsw <= 0.09 and ST5_prev > 12.05
ST5_curr = -0.211 + 0.3165 dsw + 0.83 ST5_prev +
0.13 ST10_prev + 0.02 AirT + 0.02 TD
• 48 hour forecasts were generated iteratively
– Starting with observed soil state and first hour’s weather predictions
– Regression trees were applied for each predictand to generate forecast
state at hour 1
– Using the forecast soil state and weather predictions, the next hours’
forecasts were generated iteratively
• Soil forecasts generated for 2007 growing season (April-June)
– Data Mining and HRLDAS forecasts were compared to observations
Results
• Statistically, data mining better than HRLDAS at
nearly all the 29 SCAN sites
• Median (and quartile) MAEs significantly lower
for data mining
• Data mining errors generally 30%+ lower than
HRLDAS errors
Soil Temperature Errors
Data Mining Solid Lines, HRLDAS dashed
4
5 cm
degC
3
10 cm
2
20 cm
1
50 cm
0
0
1
M
Quartile
3
4
Summary
•
Data mining with Cubist Regression Trees
•
Applicability to non-observing sites
•
•
•
•
•
•
•
•
•
Reduces soil temperature and moisture errors
Simple to develop rules
Rules/Regressions can be displayed easily
Regression Tree forecasts tuned to the site
HRLDAS forecast parameters are more generic
Rules, as developed are site specific
Not valid away from that location
HRLDAS can generate forecasts at any location
Observing sites do not begin to cover all land use
and soil type combinations
Future Directions
•
•
•
Add vegetation state (from NASA MODIS data) to
data mining training sets to determine see these
results can be improved upon
Train Cubist with all obs sites lumped together but
include land use and soil type as input variables
Investigate combining data mining approach and LSM
to get best of both
Acknowledgements
This research effort has been supported by a NASAROSES grant.
We appreciate the help provided by personnel at the
USDA Natural Resources Conservation Service, and
various NASA labs.
Soil forecast web site:
www.rap.ucar.edu/projects/nasa-ag/
hrldas/display_hrldas_animation.html
Cubist is available at www.rulequest.com