Transcript waller
Overcoming Chance Agreement in Classification Tree
Modeling: Predictor Variables, Training Data, and
Spatial Autocorrelation Considerations
Eric Waller
Colorado Division of Wildlife
Southwest Regional GAP Project
Arizona, Colorado, Nevada, New Mexico, Utah
US-IALE 2004, Las Vegas, Nevada: Transdisciplinary Challenges in Landscape Ecology
SWReGAP Approach
• Land Cover Mapping:
Classification Trees
– Satellite Image
Classification?
– Ecological
Modeling?
– Hybrid? (Be wary of
ancillary variables!)
Hybrid Implementation
– Iterative: Ecological modeling within spectral (NLCD)
strata?
– Single Model: “Kitchen Sink” approach?
• Captures nuances / avoids error in spectral strata
Classification Trees
“Need piles of data”
Pseudoreplication
• Pseudoreplication (multiple sampling within a polygon)
- Recommended by EROS Data Center (EDC)
Pseudoreplication: Pros and Cons
Exacerbates overfitting?
– Classification tree strategies
• Boosting, Cross-validation / pruning
– Rely on independent data
– Pseudoreplication = Non-independent data
Benefits of additional data?
- Swamp anomalous data
Autocorrelation
• Satellite imagery?
• Related to land cover?
• DEM-derived variables?
• When combined with pseudoreplication, the explanatory
power of those predictor variables is inflated.
– Even advanced classification tree techniques will be fooled.
Chance Agreement
Autocorrelation in predictor variables that lack strong
explanatory power, when combined with
pseudoreplication and a lack of sampling of the range of
classes for a given value or combination of those
predictor variables, leads to repeated chance agreement
between the "bad" predictor variables and the land cover
that fools the classification tree model.
Sampling
- Only one class for a given slope/aspect combination?
Slope
Mountain mahogany
Aspect
• Need to sample the range of classes (e.g. ponderosa
pine, mountain mahogany, etc.) that occur for any
particular combination of predictor variables
• Limit predictor variables used in modeling
(for those not strongly correlated with land cover)
Example Over Golden, Colorado
Landsat ETM+ Image
Path 34, Row 32
Summer, 2000
DEM and DEM-Derived Variables
Elevation
Slope
Aspect
Landform
autocorrelated
autocorrelated
autocorrelated
autocorrelated
Classification Comparison
Foothill Grass
Invasive Grass
Foothill Shrub
Ponderosa Pine
Mixed Conifer
Wooded Riparian
Residential
Agriculture
Urban
Mining
Water
Classification With DEMDerived Variables
Classification Without
DEM-Derived Variables
Matrix Overlay – Aerial Photography
- A matrix can be used to highlight areas of disagreement.
- Air photos can be used to resolve discrepancies.
Conclusions
• This presentation demonstrated a method for dealing
with problems associated with using DEM-derived
variables in classification tree modeling.
• The approach does not guarantee improvement upon a
more spectrally derived land cover product.
• Future efforts may want to establish a sampling strategy
that ensures, a priori, that training data represent the
range of classes across landscape variability, especially
if DEM-derived variables are to be used.