Seminar_Logistic_Regression - Sortie-ND

Download Report

Transcript Seminar_Logistic_Regression - Sortie-ND

Seminar 7
Nonlinear Logistic Regression
of Susceptibility to Windthrow
Likelihood Methods in Forest Ecology
October 9th – 20th , 2006
Analysis of windthrow data

Traditionally: Summarize variation in degree and
type of damage, across species and tree sizes, from
the storm, as a whole...

A likelihood alternative: Use the spatial variation in
storm intensity that occurs within a given storm to
estimate parameters of functions that describe
susceptibility to windthrow, as a function of
variation of storm severity...
Categorizing Wind Damage

BINARY: Simple binary response variable (windthrown vs.
not...)

CATEGORICAL: Multiple categories (uprooted,
snapped,...)

ORDINAL: Ordinal categories (degree of damage): none,
light, medium, heavy, complete canopy loss {usually estimated
visually}

CONTINUOUS: just what the term implies, but rarely used
because of the difficulties of quantifying damage in these
terms...
Analysis of BINARY Data:
Traditional Logistic Regression
Consider a sample space consisting of two outcomes (A,B)
where the probability that event A occurs is p
Definition: Logit = log of an odds ratio (i.e log[p/(1-p)])
Benefits of logits
• A logit is a continuous variable
• Ranges from negative when p < 0.5 to positive when p > 0.5
Standard logistic regression involves fitting a linear
function to the logit:
log  p /(1  p)  a  bX1  cX 2  ...
What if your terms are multiplicative?
log  p /(1  p)  a  bX1 X 2  ...
Example: Assume that the probability of windthrow is a
joint (multiplicative) function of
(1) Storm severity, and
(2) Tree size
In addition, assume that the effect of DBH is nonlinear....
A model that incorporates these can be written as:
log( pisj /(1  pisj ))  as  cs * Si * DBH isjbs
A little more detail....
log( pisj /(1  pisj ))  as  cs * Si * DBH isjbs




Pisj is the probability of windthrow of the jth individual of species s
in plot i
DBHisj is the DBH of that individual
as, bs, and cs are species-specific, estimated parameters, and
Si is the estimated storm severity in plot i
-
NOTE: storm severity is an arbitrary index, and was allowed to range
from 0-1
But don’t you have to measure storm severity (not estimate it)?
Likelihood Function
logit  log( pisj /(1  pisj ))  as  cs * Si * DBH isjbs
elogit
pisj 
1  elogit
It couldn’t be any easier... (since the scientific model is already
expressed as a probabilistic equation):.
log( pisj ) if treeisj was windthrown

log - likelihood   

isj log( 1  pisj ) if treeisj was not windthrown 
An alternative, compound likelihood
function and scientific model
What if we assume that storm severity is not fixed within a
plot (Si), but rather varies for different trees within the
plot?
log( pisj /(1  pisj ))  as  cs * Sij * DBH isjbs
Sij  N ( Si ,  2 )
If we are willing to assume a constant variance across all plots,
this only requires estimation of a single additional parameter
(the variance: s2)
Example: Windthrow in the Adirondacks
Highly variable damage due
to:
• variation within storm
• topography
• susceptibility of species
within a stand
Reference:
Canham, C. D., Papaik, M. J., and Latty, E. F. 2001. Interspecific variation in susceptibility to
windthrow as a function of tree size and storm severity for northern temperate tree
species. Canadian Journal of Forest Research 31:1-10.
The dataset

Study area: 15 x 6 km area perpendicular to the storm path

43 circular plots: 0.125 ha (19.95 m radius) censused in 1996
(20 of the 43 were in oldgrowth forests)

The plots were chosen to span a wide range of apparent
damage

All trees > 10 cm DBH censused

Tallied as windthrown if uprooted or if stem was < 45o
from the ground
Critical data requirements

Variation in storm severity across plots

Variation in DBH and species mixture within plots
log( pisj /(1  pisj ))  as  cs * Sij * DBH isjbs
The analysis...
log( pisj /(1  pisj ))  as  cs * Sij * DBH isjbs

7 species comprised 97% of stems – only stems of
those 7 species were included in the dataset for
analysis

# parameters = 64 (43 plots + 3 parameters for
each of 7 species)

Parameters estimated using simulated annealing
Observed Proportion Windthrown
Model evaluation
1.0
62
0.9
67
0.8
70
111
0.7
123
0.6
174
0.5
0.4
312
0.3
527
0.2
0.1
Numbers
above bars
represent the
number of
observations
in the class
787
279
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Predicted Probability of Windthrow
The solid line is a
1:1 relationship
Estimating Storm Severity
Storm Severity Index
1.0
0.8
0.6
0.4
0.2
density
basal area
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Plot-Level Windthrow
Probability of Mortality
Results: Big trees...
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
70 cm DBH
ACRU
ACSA
BEAL
FAGR
PIRU
PRSE
TSCA
ALL
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Storm Severity
Probability of Mortality
Little trees...
1.0
10 cm DBH
0.9
0.8
0.7
0.6
0.5
ACRU
ACSA
BEAL
FAGR
PIRU
PRSE
TSCA
ALL
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Storm Severity
100
100
ACRU
80
60
60
40
40
20
20
0
Percent Windthrown
0
10
20
30
40
50
60
70
10
20
30
40
50
60
70
40
50
60
70
40
50
60
70
100
100
BEAL
80
60
40
40
20
20
0
0
10
20
30
40
50
60
70
PIRU
80
10
40
40
20
20
0
0
30
40
50
60
70
30
PRSE
80
60
20
20
100
60
10
FAGR
80
60
100
ACSA
80
10
20
30
DBH Class (cm)
New twists

Effects of partial harvesting on risk of windthrow
to residual trees

Effects of proximity to edges of clearings on risk
of windthrow
Research with Dave Coates in cedar-hemlock forests of
interior B.C.
Effects of harvest intensity and
proximity to edge…
Equation (1): basic model – probability of windthrow is a species-specific
function of tree size and storm severity:
log( pisj /(1  pisj ))  as  Si * cs * DBH isjbs
(1)
Equation (2) introduces the effect of prior harvest removal to equation (1) by
adding basal area removal and assumes the effect is independent and additive
log( pisj /(1  pisj ))  as  Si * (cs * DBH isjbs  hBA / 100)
(2)
Equation (3) assumes the effects of prior harvest interact with tree size:
log( pisj /(1  pisj ))  as  Si * cs * DBH isj(bs  hBA / 100)
Models 1a – 3a: test models where separate c coefficients are estimated for
“edge” vs. “non-edge” trees (edge = any tree within 10 m of a forest edge)
(3)
Other issues…

Is the risk of windthrow independent of the fate of
neighboring trees? (not likely)
- Should we examine spatially-explicit models that factor in
the “nucleating” process of spread of windthrow gaps?…
Analysis for CATEGORICAL
Response Variables

Extension of the binary case??:
- Estimate a complete set of species-specific parameters
for each of n-1 categories (assuming that the set of
categories is complete and mutually exclusive...)

# of parameters required = P + (n-1)*(3*S)
- Where P = # plots, S = # species, and n = # of response
-
categories
{Is this feasible?...}
Analysis for ORDINAL Response Variables

The categories in this case are ranked (i.e. none,
light, heavy damage)

Analysis shifts to cumulative probabilities...
Simple Ordinal Logistic Regression
p  Pr( y  Yk X i)
If
(i.e. the probability that an observation y will be less than or equal to ordinal
level Yk(k = 1.. n-1 levels) , given a vector of X explanatory variables),
Then simple ordinal logistic regression fits a model of the
form:
logit( p )  ak  b1x1  b2 x2  ...
Remember:
p  e logit /(1  e logit )
The probability that an event will fall into a single class k
(rather than the cumulative probability) is simply
pk  Pr( y  Yk X i)  Pr( y  Yk 1 X i)
In our case...
logit ( p )  aks  cs Si DBH isj
where
bs
p  Pr( y  Yk )
and where aks, cs and bs are species specific parameters (s = 1.. m
species), and Si are the estimated storm severities for the i = 1..n
plots.
# of parameters: M + (n-1+2)*Q, where M = # of plots,
n = # of ordinal response levels, and Q = # of species
The Likelihood Function Stays the Same
The probability that an event will fall into a single class k
(rather than the cumulative probability) is simply
pk  Pr( y  Yk X i)  Pr( y  Yk 1 X i)
Again, since the scientific model is already expressed as a
probabilistic equation:
log( pk ) if treeisj experienced damage level k 
log - likelihood   

log(
1

p
)
if
tree
isj
did
not
isj 
k

Hurricanes in Puerto Rico

Storm damage assessment in the permanent plot at the
Luquillo LTER site
-

Combined the data into a single analysis: 136 plots, 13
species (including 1 lumped category for “other” species),
and 3 damage levels:
-

Hurricane Hugo - 1989
Hurricane Georges – 1998
No or light damage
Partial damage
Complete canopy loss
Total # of parameters = 188 (15,647 trees)
Parameter Estimation
0
2500000
5000000
-11300
Likelihood
-11320
-11340
-11360
-11380
-11400
-11420
-11440
-11460
Iteration
Solving simultaneously for 188 parameters in a
dataset containing > 15,000 trees takes time...
Model Evaluation
Goodness of Fit
Combined Hugo and Georges Analysis
Observed Proportion
1.0
0.8
Complete Damage
Partial Damage
No Damage
1:1 Line
0.6
0.4
0.2
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Predicted Probability
Comparison of the two storms...
Figure 1. Relative Variation in Severity
of Hurricanes Hugo and Georges
0.30
Hugo
Georges
Statistics on variation in storm
severity from Hurricanes Hugo
and Georges
Proportion of Plots
0.25
0.20
0.15
0.10
0.05
Severity
n
minimum
maximum
mean
S.D.
Hugo
96
0.20
1.00
0.63
0.17
Georges
40
0.02
0.69
0.44
0.16
0.00
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Storm Severity Index
0.8
0.9
1.0
Support for the Storm Severity
Parameter Estimates
Support limits for the 136 estimates of storm severity
were not particularly “tight”
Range of 95% Support Limits
Mean
0.292225
Median
0.269201
Standard Deviation
0.094282
Minimum
0.134572
Maximum
0.658898
Remember that the storm severity
parameter values range from 0 - 1
Support for the Species-specific
Parameters
Strength of support for the species-specific
parameters was better, but still not great...
Range of 95% support limits
Mean
19.80
Standard Error
3.55
Median
14.00
Minimum
3.50
Maximum
155.00
Range of the 1.92 Unit Support Intervals, as a
% of the parameter estimate
None
Medium
Complete
0.8
1
BUCCAP
Probability of Damage
Probability of Damage
1
0.6
0.4
0.2
0
None
Medium
Complete
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
0
Storm Severity
0.2
0.4
0.6
0.8
1
Storm Severity
1
1
None
Medium
Complete
0.8
CECSCH
Probability of Damage
Probability of Damage
ALCLAT
0.8
0.6
0.4
0.2
CASARB
0.8
0.6
None
Medium
Complete
0.4
0.2
0
0
0
0.2
0.4
0.6
Storm Severity
0.8
1
0
0.2
0.4
0.6
Storm Severity
0.8
1
Critical assumptions

Probability of damage to a tree in Georges was
independent of damage in Hugo (actually true…)

The “parallel slopes” model is reasonable

Others ?