athens2007traits

Download Report

Transcript athens2007traits

Analysing the link between traits & invasive spread in
German flora: accounting for residence time
Joint work between
Eva Küster, Ingolf Kühn ~ UFZ
Adam Butler, Stijn Bierman, Glenn Marion ~ BioSS
Athens ALARM meeting, January 2007
Introduction
• Direct data on the arrival, establishment & spread of invasive
species are typically not available at the national or panEuropean levels
• Indirect data about the traits & current spatial distribution of
species that invaded in the past can be used to identify
correlative relationships between traits and invasive success,
accounting for phylogeny
• Data on traits are often missing or ambiguous, however, creating
serious problems for the analysis – we look at how to address
these using Bayesian methods
Data
• We analyse data on German vascular plants
• Biolflor (www.ufz.de/biolflor):
database with information on traits & phylogeny of 3660 species
• Florkart (www.floraweb.de):
database with information on presence/absense of 4000+
species for 2995 grid cells within Germany
• We look at neophyte species (arrivals since 1490),
excluding ephemerophytes: there are 388 such species
• We use the # of grid cells occupied as a measure of
invasive success
Morphology
Leaf traits
Flowering phenology
Genetics
Leaf persistence
Beginning of flowering season
Ploidy
Growth form
Leaf anatomy
Length of flowering season
DNA content
Life span
Leaf form
End of flowering season
Life form
Generative reproductive cycles
Propagation & dispersal
Types of storage organs
Existence of storage organs
Types of shoot metamorphoses
Types of root metamorphoses
Diaspores & germinules
Life strategy
Types of diaspores
Ecological strategy
Ruderal life strategy
Invasive history
Mode of introduction
Weights of diaspores
Residence time
Weights of germinules
Floral & reproductive biology
Native global distribution
Strategy types of reproduction
Floristic zones of native area
Niche breadth in Germany
Mating strategy
# floristic zones in native area
# hemerobic levels
Pollen vector
Continent of native area
Flower colour
# continents in native area
Floral UV pattern
Native in old or new world?
Floral UV reflection
Oceanity of native area
Blossom type
Amplitude of oceanity
Urbanity
# of habitat types
# of vegetation formations
# phytosociological classes
Current analysis by UFZ
Küster, Kühn and Klotz (in prep.)
• Regress log(# grid cells occupied) onto each of the ~40
individual traits in turn, in the presence of
phylogenetic variables
• Retain only traits that are significant at the 95% level,
exclude non-predictive traits, & then use cluster
analysis to further reduce the set of traits
• Use AIC to select the best model from within this set of
traits, including interactions
• At all stages, use only those species that have
complete data for all traits currently in the model
Phylogenetic correction
Küster, Kühn and Klotz (in prep.)
• Compute the patristic distance matrix based on the
phylogenetic codes given in biolflor
• For the current set of species –
• apply a principal coordinate analysis to the relevant part of
the distance matrix
• retain only axes associated with positive eigenvalues
• then retain the axes that account for the first 80% of variation
• then regress log(# grid cell occupied) onto the remaining axes
and retain only those that are significant at the 95% level
• The phylogenetic variables need to be recomputed
whenever the set of species is changed
Missing data
• A large number of species are currently excluded
from the final analysis as data are missing on some
of their traits
• This is inefficient, & could potentially lead to bias if
the data are missing not at random
• The missing data arise from different sources –
•
•
•
•
there being no record in the Biolflor database
the qualifier in Biolflor suggesting that data quality is poor
multiple states being recorded for a particular trait
a very rare state being recorded
Residence times
• Residence time is a particularly important variable because
• it has good explanatory power to describe occupancy
• It partly accounts for the dynamic nature of invasive processes
• it allows us to make time-specific predictions about occupancy
• However, data on German residence times are only available for
171 species, & for 35 of these only to the nearest century
• Some auxiliary data is available for neighbouring countries
• How can we properly include residence time into the analysis,
given the large proportion of missing data?
Species
Region
Time
Amaranthus deflexus L.
Germany
1889
Aesculus hippocastanum L.
Germany
16th century
Acer negundo L.
Czech Republic
1699
Germany
18th cenutry
Oenothera depressa Greene
Germany
Early 19th century
Oxalis fontana Bunge
Central Europe
17th century ?
Germany
1807
Central Europe
1871 / since 1971
Germany
1927
Nepeta grandiflora M. Bieb.
Germany
ca. 1900
Agrostis scabra Willd.
Central Europe
1909
Germany
1960
Epilobium ciliatum Raf.
Work at BioSS
• The aims of our research on this at BioSS –
• to explore how sensitive the results of inferences are to the
assumptions that we make about missing data
• to analyse the data in such a way that species with missing
data for some traits do not need to be excluded
• to relate the outputs from the the analysis to invasive risk
• We work with the Biolflor-Florkart data, and focus upon missing
data for residence times; however, the methodological ideas are
widely applicable
Application to toolkit
• Application to the prediction of invasive risk
• e.g. Use traits & phylogeny to infer the number of cells
that a recently arrived species is likely to occupy after
N years of residence
• This number is uncertain, so it will be a probability
distribution rather than a single number
Bayesian methods
• An alternative approach to statistical modelling and inference, in
which data are regarded as fixed and parameters are regarded
as random
• Increasingly widely used: due to improvements in computational
power it is now often possible to fit more advanced models
using Bayesian inference than using classical statistical methods
• Particularly suitable for problems that involve missing data
• Implemented using free software called WinBUGS:
extremely powerful but not particularly user-friendly…
Bayesian modelling
Basic model
log yi ~ N( + xi + zi + ri, 2)
…just the same as a GLM
Notation:for species i:
yi = # of grid cells occupied
ri = residence time
xi = other trait data
Prior distributions
zi = phylogenetic variables
We use uninformative priors
, , ,  ~ N(0,1000)
MCMC details:
2
Burn-in = 5000, Sample = 2000
~ Gamma(1/1000, 1/1000)
Thinning ratio = 1:50
Imputation
• When data on residence times are missing, then we
can assume that they are random variables
• We can use data on the other traits, phylogeny &
number of grid cells occupied to infer the distribution
of the residence time for a particular species i
e.g.
log ri ~ N(exp{a + bxi + czi + dyi}, s2)
• Use of the cut function ensures this does not bias
inferences about , , ,  and 
Results: Ploidy
Polyploid vs diploid
Estimate (SE) for trait effect 
Classical
Bayesian
Trait
.580 (.226)
.587 (.225)
Trait + Phylogeny
.636 (.220)
.656 (.211)
Trait + Phylogeny + Residence
.790 (.347)
.630 (.216) [cut]
.761 (.199) [full]
Pink result based on 124 species
Main model:
Imputation model:
P(parameter > 0) P(parameter > 0)
Other results based on 345 species

> .99
b
.14
42 species excluded

1, .94
c
1, .84

> .99
d
.99
Results: Ploidy
Imputed values
Results: Ploidy
Predictions
Results: Duration of flowering
Estimate (SE) for trait effect 
Classical
Trait
.362 (.084)
.358 (.080)
Trait + Phylogeny
.329 (.083)
.326 (.081)
Trait + Phylogeny + Residence
.298 (.113)
.229 (.082) [cut]
Bayesian
.204 (.076) [full]
Pink result based on 135 species
Main model:
Imputation model:
P(parameter > 0) P(parameter > 0)
Other results based on 379 species

> .99
b
.97
8 species excluded

.99
c
> .99

> .99
D
> .99
Results: End of flowering
Estimate (SE) for trait effect 
Classical
Trait
.207 (.060)
.206 (.058)
Trait + Phylogeny
.167 (.060)
.166 (.059)
Trait + Phylogeny + Residence
.275 (.106)
.096 (.061) [cut]
Bayesian
.227 (.060) [full]
Pink result based on 135 species
Main model:
Imputation model:
P(parameter > 0) P(parameter > 0)
Other results based on 379 species

.96
b
.17
8 species excluded

.98
c
> .99

> .99
d
> .99
Results: End of flowering
Results: End of flowering
Results: Pollen vector
Estimate (SE) for trait effect
Wind vs Self
Classical
Insect vs Self
Bayesian
Classical
Trait
-1.16 (.38) -1.16 (.37)
-0.71 (.32)
-0.72 (.31)
Trait + Phylogeny
-0.79 (.38) -0.81 (.36)
-0.72 (.32)
-0.72 (.31)
Trait + Phylogeny + Residence
-1.22 (.51) -0.57 (.37)
-0.74 (.43)
-0.39 (.33)
-0.51 (.32)
Main model
Bayesian
-0.56 (.27)
Imputation model

.06,
.13
b
.06,
.82
Other results: 329 species

< .01, <.01
c
< .01,
.08
58 species excluded

> .99
d
.99
Pink result: 108 species
Results: Shoot metamorphoses
a vs no
Classical
Bayesian
rh v no
Classical
Bayesian
T
0.64 (.34)
0.64 (.34)
T
-1.06 (.35)
-1.05 (.34)
T+P
0.68 (.34)
0.70 (.34)
T+P
-0.79 (.37)
-0.82 (.37)
T+P+R
0.61 (.62)
0.82 (.35)
T+P+R
0.26 (.63)
-0.70 (.35)
p vs no
Classical
z vs no
Classical
T
0.09 (.34)
0.10 (.32)
T
-1.12 (.65)
-1.04 (.65)
T+P
0.05 (.34)
0.08 (.37)
T+P
-0.24 (.75)
-0.26 (.75)
T+P+R
-0.02 (.65)
0.23 (.33)
T+P+R
?
-0.06 (.69)
Bayesian
Bayesian
Significance of trait effect in Bayesian model: posterior probability that  > 0
Trait only
Trait +
phylogeny
CUT: T + P
+ residence
> .99
> .99
> .99
Length of flowering season
> .99
> .99
> .99
End of flowering season
> .99
> .99
.94
a vs none
.97
.98
.99
rh vs none
< .01
.01
.02
p vs none
.62
.59
.75
z vs none
.05
.36
.47
Pollen vector wind vs self
< .01
.01
.06
.01
.01
.12
Ploidy
Shoot
polyploid vs diploid
insect vs self
(Note: posterior probability that  > 0 is always >0.99)
Further work 1:
Data Not Missing at Random
• Our model assumes that the data on residence times are missing
at random, as does the approach of excluding missing data
• We can also consider possible mechanisms by which the
missing data might be related to the variables of interest
Let oi = 1 if residence time observed for species i, 0 otherwise
• We could assume that
oi ~ Binomial(1, logit-1{A + Bxi + Czi + Dyi + Eri})
• The parameter E cannot be estimated, but we can assess
sensitivity to the value of it; we assume here that E is negative
Results: End of flowering
Trait effect:
estimate
(SE) for 
Mean (Q2.5%,
Q97.5%) imputed
residence
Trait only
.206 (.058)
-
+ Phylogeny
.166 (.059)
-
CUT
.096 (.061)
114 (34, 355)
full
.227 (.060)
104 (27, 351)
NMAR CUT E = -1
.094 (.062)
145 (44, 454)
E = -2
.096 (.064)
191 (55, 619)
E = -3
.090 (.058)
315 (73, 916)
+ Residence
MAR
Further work 2:
Multiple traits
• Relatively low proportions of missing data for the other key traits:
can just exclude these when he look at traits individually,
but more problematic when we look at effects of multiple traits
• Most “missing data” for the other key traits arise because rare or
duplicate trait states are recorded in Biolflor
• We would like to incorporate this information directly into the
analysis, rather than attempting to impute the missing values
• We can deal with duplicate states either by assuming:
• that the parameter for species that have both states is the average of the
parameters for the two states; or
• by including a separate parameter for species that have duplicate traits
# treated as missing
in current analysis
# with no record at all
Ploidy
42
13
Length of flowering season
8
8
End of flowering season
8
8
Pollen vector
58
37
Shoot metamorphoses
59
1
Any of the above five traits
134
54
Missing data
in current
analysis
Species
Pollen vector
Qualifer
Acer negundo L.
Wind
Always
Adonis annua L.
Selfing
Unknown
Insects
Unknown
Selfing
At failure of outcrossing
Insects
The rule
Artemisia dracunculus L.
Wind
The rule
Diplotaxis muralis (L.) DC.
Selfing
The rule
Selfing
At failure of outcrossing
Insects
The rule
Elodea canadensis Michx.
Water
The rule
Epilobium ciliatum Raf.H
Selfing
The rule
Cleistogamy
The rule
Alcea Rosea L.
Classical analysis, model = Traits + Phylogeny
Method to deal with duplicates:
Ploidy
Exclude
Average of
parameters
Separate
parameter
.636 (.220)
.592 (.226)
.641 (.225)
-
.296 (.113)
.747 (.396)
Wind vs Self
-.795 (.376)
-.508 (.376)
-.653 (.384)
Insect vs Self
-.716 (.315)
-.683 (.310)
-.748 (.314)
Water vs Self
-
-.094 (.935)
-.134 (.933)
Insect+Self vs Self
-
-.342 (.155)
-.138 (.614)
Wind+Self vs Self
-
-.254 (.188)
2.18 (1.42)
Wind+Insect vs Self
-
-.596 (.244)
.099 (1.99)
Polyploid vs Diploid
Both vs Diploid
Pollen vector
Furthur work 3:
Auxiliary residence time data
• The imputation model allows us to draw inferences
about residence times for species where the arrival
date is unknown
• The performance of the imputation model depends
upon us it containing regressors that are strongly
correlated with residence time in Germany
• Possibility of using data on residence in a
neighbouring country, ni, as an explanatory variable:
log ri ~ N(exp{a + bxi + czi + dyi + eni }, s2)
Furthur work 4:
Climate change
• UFZ are using the species-level model to identify key
traits for invasive success, & then a spatial approach
to estimate impact of environmental change on these
• A non-spatial approach might involve grouping cells
according to environmental characteristics, & fitting the
species-level model seperately for each group of cells
• We are interesting in comparing these approaches