y ik - Wageningen UR

Download Report

Transcript y ik - Wageningen UR

History of canonical correspondence analysis in ecology
correspondence analysis without chi-square and
the barycentric principle as ecological niche model
Cajo J. F. ter Braak
25
year
Biometris, Wageningen University and Research Centre
[email protected]
www.biometris.wur.nl/UK/Staff/Cajo+ter+Braak
Biometris
quantitative methods in life and earth sciences
1986-2011
History of canonical correspondence analysis in ecology

Introduction:



History of CA in ecology













From WA to CA to CCA, algorithm
Eigen equation and its origin
My CCA derivation 1-2
Duality diagram
CCA, triplets & biplots
Ordination diagram


my prehistory: absences don’t count 1,2,3,4,5
my history 1-3
Theory of CCA


Theory, WA method
From WA to CA, algorithm
Transition formulas, but why optimal?
History of CCA in ecology


Papers 1986-88 & citation
Example data, eigen equation & diagram
Description, Data, & example
CCA in R
CCA, RC model & ideal point discriminant analysis
Later landmarks
Summary of CA-related methods in ecology
Open questions
Canonical Correspondence Analysis (CCA) 25 years
1986
from the abstract:




New multivariate technique to relate community composition to
known variation in the environment
Extends Correspondence Analysis (CA)
Two steps (1. CA, 2. interpret axes) →one step (CCA)
Ordination diagram: the pattern of community variation that can
be explained best by known environment
CCA = (M)CA with external linear restrictions on the row points
1987
duality diagram of CCA (ACC)
barycentric principle
Citations, Canoco and usage
100 150 200 250 300
50
plant ecology
vegetation
animal ecology
0
citations
3000+ citations to ter Braak (86,87,95) on CCA
1996
2000
2004
2008
other
mycology
soil ecology
freshwater ecology
marine ecology
DATA:
CtB, Ecology 1986
Spider abundance Yn×m and environment Zn×p

n = 28 sites (pitfalls), m = 12 spider species ,
p = 6 environmental variables
sites
species
env. var.
YT=
ZT=
row/column order from CCA
discretized data
CtB, Ecology 1986
Canonical Correspondence Analysis (CCA)?
Eigenvector method to relate two data matrices Y and Z
 Y with non-negative values {yik }nm (e.g. abundances of
of m species in n sites)
Y~Z
 Z quantitative and/or nominal predictor matrix {zij }np
(on p environmental variables)
Eigen equation: (ZTY (Dc-1) YTZ – λZTDrZ)b =0 with
Dc = diag(y+1 ,..., y+m) , Dr = diag(y1+ ,..., yn+)
Looks like: canonical correlation analysis, but Dc/r ..?...
Special case: Z = In×n or indicator matrix 
correspondence analysis
eigen eq in appendix
CtB, Ecology 1986
CCA ordination diagram (factorial plane)
uk species
xi* sites
c* env var
Projection of
species points on
BARE SAND
arrow shows
approx. ranking of
the species
centroids along this
variable
joint plot { x*, u } of Y as in CA
biplot {u, c*} of Dc-1YT Z ={weighted average of species wrt
environment variables}
weighted least squares biplot
History of CA in ecology builds on (1)
Theory:
“a plant species does not grow when it is either too wet or too dry”
Liebig’s law: a species requires a minimum amount of a
resource (e.g. N): agriculture
Shelford’s law of tolerance (1919): but also does not
tolerate more than a certain maximum yik
Ecological niche: region where species
actually grows
Niches vary among species
preference model
→resource
Independently: Roux&Roux, RSA,1967
History of CA in ecology builds on (2)
Method of weighted averaging (WA) to obtain
 preference or indicator value of a species wrt to a
physical gradient, e.g. moisture


WA of moisture values at sites where species occurs
estimate of moisture value at a site

WA of indicator values of species that occur there
Gause (1930), Ellenberg (1948), Whittaker (1948)
Idea: iterate to replace a manifest gradient by the best latent one
→ Reciprocal averaging (Hill, 1973,74) == CA
“le principe barycentric” both ways
From weighted averaging (WA) to CA (1)
yik
i  sites
k  species
→moisture
• WA:
uk =
→moisture
yik ≥ 0
uk = i yik moisturei /i yik
centroid , weighted average
wrt moisture, “optimum”
From weighted averaging (WA) to CA (1)
yik
i  sites
k  species
→moisture
• WA:
→moisture
yik ≥ 0
uk = i yik moisturei /i yik
For presence/absence data yik is 1 or 0.
Example: species k found at sites with moisture values 20, 30 and 40
and nowhere else, then
uk = (1*20+1*30+1*40)/(1+1+1)=
uk =
90/3 = 30
centroid , weighted average
wrt moisture, “optimum”
Reverse (calibration): weighted averaging of species optima
From training data or literature, the ‘optima’ {uk} of 200 species wrt a
moisture scale [0-100] are known.
Then an estimate of the moisture value at a site is obtained from the
species composition at the site by the weighed average:
xi = k yikuk/ k yik
-site score is (prop. to) WA of species scores
where yik is abundances (amounts,yik ≥ 0 ) of these species.
Example:
Let site i contain the species with optima 75, 80, 85
Then it moisture value is estimated as
xi = (1*75+1*80+1*85)/(1+1+1)=80 (the average of the optima of species present)
If the abundances (amounts) of these species are 4, 1,1 then
xi = (4*75+1*80+1*85)/(4+1+1)= 465/6=77.5 (the weighted average of the
optima of species present)
Algorithm
CA
• Start with arbitrary site scores { xi } with zero mean
• Calculate new species scores = WA of site scores:
uk = i yikxi /i yik
• Calculate new site scores = WA of species scores:
xi = k yikuk / k yik
• Remove arbitrariness in scaling by standardizing the
site scores
• Stop on convergence, i.e. when site scores stay the
same after cycle
WA = Weighted Averaging; RA = Reciprocal Averaging
(Hill 1973)
→Transition formulae
(Hill 1973)
Correspondence analysis == Two way Weighted Averaging
1- uk = i yikxi / i yik
 xi = k yikuk / k yik
-species score is (prop. to) WA of site
scores
-site score is (prop. to) WA of species
scores
 eigenvalue (between 0 and 1)
 = scaling of the axis
For comparison
Principal components (PCA)
Reciprocal linear regression
bk = i yik xi /i xi2
xi = k yikbk /k bk2
But what does it
optimize?
==Two way weighted summation
1- bk = i yik xi
 xi = k yikbk

eigenvalue >0 (prop. to variance
explained)
But what does it optimize?




Seriation: recovery of a Petrie matrix, consecutive 1’s
Relation with canonical correlation:
max. correlation of row and column quantification
Finds block structures in two-way data → cluster analysis
(TWINSPAN), spectral clustering
Also: bad features


sensitivity to rare species
arch effect (Guttman effect)
CA
Hill 1974, Heiser 1981
truth
DECORANA, 1979
 Hill & Gauch 1980
1979-1984
How was CCA derived? My prehistory
Benzécri’s 1973 L’analyse des donneés I and II with Hans
 MSc Newcastle (UK, ’79-80), learned ML eqs from RL Plackett
 contacts with:
Colin Prentice (NMDS) and Mark Hill (CA/RA/DECORANA)
 Mark did not like my PCA biplot and diversity paper (Ecology ’83).



Ecology is not linear...
Differences in niche location create diversity! See DECORANA

Second Gifi course Leiden (’81):J de Leeuw/W Heiser

From ’81, for ecologists, I studied properties of



Weighted Averaging vs ML in ecological niche models (Gaussian model)
(ter Braak Vegetatio ’86, Math. Biosc. ’86)
Reciprocal Averaging (CA) vs ML (ter Braak Biometrics ’85)
When might WA be (nearly) as good as ML?
Absences don’t count in...weighted averaging
Greig-Smith 1983

“telephone poles have an optimum of pH of 5.5”
WA/CA uses
h(x|y>0) = probability density function of x given species presence
response function in logistic regression uses
f(y>0|x) = occurrence probability given x
g(x)
= (prior) pdf of x
g(x) is same for all
Bayes’theorem
species, so perhaps
h(x|y>0) ∝g(x) f(y>0|x)
distinction does not
matter in MVA...
WA for
Absences don’t count in... estimating optima
Regression and WA:
a) response functions for species A
and B:
b)
x~ Uniform or even, g(x) ~ 1
a)
b)
g(x)
c)
But for this g(x) reversal of optima
So be warned, use GLM ...
ter Braak, Vegetatio 1986
c)
WA for
Absences don’t count in... estimating site conditions
Calibration and WA:
Find a model such that the weighted average
xi = k yikuk / k yik is as efficient as maximum
likelihood (ML):
species packing model
μik= ckexp{-½ ( xi-uk )2 /tk };
yik~Poi(μik)
-uk uniform over large interval A
-ck constant or independent of uk
-tk constant
ter Braak, Math. Biosc. 1986
Absences don’t count: CA vs Gaussian ordination
 Transition formulas approximate ML equations of equiwidth Gaussian response model with latent predictor x
μik= ckexp{-½ ( xi-uk )2 /tk }; yik~Poi(μik)
-uk uniform over large interval A
-ck constant or independent of uk
-xi uniform over large interval B in A
-tk constant
ter Braak Biometrics 1985
Unimodal model
1984-5
How was CCA derived? My history (1)
Preparing with Colin Prentice in Uppsala (1984)
a Theory of Gradient Analysis, search for
“something like canonical correlation for niche models”

Idea:
 linear constraints on scores in Gaussian ordination and
approximate the ML equations, as I did for CA (Oct 1984)
1985:
 linear combination of Z that best separates species niches
Niche: region in x-space which species occurs (yik >0)
1985 & Canoco
How was CCA derived? My history (2)


Quick first draft → report (1985) sent around.
Jan de Leeuw commented quickly:


CCA is a relatively simple special case of HOMALS (or, if you like
CANALS). [YES!]
this suggests directly which geometry Gifi would use. Nothing bilinear,
nothing biplot, just centres of gravity [YES, for nominal predictors]
Extending DECORANA (Hill, 1979) to include CCA →
CANOCO v1.0 ’85 and later PCA & redundancy analysis
 Help from Onno van Tongeren and Petr Smilauer for
CANOCO v2.0 1985; v3.0 1990 (permutation testing), v4.0 1998
(1st Windows version), v4.5 2002 (2nd Windows version)
 Ambassadors: Colin Prentice, John Birks, Paul van de Brink

1986-France
How was CCA derived? My history (3)



Partial CCA at 1st IFCS in Aachen (Germany):
Y. Escoufier stood up and ..... disliked CCA and ... invited me to
Montpellier to work with JD Lebreton, D Chessel and P Sabatier,
who independently invented CCA just a few months after I did.
I tried to bridge the gap:



lectured in English with only French references
learned duality diagrams
Later found : RH Green, Ecology 1971,1974:
multi-group discriminant analysis a precursor of CCA (lacking environmental arrows)
Interest lost in same time that CA got popular via Mark Hill (1973-81)!
Lets celebrate now 40 years of CCA!
From weighted averaging (WA) to CA (2)
yik
i  sites
k  species
yik ≥ 0
• WA:
uk = i yik moisturei /i yik
How well does moisture explain the data?
δ = dispersion of species scores
=B/T
δ = k y+k uk 2 / y++ wrt standardized env. var.
uk =
centroid , weighted average
wrt env. var., “optimum”
From weighted averaging (WA) to CA (3)
yik
i  sites
k  species
yik ≥ 0
• WA:
uk = i yik moisturei /i yik
• CA:
uk = i yik xi /i yik
{xi} = Site scores that best separate the species curves
as measured by the dispersion of the {uk}
uk = Species score,
centroid, weighted average
“optimum”
Arch effect: folded first axis has also high δ2
From weighted averaging (WA) to CCA (4)
yik
i  sites
k  species
yik ≥ 0
• WA:
uk = i yik moisturei /i yik
• CCA: xi = i cj zij
uk = i yik xi /i yik
{xi} = Site scores that are the linear combination of
environmental variables {zj} that best separate the
species curves as measured by the dispersion of the {uk}
In multigroup disciminant: max B/T
uk = Species score, centroid, weighted average
“optimum”
Algorithm
CCA
• Start with arbitrary site scores with zero mean
• Calculate new species scores = WA of site scores
• Calculate new site scores = WA of species scores
and constrain the site scores by a weighted* multiple regression
on the env. variables; take fitted values as new scores
• Remove arbitrariness in scaling by standardizing the
site score
• Stop on convergence, i.e. when site scores stay the
same after cycle
WA = Weighted averaging;
(C)CA = (Canonical) Correspondence Analysis
*Site
weights = { yi+ } but why?
CtB, Ecology 1986
Canonical Correspondence Analysis (CCA)?
Eigenvector method to relate two data matrices Y and Z
 Y with non-negative values {yik }nm (e.g. abundances of
of m species in n sites)
Y~Z
 Z quantitative and/or nominal predictor matrix {zij }np
(on p environmental variables)
Eigen equation: (ZTY (Dc-1) YTZ – λZTDrZ)b =0 with
Dc = diag(y+1 ,..., y+m) , Dr = diag(y1+ ,..., yn+)
Looks like: canonical correlation analysis, but Dc/r ..?...
Special case: Z = In×n or indicator matrix 
correspondence analysis
eigen eq in appendix
Origin of CCA - Why is this equation appealing?
 Transition formulas approximate ML equations of equiwidth Gaussian response model with latent predictor
x = Zb (b unknown)
μik= ckexp{-½ ( xi-uk )2}; yik~Poi(μik)
-uk uniform over large interval A
-ck constant or independent of uk
-xi uniform over large interval B in A
As in CA, but now with linear restrictions
ter Braak 1986,1987
CtB, Ecology 1986
CCA derivation (1): ML eqs of equi-width
Gaussian response model with latent predictor
xi = j zij bj
{bj } to be estimated
uk = i yikxi /y+k – [i (xi - uk )μik /y+k ]
(A.1)
i zij [k yik (xi - uk )] = i [k (xi - uk ) μik ] zij
(A.2)
Under the conditions
(CtB,Biometris
*
k (xi - uk ) μik ]≈0 and i (xi - uk )μik ≈-λ uk y+k 1985)
we obtain
λ uk = i yikxi /y+k
(λ = 1- λ*)
i zij [k yik (xi - uk )]=0
CtB, Ecology 1986
CCA derivation (2) : Transition formulas of CCA
From:
 λ uk = i yikxi /y+k
xi = j zij bj
 xi*= k yikuk / yk+
  b = (ZTRZ )-1ZTRx*
λ uk = i yikxi /y+k
 x = Zb
i zij [k yik (xi - uk )]=0
In matrix notation:
b = (ZTDrZ )-1ZTDrx*
 u = Dc-1YT x
= (ZTDrZ )-1ZTYu
x* = Dr-1Yu
= (ZTDrZ )-1ZTY Dc-1YT x-1
b = (ZTDrZ )-1ZTDrx*
=(ZTDrZ )-1ZTY Dc-1YT Zb-1
x = Zb
(ZTY (Dc-1) YTZ – λZTDrZ)b =0
Duality diagrams and transition formulas
Duality diagrams and transition formulas
CCA, PCA-triplets and biplots
fitted contingency ratios
fitted table of WA’s
low rank regression coefs
CtB, Ecology 1986
Ordination diagram (factorial plane)

Two sets of sites (row) scores






x derived from Z (environmental data)
x* derived from Y (species data)[chosen to be closer to Y]
Plot x*, u and interset correlations, c* = cor (Z,x* )
CA type of joint plot { x*, u } of Y
(instead of b)
WLS-biplot {u, c*} of Dc-1YT Z ={weighted average of
species wrt environment variables}
For nominal predictors: class point at centroid of x*
CA type plot, but no example given except in Canoco
manual
CtB, Vegetio,1987
Dune meadow data
n = 20 sites (meadows),
m = 30 plant species ,
p = 8 environmental variables
Predictors mix of


quantitative (3)

nominal (1 with 4 classes BF,SF,NM,HF)
→ class centroids (squares)
pCCA uses all power of regression

YT=
ZT=
factors, interactions between pred.
row/column order from CCA
discretized data
+1.0
CCA Dune meadow biplot of species’
u = WA(x)
centroids (WA) w.r.t. Manure
squares:
class centroids
spec
uk,
18......17......
Vic lat
Pla lan Ant odo
7.......11......
2.......
Leo aut
BF
Ach
mil
10......
Tri rep
6.......
Tri
pra
Bel per
Bro hor Poa pra
9.......
Rum ace Lol per
5....... 1.......
Ely repHF
Poa tri
Cir
arv
4.......
3.......
xi* sites
c* env var
λx= WA(u)
Air
Barycentric
Salpra
rep
20......
19......
Emp
nig
Hyp rad
principle
everywhere:
NM
projection
14......
Pot pal
points of
Cal cus
Bra rut
15...... species are
Ran
fla
Jun art
centroids of
Ele pal
the site
Moisture
Sag pro
A1 hor projections on
Agr
8....... sto
Use Jun buf
the same
Alo gen
12......
environmental
16......
SF
variable
Che
alb
13......
-1.0
Manure
-1.0
+1.0
CCA in R packages anacor, vegan, ade4
1
#e.g.:
library(anacor) # Jan de Leeuw & Patrick Mair
# CCA
res = anacor(dune, row.covariates = dune.env)
plot(res)
0
Chealb Cirarv
JunbufElyrep1
Alogen
3
13 9Poatri
2
12 4 Belper
Brohor
Lolper
Poapra
Rumace
Agrsto 8
75Tripra
Trirep
10
6 Achmil
Sagpro
16
Junart
Brarut
11 Plalan
Ranfla
15
14
Elepal
Leoaut
18 Viclat
Potpal
Calcus
20
-1
Antodo
Salrep
17
-2
19
Hyprad
-3
Airpra
Empnig
-4
Dimension 2
library(vegan)
cca(Y~ Z)
# also with a formula interface
# e.g factors A, B and C
cca(Y~ A*B + Condition(C))
Joint plot
-4
-3
-2
-1
Dimension 1
0
1
(C)CA & RC-model, ideal point discr. anal.
bilinear model
Goodman’s RC model Eyik = rickexp(xi T uk ) is
equivalent to
distance model
 Eyik = rickexp(d2(xi , uk )) with
d2(xi , uk )= (xi - uk )T(xi - uk )
which Ihm’ model B, and
with x=Zb identical to ideal point discriminant anal.

(C)CA is approx. to this model (ter Braak 1988, Takane ...)
Goodman showed this for small λ, ter Braak for large λ
See de Rooij, 2007 JCGS for plot
scalings
and forward selection ...
Later landmarks

Cointertia analysis (Dolédec & Chessel , 1994)


Linear restrictions, norm on b coefficients
CCA-PLS (ter Braak & Verdonschot, 1995) Key: 1.Pre-transform:
2. apply PLS 3. post-transform!




Variance partitioning (Borcard, Legendre, Drapeau, 1992)
RLQ (Dolédec et al. 1996) ; doubly constrained CA (Lavourel...)
All (?) CA methods obtainable from standard methods


by inflating the data matrices (super indicator matrices, in
ecology: unit = the individual instead of the site
by pre and post transformation
→Regularized generalized canonical
correlation, Tenenhaus2
Summary: CA-related methods in ecology
Method
Abbr
Correspondence analysis
CA
Canonical CA
CCA
CCA Partial Least Squares
CCA-PLS
Weighted averaging
WA
WA-PLS
WA-PLS
Community Environmen
data
tal variables
Community Many env.
vars
Env.var.
Community
data
Env.var(s)
Community
Co-correspondence
analysis
CO-CA
Community Community
Coinertia anal., RLQ,...
Response
Predictors
vars
Community -
Open questions
(C)CA is a chameleon: sometimes shows up as


linear method: reconstitution formula & biplot
unimodal method: relation with CVA & r-c distance plot
In the RC model and logratio analysis: clearly both
With arch effect: 1-d reconstitution bad....
Can Greenacre 2009 power transformations shed light?
Sparse CA?
Sparse CCA?
Apply principal response curves (PRC) in CCA context
PRC: good for display of interactions
Unfolding criterion
Max δ* = i,k yik (xi - uk )2 / y++
subject to i yi+ xi =0, i (yi+ /y++) xi 2=1 and x=Zb
distance (xi - uk )2 has meaning (unfolding)
 the data are a kind of weights and are not approximated
in any formal way.
 squared
As in multigroup disciminant: max
W/B
See also Nishisato’s (1980) quantification of row and
columns in one way ANOVA: his D2 = 1/δ*=B/W
and ideal point model
(cf. Heiser 1987, Takane et al 1991)
Best separation of species niches, max dispersion δ
CA:
Max ! δ = k y+k uk2 / y++ with
uk = i yik xi /i yik, ,weighted average of sites scores {xi }
which are Dr-standardized
i yi+ xi =0 and i (yi +/y++) xi 2=1 .
CCA: extra contraint x=Zb
δ = dispersion = weighted variance = inertia
In multigroup disciminant: B/W instead of δ=B/T
ter Braak 1987
Gaussian regression → ordination
yki
abundance yik
i  sites
k  species
↑
yik ≥ 0
→moisture
• Gaussian regression:
log(Eyik) = ck - (moisture - uk)2 / 2tk2
• Gaussian ordination:
log(Eyik ) = ck - (xi - uk)2 / 2tk2
{xi} = Site scores that best explain the species
data by Gaussian curves
{uk} = Species scores or optimum
Gaussian ordination?
• Axes cannot be extracted one after the
other; a joint fit is needed
- difficult to fit more than 1 axis, but see R{VGAM}
• Axes are not nested:
- solution for axis 1 changes when fitting 2 axes
jointly
- this problem is common outside the eigenvalue
techniques such as PCA - CCA
• Ecologists invented a simpler approximate
method: reciprocal averaging alias
correspondence analysis (CA)