Transcript Document
NUMERICAL ANALYSIS OF
BIOLOGICAL AND
ENVIRONMENTAL DATA
Lecture 6.
Indirect Gradient
Analysis
INDIRECT GRADIENT ANALYSIS
Introduction
Principal components analysis (PCA)
Factor analysis
Correspondence analysis (CA)
Detrended correspondence analysis (DCA)
PCA or CA/DCA?
Metric scaling or principal co-ordinates analysis (PCOORD)
Non-metric scaling (NMDSCAL)
Is there an ideal method?
Conclusions
Future developments
Principal curves
Comparing different ordinations
Software
GRADIENT MODEL OF REAL WORLD
• Environment varies continuously in space and time
• Gradients are real or abstract environmental factors
• “Community” is an abstract concept
• Transects are concrete entities in the landscape
You can have continuous gradients but boundaries on transects (e.g.
geological change)
• Resource gradients are factors that are used by organisms (e.g. nitrogen)
• Direct gradients influence organisms (e.g. pH) but do not use them
• Indirect gradients (e.g. altitude) influence organisms but many
environmental variables may be important (e.g. temperature, wind,
snow-lie)
• Complex gradients are correlated direct and/or resource gradients
In field data sets you can have mainly complex gradients
Complex gradients are dependent on the data set (landscape concept)
• Species have curvilinear relations to gradients
Advantages of INDIRECT GRADIENT ANALYSIS
1.
Species compositions easier to determine than full range of environmental
conditions. Many possible environmental variables. Which are important?
2.
Overall composition is often a good reflection of overall environment.
3.
Overall composition often of greater concern that individual species.
Global, holistic picture, in contrast to regression which gives a local,
individualist reductionist view.
Constrained canonical ordinations stand between INDIRECT GRADIENT
ANALYSIS and REGRESSION. Many species, many environmental variables.
Aims of INDIRECT GRADIENT ANALYSIS
1.
Summarise multivariate data in a convenient low-dimensional geometric
way. Dimension–reduction technique.
2.
Uncover the fundamental underlying structure of data. Assume that there
is underlying LATENT structure. Occurrences of all species are determined
by a few unknown environmental variables, LATENT VARIABLES, according to
a simple response model. In ordination trying to recover and identify that
underlying structure.
UNDERLYING RESPONSE MODELS
A straight line displays the linear
relation between the abundance
value (y) of a species and an
environmental variable (x), fitted
to artificial data (●). (a = intercept;
b = slope or regression coefficient).
A Gaussian curve displays a
unimodal relation between
the abundance value (y) of
a species and an
environmental variable (x).
(u = optimum or mode; t =
tolerance;
c=
maximum = exp(a)).
RESPONSES
ENVIRONMENT
SPACE
ORDINATION
SPECIES
SPACE
Linear
response
Unimodal
response
Response curves for two species A and B against a latent variable x (a, c) and the
expected abundances of the species plotted against each other (b, d), for the
linear model (a, b) and the unimodal model (c, d). The numbers refer to sites
with a particular value for x. The ordination problem is to make inferences about
the relations in Figures a and c from species data plotted in Figures b and d.
Indirect gradient analysis can thus be viewed as being like regression
analysis but with the MAJOR difference that in ordination the
explanatory variables are not known environmental variables but
theoretical ‘latent’ variables.
Constructed so that they ‘best’ explain the species data.
As in regression, each species is a response variable but in contrast to
regression, consider all response variables simultaneously.
____________________
PRINCIPAL COMPONENTS ANALYSIS
PCA
CORRESPONDENCE ANALYSIS
CA
& relative DCA
PCA – linear response model
CA – unimodal response model
PRINCIPAL CO-ORDINATES ANALYSIS
PCOORD
NON-METRIC MULTIDIMENSIONAL SCALING
NMDSCAL
____________________
TOWARDS A PHILOSOPHY OF USING
ORDINATION METHODS IN DATA ANALYSIS
Ockham’s (or Occam's) razor: invaluable philosophical
concept because of its strong appeal to common sense
William of Ockham, 14th century English philosopher, stated the principle as:
PLURALITAS NON EST PONENDA SINE NECESSITATE
Plurality must not be posited without necessity
Entities should not be multiplied without necessity
It is vain to do with more what can be done with less
An explanation of the facts should be no more complicated than necessary
Among competing hypotheses, favour the simplest one that is consistent with
the data
OCKHAM’S RAZOR
What has this to do with ordination?
Data consist of few variables and few objects, perhaps easiest to present
and interpret as tables or diagrams.
Typical data of many variables and many objects, purpose of ordination is
to assist in the application of Ockham’s razor.
A few dimensions are easier to understand than many dimensions
A good ordination technique will be able to determine the most important
dimensions or gradients of variation in a data set and help us to ignore
‘noise’ or chance variation.
i.e. ordination helps us avoid being vain by doing with more what can be
done with less.
Want to show as much as possible with only a few axes. A few important
axes are likely to be more easily explicable than many less important axes.
PRINCIPAL COMPONENTS ANALYSIS
Extension of fitting straight lines and planes by least-squares regression. (Pearson 1901)
Fit moisture to all the species in data by a series of
least-squares regressions of Ey = b0 + b1x + , we
obtain for each regression the RESIDUAL SUM OF
SQUARES, the sum of squared vertical distances
between observed and fitted line.
Total of the separate residual sum of squares for all species, total residual SS, is a
measure of how badly moisture explains the data of all species.
What is the best possible fit that is theoretically possible with straight-line regression?
y = b0 + b1x +
or, if we have centred the data (subtracted the mean)?
y = b1x +
Defines an ORDINATION problem – construct the single hypothetical variable (latent
variable) that gives the best fit to the species data according to our linear equation.
PCA is the ordination technique that constructs the theoretical variable that minimises the
total residual SS after fitting straight lines or planes to the species data.
y = b,x
Fitted y = b1x1 where y = abundance, b1 = slope for species
with respect to the latent variable, and x is site score on
latent variable, where latent variable is first PCA axis.
How good a fit is the latent variable to the data?
Total SS of regressions is so-called EIGENVALUE of first PCA
axis = goodness of fit. Express as % of total variance in
species data.
Straight lines for several
Dune Meadow species,
fitted by PCA to species
abundances. Also shown
are the abundances of
Lolium perenne and their
deviations from the fitted
straight line. The
horizontal axis is the first
principal component.
Fitting straight lines by
least-squares regression of
the abundances of the
species on the site scores
of the first PCA axis gives
the same results. The
slope equals the species
score (b) of the first axis.
The site scores are shown
by small vertical lines
below the horizontal axis.
Three dimensional view of a plane
fitted by least-squares regression
of responses (●) on two
explanatory variables PCA axis 1
and PCA axis 2. The residuals, i.e.
the vertical distances between
the responses and the fitted plane
are shown. Least squares
regression determines the plane
by minimization of the sum of
these squared distances.
PCA-ordination diagram of
the Dune Meadow Data in
covariance biplot scaling
with species represented by
arrows. The b scale applies
to species, the x scale to
sites. Species not
represented in the diagram
lie close to the origin (0,0).
Total sum-of-squares (variance) = 1598 = SUM OF EIGENVALUES
Axis 1 471 = 29 %
Axis 2 344 = 22 %
} 51%
Each axis – vector of species slopes or scores (b) EIGENVECTORS
[Regression Ey = b0 + b1x1
= b1x1 (if centred data)
PCA
y = b1x1 + b2x2
Species score
Eigenvector
]
Site score
REGRESSION AND PCA
Linear Regression
PCA
Ey
Fitted y, response variable
Species score
b1
Slope
Species loading (eigenvector)
x1
Explanatory variable
Site score, linear combination of
variables
Regression Sum of Squares
Eigenvalues, function to be
maximised
Total SS
Sum of eigenvalues (variance) ()
R2 = 1 - Residual SS / Total SS
Fraction of variance explained
(1 + 2) / Sum
Residual SS
Function to be minimised
PCA Biplots
Correlation (=covariance) biplot scaling
Species scores sum of squares = λ
Site scores scaled to unit sum of squares
Distance biplot scaling
Site scores sum of squares = λ
Species scores scaled to unit sum of squares
Emphasis on species
Emphasis on sites
CANOCO
R
PCA BIPLOTS
• Axes must have identical scales
• Species loadings and site scores in the same plot: graphical order 2
is an approximation of the data
• Origin: species averages. Points near the origin are average or are
poorly represented
• Species increase in the direction of the arrow, and decrease in the
opposite direction
• The longer the arrow, the stronger the increase
• Angles between vector arrows approximate their correlations
• Approximation: project site point onto species vector
• Distance from origin reflects magnitude of change
Biplot interpretation for Agrostis stolonifera.
In correlation (covariance) biplot, site scores scaled to unit sum of
squares and sum of squared species scores equals eigenvalue
r = Cos = Correlation
AN ALTERNATIVE GEOMETRICAL
REPRESENTATION OF PCA
1. Move the axis origin to the
centroid of the species space
(species averages).
2. Rotate the axes so that the first
axis becomes
(a) as close to all observations
as possible, which means that it
(b) explains as much of the
variance as possible.
•
First rotated axes show the
configuration as well as
possible.
(PCA = reduced major axis
regression in Model II regression)
J. Oksanen (2002)
EXPLAINING THE VARIATION IN PCA
• Total variation: sum of squared
distances from the origin
• Explained variation: sum of
squared projections onto principal
components =
• Residual variation: sum of
squared orthogonal distances from
the principal components
• Only a rotation: all axes (min (M,
N)) explain everything
J. Oksanen (2002)
DATA TRANSFORMATIONS
1. Centred species data PCA variance–covariance matrix.
Species implicitly weighted by the variance of their values.
2. Standardised PCA PCA correlation matrix. Centre species
and divide by standard deviation (zero mean, unit variance).
All species receive equal weight, including rare species. Use
when data are in different units, e.g. pH, LOI, Ca.
3. Square root transformation of percentage data. Chord
distance.
4. Log (y + 1) transformation for abundance data.
5. Log transformation and centre by species and samples = loglinear contrast PCA for closed % data (few variables, e.g. blood
groups).
STANDARDISED OR NON-STANDARDISED PCA
Standardising species to unit variance (using the correlation coefficient)
makes all species equally important, instead of concentrating on the
abundant species with largest variances (using covariance matrix).
Covariance matrix
Correlation matrix
J. Oksanen (2002)
Adding ‘Unknown’ Samples
Passive samples
m
x1 y ik bk
k 1
A PCA biplot showing the scores of the
first and second components of the
modern pollen spectra, the vectors of
the pollen taxa, and the means and
standard deviations of the five pollen
zones from the Lateral Pond fossil site
(zone 1 is the oldest); o represents
the projection of the origin.
EQUILIBRIUM CONTRIBUTION OF A VARIABLE
If a variable is equally associated
with each of the principal
component axes, all its loadings
would be equal and be s 1/p
where s = standard deviation and
p is the number of variables.
For a two-dimensional PCA
representation the equilibrium
circle has radius s d/p where s =
standard deviation (all = 1 with
standardised data), d = number of
axes = 2, and p = number of
variables
HOW MANY ORDINATION AXES TO
RETAIN FOR INTERPRETATION?
Jackson, D.A. (1993) Ecology 74, 2204–2214
PCA – applicable to CA, PCOORD, ?RDA, CCA, ??DCA
Real data – 3 data-sets
1) Correlations between variable 0.6 – 0.9
2) 0.0 – 0.7
3) 0.3 – 0.4
Simulated data
1000 observations; 4, 12, 32
r = 0, 0.3 or 0.8 for off-diagonal
i.e. no dimensions, weak one-dimension, strong one-dimension
Assessment of eigenvalues:
> 1.0
Bootstrap to derive confidence intervals encompassing the = 1.0
criterion.
Scree plot.
Broken-stick.
Total variance (=) divided randomly amongst the axes,
eigenvalues follow a broken stick distribution.
p
bk
i k
1
i
p = number of variables (= no)
bk = size of eigenvalue
e.g. 6 eigenvalues
% variance – 40.8, 24.2, 15.8, 10.7, 6.1, 2.8
Proportion of variance
75%
95%
Statistical tests
Bartlett’s test of sphericity – is each eigenvalue different from remaining ones?
Bartlett’s equality test of 1
Lawley’s test of 1
Bootstrap eigenvalue 100 times
Mean, minima, maxima, 95% confidence intervals
If overlap between pairs of successive eigenvalues, eigenvalues indistinguishable
Applications:
1)
2)
3)
4)
5)
1. 0
Gave too many components
Gave too many components
Gave too many components
Correct
Unreliable
Bootstrap
Scree plot
Broken stick
Proportion of
variance
6) Bartlett & Lawley
Overestimated
tests
7) Bootstrap eigenvalue Correct except for poorly structured data
(& eigenvector > 0)
BROKEN STICK – simple to calculate, easy to follow
BSTICK
= observed eigenvalues
= broken-stick model expectation
R
SCALING OF ORDINATION SCORES IN PRINCIPAL COMPONENTS
ANALYSIS, as implemented by CANOCO 3.1 and later
CHOICES
1. Euclidean distance biplot
2. Correlation biplot
3. Symmetric scaling
Negative (–1, –2, –3) for covariance-based scores
Negative scalings use raw data in which variables have different variances.
Resulting species scores are covariances between species and eigenvalues in –2.
Positive values adjust species scores for their variances by standardising data by
species. Resulting species scores are correlations between species and
eigenvalues in scaling 2.
1. Euclidean distance biplot – focus on samples, distance between samples
approximates Euclidean distance in species-space.
2. Correlation biplot – focus on species. Reflects correlations between species.
3. Intermediate between 1 and 2.
No extra mathematical properties. Compromise.
But see Gabriel (2002) Biometrika – may be optimal!
STANDARDISED OR NON-STANDARDISED PCA
Standardising species to unit variance (using the correlation coefficient)
makes all species equally important, instead of concentrating on the
abundant species with largest variances (using covariance matrix).
Covariance matrix
Correlation matrix
J. Oksanen (2002)
GENERAL INTERPRETATION OF PCA PLOTS
1. Samples close together are inferred to resemble one
another in species composition.
2. There is a tacit assumption that samples with similar
species composition have similar environments.
PRINCIPAL COMPONENTS BIPLOTS & DIVERSITY
ter Braak, C.J.F. (1983) Ecology 64: 454-462
In non-centred PCA of proportional data and distance scaling ('distance biplot')
the squared length of each site vector pi is
pi
2
m
pki 2 SI
k 1
and is exactly the formula for Simpson's Index. It can be transformed into a
measure of alpha-diversity (within-site diversity) as 1/SI (= Hill's N2).
Squared Euclidean distances between sites
m
dij (pki pkj )2
2
k 1
is a measure of dissimilarity between sites and can be regarded as a measure
of beta-diversity (between-site diversity). "Beta or between-habitat diversity
refers to the degree of contrast in species composition among samples of a
set taken from a landscape" (Whittaker, 1973).
Beta-diversity is simply the dissimilarity between two samples. If there are
more than two samples, their mean dissimilarity is a measure of their betadiversity.
More efficient to use a species-centred PCA than a non-centred PCA to
represent beta-diversities. Can calculate where the true origin of the
axes is if the PCA was non-centred so as to represent alpha-diversity.
• = 1920
o = 1978
ALPHA-DIVERSITY
(WITHIN-SITE DIVERSITY)
1920
Simpson
Index
A
B
C
D
1978
E
F
a
b
c
d
e
f
0.46 0.41 0.23 0.16 0.15 0.12 0.74 0.93 0.88 0.78 0.32 0.60
In PCA biplot samples, the Simpson Index values are approximated by the
origin position (0.2772) plus the squared distance from the projection of
the origin.
D, E, and F are close to asterisk, hence have high alpha diversity (Simpson
= 0.12 - 0.16), whereas A is far away and has low diversity (Simpson 0.46).
a, b, c, d, and f are far from asterisk, and hence have high Simpson values
and low alpha diversity.
Can thus relate alpha-diversity to species composition.
BETA-DIVERSITY
(BETWEEN-SITE DIVERSITY)
A
0
B
83
0
C
68
23
0
D
61
56
36
0
E
55
40
32
27
0
F
33
38
24
25
18
0
a
119
45
21
89
85
78
0
b
138
56
30
108
104
97
2
0
c
133
52
27
103
99
92
1
0
0
d
122
46
22
92
88
82
1
1
0
0
e
75
3
14
47
34
34
31
41
38
32
0
f
104
38
13
74
70
63
2
4
4
2
25
0
A
B
C
D
E
F
a
b
c
d
e
f
Squared Euclidean
distances (x100)
Samples close together have low pairwise squared Euclidean distances, hence low
beta-diversity.
1920 sample (A-F) have mean squared distance = 0.41
1978 samples (a-f) have mean squared distance = 0.12
Indicates decrease in alpha-diversity of pools and the beta-diversity between
pools has decreased between 1920 and 1978. Interpreted as result of acidification.
FACTOR ANALYSIS
• Factor Analysis instead of PCA in many programs
• Factor Analysis is a statistical method, whereas PCA is only a
geometrical rotation
- nature has two components: common and unique
- FA tries to explain only the common component
- PCA discards later axes in the hope of discarding unique
variation, but in fact it may be mixed on all axes – kept and
rejected
• Latent variables generate the observed variables x which have
unique errors
• Exploratory FA: rotation to a simple structure, usually varimax
• Confirmatory FA: modelling procedure to test specific
hypothesis
J. Oksanen (2002)
EXPLORATORY FACTOR ANALYSIS
PCA with varimax rotation ('simple structure')
Few taxa have high loadings on each axis and
remaining loadings are low, thereby easing
interpretation.
CONFIRMATORY FACTOR ANALYSIS
Maximum likelihood solution
• The real variables are nonobservable (latent), but they
influence several observable
variables x.
• Select a set of observable
variables as indicators of the
latent variable, and use them
to build a measurement model
for the latent variable.
• The latent variables are truer
explanatory variables than
single noisy observed variables
and can be used in further
modelling.
J. Oksanen (2002)
CONFIRMATORY FACTOR ANALYSIS
Unique
component
Loadings for
factor 1
P
0.340
0.812
K
0.399
0.776
Ca
0.163
0.915
Mg
0.310
0.831
SS
2.789
% variance
69.7
Test if one factor is sufficient
Chi-square value is 4.93 with 2 degrees of freedom
P-value is 0.085
CONFIRMATORY FACTOR ANALYSIS
Malmgren & Haq (1982) Marine Micropaleontology 7: 213-236
Compared PCA, correspondence analysis, 'true' factor analysis, and PCA
with varimax rotation (‘exploratory factor analysis’) on 399 samples
containing % data of 12 coccolith taxa in Miocene cores.
General model for R-mode analysis (species x species correlation or
covariance matrix R)
R = (A-matrix) x (A-matrix)' + -matrix
where A-matrix is matrix of component or factor loadings of
species (eigenvectors) and -matrix is matrix of residuals.
Partitioning variation expressed by R into
(1)
Common part (A)(A)'
(2)
Unique part consisting of (i) variation in one variable not shared
with other variables and (ii) variation due to errors in analysis,
etc.
PCA - accounts for the maximum variance in the data (diagonal
elements in R)
FA - portrays optimally the interrelationships among variables (offdiagonal elements of R).
PCA
- variance-orientated
FA
- correlation-orientated
Unique component assumed to be small in PCA and larger in FA and may
be very large for some variables.
FA accepts that some part of the variation in each variable is unique.
FA - A-matrix and -matrix estimated separately using different
mathematical criteria. Start with squared multivariate correlation
coefficient (multivariate generalisation of r2) between a variable and
other variables. Modify these estimates of the common part iteratively
to find the 'optimal' number of factors. Maximum likelihood estimation.
-test for number of factors.
PCA - -matrix is set to zero and uses A only.
-TEST IN FACTOR ANALYSIS
-test for number of factors - how well are true correlations in data
reproduced?
1.
Determine the true correlations (Rt) between variables from
correlation matrix. Define 'true' as above some critical threshold (e.g.
p 0.05).
2.
Compare with principal component or factor loadings for variables and
define 'significant' as above some critical value. Useful rule-of-thumb
is to use loadings > 50% of highest loading (Lt).
3.
Compare Rt with Lt for axis 1, axes 1 and 2, axes 1, 2, and 3, and so
on. Calculate -value for each solution.
4.
Best model is chosen to be the one with the maximum -value (max),
where there is a balance between 'correct' and 'false'
interrelationships.
5.
Plot against the number of factors or components. Usually have a
peak and then decrease. May have a 'plateau', indicating no single
solution is superior. Select the model with the smallest number of
axes (minimal adequate model).
Compare results for coccolith data-set (max is number of 'true' correlations = 23)
PCA
max
6
1 axis
30.3%
PCA + varimax rotation
max
13
5 axes
70.8%
CA
max
9
1 axis
18.4%
ML-Factor Analysis
max
13
4 axes
50.8%
ML-Factor Analysis
max
13
1 axis
25.1%
(2-test indicates axes 1-7 are significant)
PCA - high number of significant residual correlations compared to FA which is
primarily correlation-orientated whereas PCA is variance-orientated.
PCA axis 1 30.3%
ML-FA axis 1
25.1%
PCA - axes contain many more taxa and more complex combinations that FA
axes.
FA axes 1 + 4
= PCA axes 2 + 3
FA axis 2
= PCA axis 1
FA axis 3
?
= PCA axis 5
= PCA axis 4
In terms of recovering 'true' correlations, FA performs best but
needs 5 factors. Resulting factors are simple in composition and
easy to interpret.
Balance between model-based or model-free analysis (FA vs PCA
and CA) and between few (PCA, CA) or many (FA, PCA with
varimax rotation) factors.
Depends on nature of the research question.
CORRESPONDENCE ANALYSIS (CA)
Invented independently numerous times:
1. Correspondence Analysis: Weighted Principal Components
with Chi-squared metric.
2. Optimal or Dual Scaling: Find site and species scores so that (i)
all species occurring in one site are as similar as possible, but
(ii) species at different sites are as different as possible, and
(iii) sites are dispersed as widely as possible relative to species
scores.
3. Reciprocal Averaging: species scores are weighted averages of
site scores, and simultaneously, site scores are weighted
averages of species scores.
CORRESPONDENCE ANALYSIS
Weighted averaging regression
n
Species optimum
(species score)
uˆk
y
i 1
n
y
i 1
xi
ik
ik
Can also have weighted averaging
calibration
m
Environmental variable
(site score)
xˆi
y
k 1
m
ik
y
k 1
uˆk
ik
Artificial example of
unimodal response curves of
five species (A-E) with
respect to standardized
variables, showing different
degrees of separation of the
species curves.
a: moisture b: First axis of
CA c: First axis of CA folded
in this middle and the
response curves of the
species lowered by a factor
of about 2. Sites are shown
as dots at y = 1 if Species D
is present and at y = 0 if
Species D is absent.
Two-way weighted averaging
Species optimum estimation by WA regression
n
y x
ik
uˆk
i
i 1
n
y
ik
i 1
Environmental variable estimation by WA calibration
m
xi
y u
ik
k 1
m
y
k 1
ik
k
Two-way weighted averaging algorithm of CA.
a) Iteration process
n
Step 1: Take arbitrary, but unequal, initial site scores (x i ).
Step 2: Calcuate new species scores (u k ) by weighted averaging of the site scores (Equation 5.1).
Step 3: Calculate new site scores (x i ) by weighted averaging of the species scores (Equation 5.2).
Step 4: For the first axis, go to Step 5. For second and higher axes, make the site scores (x i )
uncorrelated with the previous axes by the orthogonalization procedure described below
Step 5: Standardize the site scores (x i ). See below for the standardization procedure.
Step 6: Stop on convergence, i.e. when the new site scores are sufficiently close to the site scores of the
previous iteration; ELSE go to Step 2.
yik xi
u k i 1n
yik
i 1
y
ik
u
k
k 1
xi m
yik
m
k 1
b) Orthogonalization procedure
Step 4.1: Denote the site scores of the previous axis by f i and the trial scores of the present axis by x i .
Step 4.2. Calculate v = in 1 y+I xi fi / y++
m
where y +i = k 1
yki
and y ++ =
in1
y+I
Step 4.3: Calculate x i.new = x i.old – vf i
Step 4.4: Repeat Steps 4.1-4.3 for all previous axes.
c) Standardization procedure
Step 5.1: Calculate the centroid, z, of site scores (x i )z = i 1 y+i xi/y++
n
Step 5.2: Calculate the dispersion of the site scores s 2 = i 1
Step 5.3: Calculate x i.new = (x i.old –z)/s
Note that, upon convergence, s equals the eigenvalue.
n
y +i (x i – z) 2 /
++
y ik is abundance of species k at site i
CORRESPONDENCE ANALYSIS
Simultaneously ordinates both species and samples.
Usual to standardise site scores to weighted mean of 0 and variance 1.
m
Dispersion of species scores
y
ik
uk2
k 1
m n
y
ik
k 1 i 1
Dispersion steadily increases and stabilises at maximum value.
dispersion = CA axis 1.
Maximum
Maximises dispersion of species scores or 'optima'.
Can extract axis 2 in similar way but with the complication that the trial scores
of axis 2 are made uncorrelated with scores of axis 1.
Eigenvalue is maximised dispersion of species scores on ordination axis. λ1 > λ2 ,
etc. Lies between 0 and 1.
Eigenvectors are species scores or 'optima'.
UNIMODAL RESPONSE MODEL
• Optimal scaling tries to pack
species occurrences into
tight packets: unimodal
species-packing response
model
• Eigenvalue tells the success
of packing – but too high a
value (~1) indicates disjunct
subsets of sites
CA axis 1
J. Oksanen (2002)
HILL'S SCALING IN CA
We standardised site scores to unit variance. Want them to be spread out most along the
most important axes.
Hill proposed that the average width of the species curves should be the same for each
axis. Calculate, for each species, the variance of the site scores containing the species
and take weighted average of these variances.
Equalise average curve width among different axes by dividing all scores of an axis by its
average curve width.
i.e. divide site scores by
and species scores by
1 /
1
Scores so obtained are in multiples of one standard deviation or ‘TURNOVER’. Sites
differing by 4SD in scores tend to have no species in common.
TRANSFORMATIONS
Log, square root
GOOD APPROXIMATION to
ML Gaussian ordination
CA Joint Plot
λ2 =0.40
λ1 =0.53
CA ordination diagram of the Dune Meadow Data in Hill’s scaling. In this and
the following ordination diagrams, the first axis is horizontal and the second
axis vertical; the sites are represented by crosses.
(λ3 = 0.26, λ4 = 0.17)
CANOCO
CA: Joint Plot Interpretation
Joint plot with weighted Chi-squared metric: species and sites in the same plot
with Hill's scaling.
• Distance from the origin: Chi-squared
difference from the profile
• Points at the origin either average or poorly
explained
• Distant species often rare, close species
usually common
• Unimodal centroid interpretation: species
optima and gradient values – at least for
well-explained species
• Can also construct CA biplots
• Samples close together are inferred to
resemble one another in species composition
• Samples with similar species composition are
assumed to be from similar environments
J. Oksanen (2002)
REF
WHEN IS HILL'S SCALING OPTIMAL?
REF
• The species optima should be widely
dispersed: measured by between
species variance SSB
• The species responses or tolerances
should be narrow: measured by
within site variance of species
optima SSW
• Total variance is SST = SSB + SSW
• If most variance is between sites,
the scaling is optimal
• The criterion variable = SSB/SST is
maximised in CA
• Nishisato's 'optimal scaling' solution
to CA
REF
J. Oksanen (2002)
REF
WHAT IS THE CHI-SQUARED METRIC?
Implicit in correspondence analysis, just as Euclidean distance or
standardised Euclidean distance is implicit in PCA.
CA used ‘expected’ abundances as metric. Expected abundances from
the marginal totals. Exactly like in 2 analysis of contingency tables.
Species profile is the average proportion of species in the data
Site profile is the average proportion of sites
Euclidean dij = fij – eij
Chi-squared ij = (fij – eij)/√eij
Weights in CA are proportional to the square root of site totals
Euclidean distance2
2
dij y ki y kj
2
m
k 1
Chi-squared distance2
m
d ij y
2
k 1
where
y
y i y kj y j
2
ki
yk
y+i = sum of yki over index k = 1, …, m
yk+ = sum of yki over index i = 1, …, m
y++ = sum of yki over index k and i
Euclidean distance implicit in PCA involves absolute differences
of species between sites.
Chi-squared distance involves proportional differences in
abundances of species between sites.
Differences in site and species totals are therefore less
influential in CA than in PCA unless some transformation is used
in PCA to correct for this effect.
= observed eigenvalues
= broken-stick model expectation
R
DETRENDED CORRESPONDENCE ANALYSIS
(DCA)
Aim to correct three 'artefacts' or 'faults' in CA:
1. Detrending to remove 'spurious' curvature in the
ordination of strong single gradients
2. Rescaling to correct shrinking at the ends of ordination
axes resulting in packing of sites at gradient ends
3. Downweighting to reduce the influence of rare species
Implemented originally in DECORANA and now in CANOCO
2 = 0.57
CA applied to artificial data (- denotes absence). Column a: The
table looks chaotic. Column b: After rearrangement of species and
sites in order of their scores on the first CA axis (u k and x i ), a twoway Petrie matrix appears: λ1=0.87
Column a
Column b
Species
A
B
C
D
E
F
G
H
I
Sites
1 2 3
1 – –
1 – –
1 1 –
– – –
– 1 –
– 1 –
– – 1
– – 1
– – 1
Species
4
–
–
–
1
1
1
–
–
–
5
–
–
–
1
–
–
1
1
–
6
–
–
–
1
–
1
1
–
–
7
–
1
1
–
1
–
–
–
–
A
B
C
E
F
D
G
H
I
xi
Arch effect
uk
Sites
1 7
1 –
1 1
1 1
– 1
– –
– –
– –
– –
– –
– –
1 1
. .
4 0
2
–
–
1
1
1
–
–
–
–
–
0
.
6
4
–
–
–
1
1
1
–
–
–
6
–
–
–
–
1
1
1
–
–
5
–
–
–
–
–
1
1
1
–
3
–
–
–
–
–
–
1
1
1
0 0 1 2
. . . .
0 6 0 4
0 8 0 0 0 8 0
'Seriation' to arrange data into
a sequence
-1.4
-1.24
-1.03
-0.56
0
0.56
1.03
1.24
1.4
Distorted
distances
1 = 0.87
Ordination by CA of the two-way Petrie matrix in
the table above. a: Arch effect in the ordination
diagram (Hill’s scaling; sites labelled as in table
above; species not shown). b: One-dimensional
CA ordination (the first axis scores of Figure a,
showing that sites at the ends of the axis are
closer together than sites near the middle of the
axis. c: One-dimensional DCA ordination,
obtained by nonlinearly rescaling the first CA
axis. The sites would not show variation on the
second axis of DCA.
DETRENDING CA: THE ARGUMENT
PCA and CA both produce a curve from a single, ideal gradient, but
the shapes have one important difference:
Species packing model along a gradient
• Horseshoe in PCA curved
inwards at ends: wrong
order along the first axis
• Arch in CA preserves the
correct ordering along the
first axis: worth detrending
J. Oksanen (2002)
THE CAUSE OF THE ARCH IN CA
• There is a curve in the species space, and PCA shows it correctly.
• CA may be able to deal with unimodal responses, but if there is one
dominant gradient, the second axis is the first axis folded. Occurs
when the first axis is at least twice as long as the second 'real' axis.
• Problems clearly arise when there is one strong dominant gradient.
Gradient space
Species space
J. Oksanen (2002)
Artificial example of unimodal
response curves of five species (A-E)
with respect to standardized
variables, showing different degrees
of separation of the species curves.
a: Moisture.
b: First axis of CA.
c: First axis of CA folded in this
middle and the response curves of
the species lowered by a factor of
about 2.
Sites are shown as dots at y = 1 if
species D is present and y = 0 if
Species D is absent
n=
n
i 1
i 1
m
m
uˆk yikxi / yik
xˆi yikûk/ yik
k 1
k 1
Species optima
or score ûk
Sample score
DETRENDING BY SEGMENTS – BASIC IDEA
Method of detrending by segments (simplified). The crosses
indicate site scores before detrending; the dots are site scores
after detrending. The dots are obtained by subtracting, within
each of the five segments, the mean of the trial scores of the
second axis (after Hill & Gauch, 1980).
In the two-way CA algorithm, replace the orthogonalisation step
by detrending to remove any systematic dependences between
axes.
DETRENDING BY SEGMENTS- DETAILS
• Divide the axis into segments
(default 26)
• Average using moving weighted
windows in segments with weights
(1, 2, 3, 2, 1) and take the
residuals as the new 2nd axis
REF
CA2
REF
CA1
• Ideally the direction of 2nd axis
changes to an important, nonlinearly independent gradient
• May remove other information as
well, not only the arch as desired
REF
CA1
J. Oksanen (2002)
REF
ALTERNATIVE DETRENDING PROCEDURES
Detrend by polynomials - in the two-way CA algorithm, trial site scores for
a particular axis are made to be uncorrelated to the ordination axes
already extracted (orthogonalisation).
In detrending-by-polynomials they are also made uncorrelated to the k-th
order polynomials of the axes already extracted (k = 2, 3, or 4) and to firstorder cross-products of these axes.
When there is an arch, CA axis 2 is approximately a quadratic function (=
second-order polynomial) of CA axis 1. Detrending by second-order
polynomials should therefore remove the arch effect.
But this may not be enough because when there is a very dominant first
gradient, CA axis 3 can be a function of CA axis 1, namely a cubic function,
and CA axis 4 can be a quartic function, thereby obscuring the true second
underlying gradient.
Promise of detrending-by-polynomials not supported by experiments with
simulated or real data.
Detrending-by-segments consistently performed best.
DETRENDING ARTEFACTS
DCA configuration often a triangle, diamond, or 'trumpet'.
DCA detrends by twisting the space so that empty space is left in
the middle of the points. Not really seen in 2-dimensional plots.
In 2-dimensional plots, one end of DCA axis 1 (usually the upper
end) has a wider range along DCA axis 2. This does not mean that
there is greater variation at that end of the gradient.
Twisting often exposes this variation.
Any variation at the opposite end can get twisted into DCA axis 3.
Carefully inspect DCA plots (2-D and 3-D) and their eigenvalues.
NON-LINEAR RESCALING IN DCA
Assume a species-packing model, variance of optima of species at a site (‘withinsite variance’) is an estimate of average response curve breadth (‘tolerance’) of
those species. Because of edge effect, species curves are narrower at edges of
axes than in centre and within-site variance is correspondingly smaller in sites
near the ends.
Rescale by equalising within-site variance at all points along axis by dividing into
small segments, expand those with small within-site variance and contract those
with large within-site variance.
Site scores then calculated as WA of species scores and standardised so that
within-site variance is 1.
Length of axis is range of site scores in ‘standard deviation’ units.
DCA – close approximation to ML Gaussian ordination with simulated data based
on two-dimensional species-packing model.
REF
RESCALING IN DCA
REF
• Rescale gradients so that the
weighted variance of species
scores = 1 along the axes
(Hill 2). Used in DECORANA
and CANOCO.
• Can also rescale by the
weighted mean tolerance
(Hill 2), so that the average
width of species responses
in 1 standard deviation.
REF
J. Oksanen (2002)
REF
DOWNWEIGHTING OF RARE TAXA IN
CA AND DCA
Downweighting – to reduce the abundances of rare species so that they
have even less weight on the ordination than if they were not
downweighted.
CA and DCA sensitive to species that occur only in a few species-poor sites.
If AMAX is the frequency of the commonest species, effect of downweighting is to reduce the abundance of species rarer than (AMAX/5) in
proportion to their frequency.
Rare species can only distort the analysis if they appear in samples with
few other more common species. These are, by definition, ‘deviant
samples’.
The same effect can often be achieved, more elegantly, by deleting these
deviant samples (unpopular!) or by making them passive or supplementary.
Do not take part in the analysis but positioned in the final results as if they
had taken part.
DCA PLOTS
• Axes scaled so that the smallest
sample score is 0. Species scores
have a wider dispersion than
samples scores. Some will be
negative.
• Axes taken as gradients, and
scaled in ecologically meaningful
'sd' units.
• Species scores taken as species
optima.
• Species scaled so that site scores
are their direct weighted averages
( = 1).
J. Oksanen (2002)
DCA ordination of the Dune Meadow Data. The scale marks are in
multiples of standard deviations (s.d.). (1 = 0.53, 2 = 0.29; axis
1 = 3.7 sd, axis 2 = 3.1 sd)
Gaussian response surfaces for several Dune Meadow species fitted by loglinear regression of the abundances of species on the site scores for the
first two DCA axes. Arrows for species running from their DCA scores to
their fitted optimum.
Optima and contours for some of the species. The contour
indicates where the abundance of a species is 60% of the
abundance at its optimum.
WHEN DOES DCA APPROXIMATE THE
GAUSSIAN RESPONSE MODEL?
CAJO TER BRAAK (1985 Biometrics 41, 870)
“Four conditions (equal tolerances, equal or independent
maxima, and equally-spaced or uniformly distributed optima and
sample points) are needed to show that (detrended)
correspondence analysis provides an approximate solution to the
unimodal models.”
= (D)CA can approximate the Gaussian model if we have infinite
species packing gradients. Inevitable that some distortions occur,
especially at gradient ends. Still a good approximation.
Despite its heuristic nature, DCA usually very useful.
EIGENVALUES IN DCA
It appears that the eigenvalues estimated in DECORANA and
CANOCO can be too low and may have an unclear interpretation.
The estimation of the eigenvalues can be upset by detrending
and in DECORANA and CANOCO they are estimated before
rescaling.
True eigenvalues are the ratio of the weighted variances of site
and species scores after detrending and rescaling
( = SSSITE/SSSPECIES)
Differences are generally small in many (but not all!) data-sets,
so be careful!
R (VEGAN)
CORRESPONDENCE ANALYSIS AND BLOCK
STRUCTURE
Data table with block structure. Outside the sub-tables A1, A2, A3
and A4, there are no presences so that there are four clusters of
sites that have no species in common (λ1=1, λ2=1, λ3=1, λ4=1)
Basis of TWINSPAN.
ORDERED VEGETATION TABLE
Axis 1
CEDIT or DIATAB
Axis 1
J. Oksanen (2002)
RARE SPECIES IN CA AND DCA
1.
2.
3.
4.
Remove rare species: unpopular
Downweight rare species: CANOCO option
Do not show rare species: popular
Make rare species ‘passive’; rarely done
CA2
Rare species cannot have an average profile and are
extreme in CA and DCA, but still have small weights.
Solutions to the 'problem':
5. Make deviant samples ‘passive’; rarely done
CA2
CA1
CA1
J. Oksanen (2002)
SCALING OF CA ORDINATION SCORES IN CORRESPONDENCE
ANALYSIS, as implemented by CANOCO 3.1 and later
Choices:
1.
Sample scores are weighted mean species scores.
2.
Species scores are weighted mean sample scores.
3.
Symmetric scaling. May be good compromise
Negative (–1, –2, –3) for Hill scaling
Positive scalings standardise scores to
Negatives standardise scores to / (1 - )
where =0, 0.5 or 1.
-1
Equalises the root mean square species tolerance among axes (Hill). Joint plot.
1 or 2
Gives biplots with approximate values of species values.
-1 or 1
Appropriate when interest is on configuration of samples. With 1, inter-sample
distance approximates 2 distance.
-2 or 2
Appropriate when interest is on configuration of variables. With 2, intersample distance approximates 2 distance.
3
Useful or even optimal biplot of species and samples.
DCA scaling is most like –1 in CA
CA/DCA INSTABILITY
TAUSCH, CHARLET, WRIXELMAN & ZAMUDIO (1996)
J. Vegetation Science 6, 897–902
“Patterns of ordination and classification instability resulting from changes in input
data order.”
“Random re-arrangement of entry order in three data sets often changed ordination
and classification results based on reciprocal averaging. Detrended correspondence
analysis had the greatest variability of the ordination methods tested.”
Is this a problem of
method – DCA, CA
algorithm – recripocal averaging, detrending, etc
program – DECORANA compiled for the PC-ORD package
INSTABILITY OF ORDINATION RESULTS
Oksanen, J. (1988) Vegetatio 74, 29–32
When the first two CA axes have similar eigenvalues, their order can be reversed due to
random variation in the data. In detrending the ordering on the first axis is preserved
and the second axis is detrended with respect to the first axis and different
configurations will arise if axes 1 and 2 can be interchanged.
1 = 0.549
2 = 0.538
3 = 0.387
“the investigator should be alert to associated aberrant results produced by DCA.”
Oksanen & Minchin (1997) J. Vegetation Science 8, 447–454
In eigenvalue routine in DECORANA and CANOCO based on so-called Power algorithm, the
solution is found iteratively. Convergence criteria (tolerances) are used to decide if the
current estimate of an eigenvalue is sufficiently close to the ‘true’ value. Iterations stop
when the residual (which would be zero if the estimate of the eigenvalue is exact) is less
than a specified tolerance level or the number of iterations allowed is reached.
Tolerance
Max no of iterations
TWINSPAN
0.003
5
DECORANA
0.0001
10
CANOCO
0.00005
15
Affects of using CANOCO, PCA, CA, DCA, etc
STRICT CANOCO 3.12a & later
0.000005
999
STRICT TWINSPAN 2.2a
0.000005
999
Instability problem disappears
Some additional thoughts:
“If eigenvalues are indeed very close, there is not likely to be any great biological insight
or gain from getting exactly Oksanen’s ‘right’ answer. Have I missed something?”
M.O. Hill
“This is a big fuss about nothing.”
C.J.F. ter Braak
LESSON OF STRICT CONVERGENCE – WATCH THE EIGENVALUES
An important lesson from Oksanen & Minchin is to WATCH THE EIGENVALUES.
When successive eigenvalues are very different, their respective axes have a
well-defined ranking in importance and interpretation is straightforward.
If a pair of eigenvalues is close, the rank of their associated axes is less clear. In
effect they are TWINS. While strict convergence will clarify the issue by using
more decimal places, thus determining the birth order of the twins, they are still
fundamentally TWINS!
UNIT OF LEAST PRECISION (ULP)
Real-life data are often very difficult to collect with precision. Ideally the
analyses we perform should be resilient to such noise. Is the unambiguous choice
of an eigenvector thanks to strict convergence criteria rendered ambiguous again
if we allow for some noise in the data?
PCA or CA/DCA?
PCA – linear response model
CA/DCA – unimodal response model
How to know which to use?
Gradient lengths important.
If short, good statistical reasons to use LINEAR methods.
If long, linear methods become less effective, UNIMODAL methods become more effective.
Range 1.5–3.0 standard deviations both are effective.
In practice:
Do a DCA first and establish gradient length.
If less than 2 SD, responses are monotonic. Use PCA.
If more than 2 SD, use CA or DCA.
When to use CA or DCA more difficult.
Ideally use CA (fewer assumptions) but if arch is present, use DCA.
DCA results can be unstable when eigenvalues 1 and 2 are close to each other (e.g. 0.55,
0.54) (Oksanen (1988) Vegetatio 74, 29–32). Always do a CA to assess the effect of
downtrending on the data-set.
(a) The response curves of 3 species
along a gradient; 12 quadrats are
located at the numbered points
marked with arrowheads (artificial
data). (b) Ordinations of the 12 data
points by PCA (hollow dots, dashed
line) and by CA (solid dots, solid
line). Both ordinations exhibit the
arch effect. The CA ordination also
shows scale contractions at both
extremities.
Hypothetical diagram of the occurrence of species A-J over an environmental gradient. The length of the
gradient is expressed in standard deviation units (SD units). Broken lines (A’, C’, H’, J’) describe fitted
occurrences of species A, C, H and J respectively. If sampling takes place over a gradient range <1.5 SD,
this means the occurrences of most species are best described by a linear model (A’ and C’). If sampling
takes place over a gradient range >3 SD, occurrences of most species are best described by an unimodal
model (H’ and J’).
Outline of ordination techniques. DCA
(detrended correspondence analysis)
was applied for the determination of
the length of the gradient (LG). LG is
important for choosing between
ordination based on a linear or on a
unimodal response model. In cases
where LG <3, ordination based on
linear response models is considered to
be the most appropriate. PCA
(principal component analysis)
visualizes variation in species data in
relation to best fitting theoretical
variables. Environmental variables
explaining this visualized variation are
deduced afterwards, hence, indirectly.
RDA (redundancy analysis) visualizes
variation in species data directly in
relation to quantified environmental
variables. Before analysis, covariables
may be introduced in RDA to
compensate for systematic differences
in experimental units. After RDA, a
permutation test can be used to
examine the significance of effects.
CURVATURE IN ORDINATIONS
Horseshoe and Arch
Curving of Gradients
• Eigenvector methods project species space into ordination space
• If there is a curvature in the ordination, there may not be a curvature in
the data
• The model behind ordination may be different from the vegetation
model
• PCA assumes linear relation between axes and latent variables
• CA assumes unimodal model
• In both cases a curvature (“horseshoe”, “arch”) can be produced from a
“straight” gradient
• All ordination methods are based on dissimilarity measures (hidden in
CA)
• If there are no species in common between two sites, it is difficult to
estimate their similarity (cf. extended dissimilarity approach)
HOW TO HANDLE CURVATURE
1.
Accept and acknowledge probably only one dominant gradient (try
Principal Curves to check).
2.
Degree of absence - if the curve is caused by the 'naughty noughts',
estimate how many and how much the species are absent.
3.
Extended dissimilarity - if all distant points have nothing in common,
try to estimate dissimilarity using stepping-stone points.
4.
Detrend as in DCA - remove curvature in ordination. May also remove
other structure and introduce a 'trumpet' or 'triangle' concentration
of sample points with a wider range of sample scores on DCA axis 2 at
one end (usually the upper end) of DCA axis 1. Does not mean that
there is greater variation in the data at that end!
5.
Monotonic regression as in non-metric scaling - does not require
linear relation between dissimilarities and ordination distances.
6.
Constrain - use of external variables to constrain ordination axes
seems to remove curvature. Direct gradient analysis (CCA, RDA).
TRANSFORMATION OF SPECIES
ABUNDANCES IN ORDINATION
Not a well-explored topic.
DCA/CA/CCA
Reasons for species
transformations
Work well with raw data, +/- data, log
transformed, square root transformed
Stabilise variance
Dampen effects of very abundant species
Suggest CA/DCA with untransformed, square root, and log (y + 1).
Look at total variance (= inertia) and relative sizes of axes 1, 2, and 3.
See which reduces variance without destroying axis 1, 2, 3 structure.
Square root with % data
PCA/RDA
Log double centring for % data with few variables
Log for abundance data
See P. Legendre & E.D. Gallagher (2001) Oecologia 129, 271-280.
Ecologically meaningful transformations for ordination of species data.
MULTIDIMENSIONAL SCALING (MDS)
RATIONALE
• Map points into low-dimensional ordination space so that the distances in
the ordination space are as close to the original dissimilarities as possible
- regression from measured dissimilarities to ordination distances
- principal co-ordinates analysis or metric scaling (PCOORD)
• Non-metric MDS: Real distances have the same rank order as the ordination
distances
- Monotonic regression
- No specified shape of the regression
- Goodness of fit measured by ‘stress’
- NMDSCAL
Crucial: How to measure distances in a meaningful way?
Question of selection of dissimilarity or proximity coefficient. Critical problem,
as in classification procedures
DISSIMILIARITIES FOR ECOLOGICAL DATA
• Presence/absence indices based on the
number of common species J on two sites
compared to species richness A, B of sites
• Similarity index s can be transformed to a
dissimilarity index d = 1 – s
• Quantitative generalisations:
1.
Manhattan style: use common part
of abundance and sums
2.
Euclidean style: use cross
products and sums of squares
• 2 x 2 contingency table Jaccard =
a
.
a+b+c
• J = a, A = a + b, B = a + c
J. Oksanen (2002)
POSSIBLE PROXIMITY MEASURES
Manhattan style
0/1
Jaccard
Sørensen
Euclidean style
Manhattan style
Jaccard
Steinhaus (Bray-Curtis)
Euclidean style
Similarity ratio
Ruzicka
J. Oksanen (2002)
PRINCIPAL CO-ORDINATES ANALYSIS – PCOORD
J.C.Gower 1966 Biometrika 53: 325-338
Start with a matrix of similarities A or dissimilarities D
aij 1 2 dij2
Transform A by subtracting column and row means and adding matrix mean.
Extract eigenvalues and eigenvectors, scale each eigenvector so that sum of
squares = eigenvalue (multiply by ).
Scaled eigenvectors are co-ordinates of objects whose distances apart are
given by best-fitting possible approximations to original distances in chosen
number of dimensions.
Measure of fit
2
n
/
i
i 1
i
i 1
Species can be added as their weighted sums (cf. PCA) or weighted averages
(cf. CA) of the site scores.
Note that d2ij is Euclidean distance, PCOORD = PCA
If d2ij is squared chi-squared distance, PCOORD = CA
R
Two-dimensional solution given by
applying principal co-ordinates
analysis to matrix of distances
between 48 British towns.
94.6 % goodness of fit.
NON-METRIC MULTIDIMENSIONAL
SCALING – NMDSCAL
Metric scaling
dij dij
or
dij2 dij2
Non-metric scaling
dij dik d jk dij dik d jk
i.e. monotonic relationship between original and new distances.
Use ranks, try to find a low-dimensional representation that minimises
‘stress’, a measure of how monotonic are d* with d. No unique solution.
Iterative procedures.
R
NON-METRIC MULTIDIMENSIONAL
SCALING (NMDSCAL)
• Rank-order relation with community dissimilarity and ordination
distance: no specified form of regression, but the best shape is
found from the data. Monotonic regression.
• Non-linear regression can cope with non-linear species responses
of various shapes: not dependent on Gaussian model.
• Iterative solution: no guarantee of convergence
• Must be solved separately for each number of dimensions: a lower
dimensional solution is not a subset of a higher, but each case is
solved individually.
R
MONOTONIC REGRESSION IN KRUSKAL'S
NMDSCAL ALGORITHM
• Measured community dissimilarities
and ordination distances have similar
rank ordering.
• No specified shape, but can cope with
different response shapes.
• Sum of squared residuals from the
regression: Stress.
• The basic model: finds gradients if
dissimilarities meaningful.
• Iterative solution: no guaranteed
convergence.
• Shephard plot comparing dissimilarities
with ordination distance.
R
J. Oksanen (2002)
NMDSCAL: Algorithm
• Take arbitrary initial site scores
• Move points so that stress decreases
• Stop when stress is stable: Minimum found
• Must be solved separately for each dimensionality
• The minimum may be local instead of global
• Use several starting configurations and compare the solutions to find the
stable one with minimum stress
• Species scores can be estimated as weighted averages or weighted sums and
can be “stretched” to make them comparable with CA or PCA species scores
NMDSCAL: Scores, stress and plots
• Stress cannot be meaningfully compared between data sets
• Site scores are arbitrary
• Site scores can be scaled to “half change”
• Species scores are supplementary
NMDSCAL plot is a map:
• All directions equal:
No axes
No origin
However, often rotated to principal components
RECOMMENDED PROCEDURE IN NMDSCAL
1. Use adequate dissimilarity indices: an adequate index gives a
good rank-order relation between community dissimilarity
and gradient distance
2. No convergence guaranteed: start with several random starts
and inspect those with lowest stress
3. Satisfied only if minimum stress configurations are similar
R
HOW MANY DIMENSIONS TO USE IN
NMDSCAL?
• In eigenvector methods axes are
orthogonal and previous axes
remain unchanged when new axes
are evaluated
• NMDSCAL solutions are separate
for each number of dimensions
• Adequate number of dimensions
difficult to know; after sudden
drop of stress, reached adequate
number of dimensions
J. Oksanen (2002)
NMDSCAL PLOT
• All that counts is the
configuration: axes have no
direct meaning
• No origin, just a map
• Species can be added as their
weighted averages or
weighted sums
J. Oksanen (2002)
METRIC AND NON-METRIC SCALINGS ONLY
PROVIDE MAPS
• Scaling tries to find an underlying
configuration from dissimilarities
• Scaling tries to draw a map using
distance data
• Metric scaling (PCOORD) assumes
linear relation, but NMDSCAL finds a
ranked relation
• Only the configuration counts:
- no origin, but only the
constellations
- no axes or natural directions, but
only a framework for points
Map of Europe from
road distances
J. Oksanen (2002)
SCALING OF NMDSCAL AXES
NMDSCAL axes have no unique scaling or direction, and all
rotations and scaling are equally good solutions.
• Customary to rotate to
principal components:
first dimension most
important
• Half-change scaling gives
ecologically meaningful
units for the axes
J. Oksanen (2002)
Comparison of results of analyses of road distances between 45 towns. (c)
Principal components analysis using the first two principal components. (d)
Non-metric multidimensional scaling. Outline of Britain in (c) and (d)
aligned so that Carlisle and Brighton are correctly positioned.
COMPARING ORDINATION METHODS
• Ordination methods cannot be compared with real data sets: the
truth is unknown
- The correct structure is inferred from the data, and the
comparison may be biased towards the favoured result
• Comparison needs external criteria (e.g. environmental variables)
• Simulated community pattern:
- assume an interpretable gradient pattern, and see if the method
can find this pattern; reliable only if it finds the known pattern
- robustness is the ability to work even when the assumptions are
violated
COMPAS, COENOFLEX, R (Vegan)
METHODS OF UNCONSTRAINED ORDINATION OF A
MULTIVARIATE SET OF DATA, Y.
Name of method
(acronyms, synonyms)
Distance
measure
preserved
Relationship of
ordination axes
with original
variables
Criterion for drawing ordination axes
Principal Components
Analysis (PCA)
Euclidean
distance
Linear
Finds axis that maximises the total
variance (or, equivalently, that
minimises the total residual
variation)
Correspondence Analysis
(CA, reciprocal
averaging, dual scaling)
Chi-square
distance
Unimodal
(approximately
Gaussian)
Finds axis that maximises dispersion
of species scores (which are
themselves weighted averages of site
scores)
Principal Coordinates
Analysis (PCO, PCoA,
metric multidimensional
scaling, classical scaling,
Torgerson scaling)
Any chosen
distance or
dissimilarity
measure
Unknown;
depends on
distance
measure chosen
Euclidean distances in new fulldimensional space are equal to
original distances (or dissimilarities)
Nonmetric
Multidimensional Scaling
(NMD, NMDSCAL)
Any chosen
distance or
dissimilarity
measure
Unknown;
depends on
distance
measure chosen
The number of dimensions for the
new space is chosen a priori
(reduced). Euclidean distances in
new space are monotonically related
to original distances
IS THERE AN ‘IDEAL’ METHOD FOR
INDIRECT GRADIENT ANALYSIS?
The ‘ideal’ ordination method should possess the following qualities:
1. It recovers gradients without distortion.
2. If clusters exist in nature, these should be obvious in the ordination.
3. It gives the same result every time for a given data set
4. There is a unique solution.
5. Ecological similarity is related to proximity in ordination space.
6. Scaling of axes is related to beta-diversity.
7. The method is not sensitive to noise.
8. ‘Signal’ and ‘noise’ are easily separated.
9. You do not need to pre-specify the number of axes.
10. The solution is the same, no matter how many dimensions one selects to examine.
11. Unless by choice, all objects are treated equally.
12. The solution does not take too much computer time.
13. The method is robust – works well for short or long gradients; for low or high noise;
for sparse or full data matrices; for big or small data sets; for species-poor and
species-rich data.
14. To the statistician it is elegant.
15. To the ecologist it is available, easy to use, and inexpensive.
CURRENT STATUS OF INDIRECT
GRADIENT ANALYSIS METHODS
PCA
CA
DCA
Yes/No
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes/No
Yes
Yes
Yes
Yes
Yes/No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Computer time
Fast
Fast
Fast
Robust
Elegant
Ecologist – available, easy to use, inexpensive
Current status (15)
No
Yes
Yes
11½
No
Yes
Yes
12
Yes
No
Yes
13
Good gradient recovery and no distortion
Find clusters
Same result every time
Unique solution
Ecological similarity
Scaling of axes diversity
Not sensitive to noise
Signal/noise separation
Do not have to specify axis number
Solution same, irrespective of axis number
Objects treated equally
PCOORD NMDSCAL
No
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Can be
slow
No
Yes
Yes
10
Yes
Yes
No
No
Yes
Can
Yes
No
No
No
Yes
Slow
Yes
No
No
6½
CONCLUSIONS
Non-metric
scaling
Useful with ‘poor’ data, missing values. Q-mode
(n x n matrix). 400-500 objects.
Metric scaling =
Any distance measure, sample scores only, species
principal coscores can be roughly estimated, interpretation
ordinates analysis difficult. Linear or unimodal – depends on DC.
Correspondence
analysis & DCA
Specific 2 distance, unimodal response model,
species and site scores, interpretation moderate,
relative abundances, ‘shape’.
Principal
components
analysis
Euclidean distance, species and site scores, linear
response model, interpretation via biplots easy,
absolute abundances, ‘size’.
FUTURE DEVELOPMENTS
What to do with data containing many zero values and long gradients, and
to avoid the inherent problems of the Chi-squared distance metric implicit
in CA and CCA?
These are:
1.
Implicit use of relative abundances
2.
A difference between abundance values for a common species
contributes less to the distance than the same difference for a rare
species, so rare species may have an unduly large influence on the
analysis
Solutions:
1.
Delete rare species
2.
Empirical downweighting of rare species as in CANOCO
3.
Data transformations that preserve the Euclidean distances and balance
rare and common species
P. Legendre & E.D. Gallagher (2001) Oecologia 129, 271-280
Software:
www.fas.umontreal.ca/biol/legendre
(1) Chord distance
m
D12
j 1
y1 j
m
y
j 1
(2) Chi-square metric D 1
12
j 1 y
m
y2 j
m
y
2
1j
j 1
2
2j
y1 j y 2 j
j y1 y 2
2
2
where y+j is the column (species) sum for species j, and y1+
is the row (sample) sum for sample 1
y
y
D12 1 j 2 j
y 2
j 1 y 1
m
(3) Species profiles
(4) Hellinger distance
D12
y1 j
j 1
y 1
m
2
y2j
y 2
2
(These are also relevant transformations for minimum-variance cluster analysis and k-means
cluster analysis that minimise a least-squares function. These transformations result in
Euclidean metrics that can be represented in Euclidean space & that preserve sum of squares.)
PRINCIPAL CURVES
De'ath, G. (1999) Ecology 80, 2237-2253
Principal curves are smooth one-dimensional curves in a high-dimension space.
Form of non-linear PCA, analogous to LOESS smoothers as a non-linear
regression tool.
Principal curves minimise sum of squares distances from data (as does PCA) but
to a curve, not to a line or plane as in PCA.
Two species along single gradient
Principal curve showing gradient location
R, S-Plus
Relation of distances along principal curve and
original gradient. Rates high when species
abundances change rapidly (centre of gradient),
low when species constant (ends of gradient).
FITTING PRINCIPAL CURVES
25 sites, 4 species, Gaussian responses, one gradient. Plotted on first two
PCA axes. Iterative fitting of principal curves.
(a) Data using PCA axis 2
(39.4%) as initial curve
(b) First iteration,
snooth spline 3 d.f.
(c) Convergence with
3 d.f.
(d) Improved and final
fit with 7 d.f. 98.3%
variance
(e) Result of using PCA axis
1 (50.4%) as start and 3 d.f.
(f) As (e) but 7 d.f.
Actual and modelled species responses along principal curve
PRINCIPAL CURVES AND REAL DATA
12 hunting spiders at 28 sites and 6 environmental variables
Principal curve superimposed on PCA biplot. Numbers are locations along the gradient.
Principal curve captures 90% of species variance. Modelled environmental variable
values for 6 locations show PC is mainly moisture, sand, moss and twig gradient.
SPECIES RESPONSES ALONG PRINCIPAL CURVES
All are unimodal and optima
well approximated by
intersection of species vectors
with the principal curve.
Curves have approximately
equal tolerances.
Ideal for finding onedimensional gradients that
explain species composition as
well, or better than, higher
dimensional ordination
methods. Have been extended
to two-dimensional gradients
as principal surfaces.
Abundances and response curves from the principal curve
gradient analysis of the hunting spider data. Each panel
represents a single species (eight-letter code). The plots suggest
that the principal curve fit is adequate and show unimodal
response curves of approximately equal tolerances, with
maxima located at widely varying locations along the gradient.
Less restrictive in assumptions
that PCA, CA, or DCA. Only
assume that responses are
smooth. Very neutral method
(cf. LOESS in regression).
Computationally difficult,
hardly used yet...
CURRENT USES OF INDIRECT
ORDINATION METHODS
Hill, M.O. (1988) Bull. Soc. Roy. Bot. Belg. 121, 134–41
“Ordination is a rather artificial technique. The idea that the world
consists of a series of environmental gradients, along which we should
place our vegetation samples, is attractive. But this remains an
artificial view of vegetation. In the end the behaviour of vegetation
should be interpreted in terms of its structure, the autoecology of its
species and, above all, the time factor. At this level, trends become
unimportant and multivariate analysis is perhaps irrelevant.
Ordination is useful to provide a first description but it cannot provide
deeper biological insights.”
MERITS AND DRAWBACKS OF INDIRECT
ORDINATION METHODS
1. Can distract attention from individual species responses by
focussing on the overall multivariate response only.
2. As it is a correlative method, it can help with hypothesis
generation.
3. It can rarely, if ever, demonstrate causality.
4. Ordinations can provide useful low-dimensional representations
of complex data. Valuable for summarisation and for hypothesis
generation.
5. Ordination is a tool and a means to an end. It is not an end in
itself.
COMPARING DIFFERENT ORDINATIONS
Ordinations can be almost similar, but
- Axes have slightly different
orientation
- Axes are reversed
• Procrustes analysis: Rotates
ordinations to maximal similarity
and estimates the minimized
difference
- Standard in MDS, should be used
always when comparing ordinations
• Individual differences scaling can be
used to analyse whether there is a
common structure between
repeated samplings
PROCRUSTES ROTATION
Two configurations of points in ordinations representing the same n
samples.
Take one configuration as fixed, move the other to match as closely as
possible and to minimise the sum of squared distances of the
transformed points from the respective points of the fixed configuration
1.
Translation of origin – shift the origins of the co-ordinate axes
2.
Rotation and/or reflection of axes
3.
Uniform scaling (deflate or inflate the axis scale)
Single points can move a lot although the sum of squared distances can
stay fairly constant, especially in large data sets.
R (Vegan)
EXAMPLE OF PROCRUSTES ROTATION
pH and plot yield for same plots
PCA of vegetation data from Park Grass plots
Yield and pH explain about 65% of the sum of squares in the PCA ordination.
Some plots do not fit well e.g. 14a, 14d, 11/1b
PCA
Total for PCA ordination
Configuration of Park Grass plots for
standardised yield and pH: rotated,
reflected and scaled to fit PCA ordination.
Lines connect labelled points for the
rotated yield/pH ordination to their
respective fixed points from the PCA.
Procrustes rotation of
NMDSCAL (circles)
and PCA (arrows)
ordinations
R
Procrustes rotation residuals
(differences between NMDSCAL and
PCA ordination site scores)
R
GENERALISED PROCRUSTES ROTATION
Any number of configurations. Basic idea is to find a
consensus or centroid configuration so that the fit of
ordinary Procrustes rotation to this centroid over all
configurations, is optimal. Minimise m2 where m2 = mi2
where mi2 is Procrustes statistic for each pair-wise
comparison.
1. Translation
2. Rotation and/or
reflection
3. Scaling
Example – to compare results of 12 different ordination methods to the same data
m2 can be considered as squared distances in a PCOORD
analysis
N = non-metric scaling
C = correspondence analysis
P = principal co-ordinates analysis
Location of ordination methods on the
two principal co-ordinates axes: these
two axes represent 75% of the variation
m2 statistics.
1,2 = presence/absence data only
2 = all joint absences ignored
3,4,5 = abundance data
3 = log abundance data
4 = all joint absences ignored
5 = abundance data
SOFTWARE
Technique
Program
PCA
CANOCO, NT-SYS, R, R (Vegan)
CA
CANOCO, DECORANA, NT-SYS, R (Vegan)
DCA
CANOCO, DECORANA, R (Vegan)
PCOORD
PCOORD (+DCMAT), PRCOORD,
NT-SYS, Dist-PCoA, R, R (Vegan)
PCOORD (Marti Andersen
www.stat.auckland.ac.nz/~mja/)
Non-metric scaling
EUSCAL, NT-SYS, WINKYST, R, R (Vegan)
Simulations
COMPAS, COENOFLEX, R (Vegan)
Principal curves
S-PLUS, R
Procrustes rotation
GENSTAT, SOLCOMP, R (Vegan)
MAJOR PLAYERS IN INDIRECT GRADIENT
ANALYSIS
Karl Pearson 1901
Invented PCA
Joseph B. Kruskal 1964
Development of NMDSCAL
Cajo J.F. ter Braak
1985 Unified PCA
and CA in terms of
response models
David W. Goodall
1954 First use of
PCA in ecology
John C. Gower 1966
Popularised PCOORD,
invented Procrustes
rotation
Mark O. Hill 1973
Popularised CA in
ecology, 1980 DCA
Jari Oksanen 1990s
Continuous questioning
about ordination methods,
championing NMDSCAL