Multivariate analyses - University of Melbourne

Download Report

Transcript Multivariate analyses - University of Melbourne

Multidimensional scaling
MDS
 G. Quinn, M. Burgman & J. Carey 2003
Aim
• Graphical representation of
dissimilarities between objects in as few
dimensions (axes) as possible
• Graphical representation is termed an
“ordination” in ecology
• Axes of graph represent new variables
which are summaries of original
variables
Approximate distances by air (km)
between Australian Capital cities
CAN
SYD
MELB
BRIS
ADEL
PER
HOB
DAR
CAN
0
246
506
1021
976
3126
1120
3409
SYD
.
0
727
775
1185
3339
1075
3163
MELB
.
.
0
1393
651
2804
613
3355
BRIS
.
.
.
0
1961
4114
1852
2886
ADEL
.
.
.
.
0
2152
1264
2727
PER
.
.
.
.
.
0
3417
2951
HOB
.
.
.
.
.
.
0
4186
DAR
.
.
.
.
.
.
.
0
Stress = 0.014
2
Dimension 2
1
0
-1
-2
-2
-1
0
1
Dimension 1
2
Dimension 2 x -1
2
Darwin
1
Brisbane
Sydney
Canberra
0
Melbourne
Adelaide
-1
-2
-2
Hobart
Perth
-1
0
1
Dimension 1
2
http://www.boardtheworld.com/resorts/country.php?cc=AU
Haynes & Quinn (unpublished)
• Four sites along Morwell River
– site 1 upstream from planned sewage
outfall
– sites 2, 3 and 4 downstream
– site 3 below fish farm
• Abundance of all species of
invertebrates recorded from 3 stations
at each site
• 12 objects (sampling units):
– 4 sites by 3 stations at each site
• 94 variables (species)
Do invertebrate communities (or
assemblages) differ between stations
and sites?
– Is Site 1 different from rest?
Multidimensional scaling
1. Set up a raw data matrix
Species
Site/sample
S11
S12
S13
S21
S22
S23
etc.
1
2
3
4
5
54
37
68
60
47
60
0
1
2
0
0
0
0
0
0
0
0
0
5
4
2
0
2
0
0
0
0
1
0
0
etc.
2. Calculate a dissimilarity (Bray-Curtis) matrix
S11
S12
S13
S21
S22
S23
S11
.000
S12
.203 .000
S13
.666 .652 .000
S21
.216 .331 .759 .000
S22
.328 .410 .796 .191 .000
S23
.336 .432 .796 .183 .054 .000
etc.
etc.
3. Decide on number of dimensions
(axes) for the ordination:
– suspected number of underlying
ecological gradients
– match distances between objects on plot
and dissimilarities between objects as
closely as possible
– more dimensions means better match
– usually between 2 and 4 dimensions
4. Arrange objects (eg. sampling units)
initially on ordination plot in chosen
number of dimensions
– starting configuration
– usually generated randomly
Starting configuration
2
1
Axis II
0
-1
-2
-2
Site 1
-1
Site 2
0
Axis I
1
Site 3
2
Site 4
5. Compare distances between objects on
ordination plot and Bray-Curtis
dissimilarities between objects
– strength of relationship measured by
Kruskal’s stress value
– measures “badness of fit” so lower values
indicate better match
– plot is called Shepard plot
Starting configuration
Distance
3
Axis II
2
1
2
0
1
-1
-2
-2
0
-1
Site 1
Site 3
0
Axis I
1
2
0
0.5
1
Dissimilarity
Site 2
Site 4
Shepard plot
Stress = 0.394
6. Move objects on ordination plot
iteratively by method of steepest
descent
– each step improves match between
dissimilarities and distances between
objects on ordination plot
– lowers stress value
After 20 iterations
Axis II
2
Distance
3
1
2
0
1
-1
-2
-2
0
-1
0
Axis I
1
2
0
Stress = 0.119
0.5
Dissimilarity
1
7. Final configuration
• further moving of objects on ordination
plot cannot improve match between
dissimilarities and distances
• stress as low as possible
Final configuration - 50 iterations
Axis I
2
Distance
3
1
2
0
1
-1
-2
-2
0
-1
0
Axis II
1
2
0
Stress = 0.069
0.5
Dissimilarity
1
Iteration history
Iteration
1
2
3
4
...
20
...
49
50
Stress
0.394
0.368
0.357
0.351
...
0.119
...
0.069
0.069
Stress of final configuration is 0.069
How low should stress be?
Clarke (1993) suggests:
• > 0.20 is basically random
• < 0.15 is good
• < 0.10 is ideal
– configuration is close to actual
dissimilarities
How many dimensions?
• Increasing no. of dimensions above 4
usually offers little reduction in stress
• 2 or 3 dimensions usually adequate to
get good fit (ie. low stress)
• 2 dimensions straightforward to plot
• Based on how
stress is
measured
• Relationship
between distance
and dissimilarity
Distance
Types of MDS
Dissimilarity
Metric MDS
• stress measured from relationship
between actual dissimilarities and
distances
• but relationship often non-linear
• inefficient?
Non-metric MDS
• stress measured from relationship
between ranks of dissimilarities and
ranks of distances
• similar to Spearman rank correlation
• better for ecological data
Anderson et al. (1994)
• Effects of substratum type on
recruitment of intertidal estuarine fouling
assemblage
• Six replicate panels of 4 substrata
placed in estuary for 1 month at 2 times
of the year
• 14 species in total recorded
• MDS to examine relationship between
panel
– do substrata appear different in spp
composition?
• Bray-Curtis dissimilarity
• Non-metric MDS
January
October
Stress = 0.126
Stress = 0.116
concrete
aluminium
plywood
fibreglass
Comparing groups in MDS
Haynes & Quinn data
• 4 groups (sites) - must be a priori groups
• 3 replicate stations per site (n = 3)
• Are sites significantly different in species
composition?
• Is there an ANOVA-like equivalent for
MDS?
Analysis of similarities ANOSIM
• Uses (dis)similarity matrix
• Because dissimilarities are not normally
distributed, uses ranks of pairwise dissimilarities
• Because dissimilarities are not independent of
each other, uses randomisation test rather than
usual significance testing procedure
• Generates own test statistic (called R) by
randomisation of rank dissimilarities
• Available through PRIMER package
– Not SYSTAT nor SPSS
Null hypothesis
Average of rank dissimilarities between objects
within groups = average of rank dissimilarities
between objects between groups
rB = rW
No difference in species composition between
groups
Within group dissimilarities
Between group dissimilarities
Test statistic
R
average of rank dissimilarities between objects
between groups - average of rank
dissimilarities between objects within groups
R = (rB - rW) / (M / 2)
•
•
where M = n(n-1)/2
R between -1 and +1.
Use randomization test to generate probability
distribution of R when H0 is true.
Haynes & Quinn ANOSIM
• R = 0.583, P = 0.002 so reject Ho.
• Significant differences between sites
• Followed by pairwise ANOSIM comparisons
• Adjusted significance levels
ANOSIM
• Available also for 2 level nested and
factorial designs.
• Primer package.
• Limited to total of 125 objects (e.g. SU’s).
• If 2 groups, n must be > 4 for randomization
procedure.
• Alternative is to use ANOVA on NMDS
axis scores - ANOSIM is better.
Which variables (species) most
important?
• For MDS-type analyses, three methods:
– correlate individual variables (species
abundances) with axis scores
– SIMPER (similarity percentages) to determine
which species contribute most to Bray-Curtis
dissimilarity
– CA and/or CANOCO to simultaneously
ordinate objects and species - biplots
SIMPER (similarity percentages)
 |yij - yik|
Bray-Curtis dissimilarity =
yij + yik)
Note  is summing over each species, 1 to p.
The contribution of species i is:
|yij - yik|
i =
yij + yik)
Which species discriminate
groups of objects?
• Calculate average i over all pairs of objects
between groups
– larger values indicate species contribute more to group
differences
• Calculate standard deviation of i
– smaller values indicate species contribution is
consistent across all pairs of objects
• Calculate ratio of i / SD(i)
– larger values indicate good discriminating species
between 2 groups
Linking biota MDS to
environmental variables
• Are differences between SU’s in species
abundances related to differences in
environmental variables?
• Correlate MDS axis scores with
environmental variables
• BIO-ENV procedure - correlates
dissimilarities from biota with
dissimilarities from environmental variables
BIO-ENV procedure
Samples
Species
abundances
Dissimilarity matrix
Bray-Curtis
Rank correlation
Env
variables
Euclidean
Subsets of
variables
- Spearman
- Weighted Spearman
BIO-ENV correlations
• Exploratory rather than hypothesis testing
procedure.
• Tries to find best combination of
environmental variables, ie. combination
most correlated with biotic dissimilarities.
• A priori chosen correlations can be tested
with RELATE procedure - randomization
test of correlation.
Vector fitting
• Uses final NMDS configuration rather than
dissimilarity matrix - dependent on dimension
number.
• Calculates vector (direction) through configuration
of samples along which sample scores have max.
correlation with environmental variable (one at a
time).
• Significance testing (Ho: no correlation) done with
randomization (Monte-Carlo) test.
• Available in DECODA and PATN.