Correspondence Analysis: Simple ( CA) and Detrended (DCA)

Download Report

Transcript Correspondence Analysis: Simple ( CA) and Detrended (DCA)

Correspondence Analysis:
Simple ( CA) and Detrended (DCA)
Vamsi
Sundus
Shawnalee
What is Correspondence Analysis?
 AKA Reciprocal Averaging (RA).
 Basically: An ordination technique that involves repeatedly
calculating weighted averages.
 Popular only in France (due to Benzecri).
What is Detrended Correspondence
Analysis?
 Designed specifically to solve certain problems found when
using CA on ecological data based on “empirical desire to
reshape data closer to the models visualized by ecologists.”
 Popular mainly in the ecological community.
Weighted Means
 Weighted mean results when some of the numbers in the
data are repeated.
 Consider:
 Arithmetic Mean:
1  2  3  4  5 15
 3
5
5
 Weighted Mean; value of 1 found 10 times.
(10 1)  2  3  4  5 24

 1.71...
10  1  1  1  1
14
Application of Weighted Means
 Let’s say we had some hypothetical data as follows:
Year
1
2
3
4
5
6
7
8
9
10
Counts
100
90
80
60
50
40
20
5
0
0
Application of Weighted Means
 To know what’s the average lifetime of the species, you
would have to use the weighted averages to compute a
weighted mean (below):
1100  2  90  3  80  4  60  5  50  6  40  7  20  8  5
 3.21
100  90  80  60  50  40  20  5
Year Counts
1
100
2
90
3
80
4
60
5
50
6
40
7
20
8
5
9
0
10
0
M
e
a
n
Y
e
a
r
Application CA Algorithm to find “mean
species” in a 3 species case.
 But theoretically, most ecologists and the like would be
observing multiple species at the same time and hence have
count data for these multi-species groups such as follows:
Year Counts
1
100
2
90
3
80
M Y
4
60
e e
5
50
a a
6
40
n r
7
20
8
5
9
0
10
0
Year Counts
1
0
2
10
3
20
4
35
5
50
6
60
7
30
8
20
9
10
10
0
M
e
a
n
Y
e
a
r
Year Counts
1
0
2
0
3
5
4
10
5
20
6
30
7
40
8
60
9
75
10
90
M
e
a
n
Y
e
a
r
Step 1
 Start with a random weighting. It’s pretty kosher to start
from 0.0  100.0 in whatever increments are needed.
 In our case, we’ll do (0,50,100) for (A, B, C)
 Use this formula for nth species rank:
n 1
100 
| S  species
S 1
Step 2
 Use the starter weights (which are arbitrary essentially) and
compute a weighting for each of the years
Year Counts Counts Counts
1
100
0 0
2
90
10 0
3
80
20 5
4
60
35 10
5
50
50 20
6
40
60 30
7
20
30 40
8
5
20 60
9
0
10 75
10
0
0 90
Y1
--> 0.0
--> 5.0
--> 14.3
--> 26.2
--> 37.5
--> 46.2
--> 61.1
--> 82.4
--> 94.1
--> 100.0
0 100  50  0  100  0
 0.0 | Year1
100  0  0
Step 3
 We can now calculate a new weighting for each species using
these new year weightings.
0 100  5  90  14.3  80  ...  0  94.9  0 100
 19.1
100  90  ...  20  5
 Calculate similarly for B, C
Old weightings for
species
S10
S1a
0
19.1
50
43.9
100
78.5
New calculated
weightings for
species
Step 4
 These new weightings for each species though aren’t that
useful, so we need to rescale them back to 0  100, instead
of currently 19.1  78.5.
 So, to do this, simply use a logical rescaling method.
S1a
19.1
43.9
78.5
100  ( S1a  MIN )
S1b 
MAX  MIN
Step 4 cont.
 So, after computing the rescaled values, we find the
following:
S10
0
50
100
S1a 19.1
43.9
78.5
S1b 0.00 41.75 100.00
Step 5
 This is now one cycle of the CA completed.
 “Weightings for each year are recalculated using the new,
rescaled weightings for the species.”
 Eventually a stable patter will emerge.
 10-20 iterations.
Correspondence Analysis
 That was CA utilized in a simplistic example.
Detrended Correspondence Analysis
• This technique is not purely mathematical
• It’s a series of rules that are used to reshape data to make it
friendlier for analysis.
• Once again, primarily used for ecological data, but can be
extended to anything (data simply can’t contain negative
values).
• The reason that this technique is used is to over come the
arch effect (the horseshoe effect).
Arch Effect (Horseshoe Effect)
• Found in data whenever “PCA or other distance conserving
ordination techniques are applied to data which follow a
continuous gradient, along which there is a progressive
turnover of dominant variables.”
– Such as in ecological succession
• After ordination by a distance conserving technique and the
first two axes are plotted against each other, one would find
an arch shape.
Steps of DCA
 Two major stages
 Ordination by CA (as previous)
 Then get rid of arch effect by brute-force.
Goal (the bold one)
Notice
 There’s a loss of information, specifically the second CA axis,
the Y-axis in this case.
Software
 Standard software according to Shaw is based on the same
source code and entered through some front-end of
DECORANA.
 However, there is a package to do this in R.
Basics in R.
 decorana(veg, iweigh=0, iresc=4, ira=0, mk=26, short=0,
before=NULL, after=NULL)
 veg = data matrix
 Iweigh = downweighting of rare species. Both CA and DCA are
extremely sensitive to rare species, so this would decrease the
importance of rare species.
 Iresc = number of cycles of reiteration.
 Ira = turns CA into DCA, if turned on (0 = detrended, 1 =
simple)
 There’s no information to extend this in Shaw, so, leaving it
until a later time.
FIN