Introduction

Download Report

Transcript Introduction

ORDINATION AND GRADIENT
ANALYSIS
Lecture 1:
Introduction and Concepts of
Gradient Analysis
BIO-303
INTRODUCTION
Level of course
Aims of course
What are multivariate data?
What is multivariate data analysis?
Aims of multivariate data analysis
Why do multivariate data analysis?
Terminology
Types of variables
Geometrical models and concept of similarity (dissimilarity or distance)
Gradient analysis concepts
Course topics
LEVEL OF THE COURSE
Approach from practical biological viewpoint, not statistical
theory viewpoint.
Assume no background in matrix algebra, eigenanalysis, or
statistical theory.
“Truths which can be proved can also be
known by faith. The proofs are difficult and
can only be understood by the learned; but
faith is necessary also to the young, and to
those who, from practical preoccupations,
have not the leisure to learn. For them,
revelation suffices.”
Bertrand Russell 1946
The History of Western Philosophy
“It cannot be too strongly emphasised that a long
mathematical argument can be fully understood on first
reading only when it is very elementary indeed, relative to
the reader’s mathematical knowledge. If one wants only the
gist of it, he may read such material once only, but otherwise
he may expect to read it at least once again. Serious reading
of mathematics is best done sitting bolt upright on a hard
chair at a desk. Pencil and paper are indispensable.”
L Savage 1972
The Foundations of Statistics.
BUT:
“A journey of a thousand miles begins with a single step”
Lao Tsu
COURSE PENSUM
Jan Lepš and Petr Šmilauer 2003
Multivariate analysis of ecological data using
CANOCO. Cambridge University Press
and PowerPoint notes for Lectures 1 - 5
and Notes for practicals for Classes 1 - 5
STATUS OF MULTIVARIATE NUMERICAL
DATA ANALYSIS
Basic mathematics of correlation, regression, analysis of variance,
eigenanalysis, randomisation etc. not new, worked out in 1920-1930s.
Arithmetic manipulations and calculations involved so numerous and so
time consuming; virtually impossible to work with anything other than
smallest data-sets on hand calculator or early computer.
Development of numerical data analysis closely linked to development of
computers.
Now possible to do in seconds what would have taken hours, days, even
weeks.
Increased availability of computer program packages has advantages and
disadvantages.
Advantages
Disadvantages
•fast
•too fast
•painless
•too easy
•simple
•too simple
Need to understand a technique
well before one can critically
evaluate results. Sound
interpretation requires a good
understanding of the technique.
AIMS
Provide introductory understanding to the most appropriate methods for
the numerical analysis of complex multivariate biological data. Recent
maturation of methods.
Provide introduction to what these methods do and do not do.
Provide some guidance as to when and when not to use particular
methods.
Provide an outline of major assumptions, limitations, strengths, and
weaknesses of different methods.
Indicate to you when to seek expert advice.
Encourage numerical thinking (ideas, reasons, potentialities behind the
techniques). Not so concerned here with numerical arithmetic (the
numerical manipulations involved).
July 18, 1998.
Plot 6 (quadrats)
Species
(Rt. Bank, c 300 m S of mouth of Steepbank R., 40m inland)
#11
#12
#13
#14
#15
#16
#17
#18
#19
#20
Equisetum pratense
4
-
1
2
-
7
10
13
18
17
Rubus pubescens
11
4
13
18
4
7
17
-
13
2
R. strigosus
1
8
1
2
19
8
3
5
2
8
Cornus stolonifera
6
-
-
1
-
-
1
1
-
1
C. canadenis
-
-
2
-
12
-
-
1
-
-
Rosa acicularis
2
2
1
6
11
2
1
-
3
3
Galium boreale
-
-
12
3
22
-
2
-
1
-
Ribes oxycanthoides
-
1
-
4
15
-
-
8
-
3
R. triste
2
9
13
2
-
4
10
6
16
9
Mitella nuda
-
6
-
-
1
9
-
16
25
19
Mertensia nudicaulis
-
11
6
10
-
2
10
4
1
12
Aralia nudicaulis
4
-
6
1
3
-
-
1
-
1
Viburnum edule
2
15
5
6
-
7
4
5
3
4
Calamagrostis canescens
3
3
-
1
1
6
11
8
4
4
Populus balsamifera (seedling)
2
1
-
1
1
2
2
-
1
-
Prunus virginiana (seedling)
-
-
1
-
-
-
-
-
1
-
Populus tremuloides (seedling)
-
-
1
-
1
-
-
1
-
-
Actaea rubra
-
-
1
-
1
-
-
-
-
1
Circaea alpina
4
-
1
18
1
3
-
-
2
11
Thalictrun venulosum
3
-
-
-
-
1
1
-
-
-
Matteuccia struthiopteris
-
-
-
-
-
-
-
-
-
2
12
10
14
14
12
12
12
12
13
14
NO. OF SPECIES
A typical page from a field notebook. This one records observations on the ground vegetation in
Populus balsamifera woodland in the flood plain of the Athabasca River, Alberta.
TYPES OF MULTIVARIATE DATA
Object (n)
Variable (m)
Botany (plant ecology)
Quadrat
Relevé
Plot
Plant species
Archaeology
Sites
Artefacts
Geology
Samples
Particle-size classes
Chemistry
Stream sediments
Trace elements
Zoology
Geographical localities
Morphometric
characters
Pollen analysis
Sediment samples
Pollen types
Diatom analysis
Sediment samples
Diatom types
Palaeontology
Rock samples
Fossil taxa
...
...
...
Features in common –
MANY OBJECTS n
MANY VARIABLES m
CAN BE ARRANGED IN DATA MATRIX of
SAMPLES or OBJECTS x VARIABLES
DATA MATRIX
Samples (n samples)
Variables
(m vars)
1
2
3
4
...
N
(columns)
1
xik
*
*
*
...
X1n
2
*
*
*
*
3
*
*
*
*
4
*
*
*
*
...
...
M
(rows)
xm1
Xmn
Matrix X with n columns x m rows. n x m matrix. Order (n x m).
subscript
 x11
X  
 x21
x12
x22
x13 

x23 
X21
element in row
two
Xik
column
one
row i
column k
FEATURES OF MULTIVARIATE DATA
Complex
Show:
Noise
Redundancy
Internal relationships
Outliers
Some information in the data
is only indirectly interpretable
BIOLOGICAL DATA
ENVIRONMENTAL DATA
many species
fewer variables
+/–, quantitative, often %, many
zero values, skewed
+/–, ranks, quantitative
non-linear responses to
environment
non-normal
linear inter-relationships, often
high correlations, some redundancy
STATISTICS AND DATA ANALYSIS
1.
Hypothesis testing ‘confirmatory data analysis’ (CDA).
2.
Model building
explanatory
empirical
[statistical]
Pielou (1981, Quart. Rev. Biol.)
“Models are often displayed with little or no effort to link them with the
real world. As a result the whole body of knowledge and theory has grown
top-heavy with models... Models are not useless but too much should not be
expected of them. Modelling is only a part, and a subordinate part, of
research.”
3.
Hypothesis generation ‘exploratory data analysis’ (EDA).
Detective work
CDA & EDA - different aims, philosophies, methods
“We need both exploratory and confirmatory”.
J W Tukey 1980
EXPLORATORY
DATA ANALYSIS
CONFIRMATORY
DATA ANALYSIS
Real world 'facts'
Observations
Measurements
Hypotheses
Real world
'facts'
Data
Data analysis
Patterns
Observations
measurements
Data
Statistical
testing
'Information'
Hypothesis
testing
Hypotheses
Decisions
Theory
Underlying statistical model
(e.g. linear or unimodal
response)
Biological Data Y
Exploratory data analysis
Description
Confirmatory data analysis
Additional (e.g. environmental
data) X
Testable ‘null
hypothesis’
Rejected hypotheses
induction
Observation
Theory/Paradigm
Scientific HO
Scientific HA
deduction
Prediction
deduction
Conceptual design of study, choice
of format (experimental, nonexperimental) and classes of data
Evaluate theory/paradigm
Evaluate scientific HO, HA
Statistical HO
Statistical HA
Evaluate prediction
Evaluate statistical HO, HA
Analysis
Data
collection
Sampling or
experimental
design
The Popperian hypothetico-deductive method, after Underwood and others.
HO = null hypothesis
HA = alternative hypothesis
EXPLORATORY
DATA ANALYSIS
CONFIRMATORY
DATA ANALYSIS
How can I optimally describe or
explain variation in data set?
Can I reject the null hypothesis
that the species are unrelated to a
particular environmental factor or
set of factors?
Samples can be collected in many
ways, including subjective
sampling.
Samples must be representative of
universe of interest – random,
stratified random, systematic.
‘Data-fishing’ permissible, posthoc analyses, explanations,
hypotheses, narrative okay.
Analysis must be planned a priori.
P-values only a rough guide.
P-values meaningful.
Stepwise techniques (e.g. forward
selection) useful and valid.
Stepwise techniques not strictly
valid.
Main purpose is to find ‘pattern’ or
‘structure’ in nature. Inherently
subjective, personal activity.
Interpretations not repeatable.
Main purpose is to test hypotheses
about patterns. Inherently
analytical and rigorous.
Interpretations repeatable.
A well-designed modern ecological study combines both.
1) Two-phase study
- Initial phase is exploratory, perhaps involving
subjectively located plots or previous data to
generate hypotheses.
- Second phase is confirmatory, collection of
new data from defined sampling scheme,
planned data analysis.
- Large data set (>100 objects), randomly split
into two (75/25) – exploratory set and
confirmatory set.
2) Split-sampling
- Generate hypotheses from exploratory set
(allow data fishing); test hypotheses with
confirmatory set.
- Rarely done in ecology.
Data diving with cross-validation: an investigation of broadscale gradients in Swedish weed communities.
ERIK HALLGREN, MICHAEL W. PALMER and PER MILBERG.
Journal of Ecology, 1999, 87, 1037-1051.
Full data set
Remove observations
with missing data
Clean data set
Ideas for more
analysis
Exploratory
data set
Random
split
Hypotheses
Confirmatory
data set
Choice of
variables
Some
previously
removed
data
Combined
data set
Hypothesis
tests
Analyses for display
RESULTS
Flow chart for the sequence of
analyses. Solid lines represent
the flow of data and dashed
lines the flow of analysis.
EUROPEAN FOOD
(From A Survey of Europe Today, The Reader’s Digest Association
Ltd.) Percentage of all households with various foods in house at
time of questionnaire. Foods by countries.
GC ground coffee
IC instant coffee
TB tea or tea bags
SS sugarless sugar
BP packaged biscuits
SP soup (packages)
ST soup (tinned)
IP instant potatoes
FF frozen fish
VF frozen vegetables
AF fresh apples
OF fresh oranges
FT tinned fruit
JS jam (shop)
CG garlic clove
BR butter
ME margarine
OO olive, corn oil
YT yoghurt
CD crispbread
90
49
88
19
57
51
19
21
27
21
81
75
44
71
22
91
85
74
30
26
D
82
10
60
2
55
41
3
2
4
2
67
71
9
46
80
66
24
94
5
18
I
88
42
63
4
76
53
11
23
11
5
87
84
40
45
88
94
47
36
57
3
F
96
62
98
32
62
67
43
7
14
14
83
89
61
81
16
31
97
13
53
15
NL
94
38
48
11
74
37
25
9
13
12
76
76
42
57
29
84
80
83
20
5
B
97
61
86
28
79
73
12
7
26
23
85
94
83
20
91
94
94
84
31
24
L
Country
27
86
99
22
91
55
76
17
20
24
76
68
89
91
11
95
94
57
11
28
GB
72
26
77
2
22
34
1
5
20
3
22
51
8
16
89
65
78
92
6
9
P
55
31
61
15
29
33
1
5
15
11
49
42
14
41
51
51
72
28
13
11
A
73
72
85
25
31
69
10
17
19
15
79
70
46
61
64
82
48
61
48
30
CH
97
13
93
31
43
43
39
54
45
56
78
53
75
9
68
32
48
2
93
S
96
17
92
35
66
32
32
11
51
42
81
72
50
64
11
92
91
30
11
34
DK
96
17
83
13
62
51
4
17
30
15
61
72
34
51
11
63
94
28
2
62
N
98
12
84
20
64
27
10
8
18
12
50
57
22
37
15
96
94
17
64
SF
70
40
40
62
43
2
14
23
7
59
77
30
38
86
44
51
91
16
13
E
13
52
99
11
80
75
18
2
5
3
57
52
46
89
5
97
25
31
3
9
IRL
Dendrogram showing the results of minimum variance agglomerative cluster
analysis of the 16 European countries for the 20 food variables listed in the table.
Key:
Countries: A Austria, B Belgium, CH Switzerland, D West Germany, E Spain, F France, GB Great
Britain, I Italy, IRL Ireland, L Luxembourg, N Norway, NL Holland, P Portugal, S Sweden, SF Finland
Correspondence analysis of percentages of households in 16
European countries having each of 20 types of food.
Key:
Countries: A Austria, B Belgium, CH Switzerland, D West Germany, E Spain, F
France, GB Great Britain, I Italy, IRL Ireland, L Luxembourg, N Norway, NL
Holland, P Portugal, S Sweden, SF Finland
Minimum spanning tree fitted to the full 15-dimensional
correspondence analysis solution superimposed on a
rotated plot of countries from previous figure.
Percentages of people employed in nine different industry groups in Europe. (AGR = agriculture, MIN =
mining, MAN = manufacturing, PS = power supplies, CON = construction, SER = service industries, FIN =
finance, SPS = social and personal services, TC = transport and communications).
Country
Belgium
Denmark
France
W. Germany
Ireland
Italy
Luxembourg
Netherlands
UK
Austria
Finland
Greece
Norway
Portugal
Spain
Sweden
Switzerland
Turkey
Bulgaria
Czechoslovakia
E. Germany
Hungary
Poland
Romania
USSR
Yugoslavia
AGR
3.3
9.2
10.8
6.7
23.2
15.9
7.7
6.3
2.7
12.7
13
41.4
9
27.8
22.9
6.1
7.7
66.8
23.6
16.5
4.2
21.7
31.1
34.7
23.7
48.7
MIN
0.9
0.1
0.8
1.3
1
0.6
3.1
0.1
1.4
1.1
0.4
0.6
0.5
0.3
0.8
0.4
0.2
0.7
1.9
2.9
2.9
3.1
2.5
2.1
1.4
1.5
MAN
27.6
21.8
27.5
35.8
20.7
27.6
30.8
22.5
30.2
30.2
25.9
17.6
22.4
24.5
28.5
25.9
37.8
7.9
32.3
35.5
41.2
29.6
25.7
30.1
25.8
16.8
PS
0.9
0.6
0.9
0.9
1.3
0.5
0.8
1
1.4
1.4
1.3
0.6
0.8
0.6
0.7
0.8
0.8
0.1
0.6
1.2
1.3
1.9
0.9
0.6
0.6
1.1
CON
8.2
8.3
8.9
7.3
7.5
10
9.2
9.9
6.9
9
7.4
8.1
8.6
8.4
11.5
7.2
9.5
2.8
7.9
8.7
7.6
8.2
8.4
8.7
9.2
4.9
SER
19.1
14.6
16.8
14.4
16.8
18.1
18.5
18
16.9
16.8
14.7
11.5
16.9
13.3
9.7
14.4
17.5
5.2
8
9.2
11.2
9.4
7.5
5.9
6.1
6.4
FIN
6.2
6.5
6
5
2.8
1.6
4.6
6.8
5.7
4.9
5.5
2.4
4.7
2.7
8.5
6
5.3
1.1
0.7
0.9
1.2
0.9
0.9
1.3
0.5
11.3
SPS
26.6
32.2
22.6
22.3
20.8
20.1
19.2
28.5
28.3
16.8
24.3
11
27.6
16.7
11.8
32.4
15.4
11.9
18.2
17.9
22.1
17.2
16.1
11.7
23.6
5.3
TC
7.2
7.1
5.7
6.1
6.1
5.7
6.2
6.8
6.4
7
7.6
6.7
9.4
5.7
5.5
6.8
5.7
3.2
6.7
7
8.4
8
6.9
5
9.3
4
Source: Euromonitor (1979, pp.
76-7) with the percentage
employed in finance in Spain
reduced from 14.7 to the more
reasonable figure of 8.5
Correspondence
analysis
Correspondence
analysis
WHY DO MULTIVARIATE DATA ANALYSIS?
1: Data simplification and data reduction
'signal from noise'
2: Detect features that might otherwise escape attention.
3: Hypothesis generation and prediction.
4: Data exploration as aid to further data collection.
5: Communication of results of complex data.
Ease of display of complex data.
6: Aids communication and forces us to be explicit.
“The more orthodox amongst us should at least reflect that many of the same
imperfections are implicit in our own cerebrations and welcome the exposure
which numbers bring to the muddle which words may obscure”.
D Walker (1972)
7: Tackle problems not otherwise soluble. Hopefully better science.
8: Fun!
“General impressions are never to be trusted. Unfortunately
when they are of long standing they become fixed rules of life, and
assume a prescriptive right not to be questioned. Consequently
those who are not accustomed to original inquiry entertain a
hatred and a horror of statistics. They cannot endure the idea of
submitting their sacred impressions to cold-blooded verification.
But it is the triumph of scientific men to rise superior to their
superstitions, to desire tests by which the value of their beliefs
may be ascertained, and to feel sufficiently masters of themselves
to discard contemptuously whatever may be found untrue.”
Francis Galton
Quoted from Quotes, Damned Quotes and...
compiled by J Bibby Edinburgh: John Bibby
(Books)
TERMINOLOGY
Sample, object, individual 'sampling unit'
Statistician
Others
Single unit
Sampling unit
Sample
Collection of units
Sample
Sample set
Variable, character, attribute
Algorithms, methods, models, programs
Classification, clustering, partitioning, scaling, gradient analysis
[assignment, identification, discrimination]
[dissection]
Objective, repeatable
TYPES OF VARIABLES
1) Numeric, quantitative, continuous variables
2) Nominal and ordinal variables (qualitative multistate)
Nominal 'disordered multistate' (e.g. red, white, blue)
Ordinal 'ordered multistate' (e.g. dry, moist, wet)
3) Binary or dichotonous variables +/- (e.g. male, female)
4) Conditionally present variables
Only A & B have petals
A pink petals
B white petals
e.g. 3 species - A, B, C
A
B
C
Pink petals
+
-
-
White petals
-
+
-
No petals
-
-
+
5) Mixed data
nominal disordered
GEOMETRICAL MODELS
Pollen data - 2 pollen types x 15 samples
Variables
Depths are in
centimetres, and the
units for pollen
frequencies may be
either in grains
counted or
percentages.
Sample
1
2
3
4
5
6
7
Samples
8
9
10
11
12
13
14
15
Depth
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
Type A
10
12
15
17
18
22
23
26
35
37
43
38
47
42
50
Type B
50
42
47
38
43
37
35
26
23
22
18
17
15
12
10
Adam (1970)
ALTERNATE REPRESENTATIONS OF THE
POLLEN DATA
Palynological representation
Geometrical representation
In (a) the data are plotted as a standard diagram, and in (b) they are plotted using
the geometric model. Units along the axes may be either pollen counts or
percentages.
Adam (1970)
Geometrical model of a vegetation
space containing 52 records (stands).
A:
A cluster within the cloud of points (stands)
occupying vegetation space.
B:
3 dimensional abstract vegetation space: each
dimension represents an element (e.g.
proportion of a certain species) in the analysis
(X Y Z axes).
A, the results of a classification approach (here
attempted after ordination) in which similar
individuals are grouped and considered as a single
cell or unit.
B, the results of an ordination approach in which
similar stands nevertheless retain their unique
properties and thus no information is lost (X1 Y1 Z1
axes).
N. B. Abstract space has no connection with real
space from which the records were initially
collected.
CONCEPT OF SIMILARITY, DISSIMILARITY,
DISTANCE & PROXIMITY
Sij - how similar object i is object j
Proximity measure  DC or SC
Dissimilarity = distance
Convert sij  dij
sij  C  d ij where C is constant
d ij 
1  s 
ij
d ij  1  sij 
sij  1 
1  d ij 
BASIC CONCEPTS OF GRADIENT ANALYSIS
TYPES OF GRADIENT ANALYSIS
Relation of species to environmental variables or gradients.
Individualistic species responses
1.
Environmental gradient
2.
Environmental gradient
Gradient analysis
Bioindication
Biological community
Biological community
Gradient analysis (1) - methods for relating species composition to measured
or hypothetical environmental gradients. Techniques of classification and/or
ordination for many species. Techniques of regression for one or a few
species.
TYPES OF GRADIENTS
1.
Direct gradients: Influence organisms but are not utilised.
- correspond to ecological conditions (e.g. soil pH, temperature)
2.
Resource gradients: Utilised resources
- correspond to limiting resources (e.g. nitrogen)
3.
Indirect or complex gradients: Covarying direct and/or resource
gradients. Impossible to separate effects of single gradients.
- most observed gradients (e.g. altitude, geographical position, slope,
aspect).
UNDERLYING RESPONSE MODELS
A straight line displays the linear
relation between the abundance
value (y) of a species and an
environmental variable (x), fitted
to artificial data (●). (a = intercept;
b = slope or regression coefficient).
A Gaussian curve displays a
unimodal relation between
the abundance value (y) of
a species and an environmental variable (x). (u =
optimum or mode; t =
tolerance; c = maximum =
exp(a)).
SPECIES ALONG GRADIENTS
Hypothetical linear responses of species
abundance to an environmental gradient
Hypothetical unimodal responses of species
abundance to an environmental gradient
SPECIES RESPONSES ALONG GRADIENTS
Overlapping Gaussian unimodal curves of species responses to an environmental factor
Kent & Coker (1992)
SPECIES RESPONSES
Species generally have non-linear responses along environmental gradients.
(m)
J. Oksanen (2002)
LINEAR MODELS ARE CLEARLY INADEQUATE
The slope, sign, and significance depend on the studied range along the gradient.
J. Oksanen (2002)
Scatterplot of
abundance, measured
as cover, of four
species in relation to a
gradient. Each point is
a plot. Least-square
linear regression lines
and fitted envelope
lines are shown
McCune & Grace (2002)
UNIMODAL MODEL
The Gaussian unimodal curve of plant species response to a single environmental
factor and zones of tolerance.
Kent & Coker (1992)
GAUSSIAN OR UNIMODAL RESPONSE
Three interpretable parameters:
1. Location of optimum u on
gradient x
2. Expected height h at the
optimum
3. Width (tolerance) t of the
response
J. Oksanen (2002)
GAUSSIAN RESPONSE: A CASE OF GLM
Can be reparametrised as a generalised
linear model:
• Gradient as a 2nd degree polynomial.
• Logarithmic link function.
J. Oksanen (2002)
EVIDENCE FOR GAUSSIAN RESPONSE MODEL
• R.H. Whittaker described many response types: multimodal,
skewed, flat, plateaux, and symmetric.
• Only a small part of responses were regarded as symmetric, but
symmetric unimodal model became the standard response
model.
• First popularised in coenocline simulations (Gauch & Whittaker,
1972).
• Species packing model is the theoretical basis of (canonical)
correspondence analysis.
• Must be tested with field data.
SPECIES PACKING MODEL
Species have Gaussian responses and divide the gradient optimally:
•
Equal heights h
•
Equal widths t
•
Evenly distributed optima u
J. Oksanen (2002)
Assumes species evolve to occupy maximally separate niches with respect to
a limiting resource or gradient.
SHAPE OF RESPONSE MATTERS
Fundamental response may be symmetric, but realised response can be
skewed or even multimodal due to interactions.
Gradient
J. Oksanen (2002)
POPULAR SPECIES RESPONSE MODELS
• Gaussian response model: the most popular model that gives
symmetric responses, and it is the basis of much of theory of
ordination and gradient analysis.
• Beta response: able to produce responses for individual species
of varying skewness and kurtosis, and challenges the Gaussian
model.
• HOF response: a family of hierarchical models which can
produce skewed, symmetric, or different monotonic responses,
and can be used to analyse the response shape of each species.
• GAM models: can find any smooth shape and fit any kind of
smooth response to each species.
ZERO TRUNCATION PROBLEM
The zero truncation problem
A 'solid' response curve. Points
represent a species abundance
Beyond extremes of species tolerance, only zero values are possible. Once species
is absent, we have no information about how unfavourable the environment is for
the species.
FEATURES OF BIOLOGICAL
COMMUNITY DATA
1.
Many species.
2.
Many sites.
3.
Contain many zero values (absences).
4.
Non-zero values may be presences (1) or quantitative estimates or measures
of abundance (e.g. cover, density, frequency, biomass).
5.
Quantitative data are highly variable, invariably show a skewed distribution,
even the 'dust bunny' distribution.
6.
Biological data rarely, if ever, follow normal or log-normal distributions.
7.
Species generally have a non-linear hump-shaped relationship with the
environment.
8.
Relationships among species typically non-linear.
9.
Zero-truncation problem limits species abundance as a measure of
environmental favourability. When a species is absent, we have no
information about how unfavourable the environment is for that speices.
DUST BUNNY DISTRIBUTION
Bruce McCune & James B. Grace (2002) Analysis of Ecological Communities
McCune & Grace
(2002)
Dust Bunny distribution
Resembles the fluffy accumulations of dirt and dust that
are found in the corners of all but the tidiest spaces.
McCune & Grace
(2002)
Multivariate species can be imagined as an m-dimensional hyperspace. If
we extend the dust bunny to m dimensions, most sites will lie along the
corners of the m-dimensional space, just like the dust in the 3-dimensional
space of a room.
A 3-dimensional space has 2-D corners and 3-D corners,
An m-dimensional space has 2-, 3-, 4-, ... m-dimensional corners.
A site that contains 10 of 100 species in a site lies in a 90-dimensional
corner of the 100-dimensional space.
REAL DATA
Clearly raises major problems for data analysis nothing to do with standard statistical distributions!
COMMUNITY AND CONTINUUM CONCEPTS
Community concept
F.E. Clements
Discrete units, plant associations, alliances, etc
Plant sociology
Continuum concept
H.A. Gleason
Species distributed as a continuum in response to
variation in the environment in time and space
Individualistic concept of plant community
MODELS OF TAXA ALONG GRADIENTS
Kent & Coker
(1992)
COMMUNITY - UNIT THEORY
R.H. Whittaker (1956) Ecological Monographs 26: 1-80
Kent & Coker (1992)
Topographic distribution of vegetation types on an idealised west-facing mountain
and valley in the Great Smokey Mountains, USA.
Community - unit - vegetation of an area is distributed as a mosaic, so-called
climax pattern.
Broadly similar environmental factors (both abiotic and biotic) occur and repeat
themselves in an area. Not all areas can be placed in one or other
type. Boundaries may be distinct or vague (ecotones).
60 - 80% can be unambiguously assigned to particular vegetation types.
20 - 40% can not be assigned because they are transitional or ecotonal areas
between types.
Community - unit  vegetation - landform unit. Features of landscape.
Noda - reference points in a continuum of either geographical or environmental
space.
Continuum of species responses in environmental space or gradients. Features
of underlying environment.
THEORETICAL PROPOSITIONS
Gauch and Whittaker (1972) Coenocline simulations. Ecology 53: 446-451
1.
Response curves are generally of Gaussian form, with a single
peak or mode, symmetric, and gradually tapering tails.
2.
Modes of 'minor' species have a uniform random distribution
along gradients.
3.
Modes of 'major' or dominant species are evenly distributed.
4.
Abundance at the mode has either a log-random or log-normal
distribution.
5.
Aspects of alpha-diversity (plot diversity) for groups of species
defined by structural form, have unimodal trends along
environmental gradients.
FIELD DATA TESTS
Minchin (1989) Vegetatio 83: 97-110
100 species, Mt Field, Tasmania, 438 quadrats at 100m intervals along 17
west-east traverses. Recorded altitude and soil drainage.
1.
Responses for 52 commonest species (HOF models)
Skewed
11
Symmetric unimodal 22
Plateau
4
Monotonic
12
No fit
3
HOF RESPONSE MODELS

M
[1  exp( a  bx)].[1  exp( c  dx)]
Huisman-Olff-Fresco: a set of five
hierarchic models with different shapes.
J. Oksanen (2002)
HOF: INFERENCES ABOUT RESPONSE SHAPES
• Alternative models differ only
in response shape.
• Selection of parsimonious
model with statistical criteria.
• 'Shape' is a parametric concept,
and parametric HOF models
may be the best way of
analysing differences in
response shapes.
J. Oksanen (2002)
FIELD DATA (continued)
2.
Modes of 'minor' species have uniform random distribution. Rejected for
full set of species, but supported for all but one of the structural groups
(trees, shrubs, herbs, graminoids, pteridophytes). Rejected for herbs.
3.
Modes of 'major' species evenly distributed. Rejected. Actually were
randomly distributed.
4.
Abundances at mode are log-random or log-normal. Rejected for full set
of species, but supported for all but one of the structural groups
(shrubs).
5.
Total alpha diversity has complex pattern but is unimodal for all
structural groups.
Conclusions - simple Gauch and Whittaker coenocline a remarkably
good approximation to 'reality'.
Five simulated coenoclines.
A-D have species turnovers
of 1.1, 2.2, 4.4, and 8.9
half-changes. E is identical
to D except that sampling
errors have been introduced.
Gauch (1982)
REAL WORLD (ALMOST!)
•
In Danish beech forests, dominant
species skew other species away.
•
Austin predicted skewed responses
at gradient ends.
•
A measured environmental gradient
would be nice instead of an indirect
DCA axis of hypothetical latent
variation.
•
60% responses symmetric, 29%
skewed, 8% monotonic, 3% no fit.
Lawesson & Oksanen (2002) J. Veg.
Sci. 13: 279-290
J. Oksanen (2002)
GRADIENTS AND LANDSCAPES
Austin & Smith (1989)
Patterns of co-occurrence of four species on a landscape along an indirect environmental
gradient altitude; note continuous variation of composition along altitude gradient. A
plant community is a landscape concept and recognition of communities depends on the
frequency of environmental combinations in a particular landscape.
Important distinction between geographical distribution and environmental
distribution.
In geographical space, have species associations A, AB, B, C, and D with BC and
CD as 'ecotones'. These associations are a consequence of the spatial pattern of
the landscape.
In environmental space, find a continuum of species A, B, C, and D replacing
each other with increasing altitude.
Communities or associations are a function of the landscape examined.
Continuum concept applies to the environmental space, and not necessarily
to geographical distance on the ground or to any indirect or complex
environmental gradient.
Community concept and continuum concept not alternatives but are features
of vegetation viewed in different ways - geographical space or environmental
space.
DIVERSITY COMPONENTS
R.H. Whittaker suggested several concepts of diversity:
- : diversity on a sample plot, or 'point' diversity (or within-habitat
diversity)
- : diversity along ecological gradients (or between-habitat diversity).
Differentiation diversity. Many meanings - poorly understood. Cannot
be estimated unless there are known environmental gradients or the
underlying latent structure of the data has been recovered.
- : diversity among parallel gradients or classes of environmental
variables. Product of -diversity of communities and -differentiation
among them.
- : the total regional diversity of an area: sum of all previous
components. Applicable to broad biogeographical areas. 'Species pool'.
In practice,  and  diversities are rarely distinguished.  is often used to
designate the total diversity of a landscape, geographical area, or
island.
MANY FACES OF BETA DIVERSITY
What are we talking about when we are talking about beta diversity?
1.
General heterogeneity of a community.
2.
Change of similarity with gradient separation.
3.
Widths of species responses along gradients.
4.
Rate of change in community composition along gradients. Requires
environmental gradients to be measured or the underlying latent
structure of the data to be recovered.
5.
Degree of contrast in composition among samples of a set taken from
a landscape.
6.
Dissimilarity between sets of samples.
7.
Extent of differentiation of communities along habitat gradients.
APPLICATIONS OF BETA DIVERSITY IN
GRADIENT ANALYSIS
1.
Direct gradient - -diversity is the amount of change in species
composition along a directly measured gradient in environment
or time.
2.
Indirect gradient - -diversity is the length of a presumed
environmental or temporal gradient as measured or estimated
by the species assemblage.
3.
No specific gradient - -diversity measures compositional
heterogeneity without reference to a single gradient.
CLASSIFICATION AND (?OR) ORDINATION
Traditionally classification associated with the discontinuum or
community concept of vegetation and ordination with the continuum
concept.
Reflects the history of the methods but is the distinction valid?
Approaches are complementary: choice depends on the aim of the
study.
Vegetation mapping -
classification is necessary.
Fine-scale studies -
ordinations are necessary to find repeatable
patterns and discontinuities in composition.
Not a case of Classification or Ordination, but Classification and/or
Ordination.
COURSE TOPICS
Introduction
Lecture 1
Data preparation and formatting
Practical 1
Classification
Lecture 2
Practical 2
Ordination (indirect gradient analysis)
Lecture 3
Practical 3
Constrained ordination
(direct gradient analysis)
Lecture 4
Practical 4
Hypothesis testing
Lecture 5
Practical 5
Overview
Lecture 5
-