Diversity_Index_and_Chi-Squared_Tutorial
Download
Report
Transcript Diversity_Index_and_Chi-Squared_Tutorial
Diversity and Distribution of Species
Calculating the index of diversity for species in a
sample and comparing the distribution of species
between samples from two sites
Core Quantitative concepts and skills
Statistics: Simpson’s index of diversity and the
chi-squared test
Prepared for SSAC by
*David McAvity – The Evergreen State College*
© The Washington Center for Improving the Quality of Undergraduate Education. All rights reserved. *2007*
1
Measure of Diversity
The number of different methods for determining the diversity of species. The key idea is to
balance two important components of diversity. One is the richness of the sample and the
other is the evenness of the sample. The richness of a sample is the number of different
types of organisms (species, or other category) present in the sample. The evenness refers
to the balance between number of each type: The following examples illustrate these
concepts:
Consider the following data giving the number of different flowers contained in a 10 square
meter plot from three different lawns, A, B and C.
lawn flower
daisy
dandelion
clover
butter cup
thistle
total
plot A
plot B
10
20
19
8
3
60
plot C
35
42
23
100
47
4
6
1
2
60
Plot A and C are both richer than plot B because they have five species present and plot B
has only three. However, A is more even than C, because the numbers of organisms are more
evenly distributed between the species in A. Plot C has a much larger proportion of daisies
and very few of the others. Notice how plot B actually has more organisms in total.
2
Diversity Index
One way to quantify the diversity in a sample is to think about the likelihood of getting two of
the same species if you removed two organisms at random from the sample. If the sample is
very diverse you would be unlikely to get two of the same type of organism. In plot C from
the previous example you would be very likely to get two daisies if you randomly chose two,
whereas in plot A, this would be less likely.
Simpson’s index of diversity is the probability of getting two different species when picking
two organisms from your sample. This is one minus the probability of getting two of the same
kind. For example, if you have 4 red and 3
4 3
blue balls in a bag. The probability of getting two reds when drawing out two is 7 6 and
3 2
the probability of getting two blues is . Notice that each of these has the form:
7 6
n n 1 n(n 1)
N N 1 N ( N 1) where n is the number of balls of a particular color and N is the total
number of balls. The probability of getting two of a kind is the sum of these terms for
each type of bal. This can be expressed as
diversity to be:
D 1
n(n 1)
N ( N 1)
. So we define Simpson’s index of
n(n 1)
N ( N 1)
3
Calculating Simpson’s Index of Diversity
Simpson’s index of diversity is
D 1
n(n 1)
N ( N 1)
Where n number of organisms of each type, and N is the total number of organisms. Since
this is a probability the values should range between 0 and 1, with 1 being the most diverse
and 0 being the least.
Use a spreadsheet to calculate Simpson’s index of diversity using the data from Plot A
shown in yellow cells below. You should enter a formula to duplicate the results in the peach
cells.
lawn flower
daisy
dandelion
clover
butter cup
thistle
total
Index of Diversity
n
n(n-1)
10
20
19
8
3
60
0.75
90
380
342
56
6
874
= cell with a number in it
= cell with a formula in it
Repeat this calculation for the other two samples of data in the previous previous example to
see if this measure of diversity fits our intuition about what diversity is.
4
Chi-squared test for the distribution of species
Even when two sites have a similar diversity the species may have different relative
abundances. For examples, if we have two bags, one with 3 red balls and 8 blue balls and
another with 8 red balls and 3 blue balls, the diversity of the two bags will be the same, but
the distribution is different. To test if two sites have a significantly different distribution of
species we do a chi-squared test.
For this example we will compare plots A and B. The first thing to do is to group the data
into categories so that frequency of each type is greater than or equal to 5
lawn flower plot A
plot B
daisy
10
35
dandelion
20
42
clover
19
23
butter cup
8
thistle
3
total
60
100
lawn flower plot A
plot B
daisy
10
35
dandelion
20
42
other
30
23
total
60
100
5
Chi-squared test for the distribution of species
Now we form the row totals to find out how many of each species are present in both the
plots together. To see if there is a difference in the distribution of species between the two
plots we calculate the chi-squared statistic with the null hypothesis that there is no
difference. If there were no difference we would expect that the distribution in each plot
would be the same as the over all distribution. In the example below the proportion of
daisies in the overall total is 45:160 or about 28%. This means we would expect 28% of the
flowers in Plot A to be daisies and 28% of flowers in Plot B to be daisies. 28% of of 60 is
16.9 and 28% of 100 is 28. In general the entry in any cell in the table for expected
frequencies is (row total)(column total)/(grand total). Complete the following expected
frequency tables as shown below by finding formulas for each of the peach cells.
Observed
lawn flower plot A
plot B
Total
daisy
10
35
45
dandelion
20
42
62
other
30
23
53
total
60
100
160
Expected
lawn flower plot A
plot B
Total
daisy
16.875 28.125
45
dandelion
23.25
38.75
62
other
19.875 33.125
53
total
60
100
160
6
Calculating chi-squared.
Once you have your expected frequencies you calculate chi-squared using the formula.
2
(O E ) 2
E
Where O is the observed frequency of each cell and E is the expected frequency. Once you have
your chi-squared value you need to calculate the degrees of freedom. In a contingency table this is
given by (N-1)(M-1) where N is the number of rows and M is the number of columns. In our example b
there are 3 rows and 2 columns so the degrees of freedom are 2. Finally we compare our value of chisquared to the critical value at the 0.05 level of significance and if our value is less than the critical
value we cannot reject the null hypothesis – ie the two sites have a similar distribution. However if our
value of chi-squared is greater than the critical value we can reject the null hypothesis ie we can say
the sites have a different distribution. Critical values of chi-squared are given on the next page.
In each of the cells below calculate (O-E)2/E then find the grand total, which is the value of chisquared. Here the grand total is 13.46. The critical value of chi-squared with two degrees of freedom
at the 0.05 level of significance is 5.99, since ours is greater we can say that the two plots have a
significantly different distribution at that level of significance. Now compare plot A and plot C in the
same way.
Chi-squared
lawn flower plot A
daisy
2.8009
dandelion
0.4543
other
5.158
total
8.4132
plot B
1.6806
0.2726
3.0948
5.0479
Total
4.4815
0.7269
8.2528
13.461
7
Critical Values of Chi-squared
Chi-Square Table
0.050
0.010
0.001
df
1
2
3
4
3.84146
5.99147
7.81473
9.48773
6.63490
9.21034
11.3449
13.2767
10.828
13.816
16.266
18.467
5
6
7
8
9
11.0705
12.5916
14.0671
15.5073
16.9190
15.0863
16.8119
18.4753
20.0902
21.6660
20.515
22.458
24.322
26.125
27.877
10
11
12
13
14
18.3070
19.6751
21.0261
22.3621
23.6848
23.2093
24.7250
26.2170
27.6883
29.1413
29.588
31.264
32.909
34.528
36.123
8