Descriptive Statistics - University of Alberta
Download
Report
Transcript Descriptive Statistics - University of Alberta
Choosing and using
statistics to test ecological
hypotheses
Botany 332 Lab Tutorial
Department of Biological Sciences
University of Alberta
November 2004
OBSERVATIONS
Patterns in space or time
MODELS
Explanations or theories
Reject Ho (Null
Hypothesis)
Support
hypothesis and
model
HYPOTHESIS
Predictions based on model
NULL HYPOTHESIS
Logical opposite to hypothesis
Retain Ho (Null
Hypothesis)
Refute hypothesis
and model
EXPERIMENT
Critical test of null hypothesis
INTERPRETATION
Underwood (1997)
Ecological experiments
1.
2.
3.
4.
5.
6.
7.
8.
OBSERVE things.
Come up with MODELS (explanations or theories) to
explain your observations.
Based on your model, come up with a testable
HYPOTHESIS (and a NULL hypothesis).
Design an EXPERIMENT to test your null hypothesis
statistically.
Conduct the experiment and collect DATA.
Use STATISTICS with your data to TEST the null
hypothesis.
INTERPRET your results. Did you accept or reject the
null hypothesis?
Repeat!
Testing (null) hypotheses
statistically
Recall we can’t prove our hypothesis, so
we try to disprove a null hypothesis
instead!
Null hypothesis = opposite of our actual
hypothesis
– H0 = Null Hypothesis
– HA = Alternative hypothesis
Testing (null) hypotheses
statistically
We formally test hypotheses using
statistics
Which statistical test to use? Depends on
your experimental design, data and your
hypotheses
It’s important to understand the basics of
statistical hypothesis testing
Testing (null) hypotheses
statistically
Based on assumptions about the data, statistics
tell us the probability that the null hypothesis is
true (P-value).
If P is small enough, we can reject the null
hypothesis (result is “statistically significant”).
What’s “small enough”?
– P < 0.05
Reject null hypothesis (accept our hypothesis)
– P > 0.05
Accept null hypothesis (reject our hypothesis)
Testing (null) hypotheses
statistically
Many statistical methods also tell us the effect
size or proportion of variation in the independent
variable explained by the dependent variable.
e.g. Regression and correlation
– P-values
H0 = No relationship between variables
HA = Relationship between variables
– R2 (variation explained)
– Can have significant P-values but very small R2
Choosing and using statistics
Determine what kinds of data you have
Describe your data
Choose an appropriate statistical test
Perform the test
Report and interpret the results
What kinds of data do you have?
Categorical
– Fertilizer addition, species identity
Continuous and discrete
– Biomass, height, number of bites
Independent and Dependent variables
Describe your data
Measures of central tendency
– Mean, median
Measures of dispersion
– Variance, standard deviation, standard error,
range, quartiles
Descriptive Statistics – Visual Aids
Boxplots
- median, upper and lower quartiles,
whiskers (fences), outliers
Mean # of seeds/pod
30
20
54
10
0
-10
N=
44
37
Out
In
Treatment
In
Out
8
7
6
Frequency
- separate, stackbar, or paired
4
Frequency
Histograms
6
5
3
2
1
0
0.0
4.0
2.0
8.0
6.0
12.0
10.0
16.0
14.0
20.0
18.0
4
2
0
1.0
24.0
22.0
Mean # of seeds/pod
Bar Plots
5.0
4.0
7.0
6.0
9.0
8.0
Mean # of seeds/pod
Mean # of seeds/pod
Error
3.0
2.0
26.0
16
14
12
10
8
6
4
2
N=
37
44
IN
OUT
Treatment
11.0
10.0
13.0
12.0
Describe your data
Normal vs. non-normal distributions
– histograms, Q-Q plots, K-S test (significant
means non-normal)
Data transformation
If your data are non-normal
– Use non-parametric statistics
– Transform your data
square-root transform
log transform
Choose your statistical test
Choose statistical tests based on your
hypothesis, experimental design and the
data you have collected
Parametric tests assume data are normal,
non-parametric tests do not
Many textbooks have recipes or flowcharts
for choosing statistics
Check with your TA’s
Common statistical tests
Chi-squared test
t-test (Mann-Whitney U test)
One-way ANOVA (Kruskal-Wallis test)
Two-way ANOVA
ANCOVA
– ANOVA with covariate
Correlation and regression
Chi-squared test
For analysis of tables of counts or frequencies
Good with categorical variables
Non-parametric
# plants
Germinated
Not
Germinated
Outcrossed
14
10
Inbred
6
10
t-test
For analysis of categorical independent
variable (2 categories) and a continuous
dependent variable
Samples may be paired (measurements on
same individual) or independent
(measurements on two sets of individuals)
Assumes data are parametric
(non-parametric – Mann-Whitney U)
ANOVA
Analysis of Variance examines variation
within and between groups
For analysis of categorical independent
variables (2 or more categories) and a
continuous dependent variable
Assumes data are parametric
(non-parametric – Kruskal-Wallis)
ANOVA
One-way ANOVA
– Single independent variable
– Main effect
Two-way ANOVA
– Two independent variables
– Main effects and interaction terms
Significant result means at least one group
differed from another
Use post-hoc tests to test for differences among
individual treatments
ANCOVA
Analysis of Covariance
For analysis of categorical independent
variables (2 or more categories), a
continuous dependent variable, and a
covariate
Effects of covariate removed before
testing for effect of independent
variable(s)
Correlation and regression
Tests for relationships between two (or
more) continuous variables
Important to consider both significance
(P-value) and effect size (R2)
Report statistical results
What’s important?
– Test used and assumptions tested
– Test statistic (t, F, R2, χ2, etc.)
– Significance (P-value)
– Sample size / degrees of freedom
How to report results?
– Text
– Figures
– Tables
140
ANOVA, F = 1.8, df = 1,83
120
P = 0.17
13
Number of flowers per plant
100
80
60
40
20
0
-20
N=
Treatment
46
39
IN
OUT
Interpret your results
Remember to relate results/tests to your
original hypotheses
Correlation ≠ causation
(P > 0.05) ≠ bad
Recognize trends even when not
statistically significant
Talk to your TAs if you have any questions
SPSS walkthrough
Data entry and transformation
Descriptive statistics
Creating figures
Analyses
– Chi-square (inbreeding data)
– t-test / ANOVA (inbreeding data)
– ANCOVA (tomato data)
– Correlation and regression (inbreeding data)