Transcript document
Inference in Biology
BIOL4062/5062
Hal Whitehead
• What are we trying to do?
• Null Hypothesis Significance Testing
• Problems with Null Hypothesis
Significance Testing
• Alternatives:
– Displays, confidence intervals, effect size
statistics
– Model comparison using information-theoretic
approaches
– Bayesian analysis
• Methods of Inference in Biology
What are we trying to do?
• Descriptive or exploratory analyses
• Fitting predictive models
• Challenging research hypotheses
What are we trying to do?
• Descriptive or exploratory analyses
– What factors influence species diversity?
• Fitting predictive models
– Can we make global maps of species
diversity?
• Challenging research hypotheses
– Is diversity inversely related to latitude?
The traditional approach:
Null Hypothesis Significance Testing
•
•
•
•
•
Formulate null hypothesis
Formulate alternative hypothesis
Decide on test statistic
Collect data
What is probability (P) of test statistic, or more
extreme value, under null hypothesis?
• If P<α (usually 0.05) conclude:
– Reject null in favour of alternative
• If P>α conclude:
– Do not reject null hypothesis
Null Hypothesis Significance Testing
An example
• Formulate null hypothesis
– “Species diversity does not change with latitude”
• Formulate alternative hypothesis
– “Species diversity decreases with latitude”
• Decide on test statistic
– Correlation between diversity measure and latitude, r
• Collect data
– 405 measures of diversity at different latitudes
• What is probability (P) of test statistic, or more extreme
value, under null hypothesis?
– r = -0.1762; P = 0.002 (one-sided)
• If P<α (usually 0.05) conclude:
– Reject “Species diversity does not change with latitude”
Criticisms of: Null Hypothesis
Significance Testing (1)
• α is arbitrary
• Most null hypotheses are false, so why test them?
• Statistical significance is not equivalent to biological
significance
– with large samples, statistical significance but not biological
significance
– with small samples, biological significance but not statistical
significance
• If statistical power is low, the null hypothesis will usually
not be rejected when false
• Encourages arbitrary inferences when many tests carried
out
Criticisms of: Null Hypothesis
Significance Testing (2)
• Power analysis does not save NHST
– arbitrary, confounded with P-value
– “vacuous intellectual game” (Shaver 1993)
• Incomplete reporting and publishing
– only report statistically significant results
– only publish statistically significant results
• Focussing on one null and one alternative
hypothesis limits scientific advance
• Emphasis on falsification obscures uncertainty
about “best” explanation for phenomenon
Misuse of: Null Hypothesis
Significance Testing
• Failure to reject null hypothesis does not
imply null is true
• Probability of obtaining data given null
hypothesis is not probability null hypothesis
is true
• Poor support for null hypothesis does not
imply alternative hypothesis is true
Statistical significance
Practical
importance of
observed
difference
Not significant
Significant
Not important
Happy
Annoyed
Very sad
Elated
Important
Johnson (1999) “The insignificance of statistical
significance testing”J. Wild. Manage.
Statistical significance
Practical
importance of
observed
difference
Not significant
Significant
Not important
n OK
n too large
n too small
n OK
Important
Johnson (1999) “The insignificance of statistical
significance testing”J. Wild. Manage.
Null Hypothesis Significance Testing:
• “no longer a sound or fruitful basis for statistical
investigation” (Clarke 1963)
• “essential mindlessness in the conduct of research”
(Bakan 1966)
• “In practice, of course, tests of significance are not
taken seriously” (Guttman 1985)
• “simple P-values are not now used by the best
statisticians” (Barnard 1998)
• “The most common and flagrant misuse of
statistics... is the testing of hypotheses, especially
the vast majority of them known beforehand to be
false” (Johnson 1999)
“The problems with Null
Hypothesis Significance Testing
are so severe that some have
argued for it to be completely
banned from scholarly journals”
Denis (2003) Theory & Science
Alternatives to:
Null Hypothesis Significance
Testing
• Displays, confidence intervals, effect size
statistics
• Model comparison using informationtheoretic approaches
• Bayesian statistics
Diversity and latitude
6
5.5
• r = -0.1762; P = 0.002
5
• r = -0.1762;
95% c.i.:
-0.2690; -0.0801
Diversity
4.5
4
95%
c.i.
3.5
3
2.5
2
1.5
1
0
10
20
30
Latitude
40
50
60
70
Diversity and latitude:
Maybe by focussing on the diversity-latitude
hypothesis, we have missed the real story
4
6
5.5
3.5
5
3
Diversity
4
3.5
3
2.5
2.5
2
2
1.5
1.5
1
0
10
20
30
Latitude
40
50
60
70
1
5
10
15
20
SST
25
30
35
5
4.5
Atlantic
Pacific
4
Galápagos
Gully
Other
4.5
4
3.5
3.5
3
Diversity
Diversity
Diversity
4.5
2.5
3
2.5
2
2
1.5
1.5
1
5
1
0.5
10
15
20
SST
25
30
35
5
10
15
20
SST
25
30
35
Effect Size Statistics
• indicate the association that exists between two or
more variables
– Pearson’s r correlation coefficient (or r2)
• for two continuous variables
– Cohen’s d
• for one continuous, one two-level category (t-test)
– Hedges’ g
• better than d when sample sizes are very different
– Cohen’s f2
• for one continuous, one multi-level category (F-test)
– Cramer’s φ
• for two categorical variables (Chi2 test)
– Odds ratio
• for two binary variables
Cohen’s d
d = Difference between means of two groups
Pooled standard deviation
• d = 0.2 indicative of a small effect size
• d = 0.5 a medium effect size
• d = 0.8 a large effect size
Problems with effect size
statistics
• No serious problems
• But they don’t tell the whole story
Model fitting:
How can we best predict diversity?
4
6
5.5
3.5
5
3
Diversity
4
3.5
3
2.5
2.5
2
2
1.5
1.5
1
0
10
20
30
Latitude
40
50
60
70
1
5
10
15
20
SST
25
30
35
5
4.5
Atlantic
Pacific
4
Galápagos
Gully
Other
4.5
4
3.5
3.5
3
Diversity
Diversity
Diversity
4.5
2.5
3
2.5
2
2
1.5
1.5
1
5
1
0.5
10
15
20
SST
25
30
35
5
10
15
20
SST
25
30
35
Some models of diversity
SST = Sea Surface Temperature lat = Latitude
Ocean = Atlantic /Pacific
area = Ocean area (categorical)
constant
SST
SST, SST2
SST, SST2, SST3
lat
lat, lat2
lat, lat2, lat3
SST, SST2, lat
SST, SST2, lat, lat2
SST, SST2, lat, lat2, lat3
ocean
SST, SST2, ocean
area
SST, SST2, area
Which model is best?
Model:
Residual sum of squares
constant
0.854
SST
0.774
SST, SST2
0.724
SST, SST2, SST3
0.726
lat
0.835
lat, lat2
0.804
lat, lat2, lat3
0.785
SST, SST2, lat
0.725
SST, SST2, lat, lat2
0.722
SST, SST2, lat, lat2, lat3 0.724
ocean
0.844
SST, SST2, ocean
0.725
area
0.831
SST, SST2, area
0.723
Parameters
2
3
4
5
3
Lowest RSS
4
but many
5
parameters
5
6
7
3
5
4
6
Which model is best?
• Information-theoretic AIC
– Akaike Information Criterion
• A measure of the similarity between the
statistical model and the true distribution
• Trades off the complexity of a model
against how well it fits the data
Which model is best?
Model:
constant
SST
SST, SST2
SST, SST2, SST3
lat
lat, lat2
lat, lat2, lat3
SST, SST2, lat
SST, SST2, lat, lat2
SST, SST2, lat, lat2, lat3
ocean
SST, SST2, ocean
area
SST, SST2, area
RSS Parameters
0.854
2
0.774
3
0.724
4
0.726
5
0.835
3
0.804
4
0.785
5
0.725
5
0.722
6
0.724
7
0.844
3
0.725
5
0.831
4
0.723
6
AIC
-61.08
-99.81
-125.54
-123.64
-69.09
-83.19
-92.19
-124.10
-125.05
-123.05
-64.88
-124.27
-69.77
-124.59
Lowest AIC:
Best Model
How much support for different models?
Model:
constant
SST
SST, SST2
SST, SST2, SST3
lat
lat, lat2
lat, lat2, lat3
SST, SST2, lat
SST, SST2, lat, lat2
SST, SST2, lat, lat2, lat3
ocean
SST, SST2, ocean
area
SST, SST2, area
AIC
-61.08
-99.81
-125.54
-123.64
-69.09
-83.19
-92.19
-124.10
-125.05
-123.05
-64.88
-124.27
-69.77
-124.59
ΔAIC
64.46
25.73
0.00
1.90
56.45
42.35
33.35
1.45
0.49
2.49
60.66
1.27
55.77
0.96
How much support for different models?
Model:
constant
SST
SST, SST2
SST, SST2, SST3
lat
lat, lat2
lat, lat2, lat3
SST, SST2, lat
SST, SST2, lat, lat2
SST, SST2, lat, lat2, lat3
ocean
SST, SST2, ocean
area
SST, SST2, area
AIC
-61.08
-99.81
-125.54
-123.64
-69.09
-83.19
-92.19
-124.10
-125.05
-123.05
-64.88
-124.27
-69.77
-124.59
ΔAIC
64.46
25.73
0.00
1.90
56.45
42.35
33.35
1.45
0.49
2.49
60.66
1.27
55.77
0.96
No support
No support
Best model
Some support
No support
No support
No support
Some support
Some support
Little support
No support
Some support
No support
Some support
Relative importance of variables
from AIC
SST
SST2
SST3
lat
lat2
lat3
ocean
area
1.000
1.000
0.211
0.398
0.280
0.075
0.128
0.141
Best model of diversity:
Diversity = 0.293 + 0.261SST - 0.00614SST2
4
3.5
Diversity
3
2.5
2
1.5
1
5
10
15
20
SST
25
30
35
Global pattern of diversity
apply equation to global SST map
Global pattern of diversity
apply equation to SST predictions
from global circulation models
Advantages and criticisms of
information-theoretic model-fitting
• Indicates “best” model and
support for other models
• Can compare very different
models
• Balances complexity of
model against fit
• Produces predictive models
• Fairly simple
mathematically and
computationally
• Model averaging
• Philosophical basis
“nuanced”
• Which models to
consider is
subjective
Bayesian Analysis
• Given prior distribution of models or model
parameters
• Collect data
• Work out probability of data for each model
and combination of model parameters
• Work out posterior distribution of models
or model parameters
– using Bayes’ theorem
Bayes’ Theorem
Posterior probability of model given data =
Probability of data given model X Probability of model
Probability of data
Bayesian Analysis
• So, Bayesian analysis gives:
– the probability of models or parameters
given prior knowledge and data
– very nice!
– but may need considerable computation
Example of Bayesian Analysis
• Trying to work out survival rate of newly
studied species of rodent
• Ten other species in genus have mean
survival per year of 0.72 (SD 0.13)
• Of 20 animals marked, 17 survive for 1 year
• Standard (binomial) estimate of survival =
0.850 (95% c.i. 0.621 - 0.968)
• Bayesian estimate of survival =
0.797 (95% c.i. 0.637 - 0.921)
Advantages and Difficulties with
Bayesian Analysis
• Philosophically very nice
• Gives probability of
model given data and prior
information
• Updates estimates as more
information becomes
available
• Does not give biologically
implausible estimates
– e.g. survival >1
• Fits adaptive management
paradigm
• Choice of priors
somewhat arbitrary
• Bayesian analysis with
“uninformative priors”
gives similar results to
simpler methods
• Complex
• Computation can be
VERY time consuming
and opaque
Methods of Inference in Biology
• Descriptive or exploratory analyses
–
–
–
–
Displays, confidence intervals, effect size statistics
Model comparisons using AIC, etc
Bayesian analysis (if prior information)
Null hypothesis significance tests?
• Fitting predictive models
– Model comparisons using AIC, etc
– Bayesian analysis (if prior information)
• Challenging research hypotheses
– Model comparisons using AIC, etc
– Null hypothesis significance tests
This class
• Displays, confidence intervals, effect
size statistics ***
• Model comparisons using AIC, etc **
• Bayesian analysis
• Null hypothesis significance tests *