Transcript Slide 1
Outline
Some field sampling issues
Overview of approach to understanding a system
Example 1 – KPBS
Example 2 – Xoo
Example 3 – SIR model
Through approach again with these three examples
Karen A. Garrett
Kansas State University
Cercospora apii infects both
humans and… celery
Phytophthora infestans,
an oomycete
Rust fungi
Wheat curl mite, vector of Wheat streak mosaic virus
Designed experiments vs.
observational experiments
• Designed experiments generally have a
more straightforward analysis
• Observational experiments rely more on
correlation, so that interpreting causality
may be more difficult
• Many experiments in disease ecology
have some designed elements and some
observational elements
Ratio of Phaeosphaeria nodorum to Mycosphaerella
graminicola compared to sulfur dioxide emissions
Bearchell et al. 2005 PNAS
Defining an inference space
• The inference space of an experiment is the
group to which the experimental conclusions can
be correctly applied
• The pool from which the experimental units are
randomly drawn will clearly be part of the
inference space
• Logic outside statistical inference may be used
to extend results to broader set of units
• Definition of this space allows definition of the
appropriate experimental unit
Pseudoreplication
• Pseudoreplication occurs when repeated
observations of a subject are substituted for
replicated applications of a treatment on different
subjects
• In general, if it seems that the number of replicates
can be increased indefinitely by splitting samples in
increasingly smaller units, these are probably
pseudoreplicates
• “What is objectionable is when the tentative
conclusions derived from unreplicated treatments are
given an unmerited veneer of rigor by the erroneous
application of inferential statistics” - Hurlbert
Classic example of
pseudoreplication
Pseudoreplicates
True replicates
Scenario in which an individual mite is the appropriate experimental unit
Pseudoreplication – ex 2 –
pseudoreplication in spatial samples
Suppose that a treatment has been applied at the larger scale
Pseudoreplication
Suppose there is no treatment
application or experimental design?
• Defining pseudoreplication in an
observational study is more challenging
• The variance associated with sampling at
different spatial scales or across different
types of groups of individuals can be
compared to determine what are the
largest sources of variation
Important note about correlation
and newer statistical packages
• Recall the standard assumption for typical
analyses of variance that observations are
independent
• In the past, and sometimes in the present,
people might disregard the possibility of using
packages like SAS Proc GLM because they
knew their samples were not truly independent
• Newer programs like SAS Proc Mixed (and
programs in R?) make it easier to specify more
complicated correlation matrices for the errors in
an analysis of variance
Statistical power
• Statistical power: the probability of detecting treatment
effects that really exist
• Scientists have tended to emphasize controlling the Type
I error rate (the probability of designating an effect as
“significant” when it is not real) rather than maximizing
power
• This seems to be based on the idea that journals should
not be cluttered with reports of a lot of effects that are not
real
• However, if you want to manage a disease, discarding
an effect because the associated p-value is greater than
0.05 may lead you to leave out important effects
• Real effects may be difficult to detect because of noise
• Sensitivity analyses can be used to explore the
implications of removing an effect when it is actually real
Parsimony
• On the other hand, parsimony is a good
general goal
• Statistical models need to strike a balance
to avoid leaving out important predictors
and also to avoid overparameterizing
• Mechanistic models can be applied to
explore the potential impacts of many
predictors
Statistical power
• Power is increased by reducing measurement
errors and by increasing sample size
• Just because a null hypothesis has not been
rejected doesn’t mean that there are no
treatment effects
Testing for bioequivalence
• Bioequivalence tests can be used to
formally test whether there is no difference
between the effects of treatments (within
some tolerance)
Garrett 1997
Defining a “biological tolerance
level”
• A sensitivity analysis might be used to
define a tolerance level for effects below
which there is not expected to be any
important impact
• Formal discrimination between statistical
significance and biological significance
• BUT…you would need to have a great
deal of confidence in your model to rely on
this for management decisions
Relative yield loss percentage
Meta-analysis applications in
plant pathology
40
TS3
35
30
LR2
25
20
LR1
15
10
5
TS1
TS2
0
0
20
40
60
80
Percentage disease severity
100
• Comparisons across
studies can be
formalized in metaanalyses
• We have illustrated
the application of
meta-analysis to the
large quantities of
data available from
plant pathology field
trials
Rosenberg, Garrett, Su, and Bowden 2004 Phytopathology
Metadata
• The National Center for Ecological
Analysis and Synthesis works with
metadata and metadata standards as one
of its many projects
• http://www.nceas.ucsb.edu/nceasweb/resources/metadata.html
For discussing the disease
data set analyses
• Here are some suggestions for pondering
your data sets and projects
• You might consider addressing these
questions in your discussions and final
presentation
Defining the goals of the project
• A. What is the motivation for the project?
– Understanding the system better
• In what way in particular?
– Learning to manipulate the disease
• What are the potential methods for manipulation?
• B. What are the hypotheses to be tested or
parameters to be estimated?
– Will the project be sufficient to test hypotheses?
– Or will it more appropriately generate hypotheses to
be tested in a more controlled context?
Variables and parameters
• What are the potential predictor and
response variables?
• What are the parameters to be estimated?
• When using parameter estimates from an
experiment in a mechanistic simulation
model, the estimates might be viewed as
values to emphasize while considering a
wider range of possible values
Studying the distribution of variables
• It may be necessary to split variables into
logical groups, such as by environment
• For example, if environment has a large
effect, analyzing the disease severity for
samples from all environments in the
same analysis might produce a multimodel distribution
What are sources of bias?
• Since samples may not have been
collected specifically to answer later
questions, estimates may be biased for
some questions
• For example, rather than random
sampling, specific individuals may have
been sampled because of their observed
characteristics (symptoms, family size, …)
• True random sampling is often a
challenge, anyway
Deciding what to average prior to
analysis
• Once the appropriate experimental unit is
identified, you might average the
subsamples within a unit
• Possibly the subsample variance is
interesting in its own right and you would
like to include it in analyses
• You can keep all the individual subsample
measures in the analysis if you are careful
to use the correct error estimates for
testing effects
Is there a widely accepted model
for this system already available?
• Can your data be used to further validate
this model or perhaps as an example of a
case in which the model does not hold?
• Does your data add a new component to
this model, such as considering the effects
of a novel environmental parameter?
If there are not already accepted
models for your system…
• Is there a related system that has been
studied more, modeled, and might be used
as a starting point for considering your
system?
• For example, SIR models might be
generally applied for many types of
disease
Iteration between input from experimental
analyses and input from modeling
Modeling
Construction of new
hypotheses and predictions
Empirical experimentation:
Testing of hypotheses in
experiments; generation of
new parameter estimates;
generation of new
hypotheses
Modeling
Construction of new
hypotheses and predictions
Empirical experimentation:
Testing of hypotheses in
experiments; generation of
new parameter estimates;
generation of new
hypotheses
Sensitivity analysis
• Analysis of model output for a range of
parameter and variable inputs – analysis
of the sensitivity of outputs to changes in
the inputs
• The distribution of outputs for a particular
set of inputs can be evaluated in terms not
only of the mean or median, but also the
maxima and minima
Model validation
• A data set might be split by location, so
that a model developed based on one
subset of locations is validated using
another subset
• A data set might be split by time, so that a
model developed based on earlier time
points is validated using later time points