50_Analysis & interpretation

Download Report

Transcript 50_Analysis & interpretation

Steps 3 & 4:
Analysis & interpretation of evidence
Hypothesis testing: a simple scenario
SCENARIO 1
•
DO measured upstream &
downstream over 9 months
SCENARIO 2
•
Upstream = 9.3 mg/L
— Downstream = 8.4 mg/L
Upstream = 7.9 mg/L
— Downstream = 4.2 mg/L
—
•
Difference significant at
P<0.05
DO measured upstream &
downstream over 3 months
—
•
Difference not significant at
P<0.05
Which scenario presents a stronger case for DO
causing impairment?
2
Using statistics responsibly
Use caution in interpreting differences
•
Look at magnitude & consistency of differences, rather
than statistical significance
•
Statistical significance detects differences exceeding
natural variance
— Does not detect stressor effects
— Does not equal biological significance
•
Can use statistics, but also use your head
— Consider relationship between minimum detectable
difference (power) & biologically relevant difference
3
That said, graphs & statistics can help a lot…
STEP 2
STEP 3
STEP 4
List candidate
causes
Evaluate data
from case
Evaluate data
from elsewhere










Regression analysis


Predicting environmental
condition from biological
observations (PECBO)

Scatterplots
Boxplots
Correlation analysis
Conditional probability analysis
Classification & regression tree


Species sensitivity distributions
(SSDs)

Quantile regression
4
5
6
7
8
Sampling/data collection issues
•
Want to collect candidate cause & biological response
variables at same sites, at same times
•
Want to time sampling to accurately assess exposures
•
—
For continuous exposures, try to catch sensitive life
stages & conditions that enhance exposure or effects
—
For episodic exposures, try to characterize episodes
through continuous, event, and post-event sampling
Want to collect good reference comparisons, in space
& time
—
At unimpaired sites, at same times you sample impaired
sites
9
10
Data analysis tools
•
Species sensitivity distribution (SSD) generator
—
Tool that calculates & plots proportion of species affected at
different levels of exposure in laboratory toxicity tests
11
Data analysis tools
•
Species sensitivity distribution (SSD) generator
—
•
Command-line R scripts
—
—
•
Tool that calculates & plots proportion of species affected at
different levels of exposure in laboratory toxicity tests
Powerful and free statistical package [http://www.r-project.org/]
R scripts provided for PECBO (predicting environmental
conditions from biological observations)
CADStat
—
Menu-driven package of data visualization & statistical
techniques, based on JGR (Java GUI for R)
12
R
R is a powerful and free statistical package…
[http://www.r-project.org/]
…but it’s run at the command line, so there’s a steep learning curve
13
JGR/CADStat: a gentler introduction to R
•
•
JGR provides point-and-click tools for
basic R functions
—
Loading data
–
Scatterplots
—
Setting up a workspace
–
Boxplots
—
Managing data in your workspace
Correlation analysis
—
Installing packages
–
–
Linear regression
–
Quantile regression
CADStat adds several tools to JGR
—
Point-and-click interfaces for tools useful
in causal analysis
–
Conditional
probability analysis
—
Rendering of analysis results in browser
window
–
PECBO
—
Additional help files
14
Load your data into CADStat, or use example data…
15
16
Where are we?
•
Step 1 
— Biological effects of interest defined
— Impaired & reference sites identified
•
Step 2 
— Map completed
— Conceptual model developed
— Candidate causes identified
— Data identified & collected
Now we’re ready to analyze available data &
use results as evidence in Steps 3 & 4
17
Begin with data exploration
•
Want to examine relationships between different
variables
— Linear & more complex
— Expected & unexpected
•
1st step is visualizing data
— Scatterplots & boxplots
— No statistics, hypotheses, or p-values
18
Looking at your data…
examples from exercise
PC1 – reference PC2 – impaired
Dissolved oxygen (mg/L)
7.3
7.9
Canopy cover
moderate
low
EPT taxa richness over time
2004
2005
2006
2007
PC1
18
16
15
16
PC2
8
9
13
15
In Fall 2004, an unpermitted industrial
discharge with high metal concentrations
was discovered and removed, just
upstream of PC2.
•
Which candidate cause do the data relate to?
•
Do the data support or weaken the case for the candidate cause?
•
What type of evidence do the data represent?
19
Boxplots
•
Depict distribution of
observations
–
–
Center box indicates spread of 50%
of observations
Whiskers indicate either max & min
values, or 1.5X the interquartile
range
•
Good data exploration &
visualization tool
•
Can be used to evaluate data from
case & data from elsewhere
–
–
–
Spatial/temporal co-occurrence
Stressor-response relationships
from the field or from other field
studies
Causal pathway
20
Scatterplots
–
•
% Non-insects
12
H
G
D
E
C
B F
200
400
600
800
A
1000
Conductivity (uS/cm)
Good data exploration & visualization
tool
–
•
Dependent variable (y-axis) vs. independent
variable (x-axis)
Often dependent variable is biological
response, independent variable is measure of
candidate cause
10
–
8
Graphical displays of matched data
6
•
EPT Richness
14
I
Can view several at once in
scatterplot matrix
Can be used to evaluate data from
case & from elsewhere
–
–
Stressor-response relationships from
the field, from other field studies, &
from laboratory studies
Causal pathway
21
Examples from exercise: scatterplots
EPT taxa richness
20
15
PC1
10
5
PC2
0
15
20
25
30
Maximum summer water temperature (˚C)
•
Do the data support or weaken the case for temperature as a
candidate cause?
•
What type(s) of evidence do these data provide?
22
EPT taxa richness
Examples from exercise: scatterplots
20
20
15
15
10
10
5
5
0
0
0
1
2
Log [Zn], ug/L
3
0
1
2
Log [Cu], ug/L
3
•
Do the data support or weaken the cases for copper & zinc as
candidate causes?
•
What type(s) of evidence do these data provide?
23
Correlation analysis
•
Method for measuring degree of association between 2 variables
in a matched data set
–
Correlation coefficient provides quantitative measure of degree to which 2
variables co-vary
Elevation
Water
Temp
Air
Temp
Log
Conductivity
1
-0.75
-0.25
-0.02
Water Temp
-0.75
1
0.64
0.07
Air Temp
-0.25
0.64
1
0.39
Log
Conductivity
-0.02
0.07
0.39
1
Elevation
•
Good data exploration tool
•
Can be used to evaluate data from case
–
–
Stressor-response relationships from the field
Causal pathway
24
Regression analysis
•
Method for quantifying relationship
between dependent & independent
variable
–
•
Can have ≥ 1 independent variables
Models can be used to:
–
Predict value of response
variable for new values of
explanatory variable
–
Estimate value of explanatory
variable needed to account for
change in response variable
–
Model natural variation in
stressors, to define reference
expectations
90th percentile prediction limits
25
Quantile regression
•
Models relationship between
specified quantile of response
variable and explanatory
variable (stressor)
–
–
50th quantile gives median
line, under which 50% of
observed responses occur
90th quantile gives line under
which 90% of observed
responses occur
•
Provides means of estimating location of upper boundary on
scatterplot
•
Used to evaluate stressor-response relationships from other field
studies (Step 4: Evaluating data from elsewhere)
26
Applying quantile regression
SUPPORTS
WEAKENS
27
Predicting environmental conditions from
biological observations (PECBO)
•
•
Uses taxa list & niche characteristics to predict environmental
conditions at site
Maximum likelihood
inference based on taxonenvironment relationships
Predicted temperature if
both Ameletus & Diphetor
are present at site
Ameletus
Diphetor
•
Useful when environmental
data not available
28
Using PECBO to verify predictions
•
Biologically-based predictions can provide valuable
supporting evidence, under “Verified predictions”
—
•
If stressor elevated at site,
can test hypothesis that
biologically-based
prediction should also be
elevated
Tools for calculating taxonenvironmental relationships
& PECBO available on CADDIS
R2 = 0.67
29
Species sensitivity distributions (SSDs)
An SSD is a statistical distribution describing the variation of responses
of a set of species to a stressor; it can be used to estimate the
proportion of species adversely affected at a given stressor intensity.
SSD plot showing LC50s for arthropods
exposed to dissolved cadmium
30
Now where are we?
•
•
Step 1 
—
Biological effects of interest defined
—
Impaired & reference sites identified
Step 2 
—
Map completed
—
Conceptual model developed
—
Candidate causes identified
—
Data identified & collected
•
Steps 3 & 4 
—
Available data examined &
analyzed
—
Available data organized into
types of evidence
Now we’re ready to evaluate & score the evidence
across candidate causes
31