Mod18-A Applications of Regression to Water Quality Analysis

Download Report

Transcript Mod18-A Applications of Regression to Water Quality Analysis

Applications of Regression
to Water Quality Analysis
Unite 5: Module 18, Lecture 1
Statistics
 A branch of mathematics dealing with the
collection, analysis,
interpretation and presentation of masses of
numerical data
 Descriptive Statistics (Lecture 1)
 Basic description of a variable
 Hypothesis Testing (Lecture 2)
 Asks the question – is X different from Y?
 Predictions (Lecture 3)
 What will happen if…
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s2
Objectives
 Introduce the basic concepts and assumptions of




regression analysis
 Making predictions
 Correlation vs. causal relationships
 Applications of regression
Basic linear regression
 Assumptions
 Techniques
What if it is not linear: data transformations
Water quality applications of regression analyses
Survey of regression software
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s3
Regression defined
 A statistical technique to
40
Fish Weight (oz)
define the relationship
between a response
variable and one or more
predictor variables
 Here, fish length is a
predictor variable (also
called an “independent”
variable.
 Fish weight is the
response variable
45
35
30
25
20
15
10
5
0
5
7
9
11
13
15
Fish Length (in)
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s4
Regression and correlation
 Regression:
 Identify the relationship between a predictor and
response variables
 Correlation
 Estimate the degree to which two variables vary together
 Does not express one variable as a function of the other
 No distinction between dependent and independent
variables
 Do not assume that one is the cause of the other
 Do typically assume that the two variable are both effects of
a common cause
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s5
Basic linear regression
 Assumes there is a
m – the slope coefficient
(increase in Y per unit
increase in X)
40
Fish Weight (oz)
straight-line relationship
between a predictor (or
independent) variable X
and a response (or
dependent) variable Y
 Equation for a line:
Y = mX + b
45
35
30
25
20
15
10
5
 b – the constant or Y
Intercept
(value of Y when X=0)
Developed by: Host
0
5
7
9
11
13
15
Fish Length (in)
Updated: Jan. 21, 2003
U5-m18a-s6
Basic linear regression
 Assumes there is a
40
Fish Weight (oz)
straight-line relationship
between a predictor (or
independent) variable X
and a response (or
dependent) variable Y
 Regression analysis
finds the ‘best fit’ line
that describes the
dependence of Y on X
45
35
30
25
20
15
10
5
0
5
7
9
11
13
15
Fish Length (in)
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s7
Basic linear regression
 Assumes there is a
 Regression model
Y = mX + b
Weight = 4.48*Length + 28.722
40
Fish Weight (oz)
straight-line relationship
between a predictor (or
independent) variable X
and a response (or
dependent) variable Y
 Outputs of regression
45
35
30
25
20
15
10
5
0
5
7
9
11
13
15
Fish Length (in)
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s8
Basic linear regression
 Assumes there is a
 Regression model
Y = mx + b
Weight = 4.48*Length + 28.722
40
Fish Weight (oz)
straight-line relationship
between a predictor (or
independent) variable X
and a response (or
dependent) variable Y
 Outputs of regression
45
35
30
25
20
15
10
5
 Coefficient of
Determination
0
5
9
11
13
15
Fish Length (in)
R2 = 0.89
Developed by: Host
7
Updated: Jan. 21, 2003
U5-m18a-s9
How good is the fit? The Coefficient of
Determination
 R2: The proportion of the
 0.00 – No correlation
 1.00 – Perfect correlation
 no scatter around line
40
Fish Weight (oz)
total variation that is
explained by the
regression
 Coefficient of
determination
 R2 = 0.89
 Ranges from 0.00 to 1.00
45
35
30
25
20
15
10
5
0
5
7
9
11
13
15
Fish Length (in)
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s10
Example coefficients of determination
80
1.2
70
1
60
0.8
50
0.6
40
30
0.4
20
0.2
10
0
0
0.2
0.4
0.6
0.8
1
0
0
0.2
0.6
0.8
1
•R2 = 0.54
•R2 = 0.08
Developed by: Host
0.4
Updated: Jan. 21, 2003
U5-m18a-s11
Four assumptions of linear regression
-adapted from Sokal and Rohlf (1981)
 The independent variable X is measured
without error
 Under control of the investigator
 X’s are ‘fixed’
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s12
Four assumptions of linear regression
-adapted from Sokal and Rohlf (1981)
 The independent variable X is measured
without error
 Under control of the investigator
 X’s are ‘fixed’
 The expected value for Y for a given value of X
is described by the linear function Y = mX +b
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s13
Four assumptions of linear regression
-adapted from Sokal and Rohlf (1981)
 The independent variable X is measured
without error
 Under control of the investigator
 X’s are ‘fixed’
 The expected value for Y for a given value of X
is described by the standard linear function y =
mx +b
 For any value of X, the Y’s are independently
and normally distributed
 Scan figure 14.4 from S&R
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s14
Four assumptions of linear regression
-adapted from Sokal and Rohlf (1981)
 The independent variable X is measured without error
 Under control of the investigator
 X’s are ‘fixed’
 The expected value for Y for a given value of X is
described by the standard linear function y = mx +b
 For any value of X, the Y’s are independently and
normally distributed
 Scan figure 14.4 from S&R
 The variance around the regression line is constant;
variability of Y does not depend on value of X
 Extra credit word: the samples are homoscedastic
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s15
Data transformations: What if data are not
linear?
 It is often possible to ‘linearize’ data in order to
use linear models
 This is particularly true of exponential
relationships
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s16
Applications: Standard curves for lab analyses
 A classic use of regression:
calibrate a lab instrument to
predict some response
variable – a “calibration
curve”
 In this example, absorbance
from a spectrophotometer is
measured from series of
standards with fixed N
concentrations.
 Once the relationship
between absorbance and
concentration is established,
measuring the absorbance of
an unknown sample can be
used to predict its N
concentration
Developed by: Host
N
Updated: Jan. 21, 2003
U5-m18a-s17
Using regression to estimate stream nutrient and
bacteria concentrations in streams
 The USGS has real time water quality monitors
installed at several stream gaging sites in Kansas
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s18
Using regression to estimate stream nutrient and
bacteria concentrations in streams: data flow
•
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s19
Using Regression to estimate stream nutrient and
bacteria concentrations in streams: Results
 USGS developed a series of single or multiple regression
models
 Total P = 0.000606*Turbidity + 0.186
R2=0.964
 Total N = 0.0018*Turbidity + 0.0000940*Discharge + 1.08
R2=0.916
 Total N = 0.000325 * Turbidity + 0.0214 * Temperature +
0.0000796*Conductance + 0.515
R2=0.764
 Fecal Coliform = 3.14 * Turbidity + 24.2
R2=0.62
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s20
Using Regression to estimate stream nutrient
and bacteria concentrations in streams:
Important Considerations
 Explanatory variables were
only included if they had a
significant physical basis for
their inclusion
 Water temperature is
correlated with season and
therefore application of
fertilizers
 Conductance is inversely
related to TN and TP, which
tend to be high during high
flow
 Turbitidy is a measure of
particulate matter – TN and
TP are related to sediment
loads
Developed by: Host
 The USGS needed a separate
model for each stream!
 The basins were different
enough that a general model
could not be developed
 By using the models with the
real-time sensors, USGS can
predict events, e.g. when fecal
coliform concentrations
exceed criteria
Updated: Jan. 21, 2003
U5-m18a-s21
Measured and regression estimated density
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s22
Using regression to estimate stream nutrient and bacteria
concentrations in streams: Important Considerations
 Explanatory variables were
only included if they had a
significant physical basis for
their inclusion
 Water temperature is
correlated with season and
therefore application of
fertilizers
 Conductance is inversely
related to TN and TP, which
tend to be high during high
flow
 Turbitidy is a measure of
particulate matter – TN and
TP are related to sediment
loads
Developed by: Host
 The USGS needed a separate
model for each stream!
 The basins were different
enough that a general model
could not be developed
 By using the models with the
real-time sensors, USGS can
predict events, e.g. when fecal
coliform concentrations
exceed criteria
 Concentration estimates can
be coupled with flow data to
estimate nutrient loads
 Finally, these regressions can
be useful tools for estimating
TMDL’s
Updated: Jan. 21, 2003
U5-m18a-s23
Software for regression analyses
 Any basic statistical package will do
regressions
 SigmaStat
 Systat
 SAS
 Excel and other spreadsheets also have
regression functions
 Excel requires the Analysis Toolpack Add-in
 Tools > Add-in > Analysis ToolPack
Developed by: Host
Updated: Jan. 21, 2003
U5-m18a-s24