assuming the null hypothesis is correct

Download Report

Transcript assuming the null hypothesis is correct

Data Analysis
A Few Necessary Terms
Categorical Variable: Discrete groups, such as Type
of Reach (Riffle, Run, Pool)
Continuous Variable: Measurements along a
continuum, such as Flow Velocity
What type of variable would “Mottled Sculpin /meter2” be?
What type of variable is “Substrate Type”?
What type of variable is “% of bank that is undercut”?
A Few Necessary Terms
Explanatory Variable: Independent variable. On xaxis. The variable you use as a predictor.
Response Variable: Dependent variable. On y-axis.
The variable that is hypothesized to depend on/be
predicted by the explanatory variable.
Statistical Tests: Appropriate Use
For our data, the response variable will always be
continuous.
T-test: A categorical explanatory variable with 2
options.
ANOVA: A categorical explanatory variable with >2
options.
Regression: A continuous explanatory variable
Statistical Tests
Hypothesis Testing: In statistics, we are always
testing a Null Hypothesis (Ho) against an alternate
hypothesis (Ha).
Test Statistic:
p-value: The probability of observing our data or
more extreme data assuming the null hypothesis
is correct
Statistical Significance: We reject the null
hypothesis if the p-value is below a set value,
usually 0.05.
Student’s T-Test
Tests the statistical significance of the
difference between means from two
independent samples
Compares the means of 2 samples of a categorical
variable
Mottled
Sculpin/m2
Cross Plains Salmo Pond
Precautions and Limitations
• Meet Assumptions
• Observations from data with a normal
distribution (histogram)
• Samples are independent
• Assumed equal variance (boxplot)
• No other sample biases
• Interpreting the p-value
Analysis of Variance (ANOVA)
Tests the statistical significance of the
difference between means from two or
more independent samples
Grand
Mean
Mottled
Sculpin/m2
ANOVA website
Riffle
Pool
Run
Precautions and Limitations
• Meet Assumptions
• Observations from data with a normal
distribution
• Samples are independent
• Assumed equal variance
• No other sample biases
• Interpreting the p-value
• Pairwise T-tests to follow
Simple Linear Regression
• What is it? Least squares line
•When is it appropriate to use?
•Assumptions?
•What does the p-value mean? The Rvalue?
• How to do it in excel
Simple Linear Regression
Tests the statistical significance of a
relationship between two continuous
variables, Explanatory and Response
0.4
R2 = 0.6955
0.35
Brown Trout/Meter^2
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0.1
0.2
0.3
Mottled Sculpin/Meter^2
0.4
0.5
Precautions and Limitations
• Meet Assumptions
• Observations from data with a normal
distribution
• Samples are independent
• Assumed equal variance
• Relationship is linear
• No other sample biases
• Interpret the p-value and R-squared value.
Residual Plots
Residuals are the distances from observed points
to the best-fit line
Residuals always sum to zero
Regression chooses the best-fit line to minimize
the sum of square-residuals. It is called the Least
Squares Line.
0.4
R2 = 0.6955
0.35
Brown Trout/Meter^2
0.3
Residuals
0.25
0.2
0.15
0.1
0.05
0
0
0.1
0.2
0.3
Mottled Sculpin/Meter^2
0.4
0.5
Residual vs. Fitted Value Plots
0.15
Observed
Values
(Points)
Residuals
0.1
0.05
0
0
0.1
0.2
0.3
-0.05
-0.1
-0.15
Fitted Values (MS_CPUA)
Model Values (Line)
0.4
0.5
Residual Plots Can Help Test Assumptions
0
0
“Normal” Scatter
Fan Shape:
Unequal
Variance
0
Curve
(linearity)
Have we violated any assumptions?
0.4
0.15
R2 = 0.6955
0.35
0.1
Residuals
Brown Trout/Meter^2
0.3
0.25
0.2
0.05
0
0
0.15
-0.05
0.1
-0.1
0.1
0.2
0.3
0.05
-0.15
0
Fitted Values (MS_CPUA)
0
0.1
0.2
0.3
Mottled Sculpin/Meter^2
0.4
0.5
0.4
0.5
R-Squared and P-value
High R-Squared
Low p-value (significant relationship)
R-Squared and P-value
Low R-Squared
Low p-value (significant relationship)
R-Squared and P-value
High R-Squared
High p-value (NO significant relationship)
R-Squared and P-value
Low R-Squared
High p-value (No significant relationship)
P-value indicates the strength of the relationship
between the two variables
You can think of this as a measure of
predictability
R-Squared indicates how much variance is
explained by the explanatory variable.
If this is low, other variables likely play a role. If
this is high, it DOES NOT INDICATE A
SIGNIFICANT RELATIONSHIP!