Mod17-A Statistics for Water Science
Download
Report
Transcript Mod17-A Statistics for Water Science
Statistics for Water Science:
Hypothesis Testing:
Fundamental concepts and a
survey of methods
Unite 5: Module 17, Lecture 2
Statistics
A branch of mathematics dealing with the
collection, analysis, interpretation and
presentation of masses of numerical data:
Descriptive Statistics (Lecture 1)
Basic description of a variable
Hypothesis Testing (Lecture 2)
Asks the question – is X different from Y?
Predictions (Lecture 3)
What will happen if…
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s2
Objectives
Introduce the basic concepts and assumptions of
significance tests
Distributions on parade
Developing hypotheses
What is “true”?
Survey statistical methods for testing for differences in
populations of numbers
Sample size issues
Appropriate tests
What we won’t do:
Elaborate on mathematical underpinnings of tests (take a
good stats course for this!)
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s3
From our last lecture
The mean:
A measure of central tendency
The Standard Deviation:
A measure of the ‘spread’ of the data
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s4
Tales of the normal distribution
Many kinds of data follow this symmetrical, bell-shaped
curve, often called a Normal Distribution.
Normal distributions have statistical properties that
allow us to predict the probability of getting a certain
observation by chance.
-2.0
-1.5
Developed by: Host
-1.0
-0.5
0.0
0.5
1.0
Updated: Jan. 21, 2004
1.5
2.0
U5-m17b-s5
Tales of the normal distribution
When sampling a variable, you are most likely to obtain
values close to the mean
68% within 1 SD
95% within 2 SD
-2.0
Developed by: Host
-1.5
2.0
-1.0
1.0 -0.5
Updated: Jan. 21, 2004
0.0
0
0.5 1.0 1.0
1.5
2.0
2.0
U5-m17b-s6
Tales of the normal distribution
Note that a couple values are outside the 95th (2 SD)
interval
These are improbable
-2.0
Developed by: Host
-1.5
2.0
-1.0
1.0 -0.5
Updated: Jan. 21, 2004
0.0
0
0.5 1.0 1.0
1.5
2.0
2.0
U5-m17b-s7
Tales of the normal distribution
The essence of hypothesis testing:
If an observation appears in one of the tails of a
distribution, there is a probability that it is not part of that
population.
-2.0
Developed by: Host
-1.5
2.0
-1.0
1.0 -0.5
00.0
Updated: Jan. 21, 2004
0.5
1.0
1.0
2.01.5
2.0
U5-m17b-s8
“Significant Differences”
A difference is considered significant if the
probability of getting that difference by random
chance is very small.
P value:
The probability of making an error by chance
Historically we use p < 0.05
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s9
The probability of detecting a significant
difference is influenced by:
The magnitude of the effect
A big difference is more likely to be significant
than a small one
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s10
The probability of detecting a significant
difference is influenced by:
The spread of the data
If the Standard Deviation is low, it will be easier
to detect a significant difference
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s11
The probability of detecting a significant
difference is influenced by:
The number of observations
Large samples more likely to detect a difference
than a small sample
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s12
Hypothesis testing
Hypothesis:
A statement which can be proven false
Null hypothesis HO:
“There is no difference”
Alternative hypothesis (HA):
“There is a difference…”
In statistical testing, we try to “reject the null
hypothesis”
If the null hypothesis is false, it is likely that our
alternative hypothesis is true
“False” – there is only a small probability that the results
we observed could have occurred by chance
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s13
Common probability levels
Alpha
Level
Reject Null
Hypothesis
P > 0.05
Not significant
No
P < 0.05
1 in 20
Significant
Yes
P <0.01
1 in 100
Significant
Yes
1 in 1000
Highly
Significant
Yes
P < 0.001
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s14
Types of statistical errors (you could be right,
you could be wrong)
Accept Ho
Reject Ho
Ho is True
Correct Decision
Type I Error
Alpha
Ho is False
Type II Error
Beta
Correct Decision
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s15
Examples of type I and type II errors
Type II
Error
-2.0
2.0
-1.5
Type I
Error
-1.0 1.0 -0.5
Developed by: Host
0
0.0
0.5
1.01.0
Updated: Jan. 21, 2004
2.0
1.5
2.0
U5-m17b-s16
Common statistical tests
Question
Test
Does a single observation belong to a population of values?
Z-test
Are two (or more populations) of number different?
T-test
F-test (ANOVA)
Is there a relationship between x and y
Regression
Is there a trend in the data (special case of above
Regression
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s17
Does a single observation belong to a
population of values: The Z-test
On June 26, 2002, a temperature probe reading at 7 m
depth in Medicine Lake was 20.30 C. Is this unusually
high for June?
Medicine Lake
June 2002 Temp - 7 m
Note: this is a
“one-tailed test”,
we just want to
know if it’s high
# observations
12
10
8
6
June Temp
4
2
0
21.00
20.75
20.50
20.25
20.00
19.75
19.50
19.25
19.00
18.75
18.50
18.25
18.00
We’re not asking
if it is unusually
low or high (2tailed)
Temperature
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s18
The z distribution: Standard normal distribution)
The Z-distribution is a Normal Distribution, with special
properties:
Mean = 0 Variance = 1
Z = (observed value – mean)/standard error
Standard error = standard deviation * sqrt(n)
The Z distribution
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s19
Medicine lake example
Calculate the Z-score for the observed data
Compare the Z score with the significant value
for a one tailed test (1.645)
Medicine Lake
June 2002 Temp - 7 m
# observations
12
10
8
6
June Temp
4
2
0
21.00
20.75
20.50
20.25
20.00
19.75
19.50
19.25
19.00
18.75
18.50
18.25
18.00
Temperature
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s20
The Deep Math…
Z = (observed value – mean)/standard error
Standard error = standard deviation * sqrt(n)
Z = (20.3 – 19.7)
0.08
= 6.89
Since 6.89 > the critical Z value of 1.64
Our deep temperature is significantly higher than the
June average temperature.
Further exploration shows that a storm the previous
day caused the warmer surface waters to mix into the
deeper waters.
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s21
Are two populations different: The t-test
Also called Student’s t-test. “Student” was a
synonym for a statistician that worked for
Guinness brewery
Useful for “small” samples (<30)
One of the most basic statistical tests, can be
performed in Excel or any common statistical
package
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s22
Are two populations different: The t-test
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s23
Are two populations different: The t-test
One of the most basic statistical tests, can be
performed in Excel or any common statistical
package
Same principle as Z-test – calculate a t value,
and assess the probability of getting that value
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s24
In Excel
Formula:
@ttest(Pop1, Pop2, #Tails, TestType)
Tailed tests: 1 or 2
TestType
1 - paired (if there is a logical pairing of XY data)
2 - equal variance
3 - unequal variance
Test returns exact probability value
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s25
Example: 1-tailed temperature comparison
@ttest(Pop1, Pop2, 1, 3) = 1.5 * 10-149
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s26
ANOVA: Tests of multiple populations
ANOVA – analysis of variance
Compare 2 or more populations
Surface temperatures for 3 lakes
Can handle single or multiple factors
One way ANOVA – comparing lakes
Two-way ANOVA – compare two factors
Temperature x Light effects on algal populations
Repeated measures ANOVA – compare factors
over time
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s27
Next Time: Regression - Finding relationships
among variables
Dissolved Oxygen (ppm)
H a ls te d S u rfa c e - A u g u s t 1 9 9 9
20
15
10
DO
5
0
7.5
8
8.5
9
9.5
pH
Developed by: Host
Updated: Jan. 21, 2004
U5-m17b-s28