Statistical Analysis - Lincoln Park High School
Download
Report
Transcript Statistical Analysis - Lincoln Park High School
Statistical Analysis
Null vs. Alternative Hypothesis
Null hypothesis: observed differences are due to
chance (no causal relationship)
•
Alternative hypothesis: states that a causal
relationship exists between independent variable
and observed data
•
Ex. If light intensity increases, then the rate of
photosynthesis will not be affected
Ex: If light intensity increases, then the rate of
photosynthesis will increase
In statistics, the world is null until proven alternative
Mean, Median, Mode, % Difference, &
Standard Deviation
A mean is an average of all data points in a set
A median is the middle value in a data set
A mode is the most common value in a data set
Percent difference shows the difference between
the means of the experimental and control groups
•
% difference = (│experimental – control│/ control) x 100
Standard deviation is the average measure of how
much each value differs, or deviates, from the mean
With 2 data sets, you could have the same mean but
very different standard deviations.
A small standard deviation shows more consistency
Formula for Standard Deviation
What does this
mean?
N = Total #
of values
Mean
Each individual
value
SD example
Data Set 1: 4,4,4,4,4,6,6,6,6,6,5,5,5,5,5
Data Set 2: 5,5,5,4,4,6,6,3,3,7,7,1,1,9,9
Both sets have an identical mean…which data set has
a smaller standard deviation?
Set 1 has less spread around the mean, which would
give it a lower standard deviation
Mean and SD
For our data sets:
Set 1: Mean = 5, SD = 0.8
Set 2: Mean = 5, SD = 2.4
What these numbers really mean is that, given a
normal (bell curve) distribution, 68% of data points
fall within 1 SD of the mean, and 95% fall within 2
standard deviations
Precision of Data- BE CONSISTENT
Which data set is more useful? Why?
Error Bars
When we graph our data,
we can use error bars to
show the SD for each
mean
What is the approximate
standard deviation of meal
worms per tray in the
canopy cover group at
4 m from cover?
Error Bars
When we graph our data,
we can use error bars to
show the SD for each
mean
What is the approximate
standard deviation of meal
worms per tray in the
canopy cover group at
4 m from cover?
Answer: ~1 mealworm per
tray
Chi-square
Analysis
o = observed values
e= expected values
A chi-square analysis tests the significance of
results
Answers the question: were the differences in the
means large enough to reject the null hypothesis (and
support the alternative hypothesis)?
Tests the probability of observed differences being
random and NOT due to the independent variable
In the chi-square formula, the expected (e) values are
those that you would expect if the null were true.
Null vs. Alternative Hypothesis
A p-value of .05 means there is a 5% chance that the
difference between observed and expected data is
random (95% chance that there is a significant
difference)
Critical value – predetermined value establishing
boundary for rejecting/accepting null hypothesis
•
•
•
Maximum chi-square value that would fail to reject null
hypothesis (i.e., chi-square value higher than the critical
value shows support for the alternative hypothesis)
Critical values will be provided in a chi-square table
Dependent on degrees of freedom: number of possible
outcomes minus 1 (d = N – 1)
CHI-SQUARE DISTRIBUTION TABLE
Critical values
Accept Null Hypothesis (difference due to
chance)
Reject Null
Hypothesis
Probability (p-value)
Degrees of
Freedom
0.95
0.90
0.80
0.70
0.50
0.30
0.20
0.10
0.05
0.01
0.001
1
0.004
0.02
0.06
0.15
0.46
1.07
1.64
2.71
3.84
6.64
10.83
2
0.10
0.21
0.45
0.71
1.39
2.41
3.22
4.60
5.99
9.21
13.82
3
0.35
0.58
1.01
1.42
2.37
3.66
4.64
6.25
7.82
11.34
16.27
4
0.71
1.06
1.65
2.20
3.36
4.88
5.99
7.78
9.49
13.38
18.47
5
1.14
1.61
2.34
3.00
4.35
6.06
7.29
9.24
11.07
15.09
20.52
6
1.63
2.20
3.07
3.83
5.35
7.23
8.56
10.64
12.59
16.81
22.46
7
2.17
2.83
3.82
4.67
6.35
8.38
9.80
12.02
14.07
18.48
24.32
8
2.73
3.49
4.59
5.53
7.34
9.52
11.03
13.36
15.51
20.09
26.12
9
3.32
4.17
5.38
6.39
8.34
10.66
12.24
14.68
16.92
21.67
27.88
10
3.94
4.86
6.18
7.27
9.34
11.78
13.44
15.99
18.31
23.21
29.59
Chi-square Analysis
For example, using a p-value of .05 and 3 degrees of
freedom, a chi-square value must be greater than
__________ (the critical value) to reject the null
hypothesis and support the alternative hypothesis.
Put another way, a calculated chi-square value that is
greater than 7.82 means there is a greater than 95%
chance that there is a significant difference between
the observed and expected data (less than 5% chance
that the difference is random).
1.1.5 T-test
A T-test determines whether or not there
is a significant difference between 2
samples
Assume we’re measuring wing span of 2
populations of eagles, 1 wild and 1
captive bred
We want to know if the difference
between the lengths is significant (as
opposed to being due to chance)
1.1.5 T-test
Captive: 180 cm, 187, 212, 196, 200, 204, 194, 189
Wild: 188, 205, 201, 214, 194, 189, 206, 203
Degrees of Freedom = 8 + 8 – 2 = 14
When we apply the T-test, and use a T value chart,
we obtain a 66% confidence level that the
differences are significant. Not enough.
We need a confidence level of 95%, with a
minimum sample size of 5.
1.1.6 Correlation and Causality
Simply because data shows a correlation does not
imply causation.
Causation requires that one variable causes the
other to occur.
The number of cavities in children shows a strong
positive correlation with their vocabulary level.
?
We should not assume that well spoken children
will have dentures by college.
Stats Quiz
1.
2.
3.
4.
Define standard deviation as required on the
syllabus. (2)
State the usefulness of knowing a standard
deviation. (2)
Give the minimum confidence level for results to
be significant in science. (1)
If I told you that, based on measurements in a
previous class, that blue haired people are hard of
hearing, how would you respond (regarding the
relationship)? (2)
Stats Quiz Answers
1.
2.
3.
4.
Summarize spread of values around
mean, 68% of data lies within 1 SD of
mean (95% within 2)
Comparing samples/ data points, large
SD = bad, low SD = consistent
Greater than 95%
Just because there is correlation does
not imply causation.