Transcript o - e
TESTING
HYPOTHESES
Two ways of arriving at a conclusion
1. Deductive inference
sample
population
2. Inductive inference
sample
population
IF YOUR DATA ARE:
1. Continuous data
4. Equal variance (F-test)
2. Ratio or interval
3. Approximately normal distribution
5. Conclusions about population based on
sample (inductive)
sample
6. Sample size > 10
population
Imagine the following experiment:
2 groups of crickets
Group 1 – fed a diet with
extra supplements
Group 2 – fed a diet
with no supplements
Weights
12.1
13.9
13.0
12.1
9.1
8.9
11.0
10.1
14.9
12.2
12.9
14.9
9.9
9.2
8.0
11.9
13.6
12.0
13.5
13.6
8.6
9.0
8.5
9.6
12.0
15.9
12.4
12.0
10.0
10.9
9.4
8.0
10.9
12.1
11.0
10.9
11.9
7.1
10.0
8.9
Mean = 12.8
Mean = 9.49
What you’re doing here is comparing two samples that,
because you’ve not violated any of the assumptions we saw
before, should represent populations that look like this:
9.49
12.8
Frequency
Weight
Are the means of these populations different??
Are the means of these populations different??
To answer this question – use a statistical test
A statistical test is just a method of determining
mathematically whether you definitively say ‘yes’ or ‘no’
to this question
What test should I use??
IF YOU HAVEN’T VIOLATED ANY OF THE ASSUMPTIONS WE MENTIONED BEFORE……
Number of groups compared
2
other than 2
ANOVA
T -test
Are the means of more
than two populations
the same?
Are the means of two
populations the same?
Direction of difference specified?
Yes
One-tailed
No
Two- tailed
Does each data point in one data set
(population) have a corresponding
one in the other data set?
Yes
Paired t-test
Number of factors being tested
1
2
Does each data point in
one data set (population)
have a corresponding one
in the other data sets?
Yes
No
>2
Two way
ANOVA
No
Unpaired t-test
Repeated Measures
ANOVA
One way
ANOVA
Other tests
A simple t-test
1. State hypotheses
Ho – there is no difference between the means of the
two populations of crickets (i.e. the extra nutrients had
no effect on weight)
H1 – there is a difference between the means of the two
populations of crickets (i.e. the extra nutrients had an
effect on weight)
A simple t-test
2. Calculate a t-value (any stats program does this for
you)
(for the truly masochistic)
3. Use a probability table for the test you used to
determine the probability that corresponds to the tvalue that was calculated.
A simple t-test
2. Calculate a t-value (any stats program does this for
you)
3. Use a probability table for the test you used to
determine the probability that corresponds to the tvalue that was calculated.
Data
Test statistic
Probability
Unpaired t test
Do the means of Nutrient fed and No nutrient differ significantly?
P value
The two-tailed P value is < 0.0001, considered extremely significant.
t = 7.941 with 38 degrees of freedom.
95% confidence interval
Mean difference = -3.307 (Mean of No nutrient minus mean of Nutrient fed)
The 95% confidence interval of the difference: -4.150 to -2.464
Assumption test: Are the standard deviations equal?
The t test assumes that the columns come from populations with equal SDs.
The following calculations test that assumption.
F = 1.192
The P value is 0.7062.
This test suggests that the difference between the two SDs is
not significant.
Assumption test: Are the data sampled from Gaussian distributions?
The t test assumes that the data are sampled from populations that follow
Gaussian distributions. This assumption is tested using the method
Kolmogorov and Smirnov:
Group
KS P Value Passed normality test?
=============== ====== ======== =======================
Nutrient fed 0.1676 >0.10 Yes
No nutrient 0.1279 >0.10 Yes
Interpretation of p < .0001?
This means that there is less than 1 chance in 10,000 that these
two means are from the same population.
In the world of statistics, that is too small a chance to have
happened randomly and so the Ho is rejected and the H1
accepted
For all statistical tests that you’ll use, it is convention
that the minimum probability that two samples can differ
and still be from the same population is 5% or p = .05
Nonparametric Statistics
(Nominal Data)
&
Goodness-of-Fit Tests
What happens if you violate any of the assumptions?
Step 1 - Panic
What happens if you violate any of the assumptions?
Step 1 - Panic
Step 2 - It depends on what assumptions have been violated.
Assumption
Other tests
1. Continuous data
Yes
2. Ratio/interval
Yes
3. Normal distribution
Yes
4. Equal variance
5. Sample
6. N<10
Population
Another solution?
Transform the data
Yes - Welch’s
Yes
Yes
Take more samples
Nonparametric Tests
These tests are used when the assumptions of t-tests and
ANOVA have been violated
They are called “nonparametric” because there is no
estimation of parameters (means, standard deviations or
variances) involved.
Several kinds:
1) Goodness-of-Fit tests - when you calculate an expected value
2) Non-parametric equivalents of parametric tests
Goodness-of-Fit Tests
Use with nominal scale data
e.g. results of genetic crosses
Also, you’re using the population to deduce what the
sample should look like
Classic example - genetic crosses
Do they conform to an “expected’ Mendelian ratio?
Back to our little ball creatures - Critterus sphericales
Phenotypes:
A_B_
A_bb
aaB_
aabb
Mendelian inheritance
-Predict a 9:3:3:1 ratio
-sampled 320 animals
Observed (o)
A_B_
A_bb
aaB_
aabb
194
53
67
6
-sampled 320 animals
A_B_
A_bb
aaB_
aabb
Observed (o)
194
53
67
6
Expected (e)
180
60
60
20
-sampled 320 animals
A_B_
A_bb
aaB_
aabb
Observed (o)
194
53
67
6
Expected (e)
180
60
60
20
o-e
14
-7
7
-14
-sampled 320 animals
A_B_
A_bb
aaB_
aabb
Observed (o)
194
53
67
6
Expected (e)
180
60
60
20
o-e
14
-7
7
-14
(o - e)2
196
49
49
196
-sampled 320 animals
A_B_
A_bb
aaB_
aabb
Observed (o)
194
53
67
6
Expected (e)
180
60
60
20
o-e
14
-7
7
-14
(o - e)2
196
49
49
196
(o - e)2
e
1.08
.82
.82
9.8
-sampled 320 animals
A_B_
A_bb
aaB_
aabb
Observed (o)
194
53
67
6
Expected (e)
180
60
60
20
o-e
14
-7
7
-14
(o - e)2
196
49
49
196
(o - e)2
e
1.08
.82
.82
9.8
C2
=
S
(o -e)2
e
= 1.08 + .82 + .82 + 9.8 = 12.52
df = number of classes -1 = 3
X2 = 12.52
Critical value for 3 degrees of freedom at .05 level is 7.82
X2 Table
The actual probability of X2 =12.52 and df = 3 is .01 > p > .001
Conclusion: Probability of these data fitting the expected distribution is < .05,
therefore they are not from a Mendelian population
A little X2 wrinkle - the Yates correction
Formula is
C2
=
S
(o -e)2
e
Except of df = 1 (i.e. you’re using two categories of data)
Then the formula becomes
C2
=
S
(|o -e| - 0.5)2
e
A second goodness-of-fit test
G-test or Log-Likelihood Ratio
Use if |o - e | < e
e.g. if o is 12 and e is 7
G = 2 So ln o = 4.60517 *So log10
e
o
e
Summary!
Type of data
Number of
samples
Are data
related?
Test to use
Nominal
2
Yes
McNemar
Nominal
2
No
Fisher’s Exact
Nominal
>2
Yes
Cochran’s Q
All of the parametric tests (remember the big flow chart!) have non-parametric
equivalents (or analogues)
Type of data
Number of samples
Are data related?
Test to use
Nominal
2
Yes
McNemar
Nominal
2
No
Fisher’s Exact
Nominal
>2
Yes
Cochran’s Q
Ordinal
1
No
KomolgorovSmirnov
Ordinal+
2
Yes
Wilcoxon
(paired t-test
analogue)
Ordinal+
2
No
Mann Whitney U
(unpaired t-test
analogue)
Ordinal+
>2
No
Kruskal Wallis
(analogue of oneway ANOVA
Ordinal
>2
Yes
Friedman two-way
ANOVA