Transcript Document
The Student’s T-test and other
tests for significance
Faced with an Observation
• Question an Observation (I feel more secure
using my right hand)
• Why This Question-Background Information
(Reaction time can be calculated)
• Hypothesis Ho, H1, H2
• Experimental Design (Test Reaction Time)
• Results and Analysis of Data (Statistics)
• Conclusions
2
Let’s get started!!
State the Hypotheses
• Ho There is no difference between the
reaction time of the dominant and non
dominant hand (Null Hypothesis)
• H1 = Reaction time in the dominant hand is
significantly faster than reaction time in the
non dominant hand.
• H2 = Reaction time in the non dominant
hand is significantly faster than reaction
time in the dominant hand
3
Experimental Design and
Analysis of Data
• Collecting reaction time data
– Reaction time t = √(2d/g)
– How many pieces of data?
4
Reaction Time: Right Hand vs Left Hand
Reaction time t = √(2d/g)
How are these two experiments different even though the groups have the same mean?
Experiment I
Grp A Grp B
38
32
10
16
84
57
36
28
50
55
35
12
73
61
48
29
avg
stdev
0.339037829
46.75 36.25
0.378 0.189
Create a Data Table in Excel for your
Reaction times: ten repeats on each hand
21.20062892
46.75 36.25
23.21 19.02
Experiment II
Grp A Grp B
46.75 36.25
46.25 36.25
46.75
36
47.25 36.25
46.75 36.5
46.25 36.25
46.75
36
47.25 36.5
avg
stdev
p values
TIME
(sec)
3.02246E-19
RIGHT
LEFT
.012
.014
.018
.016
.018
.015
.016
.018
5.429855738
Conclusions?? Significant Difference?
What Statistical Method?
• Is there a difference between the right and
left hand with regard to reaction time?
• Let’s learn how to do the Student’s T test
6
Statistical Significance
• If the probability of an event occurring due to
chance is less than 5% it is most likely due to
the independent variable. This means there is a
95% chance it is due to your independent
variable. The “p” value gives you this probability
and 5% is defined as alpha and is usually the
chosen “cut off” for significance in science. Let
us look at some graphs to determine which
ones may represent data that is significant and
due to the independent variable.
• The t-test assesses
whether the means
of two groups are
statistically different
from each other.
This analysis is
appropriate
whenever you want
to compare the
means of two
groups.
What are some representative units for the axis? (ie # students vs Grades
(male /female), #of planaria regenerated vs concentration (ant /post),
# planaria regenerated vs time (concentration : 0 /70ppb); diameter of zone vs
number of disks
• What does it mean to say that the
averages for two groups are
statistically different?
• Notice that the three situations is
that the difference between the
means is the same in all three.
• But, the three situations don't look
the same -- they tell very different
stories.
• The top example shows a case with
moderate variability of scores within
each group.
• The second situation shows the high
variability case.
• the third shows the case with low
variability.
• The two groups appear most
different or distinct in the bottom or
low-variability case.
• Why? Because there is relatively
little overlap between the two bellshaped curves.
Significant???????
• The formula for the
t-test is a ratio. The
top part of the ratio
is just the difference
between the two
means or averages.
The bottom part is a
measure of the
variability or
dispersion of the
scores.
• The top part of the formula is easy to compute - just find the difference between the means.
The bottom part is called the standard error of
the difference.
– To compute it, we take the variance for each group
and divide it by the number of samples in that
group. We add these two values and then take their
square root
• The formula is:
Formula for the Standard error of the difference between the means
• Remember, the variance is simply the square
of the standard deviation.
• The final formula for the t-test is:
Formula for the t-test.
Let’s Try
• Cedar-apple rust is a (non-fatal) disease that affects
apple trees. Its symptom is rust-colored spots on
apple leaves. Red cedar trees are the immediate
source of the fungus that infects the apple trees. If you
could remove all red cedar trees within a few miles of
the orchard, you should eliminate the problem. In the
first year of this experiment the number of affected
leaves on 8 trees was counted; the following winter all
red cedar trees within 100 yards of the orchard were
removed and the following year the same trees were
examined for affected leaves. The results are
recorded on the next panel:
Data
•
Tree number of rusted number of rusted
leaves: year 1
leaves: year 2
1
2
3
4
5
6
7
8
38
10
84
36
50
35
73
48
average
46.8
standard dev 23
32
16
57
28
55
12
61
29
36.2
19
difference: 1-2
6
-6
27
8
-5
23
12
19
10.5
12
Determine whether there was a significant change in the number of rusted
leaves between years 1 and 2. Did the treatment cure the problem?
How are these two experiments different even though the groups have the same mean?
Experiment I
Grp A Grp B
38
32
10
16
84
57
36
28
50
55
35
12
73
61
48
29
avg
stdev
46.75 36.25
23.21 19.02
Experiment II
Grp A Grp B
46.75 36.25
46.25 36.25
46.75
36
47.25 36.25
46.75 36.5
46.25 36.25
46.75
36
47.25 36.5
avg
stdev
0.339037829 ttest
46.75 36.25
0.378 0.189
3.02246E-19 ttest
Significant?????
• Sometimes, when the statistical In a scientific study, a theory is
proposed, then data is collected and analyzed. The statistical
analysis of the data will produce a number that is statistically
significant if it falls below 5%, which is called the confidence level. In
other words, if the likelihood of an event is statistically significant,
the researcher can be 95% confident that the result did not happen
by chance.
• Significance of an experiment is very important, such as the safety
of a drug meant for humans, the statistical significance must fall
below 3%. In this case, a researcher could be 97% sure that a
particular drug is safe for human use. This number can be lowered
or raised to accommodate the importance and desired certainty of
the result being correct.
• Statistical significance is used to reject or accept what is called the
null hypothesis. A hypothesis is a statement of the theory that a
researcher is trying to prove. The null hypothesis holds that the
factors a researcher is looking at have no effect on differences in the
data. Statistical significance is usually written, for example, t=.02,
p<.05. Here, "t" stands for the statistic test score and "p<.05" means
that the probability of an event occurring by chance is less than 5%.
These numbers would cause the null hypothesis to be rejected,
therefore affirming that the particular theory is true.
Other Statistical Tests
www.wikipedia.org
68-95-99.7 rule
Dark blue is less than one standard deviation from the
mean. For the normal distribution, this accounts for 68.27 %
of the set; while two standard deviations from the mean
(medium and dark blue) account for 95.45 %; and three
standard deviations (light, medium, and dark blue) account
for 99.73 %.
Standard Deviation
• The standard deviation of a probability distribution,
random variable, or population or multiset of values is a
measure of the spread of its values (wiki).
• The standard deviation is the most common measure of
statistical dispersion, measuring how widely spread the
values in a data set are. If the data points are close to
the mean, then the standard deviation is small. As well, if
many data points are far from the mean, then the
standard deviation is large. If all the data values are
equal, then the standard deviation is zero.
• www.wiki.com
Why???
• The standard deviation can also help you
evaluate the worth of all those so-called
"studies" that seem to be released to the
press everyday. A large standard deviation
(greater than 10% of the mean) in a study
that claims to show a relationship between
eating Twinkies and better SAT scores, for
example, might tip you off that the study's
claims aren't all that trustworthy.
Coefficient of Variance
• The coefficient of variance (CV) measures the
precision of the person/s during a set of individual
tests (replicates) performed for one specific water
quality parameter. As an example, if a team has just
finished collecting data on 5 replicates of dissolved
oxygen data, the team can use the coefficient of
variance formula to determine how precisely they
performed the data. The higher the precision (the
lower the %), the higher the likelihood that there was
no difference in the way each replicate/ individual
test was performed. In other words, data within a
data set which is collected consistently should
hypothetically have a Coefficient of Variance equal
to zero percent!
The Formula
• Calculating the Coefficient of Variance
• s = standard deviation
• X = average
• You can calculate the CV for the 3-5
replicates for a single sampling.
• Distributions with CV < 1 are considered lowvariance (that’s good), while those with CV >
1 are considered high-variance (that’s bad).
Percent Error
• The percent error can be determined when the
true value is compared to the observed value
according to the equation below:
• % error = | your result - accepted value |
accepted value
• Less than 5% error is acceptable
x 100 %
• Additional Slides
A Tale of Two Tails
• Directional hypotheses are called onetailed
– We are only interested in deviations at one tail
of the distribution
• Non-directional hypotheses are called twotailed
–We are usually interested in any significant
deviations from the null hypothesis
How do you decide to use a oneor two-tailed approach?
One Tail or Two? The moderate
approach:
• If there’s a strong, prior, theoretical expectation
that the effect will be in a particular direction
(A>B), then you may use a one-tailed approach.
Otherwise, use a two-tailed test.
• Because only an A>B result is interesting,
concentrate your attention on whether there is
evidence for a difference in that direction.
– E.G. does this new educational reform improve
students’ test scores?
– Does this drug reduce depression?
One tail or two? The more
conservative approach:
• The problem with the moderate approach
is that you probably would actually find it
interesting if the result went the other way,
in many cases.
– If the new educational reform leads to
worse test scores, we’d want to know!
– If the new drug actually
increases symptoms of depression,
we’d want to know!
One tail or two? The more
conservative approach:
• Only use a one-tailed test if you have a
strong hypothesis about the directionality
of the results (A>B) AND it could also be
argued that a result in the “wrong tail”
(A<B) is meaningless, and might as well
be due to chance.
One tail or two: The most
conservative approach
• Always use two-tailed tests!
• Correcting for one-vs. two-tailed tests
– If you think a researcher has run the wrong kind
of test, it’s easy to recalculate the p-value
yourself:
– P (one-tailed) = ½ P (two-tailed)
Was the result significant?
• There is no true sharp dividing line between
probable and improbable results.
• There’s little difference between p=0.051 and
p=0.049, except that some journals will not
publish results at p=0.051, and some readers
will accept results at p=0.049 but not at
p=0.051.
• In any case it does not tell us if the result is
IMPORTANT!
Decision theory and tradeoffs
between types of errors
• Think of a household smoke detector.
• Sometimes it goes off and there’s no fire (you
burn some toast, or take a shower).
–A false alarm.
–A Type I error.
• Easy to avoid this type of error: take out the
batteries!
• However, this increases the chance of a Type
II error: there’s a fire, but no alarm.
Decision theory and tradeoffs
between types of errors
• Similarly, one could reduce the chances of a
Type II error by making the alarm
hypersensitive to smoke.
• Then the alarm will very likely go off in a fire.
• But you’ll increase your chances of a false
alarm = Type I error. (The alarm is more likely
to go off because someone sneezed.)
• There is typically a tradeoff of this sort between
Type I and Type II errors.
standard deviation
In statistics, the standard deviation of
a data set is the square root of its
variance. Standard deviation is a widely
used measure of the variability. It may
be thought of as the average difference
of the data points from the average of
distribution or how far they are away
from the average. A low standard
deviation indicates that the data points
tend to be very close to the average,
whereas high standard deviation
indicates that the data are spread out
over a large range of values. The less
spread the data points the more
precise the data. We want a low
standard deviation, which we shall say
is less than 10% of the average. If it is
greater, then our fish are not similar or
our data collection is not precise.
Student T test
• Add a column in excel to your data table.
•Title it T-test
•Click on the next open cell next to the temperature readings at 18 degrees
•Go up to fx and choose TTEST, hit ok
•Array 1 is the readings (of 100%) across the row at 20 C(do not include temp, standard deviation
or average)
•Array 2 is the readings across the row at 18 C (do not include temp, standard deviation or
average)
•Type is 2 and tails is 2.
•Hit ok
•The p value will appear in the box.
•Grab the small box in the lower right hand corner of the cell and pull it down to the last row of temp
readings at 6C.
•The p values will appear which are comparing the readings in that line to the previous (ie 10C
respiratory rate to 8C)
•When p ≤ 0.05 then there is a statistically significantl difference between the respiratory rate in that
row when compared to the previous row readings.
•Is there a temp at which it no longer matters how cold the water is (when p is no longer ≤ 0.05)?
Is this because there is no longer a temperature effect or because of the variance of the
data?????? Really hard question. Is the standard deviation less than 10% of the mean??????
•Indicate where the independent variable is having a statistically significant effect of the fish
respiratory rate by placing a “*” above the bar on the bar graph of class data. Insert a text box
below the graph and type “ p ≤ 0.05” which indicates the independent variable is having a
statistically significant effect.
Respiratory Rate (% Control) vs Temperature
Average Standard T-test p
%control Deviationvalue
% control
Temp ºC
20
100
100
100
100
100
18
77
86
75
76
58
16
63
57
54
58
50
14
47
55
45
42
12
37
38
33
10
29
36
8
21
6
29
100
100
100
0
64
72.7
10.0
81
54
59.6
10.3
38
18
39
40.6
11.5
27
27
16
29
29.6
7.5
29
19
19
11
21
23.4
8.4
28
20
12
12
9
17.0
7.2
13
8
4
4
5
10.0
8.9
1.6172E-05
0.04092812
0.00678194
0.05504315
0.1731174
0.16959127
7
0.15326909