Stats PowerPoint (t-test)

Download Report

Transcript Stats PowerPoint (t-test)

Histograms and Distributions
Questions:
Do athletes have faster reflexes than non-athletes?
Experiment:
- You go out and 1st collect the reaction time of 25 nonathletes.
Histograms and Distributions
Non-Athletes
Individual
Reaction Time (ms)
1
230
2
268
3
243
4
233
5
210
6
329
7
314
8
278
9
324
10
311
11
210
12
225
13
295
14
282
15
274
16
270
17
307
18
247
19
298
20
276
21
257
22
233
23
256
24
298
25
300
Non-Athletes reaction time in millliseconds (ms)
Calculate the mean…
278.5
Histograms and Distributions
Athletes
Individual
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Reaction Time (ms)
215
218
223
226
230
231
231
245
251
255
261
265
268
270
275
275
284
287
290
294
294
298
301
307
315
Athletes reaction time in millliseconds (ms)
Calculate the mean score…
264.4
Compare:
mean
athletes
Nonathletes
264.4
278.5
Histograms and Distributions
Non-Athletes reaction time in millliseconds (ms)
arranged from low to high reaction time
Make a histogram to display the data…
Histograms and Distributions
Histogram
4.5
4
frequency
3.5
3
2.5
Series1
2
1.5
1
0.5
reaction time (ms)
Sample size: 25
Histogram = a plot of frequency
Non-athletes
330-339
321-329
311-320
301-310
291-300
281-290
271-280
261-270
251-260
241-250
231-240
221-230
210-220
200-210
0
Sample size: 25
reaction time (ms)
Athletes
330-339
321-329
311-320
301-310
291-300
281-290
271-280
261-270
251-260
241-250
231-240
221-230
210-220
200-210
frequency
Histograms and Distributions
Histogram
4.5
4
3.5
3
2.5
2
Series1
1.5
1
0.5
0
Histograms and Distributions
Compare the histograms of non-athletes to athletes:
Histogram
Histogram
4.5
4
3.5
3.5
3
3
reaction time (ms)
MEAN:
reaction time (ms)
Non-athletes
Athletes
278.5
264.4
330-339
321-329
311-320
301-310
291-300
281-290
271-280
261-270
251-260
330-339
321-329
311-320
301-310
291-300
281-290
271-280
261-270
0
251-260
0
241-250
0.5
231-240
0.5
221-230
1
210-220
1
241-250
1.5
231-240
1.5
Series1
2
221-230
2
2.5
210-220
Series1
200-210
2.5
frequency
4
200-210
frequency
4.5
Histograms and Distributions
Compare the histograms of non-athletes to athletes:
Number of students (frequency)
4.5
4
3.5
3
2.5
Series1
2
Series2
1.5
1
0.5
0
Reaction time (ms)
MEAN:
Non-athletes
Athletes
278.5
264.4
Q: Is there really a difference between these two groups???
Histograms and Distributions
The student decided to collect more data (larger sample size),
which is really the only option at this point…
bin
200-210
210-220
221-230
231-240
241-250
251-260
261-270
271-280
281-290
291-300
301-310
311-320
321-329
330-339
sample size
non-athletes athletes
0
1
2
2
1
2
2
6
12
17
15
9
4
0
73
3
6
8
12
15
10
8
6
3
3
2
1
0
0
77
Number of students (frequency)
18
16
14
12
10
8
Series1
6
Series2
4
2
0
Reaction time (ms)
Non-athletes
MEAN:
298
Athletes
251
Histograms and Distributions
Comparison of histograms with small vs. large sample size:
Series2
Number of students (frequency)
Series1
200-210
210-220
221-230
231-240
241-250
251-260
261-270
271-280
281-290
291-300
301-310
311-320
321-329
330-339
Number of students (frequency)
18
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
16
14
12
10
8
Series1
6
Series2
4
2
0
Reaction time (ms)
MEAN:
Non-athletes
279
Athletes
264
Sample size: 25 in each group (N=50)
Reaction time (ms)
MEAN:
Non-athletes
298
Athletes
251
Sample size: 73 in non-athletes
77 in athletes
(N=150)
Histograms and Distributions
Let’s go back to the small sample size data…
Number of students (frequency)
4.5
4
3.5
3
2.5
Series1
2
Series2
1.5
1
0.5
0
Reaction time (ms)
MEAN:
Non-athletes
Athletes
278.5
264.4
How can we determine if there is a significant difference between these two groups?
Histograms and Distributions
Normal or Gaussian Distribution
Standard deviation (sigma)
First one needs to determine the standard deviation, which is basically a measure of
the width of the histogram.
For example, the mean of the non-athletes is 278.5 ms. If the standard dev. is
determined to be 30 ms, then it is assumed that 68.2% of the data will fall between
278.5 +/- 30ms (between 248.5 and 308.5 ms).
Would you prefer your standard dev. to be larger or smaller in value?
Histograms and Distributions
How do we determine the standard
deviation (sigma) of the mean?
Histograms and Distributions
1. Find the distance between each value and the mean
Non-Athletes
Individual
Reaction Time (ms)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
210
225
233
233
247
256
257
268
270
274
276
278
282
286
287
295
298
298
300
305
307
311
314
324
329
-68.52
-53.52
-45.5
-45.5
-31.5
-22.5
-21.5
-10.5
-8.5
-4.5
-2.5
-0.5
3.5
7.5
8.5
16.5
19.5
19.5
21.5
26.5
28.5
32.5
35.5
45.5
50.5
210-278.5
225-278.5
233-278.5
233-278.5
…
This will tell you how far away each value
is from the mean and begin to help you
understand the width of your distribution.
Histograms and Distributions
2. Square all the differences
Non-Athletes
Individual
Reaction Time (ms)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
210
225
233
233
247
256
257
268
270
274
276
278
282
286
287
295
298
298
300
305
307
311
314
324
329
-68.52
-53.52
-45.5
-45.5
-31.5
-22.5
-21.5
-10.5
-8.5
-4.5
-2.5
-0.5
3.5
7.5
8.5
16.5
19.5
19.5
21.5
26.5
28.5
32.5
35.5
45.5
50.5
4694.9904
2864.3904
2070.25
2070.25
992.25
506.25
462.25
110.25
72.25
20.25
6.25
0.25
12.25
56.25
72.25
272.25
380.25
380.25
462.25
702.25
812.25
1056.25
1260.25
2070.25
2550.25
Histograms and Distributions
3. Sum all the squares
Non-Athletes
Individual
Reaction Time (ms)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
210
225
233
233
247
256
257
268
270
274
276
278
282
286
287
295
298
298
300
305
307
311
314
324
329
-68.52
-53.52
-45.5
-45.5
-31.5
-22.5
-21.5
-10.5
-8.5
-4.5
-2.5
-0.5
3.5
7.5
8.5
16.5
19.5
19.5
21.5
26.5
28.5
32.5
35.5
45.5
50.5
4694.9904
2864.3904
2070.25
2070.25
992.25
506.25
462.25
110.25
72.25
20.25
6.25
0.25
12.25
56.25
72.25
272.25
380.25
380.25
462.25
702.25
812.25
1056.25
1260.25
2070.25
2550.25
23957.13
Histograms and Distributions
4. Divide the sum by the number of scores minus 1
Non-Athletes
Individual
Reaction Time (ms)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
210
225
233
233
247
256
257
268
270
274
276
278
282
286
287
295
298
298
300
305
307
311
314
324
329
-68.52
-53.52
-45.5
-45.5
-31.5
-22.5
-21.5
-10.5
-8.5
-4.5
-2.5
-0.5
3.5
7.5
8.5
16.5
19.5
19.5
21.5
26.5
28.5
32.5
35.5
45.5
50.5
4694.9904
2864.3904
2070.25
2070.25
992.25
506.25
462.25
110.25
72.25
20.25
6.25
0.25
12.25
56.25
72.25
272.25
380.25
380.25
462.25
702.25
812.25
1056.25
1260.25
2070.25
2550.25
23957.13
24
998.2
(variance)
Histograms and Distributions
5. Take the square root of the variance
Non-Athletes
Individual
Reaction Time (ms)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
210
225
233
233
247
256
257
268
270
274
276
278
282
286
287
295
298
298
300
305
307
311
314
324
329
-68.52
-53.52
-45.5
-45.5
-31.5
-22.5
-21.5
-10.5
-8.5
-4.5
-2.5
-0.5
3.5
7.5
8.5
16.5
19.5
19.5
21.5
26.5
28.5
32.5
35.5
45.5
50.5
4694.9904
2864.3904
2070.25
2070.25
992.25
506.25
462.25
110.25
72.25
20.25
6.25
0.25
12.25
56.25
72.25
272.25
380.25
380.25
462.25
702.25
812.25
1056.25
1260.25
2070.25
2550.25
31.6
(standard
deviation)
Histograms and Distributions
Standard deviation formula (what we just did):
- the square root of the sum of the squared deviations from the
mean divided by the number of scores minus one
Histograms and Distributions
Standard deviation formula:
Non-athletes: 278.5 SD(σ)=31.6
Athletes:
264.4 SD(σ)=30.6
Are these groups statistically different from each other??
Histograms and Distributions
T-Test
assesses whether the means of two groups are statistically
different from each other
Histograms and Distributions
Histograms and Distributions
Histograms and Distributions
= Standard Error of the difference
Histograms and Distributions
Histograms and Distributions
Histograms and Distributions
Therefore the t-value is related to how different the means are and how
broad yours data is. A high t-value is obviously what you hope for…
Calculate the t-score
Histograms and Distributions
t = -1.61
-Degrees of freedom is the sum of the people in both groups minus 2
df = 48
Histograms and Distributions
The null hypothesis vs the hypothesis
1. The hypothesis:
Athletes will have a quicker reaction time than non-athletes.
2. The null hypothesis:
The null hypothesis always states that there is no
relationship between the two groups or there is no
difference in reaction time between athletes and nonathletes.
Histograms and Distributions
The p-value
1. The p-value is a number between 0 and 1.
2. It is the probability (hence the p-value) that there is no
difference between the groups supporting the null
hypothesis.
3. Therefore, the probability that there is a difference
between the two groups is 1 minus the p-value.
4. In order for the data to support the hypothesis, the
p-value must be high or low?
The p-value should be low (<0.05), which says that there is less than a 5% chance
that there is no difference between the two groups. Therefore, there is greater than
95% chance that there is a difference.
Histograms and Distributions
Statistical Significance
When the p-value is less than 0.05, we say that the data is
statistically significant, and there may be a real difference
between the two groups.
Be warned that just because p is less than 0.05 between two groups doesn’t
mean that there is actually a difference. For example, if we find p < 0.05 for the
reaction time experiment, it doesn’t mean that there is a definite difference
between athletes and non-athletes. It only means that there is a difference in our
data, but our data might be flawed or there is not enough data yet (sample size
too small) or we measured the data improperly, or the sampling wasn’t random,
or the experiment was garbage, etc…
Doubt is the greatest tool of any scientist (person).
Histograms and Distributions
How is the p-value determined?
The p-value is found by using a standard t-table in
combination with the t-value and the degrees of freedom
previously determined:
http://bioinfo-out.curie.fr/ittaca/documentation/Images/ttable.gif
http://davidmlane.com/hyperstat/t-table.html
Histograms and Distributions
Now you try it:
1. On Edmodo you will find data collected by Tom and
Ileana regarding one’s ability to estimate the length of a
line or the number of spots on a screen.
2. The questions were accompanied with a survey that
asked for the subject’s grade level, ethnicity, participation
in sports, and honors vs. regents level.
3. The wanted to know if any of these differences would
correlate to their ability to estimate.
How should we analyze this data?
Histograms and Distributions
1. Begin by choosing the dependent variable like grade for
example.
Since the T-test can only look at two groups simultaneously and there are four grades,
we need to perform all the possible combinations (there was apparently only one 9th
grader and therefore the sample size is too low to look at this grade):
10th vs 11th
10th vs 12th
11th vs 12th
We also would want to know if the mean of each group is significantly different than the
actual value.
Actual value vs 10th
Actual value vs 11th
Actual value vs 12th
This needs to be done twice, once for the line estimation and once for the dots estimation!!
Histograms and Distributions
These are the tables you need to fill out:
Grade
Mean
SD
Variance
10th
11th
12th
Gades
Difference of
means
Variability
of Groups
T-score
P-value
10th vs actual
11th vs actual
12th vs actual
10th vs 11th
10th vs 12th
11th vs 12th
Write a conclusion based on your analysis. Remember, just because p < 0.5 it doesn’t
necessarily mean you hypothesis is supported!