m2_p4_variation_description

Download Report

Transcript m2_p4_variation_description

Thinking about variation
Learning Objectives
By the end of this lecture, you should be able to:
– Discuss with an example why it is important to know the variation when
analyzing a dataset
– Interpret a series of Normal curves relative to each other in terms of
their center and variation
– Be able to compare values from different datasets by comparing their zscores
Thoughts on variation continued
•
•
Let’s take a moment to think about spread (again)…
Suppose you score 12 out of 15 on a test.
–
–
–
–
–
Great score?
Good score?
Average score?
Poor score?
Terrible score?
•
Answer: You can’t tell! I hope you’d agree that you’d at least need the mean in order to
interpret how good a score this was.
•
Okay then, so suppose I tell you that the mean was 11 / 15. Now answer the same
question: Is 12/15 with a mean of 11 this a Great score, Good score, Fair score, Poor score,
Terrible score?
Answer: You STILL can’t tell! While you could say that is somewhat better than average, you
really have no way of knowing if it is approximately average, good, or great.
•
Thoughts on variation continued
•
•
•
•
•
•
Suppose I tell you that the mean was 11 / 15. Is 12/15 a:
–
–
–
–
Great score?
Good score?
Average score?
Poor score?
–
Terrible score?
Discussion: What’s missing from this interpretation is a measure of spread. Suppose I told you that
of the 500 students who took this test, the vast majority scored between 9.5 and 10.5. In this case,
you’d suspect that a score of 12 was, in fact, quite good, but you couldn’t put a number on it.
KEY POINT: In order to properly interpret any score (of a Normal distribution), we simply can not
ignore the standard deviation!!!
Suppose the standard deviation was 0.5. In this case, a score of 12 is two standard deviations above
the mean. This would be a score at about the 98th percentile – which is a great result.
Suppose the standard deviation was 2. In that case, your z-score is +0.5 and you are in the 70th
percentile which is good, but not fantastic.
In other words, without knowing the spread, you simply do not know the story!
What’s different? What’s the same?
In this group, the means are the
same (m = 15) but the standard
deviations are different (s = 2, 4,
and 6).
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
In this group, means are different
(m = 10, 15, and 20) while the standard
deviations are the same (s = 3)
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
Another extremely useful thing about
working with normally distributed data
is that we can compare apples and
oranges! That is, because we can
convert any observation into a zscore, we can then answer questions
to compare seemingly noncomparable distributions.
SAT vs ACT
• Question: Suppose that student A scores 1140 on their SAT, and student B
scores 18.2 on their ACT. You are an admissions counselor and you need to
make a decision based exclusively on their test score. Can you use this
data to decide?
• Answer: If you can convert these numbers to their corresponding z-scores,
then absolutely! To do so, you would, of course, need to know the mean
and standard deviation of the two exams. This information is routinely
provided by the testing services.
•
E.g. If student A had a z-score of +1, that means he was in the 84th percentile for the SAT.
If student B had a z-score of +1.3, that means that he was in the 90th percentile. So even
though they took completely different exams, you do have a way of comparing them!
Example: Gestation time in malnourished mothers
A study was done in which the gestation time of mothers in a poor neighborhood was measured. While there were
free prenatal vitamins available, there was a great deal of misinformation about proper prenatal nutrition. The
gestation time of this group can be seen on the light-blue curve below.
Over the next couple of years, a public health project was implemented at local health-care institutions in which
women were also provided with nutritional counseling and healthier food. The results of a study after the nutritional
program was implemented are summarized on the orange graph below.
Try to interpret the results in your own words….
m 266
s 15
m 250
s 20
180
200
220
240
260
280
Gestation time (days)
Vitamins only
Vitamins and better food
300
320
Example: Gestation time in malnourished mothers
Try to interpret the results in your own words….
•
The mean gestational time improved from about 250 to 266.
•
In addition to the mean improving, there were more people who reached the mean (the peak of the
orange curve is higher than the peak of the blue curve).
•
There was more consistency in the “better nutrition” group: the spread of the orange distribution is
narrower. (While you can simply eyeball it, and you can also quantify it by the standard deviation).
m
266
s 15
m
250
s 20
180
200
220
240
260
280
Gestation time (days)
Vitamins only
Vitamins and better food
300
Don’t feel bad if you didn’t
automatically ‘get’ all these facts.
That’s why we do examples here! Your
goal should be to begin making these
kinds of interpretations on your own.
320
Example: Gestation time in malnourished mothers
A commonly accepted number for a minimum gestational period (ideally) is about 240 days or
longer. How might we quantify the improvement shown below?
Instead of waiting for me to answer, try to come up with it on your own. I.e. STOP and THINK
about it for a moment…
Answer: The best way would be to look at the percentage of women who reached the target of
240 days in each group.
m 266
s 15
m 250
s 20
180
200
220
240
260
280
Gestation time (days)
Vitamins only
Vitamins and better food
300
320
In the group without nutritional counseling (vitamins only), what percent of mothers
failed to carry their babies at least 240 days?
m=250, s=20,
x=240
Vitamins Only:
x  240
m  250
s  20
(x  m)
z
s
(240  250)
20
 10
 0.5
z
20
(half a standard deviation)
Area under to
the left of z - 0.5 is 0.3085.
z
170
190
210
230
250
270
290
Gestation time (days)
Vitamins only: About 31% of women failed to reach the
target length of 240 days.
310
Nutritional counseling and better food
m=266, s=15,
x=240
x  240
m  266
s  15
z
(x  m)
s
(240  266)
15
 26
z
 1.73
15
(almost 2 sd from mean)
z
206
Area to the left of z - 1.73 is 0.0418.
221
236
251
266
281
296
311
Gestation time (days)
Nutritional assistance program: Only about 4% of
women failed to carry their babies 240 days!
Conclusion: Compared to vitamin supplements alone, vitamins and better food resulted in a much smaller
percentage of women with pregnancy terms below 8 months (4% vs. 31%).
Going in the other direction…
Remember: stats teachers love this!!
We may also want to find the observed range of values that correspond to a
given proportion/ area under the curve.
For that, we go backward, that is, we start with the normal table:

we first find the desired
area/ proportion in the
body of the table,

we then read the
corresponding z-value from
the left column and top row.
For an area to the left of 1.25 % (0.0125),
the z-value is -2.24
Example:
How long are the longest 75% of pregnancies when mothers in the neighborhood are entered in the
“better food” program?
Answer: This is another case where we start with an area, and need to come back to our ‘x’.
m  266
s  15
upper area  75%
lower area  25%
x?
z value for the
lower 25%
is about - 0.67.
(x  m)
z
 x  m  ( z *s )
s
x  266  (0.67 *15)
x  255.95  256
m=266, s=15,
upper area 75%
upper 75%
206
221
236
?
251
266
281
Gestation time (days)
Conclusion: The 75% longest pregnancies in this group are about 256 days or longer.
296
311