No Slide Title
Download
Report
Transcript No Slide Title
Please turn off cell phones, pagers, etc.
The lecture will begin shortly.
There will be a quiz at the end of today’s lecture.
Lecture 34
Today’s lecture will cover material that is not in the
textbook.
1. Review CI for a difference between means
2. Difference in means for paired data
3. Regression to the mean
1. Review CI for difference in means
Last time, we learned how to construct a CI and test for
the difference between two population means.
Population #2:
Population #1:
σ2
σ1
μ1
μ2
These methods assume that we have independent
samples from the two populations.
How to do it
Compute the difference between the sample means:
diff = mean from sample #1 – mean from sample #2
Then compute the standard error of the difference:
SE diff = square root of
[ (SE of first mean)
2
+ (SE of second mean)
The 95% confidence interval is
diff plus or minus 2 × SE of diff
If this confidence interval doesn’t cover zero, we can
reject the null hypothesis that the two population means
are equal.
2
]
Example
An experiment was conducted to assess the manual
dexterity of male and female factory workers. On average,
men took longer to assemble a product than women, and
their standard deviation was also higher.
men
Sample size
Average
Standard deviation
32
30.3
12.5
women
40
25.3
10.0
Can we conclude beyond a reasonable doubt that the
population average time is longer for men than for women?
Solution
men
Sample size
Average
Standard deviation
women
32
30.3
12.5
40
25.3
10.0
diff = 30.3 – 25.3 = 5.0
SE for first mean = 12.5 / sqrt( 32 ) = 2.21
SE for second mean = 10.0 / sqrt( 40 ) = 1.58
SE diff = sqrt( 2.21
5.0 – 2 × 2.72 = – 0.44
2
2
+ 1.58 ) = 2.72
5.0 + 2 × 2.72 = 10.44
The confidence interval (– 0.44, 10.44) covers zero.
So we do not have enough evidence to conclude that the
population means for men and women are different.
2. Difference in means for paired data
The method we just learned assumes that the two
samples are independent, i.e. that there are no strong
connections between the subjects in one sample and
the subjects in the other sample.
In some studies, the same subjects are being measured twice.
Or, each subject in sample #1 may have a pairwise
relationship with a subject in sample #2 (spouses, siblings,
twins, etc.)
In that case, we would say that we have paired data.
We will now learn how to form a confidence interval for the
difference between the means from paired samples.
Step 1: Compute the change scores
For each pair, compute the change score.
The change score is simply the difference between the
first and second measurements.
Measurement:
first second
1
2
3
9.5
0.9
–2.3
4.0
…
12.5
4.2
6.6
2.7
…
…
…
n
5.1
4.3
6.7
change
3.0
Step 2: Find the average and SD of the change
scores
change
0.9
–2.3
4.0
…
3.0
Find the average of these
numbers in the usual way
(add them up, divide by n).
Find the SD of these
numbers in the usual way
(use a computer, or use
the three-column method
from Lecture 14).
We now have the average change score and the SD of the
change scores.
Step 3: Find a 95% CI for the population average
change
The SE for the average change is
SE = SD of the change scores / sqrt( n )
The 95% CI is then
average change plus or minus 2 × SE
If this 95% CI does not cover zero, then we can reject the
null hypothesis that there is no average change in the
population.
Example
For six months, forty hypertensive subjects were given an
herbal supplement that is supposed to help reduce blood
pressure. The average change in systolic BP (after minus
before) was –2.2, with a standard deviation of 3.6. Is this
convincing evidence that the supplement works?
Solution
We have already been given the average change score and
the SD of the change scores.
The SE of the average change is SE = 3.6 / sqrt( 40 ) = 0.57,
and two SEs is 2 × 0.57 = 1.14.
The 95% CI for the population average change is
–2.2 – 1.14 = –3.34 to –2.2 + 1.14 = –1.06.
This interval lies entirely below zero. This provides convincing
evidence that, on average, subjects similar to those in the
study who take the supplement will experience a drop in BP.
Discussion
In the previous example, we concluded that the average
population change was not zero.
Does this prove that the herbal supplement works?
It does prove that, if the supplement were given to a large
population of males similar to those in the study, on
average they would experience a drop in BP.
But this does not prove that the supplement is any more
effective than taking a placebo or doing nothing!
This study had no control group. So there is no firm basis
to claim that the supplement is effective.
In fact, if these subjects were hypertensive to begin with,
then it is highly likely that they would experience a drop in
blood pressure even if they were given no treatment at all!
3. Regression to the mean
“Regression to the mean” is a phenomenon first noticed by
Sir Francis Galton in the 1870’s.
Suppose a sample of units is measured twice, and the
measurements are positively correlated.
Units that score above average on the first measurement
also tend to score above average on the second
measurement, but their average score is not as high as it
was the first time.
Units that score below average on the first measurement
also tend to score below average on the second
measurement, but their average score is not as low as it
was the first time.
Example
Suppose you take a sample of adult men and identify a group
whose heights are above average. If you examine their sons,
you will find that the sons of these men are also above
average in height, but on average they are not as tall as their
fathers.
Example
Students in Stat 100 who scored very well on Exam 1 also
tended to score well on Exam 2, but their average score on
Exam 2 was lower than their average score on Exam 1.
What’s happening?
Regression to the mean will occur in any test-retest situation.
Whenever you identify units who score high on the first test,
some of them will have high scores just by chance.
If these same units are measured a second time, the laws of
probability suggest that they are unlikely to do as well as
they did the first time.
Regression to the mean does not indicate that, in the long
run, everyone’s scores will converge to the population
average. Random fluctuations are still present in every wave
of measurement.
One more example
Suppose that you toss 100 pennies into the air and 50 of
them land “heads”.
You collect those 50 pennies and lecture them: “You pennies
are very naughty. Half of you were supposed to be tails, but
none of you were. Make sure that you do a better job next
time!”
You toss those 50 pennies into the air again, and you find
that half of them fall “heads” and the other half fall “tails.”
Your lecture cured them of their bad behavior!
Of course, your lecture had no effect whatsover. Those
pennies were not naughty at all. They were just behaving as
pennies always do. It was regression to the mean.
Today’s quiz
1. Write your name legibly.