Transcript Slide 1

The Role of Confidence Intervals in Research
Thought Questions
1. A study compared the serum HDL cholesterol levels in people with low-fat diets to people
with diets high in fat intake.
From the study, a 95% confidence interval for the mean HDL cholesterol for the low-fat group
extends from 43.5 to 50.5.
a. Does this mean that 95% of all people with low-fat diets will have HDL cholesterol levels
between 43.5 and 50.5? Explain.
b. A 95% confidence interval for the mean HDL cholesterol for the low-fat group extends from
43.5 to 50.5. A 95% confidence interval for the mean HDL cholesterol for the high-fat group
extends from 54.5 to 61.5.
Based on these results, would you conclude that people with low-fat diets have lower HDL
cholesterol levels, on average, than people with high-fat diets?
2. In Question 1, we compared average HDL cholesterol levels for two diet groups by computing
separate confidence intervals for the two means. Is there a more direct value (and single C.I.) to
examine in order to make the comparison between the two groups?
The Role of Confidence Intervals in Research
Making guesses about an individual
•While waiting for a friend outside the library, try to guess whether the next student leaving the
library is overweight.
•To keep things simple, select the next male student, but not an athlete (hence excluding the 300 Ib
offensive lineman).
•We'll also imagine that our task is to guess body mass index (BMI), which is weight in kilograms
divided by the square of height in meters.
•In a study of 100 non athlete male students at your university, the mean BMI was 26.0 and the
standard deviation was 3.9. So if you had to guess the BMI of the next guy leaving the library, your
"best guess" would be ???
•How confident are you that they have a BMI of your "best guess"?
•Your alternative to guessing a single value would be to say something like, "I guess that his BMI is
somewhere between 22 and 30." How confident should you be in this answer?
•If you guessed "between 18 and 34"-two standard deviations of the mean-about what percentage
of your guesses would be correct?
The Role of Confidence Intervals in Research
Making guesses about the results of a Study
Let's try to do the same with the results of studies. Let’s say your statistics teacher is finalizing his
own analysis of BMI in 100 non-athlete males.
What will he announce as the mean?
•Again, your best guess would be 26.0, but again you know that results of studies can vary. This
time we aren't worried about how BMI varies between different individuals but how the mean
BMl varies between studies.
•So we want to think about standard error of your sample mean rather than standard deviation
for the population .
•The standard error of our study is the standard deviation divided by the square root of the
sample size, which gives 0.39.
•95% of the results of a study will be within two standard errors of the true population mean (if
you were to measure the BMI of all non-athlete males at Columbia and get the mean).
If we guessed that the results of the lecturer's study would be between 25.2 and 26.8 we'd have a
pretty good chance of being right.
The Role of Confidence Intervals in Research
Confidence Intervals for Studies
The range we give for results of a study is called a confidence interval.
A confidence interval is useful for interpreting the results of a study.
For example, imagine that we were looking at a study of whether a "mentoring" program
affected SAT scores.
We read that mentoring was associated with an increase in SAT scores by 5 points, with a 95%
confidence interval from 2 to 8 points.
What does the confidence interval tells us?
What would we conclude if the confidence interval went from -2 to 11?
The Role of Confidence Intervals in Research
Confidence Intervals for Population Means
Examples of measures (parameters) of interest: What is the mean number of hours Columbia first
year students study each week? What was their mean grade point average in high school?
Sampling Distribution of a sample mean
The sampling distribution of the sample mean is approximately Normal when the sample
size n is large or when the population (the sample is drawn from) is Normal.
The mean of the sampling distribution is equal to the true population mean
The standard deviation(SD) for the sampling distribution of the sample mean is
population standard deviation
sample size
Proportions have a link between the proportion value and the standard deviation of the sample
proportion. This is not the case with means
We’ll do the best we can: estimate the population standard deviation with the sample standard
deviation.
SEM
= standard error of the sample mean
= sample standard deviation/n
The Role of Confidence Intervals in Research
Recall: Conditions for Rule for Sample Means
1. Population of measurements is bell-shaped, and a random sample of any size is
measured.
OR
2. Population of measurements of interest is not bell-shaped, but a large random sample
is measured. Sample of size 40 is considered “large,” but if there are extreme outliers,
better to have a larger sample.
Constructing a Confidence Interval for a Mean
In 95% of all samples, the sample mean will fall within 2 standard errors of the true population
mean.
A 95% confidence interval for a population mean:
sample mean ± 2 (SEM)
SEM = standard error of the sample mean = sample standard deviation/n
The Role of Confidence Intervals in Research
Example : Comparing Diet and Exercise
Compare weight loss (over 1 year) in men who diet but do not exercise
and vice versa.
Diet Only Group:
sample mean = 7.2 kg
sample standard deviation = 3.7 kg
sample size = n = 42
standard error of the sample mean = 3.7/ 42= 0.571
95% confidence interval for population mean: 7.2 ± 2(0.571) = 7.2 ± 1.1
= 6.1 kg to 8.3 kg
Exercise Only Group
sample mean = 4.0 kg
sample standard deviation = 3.9 kg
sample size = n = 47
standard error of the sample mean = 3.9/ 47 = 0.569
95% confidence interval for population mean: 4.0 ± 2(0.569) = 4.0 ± 1.1 = 2.9 kg to 5.1 kg
The Role of Confidence Intervals in Research
Interpretation of your confidence interval
Diet Only Group: 95% Confidence Interval : 6.1 kg to 8.3 kg
sample mean : 7.2 kg
“95% of all men will lose between 6.1 and 8.3 kg on this diet.”
“We are 95% confident that a randomly selected man will lose between 6.1 and 8.3 kg on this
diet”
“If we took many random samples of men, about 95 out of every 100 of them would produce
a confidence interval that contained the true mean weight loss of men on this diet”
“The true mean diet loss of man is 7.2kg 95% of the time.”
“95% of all samples will have a weight loss between 6.1 and 8.3 kg .”
The Role of Confidence Intervals in Research
Project 03 City Data - 2009 School Survey
Parent Engagement Score:
sample mean = 7.1
sample standard deviation = 0.5
sample size = n = 30
standard error of the sample mean = 0.5/ = 0.091
95% confidence interval for population mean: 7.1 ± 2(0.091) = 7.1 ± 0.182 = 6.918 to 7.282
True Population Mean: 7.2
Teacher Engagement Score:
sample mean = 7.0
sample standard deviation = 0.8
sample size = n = 30
standard error of the sample mean = 0.8/ = 0.146
95% confidence interval for population mean: 7.0 ± 2(0.146) = 7.0 ± 0.292 = 6.708 to 7.292
True Population Mean: 7.1
The Role of Confidence Intervals in Research
Confidence Intervals for Difference Between Two Means
In many instances, such as in the diet versus exercise example, we are interested in comparing
The population means under two conditions or for two groups. Construct a single confidence
interval for the difference in the population means for the two groups/conditions.
General form for Confidence Intervals:
sample value ± 2  measure of variability
1.
Collect a large sample of observations, independently, under each condition/from each
group. Compute the mean and standard deviation for each sample.
2. Compute the standard error of the mean (SEM) for each sample by dividing the sample
standard deviation by the square root of the sample size.
3. For independent random quantities, variances add. Square the two SEMs and add them
together. Then take the square root. This will give you the standard error of the difference
in two means.
measure of variability =
[(SEM1)2 + (SEM2)2]
4. A 95% confidence interval for the difference in the two population means is:
difference in sample means ± 2 
[(SEM1)2 + (SEM2)2]
The Role of Confidence Intervals in Research
Example: Comparing Diet and Exercise
Compare weight loss (over 1 year) in men who diet but do not exercise and vice versa.
Diet Only Group:
sample mean = 7.2 kg
sample standard deviation = 3.7 kg
sample size = n = 42
standard error = SEM1 = 3.7/ 42 = 0.571
Exercise Only Group
sample mean = 4.0 kg
sample standard deviation = 3.9 kg
sample size = n = 47
standard error = SEM2 = 3.9/ 47 = 0.569
Compute standard error of the difference in two means:
measure of variability = [(0.571)2 + (0.569)2]
= 0.81
Compute the confidence interval:
[7.2 – 4.0] ± 2(0.81)
= 3.2 ± 1.6
= 1.6 kg to 4.8 kg
The Role of Confidence Intervals in Research
Project 03 City Data - 2009 School Survey
Parent Engagement Score:
sample mean = 7.1
sample standard deviation = 0.5
sample size = n = 30
standard error of the sample mean = 0.5/ 30= 0.091
Teacher Engagement Score:
sample mean = 7.0
sample standard deviation = 0.8
sample size = n = 30
standard error of the sample mean = 0.8/ 30= 0.146
Compute standard error of the difference in two means (Parent minus Teacher’s Mean Score)
measure of variability = [(0.091)2 + (0.146)2]
= 0.172
[7.1 – 7.0] ± 2(0.172)
= 0.1 ± 0.344
= -0.244 to .444
Actual Difference between Population Means : 7.2 – 7.1 = 0.1
Compute the confidence interval:
The Role of Confidence Intervals in Research
How Journals Present Confidence Intervals
Study of the relationship between smoking during pregnancy and subsequent IQ of child.
Journal article (Olds, Henderson, and Tatelbaum, 1994) provided 95% confidence intervals, most
comparing the means for mothers who didn’t smoke and mothers who smoked ten or more
cigarettes per day, hereafter called “smokers.”
After control for confounding background variables(diet, education, age, drug use, parents’ IQ
quality of parental care and duration of breast feeding), the average difference observed at 12 and
24 months was 2.59 points (95% CI: –3.03, 8.20); the difference observed at 36 and 48 months was
reduced to 4.35 points (95% CI: 0.02, 8.68)
The Role of Confidence Intervals in Research
Reporting Standard Errors of the Mean
Comparison in serum DHEA-S levels for practitioners and non practitioners of transcendental
meditation.
Serum DHEA-S Concentrations (± SEM)
difference in sample means ± 2  [(SEM1)2 + (SEM2)2]
[117-88] ± 2  [(12)2 + (11)2]
29 ± 2(16.3)
29 ± 32.6
–3.6 to 61.6
How do we interpret this interval?
The Role of Confidence Intervals in Research
Understanding the Confidence Level
For a confidence level of 95%, we expect that about 95% of all such intervals will actually
cover the true population value.
The remaining 5% will not. Confidence is in the procedure over the long run.
• 90% confidence level => multiplier = 1.645
• 95% confidence level => multiplier = 2 (to be exact it is 1.96)
• 99% confidence level => multiplier = 2.576
• More confidence  Wider Interval
Text Questions
The Role of Confidence Intervals in Research
6. Suppose you were given a 95% confidence interval for the difference in two population means.
What could you conclude about the population means if
a. The confidence interval did not cover zero
b. The confidence interval did cover zero
Text Questions
The Role of Confidence Intervals in Research