n - Cengage

download report

Transcript n - Cengage

Chapter 12
More
About
Confidence
Intervals
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
Recall:
• A parameter is a population characteristic – value
is usually unknown. We estimate the parameter using
sample information.
• A statistic, or estimate, is a characteristic of a sample.
A statistic estimates a parameter.
• A confidence interval is an interval of values
computed from sample data that is likely to include
the true population value.
• The confidence level for an interval describes our
confidence in the procedure we used. We are confident
that most of the confidence intervals we compute using
a procedure will contain the true population value.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
2
12.1 Examples of Different
Estimation Situations
Situation 1. Estimating the proportion falling into a
category of a categorical variable.
Example research questions:
What proportion of American adults believe there is
extraterrestrial life? In what proportion of British
marriages is the wife taller than her husband?
Population parameter: p = proportion in the population
falling into that category.
Sample estimate: p̂ = proportion in the sample falling
into that category.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
3
More Estimation Situations
Situation 2. Estimating the mean of a quantitative variable.
Example research questions:
What is the mean time that college students watch TV
per day? What is the mean pulse rate of women?
Population parameter: m (spelled “mu” and pronounced
“mew”) = population mean for the variable
Sample estimate: x = the sample mean for the variable
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
4
More Estimation Situations
Situation 3. Estimating the difference between two
populations with regard to the proportion falling
into a category of a qualitative variable.
Example research questions:
How much difference is there between the proportions that
would quit smoking if taking the antidepressant buproprion
(Zyban) versus if wearing a nicotine patch?
How much difference is there between men who snore
and men who don’t snore with regard to the proportion
who have heart disease?
Population parameter: p1 – p2 = difference between the
two population proportions.
Sample estimate: pˆ1  pˆ 2 = difference between the two
sample proportions.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
5
More Estimation Situations
Situation 4. Estimating the difference between two
populations with regard to the mean
of a quantitative variable.
Example research questions:
How much difference is there in average weight loss for
those who diet compared to those who exercise to lose
weight? How much difference is there between the mean
foot lengths of men and women?
Population parameter: m1 – m2 = difference between the
two population means.
Sample estimate: x1  x2 = difference between the two
sample means.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
6
Independent Samples
Two samples are called independent samples
when the measurements in one sample are not
related to the measurements in the other sample.
• Random samples taken separately from two
populations and same response variable is recorded.
• One random sample taken and a variable recorded,
but units are categorized to form two populations.
• Participants randomly assigned to one of two
treatment conditions, and same response variable
is recorded.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
7
Paired Data: A Special Case of One Mean
Paired data (or paired samples): when pairs of variables
are collected. Only interested in population (and sample)
of differences, and not in the original data.
• Each person measured twice. Two measurements of same
characteristic or trait are made under different conditions.
• Similar individuals are paired prior to an experiment. Each
member of a pair receives a different treatment.
Same response variable is measured for all individuals.
• Two different variables are measured for each individual.
Interested in amount of difference between two variables.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
8
12.2 Standard Errors
Rough Definition: The standard error of a
sample statistic measures, roughly, the average
difference between the statistic and the population
parameter. This “average difference” is over all
possible random samples of a given size that can
be taken from the population.
Technical Definition: The standard error of a
sample statistic is the estimated standard deviation
of the sampling distribution for the statistic.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
9
Standard Error of a Sample Proportion
s.e.( pˆ ) 
pˆ 1  pˆ 
,
n
pˆ  sample proportion
Example 12.1 Intelligent Life on Other Planets
Poll: Random sample of 935 Americans
Do you think there is intelligent life on other planets?
Results: 60% of the sample said “yes”, p̂ = .60
.61  .6
s.e. pˆ  
 .016
935
The standard error of .016 is roughly the average difference
between the statistic, p̂ , and the population parameter, p, for
all possible random samples of n = 935 from this population.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
10
Standard Error of a Sample Mean
s
s.e.( x ) 
, s  sample standard deviation
n
Example 12.2 Mean Hours Watching TV
Poll: Class of 175 students. In a typical day, about
how much time to you spend watching television?
Variable N Mean Median TrMean StDev
TV
175 2.09 2.000 1.950 1.644
SE Mean
0.124
s
1.644
s.e.x  

 .124
n
175
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
11
Standard Error of the Difference
Between Two Sample Proportions
pˆ1 1  pˆ1  pˆ 2 1  pˆ 2 
s.e.( pˆ  pˆ ) 

1
2
n1
n2
Example 12.3 Patches vs Antidepressant (Zyban)?
Study: n1 = n2 = 244 randomly assigned to each treatment
Zyban: 85 of the 244 Zyban users quit smoking p̂1 = .348
Patch: 52 of the 244 patch users quit smoking p̂2= .213
So, pˆ1  pˆ 2  .348  .213  .135
.3481  .348 .2131  .213
and s.e.( pˆ1  pˆ 2 ) 

 .040
244
244
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
12
Standard Error of the Difference
Between Two Sample Means
s.e.( x1  x2 ) 
s12 s22

n1 n2
Example 12.4 Lose More Weight by Diet or Exercise?
Study: n1 = 42 men on diet, n2 = 47 men on exercise routine
Diet: Lost an average of 7.2 kg with std dev of 3.7 kg
Exercise: Lost an average of 4.0 kg with std dev of 3.9 kg
So, x1  x2  7.2  4.0  3.2 kg
and s.e.( x1  x2 ) 
3.7 2  3.92
42
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
47
 0.81
13
12.3 Approximate 95% CI
For sufficiently large samples, the interval
Sample estimate  2  Standard error
is an approximate 95% confidence interval
for a population parameter.
Note: The 95% confidence level describes how often
the procedure provides an interval that includes the
population value. For about 95% of all random
samples of a specific size from a population, the
confidence interval captures the population parameter.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
14
Necessary Conditions
Sample Size Requirements:
• For one proportion: Both npˆ and n1 pˆ  are
at least 5, preferably at least 10.
• For one mean: n is greater than 30.
• For two proportions: npˆ and n1 pˆ  are
at least 5 (preferably 10) for each sample.
• For two means: n1 and n2 are each greater than 30.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
15
Necessary Conditions
Other Requirements:
• The samples are randomly selected.
In practice, it is sufficient to assume that
samples are representative of the population
for the question of interest.
• For the confidence intervals for the difference
between two proportions or two means, the two
samples must be independent of each other.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
16
Example 12.1 Intelligent Life? (cont)
Poll: Random sample of 935 Americans
Do you think there is intelligent life on other planets?
Results: 60% of the sample said “yes”, p̂ = .60
.61  .6
s.e. pˆ  
 .016
935
Approximate 95% Confidence Interval:
.60  2(.016) => .60  .032 => .568 to .632
Note: For about 95% of all random samples from the
population, the corresponding confidence interval captures
the population parameter. We don’t know if particular
interval does or does not capture the population value.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
17
Example 12.2 Watching TV (cont)
Poll: Class of 175 students. In a typical day, about
how much time do you spend watching television?
The sample mean was 2.09 hours and the sample standard
deviation was 1.644 hours.
s
1.644
s.e.x  

 .124
n
175
Approximate 95% Confidence Interval:
2.09  2(.124) => 2.09  .248 => 1.842 to 2.338 hours
Note: We are 95% confident that the mean time that
Penn State students spend watching television per day
is somewhere between 1.842 and 2.338 hours.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
18
Example 12.3
Patch vs Antidepressant (cont)
Study: n1 = n2 = 244 randomly assigned to each group
Zyban: 85 of the 244 Zyban users quit smoking p̂1 = .348
Patch: 52 of the 244 patch users quit smoking p̂2= .213
So, pˆ1  pˆ 2  .348  .213  .135 and s.e.( pˆ1  pˆ 2 )  .040
Approximate 95% Confidence Interval:
.135  2(.040) => .135  .080 => .055 to .215
Note: Zyban had a higher success rate and the interval
does not include the value 0, so it supports a difference
between the success rates of the two methods.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
19
Example 12.4 Diet vs Exercise (cont)
Study: n1 = 42 men on diet, n2 = 47 men exercise
Diet: Lost an average of 7.2 kg with std dev of 3.7 kg
Exercise: Lost an average of 4.0 kg with std dev of 3.9 kg
So, x1  x2  7.2  4.0  3.2 kg and s.e.( x1  x2 )  0.81 kg
Approximate 95% Confidence Interval:
3.2  2(.81) => 3.2  1.62 => 1.58 to 4.82 kg
Note: We are 95% confident the interval 1.58 to 4.82 kg
covers the increased mean population weight loss for dieters
compared to those who exercise. The interval does not cover
0, so a real difference is likely to hold for the population.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
20
12.4 General CI for
One Mean or Paired Data
A Confidence Interval for a Population Mean
s
x  t  s.e.x   x  t 
n
*
*
where the multiplier t* is the value in a t-distribution
with degrees of freedom = df = n - 1 such that the area
between -t* and t* equals the desired confidence level.
(Found from Table A.2.)
Conditions:
• Population of measurements is bell-shaped and
a random sample of any size is measured; OR
• Population of measurements is not bell-shaped,
but a large random sample is measured, n  30.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
21
Example 12.5 Mean Forearm Length
Data: Forearm lengths (cm)
for a random sample of n = 9 men
25.5, 24.0, 26.5, 25.5, 28.0, 27.0, 23.0, 25.0, 25.0
Note: Dotplot shows no obvious skewness and no outliers.
s
1.52
x  25.5, s  1.52, and s.e.x  

 .507
n
9
Multiplier t* from Table A.2 with df = 8 is t* = 2.31
95% Confidence Interval:
25.5  2.31(.507) => 25.5  1.17 => 24.33 to 26.67 cm
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
22
Example 12.6 What Students Sleep More?
Q: How many hours of sleep did you get last night,
to the nearest half hour?
Class
N
Stat 10 (stat literacy) 25
Stat 13 (stat methods) 148
Mean StDev SE Mean
7.66 1.34 0.27
6.81 1.73 0.14
Note: Bell-shape was reasonable for Stat 10 (with smaller n).
Notes: Interval for Stat 10 is wider (smaller sample size)
Two intervals do not overlap => Stat 10 average
significantly higher than Stat 13 average.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
23
Paired Data Confidence Interval
Data: two variables for n individuals or pairs;
use the difference d = x1 – x2.
Population parameter: md = mean of differences
for the population = m1 – m2.
Sample estimate: d = sample mean of the differences
Standard deviation and standard error:
sd = standard deviation of the sample of differences;
sd
s.e.d  
n
Confidence interval for md: d  t *  s.e. d ,
where df = n – 1 for the multiplier t*.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
 
24
Example 12.7 Screen Time: Computer vs TV
Data: Hours spent watching TV and hours spent on
computer per week for n = 25 students.
Task: Make a 90% CI
for the mean difference
in hours spent using
computer versus
watching TV.
Note: Boxplot shows no obvious skewness and no outliers.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
25
Example 12.7 Screen Time: Computer vs TV
Results:
sd 15.24
d  5.36, sd  15.24, and s.e.d  

 3.05
n
25
Multiplier t* from Table A.2 with df = 24 is t* = 1.71
90% Confidence Interval:
5.36  1.71(3.05) => 5.36  5.22 => 0.14 to 10.58 hours
Interpretation: We are 90% confident that the average
difference between computer usage and television viewing
for students represented by this sample is covered by the
interval from 0.14 to 10.58 hours per week, with more hours
spent on computer usage than on television viewing.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
26
12.5 General CI for Difference
Between Two Means (Indep)
A CI for the Difference Between Two Means
(Independent Samples):
x1  x2  t *
s12 s22

n1 n2
where t* is the value in a t-distribution with area
between -t* and t* equal to the desired confidence
level. The df used depends on if equal population
variances are assumed.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
27
Necessary Conditions
• Two samples must be independent.
Either …
• Populations of measurements both bell-shaped,
and random samples of any size are measured.
or …
• Large (n  30) random samples are measured.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
28
Degrees of Freedom
The t-distribution is only approximately correct
and df formula is complicated (Welch’s approx):
Statistical software can use the above
approximation, but if done by-hand then use a
conservative df = smaller of n1 – 1 and n2 – 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
29
Example 12.8 Effect of a Stare on Driving
Randomized experiment: Researchers either stared
or did not stare at drivers stopped at a campus stop
sign; Timed how long (sec) it took driver to proceed
from sign to a mark on other side of the intersection.
No Stare Group (n = 14):
8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8,
7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7
Stare Group (n = 13):
5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5,
6.1, 4.8, 4.9, 4.5, 7.2, 5.8
Task: Make a 95% CI for the difference between
the mean crossing times for the two populations
represented by these two independent samples.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
30
Example 12.8 Effect of a Stare on Driving
Checking Conditions:
Boxplots show …
• No outliers and no strong skewness.
• Crossing times in stare group generally faster and less variable.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
31
Example 12.8 Effect of a Stare on Driving
Note: The df = 21 was reported by the computer package
based on the Welch’s approximation formula.
The 95% confidence interval for the difference between
the population means is 0.14 seconds to 1.93 seconds .
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
32
Equal Variance Assumption
Often reasonable to assume the two populations have
equal population standard deviations, or equivalently,
equal population variances:  12   22   2
Estimate of this variance based on the combined
or “pooled” data is called the pooled variance.
The square root of the pooled variance is called
the pooled standard deviation:
Pooled standard deviation s p 
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
n1  1s12  n2  1s22
n1  n2  2
33
Pooled Standard Error
Pooled s.e.( x1  x2 ) 
s 2p
n1

s 2p
n2
1 1
 s   
 n1 n2 
2
p
 sp
1 1

n1 n2
Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2).
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
34
Pooled Confidence Interval
Pooled CI for the Difference Between
Two Means (Independent Samples):


1
1
x1  x2  t  s p
 
n1 n2 

*
where t* is found using a t-distribution
with df = (n1 + n2 – 2) and
sp is the pooled standard deviation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
35
Example 12.9 Male and Female Sleep Times
Q: How much difference is there between how long
female and male students slept the previous night?
Data: The 83 female and 65 male responses from
students in an intro stat class.
Task: Make a 95% CI for the difference between
the two population means sleep hours for
females versus males.
Note: We will assume equal population variances.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
36
Example 12.9 Male and Female Sleep Times
Two-sample T for sleep [with “Assume Equal Variance” option]
Sex
Female
Male
N
83
65
Mean
7.02
6.55
StDev
1.75
1.68
SE Mean
0.19
0.21
Difference = mu (Female) – mu (Male)
Estimate for difference: 0.461
95% CI for difference: (-0.103, 1.025)
T-Test of difference = 0 (vs not =): T-Value = 1.62 P = 0.108 DF = 146
Both use Pooled StDev = 1.72
Notes:
• Two sample standard deviations are very similar.
• Sample mean for females higher than for males.
• 95% confidence interval contains 0 so cannot rule out
that the population means may be equal.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
37
Example 12.9 Male and Female Sleep Times
Pooled standard deviation and
pooled standard error “by-hand”:
Pooled std dev s p 

n1  1s12  n2  1s22
n1  n2  2
83  11.752  65  11.682
83  65  2
 2.957  1.72
Pooled s.e.( x1  x2 )  s p
 1.72
1 1

n1 n2
1
1

 0.285
83 65
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
38
Pooled or Unpooled?
• If sample sizes are equal, the pooled and unpooled
standard errors are equal. If sample standard deviations
similar, assumption of equal population variance is
reasonable and pooled procedure can be used.
• If sample sizes are very different, pooled test can be quite
misleading unless sample standard deviations are similar. If
the smaller standard deviation accompanies the larger
sample size, we do not recommend using the pooled
procedure.
• If sample sizes are very different, the standard deviations
are similar, and the larger sample size produced the larger
standard deviation, the pooled procedure is acceptable
because it will be conservative.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
39
12.5 The Difference Between
Two Proportions (Indep)
A CI for the Difference Between Two
Proportions (Independent Samples):
p1  p2  z
*
p1 1  p1  p2 1  p2 

n1
n2
where z* is the value of the standard normal
variable with area between -z* and z* equal to
the desired confidence level.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
40
Necessary Conditions
• Condition 1: Sample proportions are available
based on independent, randomly selected
samples from the two populations.
• Condition 2: All of the quantities –
n1 pˆ1 , n1 1  pˆ1 , n2 pˆ 2 , and n2 1  pˆ 2 
– are at least 5 and preferably at least 10.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
41
Example 12.10 Snoring and Heart Attacks
Q: Is there a relationship between snoring and risk
of heart disease?
Data: Of 1105 snorers, 86 had heart disease.
Of 1379 nonsnorers, 24 had heart disease.
86
24
pˆ 1 
 .0778, pˆ 2 
 .0174, and pˆ 1  pˆ 2  .0604
1105
1379
.07781  .0778 .01741  .0174
and s.e.( pˆ1  pˆ 2 ) 

 .0088
1105
1379
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
42
Example 12.10 Snoring and Heart Attacks
Note: the higher the level of confidence,
the wider the interval.
It appears that the proportion of snorers with heart
disease in the population is about 4% to 8% higher
than the proportion of nonsnorers with heart disease.
pˆ1
.0778

 4.5
pˆ 2
.0174
Risk of heart disease for
snorers is about 4.5 times
what the risk is for nonsnorers.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
43