Transcript p - Cengage

Chapter 13
More
About
Significance
Tests
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
Hypothesis testing about:
• a population mean or mean difference (paired data)
• the difference between means of two populations
• the difference between two population proportions
Three Cautions:
1. Inference is only valid if the sample is representative
of the population for the question of interest.
2. Hypotheses and conclusions apply to the larger
population(s) represented by the sample(s).
3. If the distribution of a quantitative variable is highly
skewed, consider analyzing the median rather than the
mean – called nonparametric methods (Topic 2 on CD).
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
2
13.1 General Ideas of
Significance Testing
Steps in Any Hypothesis Test
1. Determine the null and alternative hypotheses.
2. Verify necessary data conditions, and if met,
summarize the data into an appropriate test statistic.
3. Assuming the null hypothesis is true,
find the p-value.
4. Decide whether or not the result is statistically
significant based on the p-value.
5. Report the conclusion in the context of the situation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
3
13.2 Testing Hypotheses About
One Mean or Paired Data
Step 1: Determine null and alternative hypotheses
1. H0: m = m0 versus Ha: m  m0 (two-sided)
2. H0: m  m0 versus Ha: m < m0 (one-sided)
3. H0: m  m0 versus Ha: m > m0 (one-sided)
Often H0 for a one-sided test is written as H0: m = m0.
Remember a p-value is computed assuming H0 is true,
and m0 is the value used for that computation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
4
Step 2: Verify Necessary Data Conditions …
Situation 1: Population of measurements of interest
is approximately normal, and a random sample of
any size is measured. In practice, use method if
shape is not notably skewed or no extreme outliers.
Situation 2: Population of measurements of interest
is not approximately normal, but a large random
sample (n  30) is measured. If extreme outliers or
extreme skewness, better to have a larger sample.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
5
Continuing Step 2: The Test Statistic
The t-statistic is a standardized score for measuring
the difference between the sample mean and the null
hypothesis value of the population mean:
sample mean  null value x  m 0
t

s
standard error
n
This t-statistic has (approx) a t-distribution with df = n - 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
6
Step 3: Assuming H0 true, Find the p-value
• For Ha less than, the p-value is the area below t,
even if t is positive.
• For Ha greater than, the p-value is the area above t,
even if t is negative.
• For Ha two-sided, p-value is 2  area above |t|.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
7
Steps 4 and 5: Decide Whether or Not the
Result is Statistically Significant based on
the p-value and Report the Conclusion in
the Context of the Situation
These two steps remain the same for all of the
hypothesis tests considered in this book.
Choose a level of significance a, and reject H0
if the p-value is less than (or equal to) a.
Otherwise, conclude that there is not enough
evidence to support the alternative hypothesis.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
8
Example 13.1 Normal Body Temperature
What is normal body temperature? Is it actually
less than 98.6 degrees Fahrenheit (on average)?
Step 1: State the null and alternative hypotheses
H0: m = 98.6
Ha: m < 98.6
where m = mean body temperature in human population.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
9
Example 13.1 Normal Body Temp (cont)
Data: random sample of n = 18 normal body temps
98.2
97.4
97.8
97.6
99.0
98.4
98.6
98.0
98.2
99.2
97.8
98.6
98.4
97.1
99.7
97.2
98.2
98.5
Step 2: Verify data conditions …
Boxplot shows no outliers
nor strong skewness.
Sample mean of 98.217
is close to sample median
of 98.2.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
10
Example 13.1 Normal Body Temp (cont)
Step 2: … Summarizing data with a test statistic
Test of mu = 98.600 vs mu < 98.600
Variable
N
Mean StDev
Temperature 18 98.217 0.684
SE Mean T
P
0.161
-2.38 0.015
Key elements:
Sample statistic: x = 98.217 (under “Mean”)
s
0.684
Standard error: s.e.x  

 0.161 (under “SE Mean”)
n
18
x  m 0 98.217  98.6
t

 2.38 (under “T”)
s
0.161
n
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
11
Example 13.1 Normal Body Temp (cont)
Step 3: Find the p-value
From output: p-value = 0.015
From Table A.3: p-value
is between 0.016 and 0.010.
Area to left of t = -2.38 equals
area to right of t = +2.38. The
value t = 2.38 is between column
headings 2.33 and 2.58 in table,
and for df =17, the one-sided
p-values are 0.016 and 0.010.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
12
Example 13.1 Normal Body Temp (cont)
Step 4: Decide whether or not the result is
statistically significant based on the p-value
Using a = 0.05 as the level of significance criterion,
the results are statistically significant because 0.015,
the p-value of the test, is less than 0.05. In other
words, we can reject the null hypothesis.
Step 5: Report the Conclusion
We can conclude, based on these data, that the mean
temperature in the human population is actually less
than 98.6 degrees.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
13
Paired Data and the Paired t-Test
Data: two variables for n individuals or pairs;
use the difference d = x1 – x2.
Parameter: md = population mean of differences
Sample estimate: d = sample mean of the differences
Standard deviation and standard error:
sd = standard deviation of the sample of differences;
sd
s.e.d  
n
Often of interest: Is the mean difference in the
population different from 0?
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
14
Steps for a Paired t-Test
Step 1: Determine null and alternative hypotheses
H0: md = 0 versus Ha: md  0 or Ha: md < 0 or Ha: md > 0
Watch how differences are defined for selecting the Ha.
Step 2: Verify data conditions and compute test statistic
Conditions apply to the differences.
sample mean  null value d  0
The t-test statistic is: t 

sd
standard error
n
Steps 3, 4 and 5: Similar to t-test for a single mean.
The df = n – 1, where n is the number of differences.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
15
Example 13.2 Effect of Alcohol
Study: n = 10 pilots perform simulation first under
sober conditions and then after drinking alcohol.
Response: Amount of useful performance time.
(longer time is better)
Question: Does useful performance time decrease
with alcohol use?
Step 1: State the null and alternative hypotheses
H0: md = 0 versus Ha: md > 0
where md = population mean difference between alcohol
and no alcohol measurements if all pilots took these tests.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
16
Example 13.2 Effect of Alcohol (cont)
Data: random sample of n = 10 time differences
Step 2: Verify data conditions …
Boxplot shows no outliers
nor extreme skewness.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
17
Example 13.2 Effect of Alcohol (cont)
Step 2: … Summarizing data with a test statistic
Test of mu = 0.0 vs mu > 0.0
Variable
N
Mean StDev
Diff
10 195.6 230.5
SE Mean T
72.9
2.68
P
0.013
Key elements:
Sample statistic: d = 195.6 (under “Mean”)
sd
230.5
Standard error: s.e.d  

 72.9 (under “SE Mean”)
n
10
d  0 195.6  0
t

 2.68
sd
72.9
n
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
(under “T”)
18
Example 13.2 Effect of Alcohol (cont)
Step 3: Find the p-value
From output: p-value = 0.013
From Table A.3: p-value
is between 0.007 and 0.015.
The value t = 2.68 is between
column headings 2.58 and 3.00
in the table, and for df =9,
the one-sided p-values are
0.015 and 0.007.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
19
Example 13.2 Effect of Alcohol (cont)
Steps 4 and 5: Decide whether or not the
result is statistically significant based on the
p-value and Report the Conclusion
Using a = 0.05 as the level of significance
criterion, we can reject the null hypothesis
since the p-value of 0.013 is less than 0.05.
Even with a small experiment, it appears that
alcohol has a statistically significant effect
and decreases performance time.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
20
Rejection Region Approach
Replaces Steps 3 and 4 with:
Substitute Step 3: Find the critical value and rejection
region for the test.
Substitute Step 4: If the test statistic is in the rejection
region, conclude that the result is statistically
significant and reject the null hypothesis. Otherwise,
do not reject the null hypothesis.
Note: Rejection region method and p-value method will always
arrive at the same conclusion about statistical significance.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
21
Rejection Region Approach
Summary (use row of Table A.2 corresponding to df)
Alternative Hypothesis
Column
heading in
Table A.2
Ha: Parameter  null value
Ha: Parameter > null value
Ha: Parameter < null value
1–a
1 – 2a
1 – 2a
Column
heading in
Table A.2
for a = 0.05
0.95
0.90
0.90
Example: Rejection
region for df = 10
and a = 0.05
t  -2.23 and t  2.23
t  1.81
t  -1.81
For Example 13.1 Normal Body Temperature?
Alternative was one-sided to the left, df = 17, and a = 0.05.
Critical value from table A.2 is –1.74.
Rejection region is t  – 1.74. The test statistic was –2.38 so
the null hypothesis is rejected. Same conclusion is reached.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
22
13.3 Testing The Difference
between Two Means (Indep)
Step 1: Determine null and alternative hypotheses
H0: m1 – m2 = 0 versus
Ha: m1 – m2  0 or Ha: m1 – m2 < 0 or Ha: m1 – m2 > 0
Watch how Population 1 and 2 are defined.
Step 2: Verify data conditions and compute test statistic
Both n’s are large or no extreme outliers or skewness in
either sample. Samples are independent. The t-test statistic is:
t
sample mean  null value x1  x2   0

standard error
s12 s22

n1 n2
Steps 3, 4 and 5: Similar to t-test for one mean.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
23
Example 13.3 Effect of Stare on Driving
Randomized experiment: Researchers either stared
or did not stare at drivers stopped at a campus stop
sign; Timed how long (sec) it took driver to proceed
from sign to a mark on other side of the intersection.
Question: Does stare speed up crossing times?
Step 1: State the null and alternative hypotheses
H0: m1 – m2 = 0 versus Ha: m1 – m2 > 0
where 1 = no-stare population and 2 = stare population.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
24
Example 13.3 Effect of Stare (cont)
Data: n1 = 14 no stare and n2 = 13 stare responses
Step 2: Verify data conditions …
No outliers nor extreme skewness for either group.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
25
Example 13.3 Effect of Stare (cont)
Step 2: … Summarizing data with a test statistic
Sample statistic: x1  x2 = 6.63 – 5.59 = 1.04 seconds
Standard error: s.e.( x1  x2 ) 
t
x1  x2   0  1.04  0  2.41
s12 s22

n1 n2
s12 s22
1.36 2 0.8222



 0.43
n1 n2
14
13
0.43
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
26
Example 13.3 Effect of Stare (cont)
Steps 3, 4 and 5: Determine the p-value and make
a conclusion in context.
The p-value = 0.013, so we reject the null hypothesis,
the results are “statistically significant”.
The p-value is determined using a t-distribution with
df = 21 (df using Welch approximation formula) and
finding area to right of t = 2.41.
Table A.3 => p-value is between 0.009 and 0.015.
We can conclude that if all drivers were stared at,
the mean crossing times at an intersection would
be faster than under normal conditions.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
27
Pooled Two-Sample t-Test
Based on assumption that the two populations have
equal population standard deviations:  1   2  
Pooled standard deviation s p 
Pooled s.e.( x1  x2 )  s p
n1  1s12  n2  1s22
n1  n2  2
1 1

n1 n2
sample mean  null value x1  x 2   0
t

pooled standard error
1
1
sp

n1 n2
Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2).
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
28
Guidelines for Using Pooled t-Test
• If sample sizes are equal, pooled and unpooled standard
errors are equal and so t-statistic is same. If sample standard
deviations are similar, assumption of common population
variance is reasonable and pooled procedure can be used.
• If sample sizes are very different, pooled test can be
quite misleading unless sample standard deviations similar.
If sample sizes very different and smaller standard deviation
accompanies larger sample size, do not recommend using
pooled procedure.
• If sample sizes are very different, standard deviations are
similar, and larger sample size produced the larger standard
deviation, pooled t-test is acceptable and will be conservative.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
29
Example 13.5 Male and Female Sleep Times
Q: Is there a difference between how long female
and male students slept the previous night?
Data: The 83 female and 65 male responses from
students in an intro stat class.
The null and alternative hypotheses are:
H0: m1 – m2 = 0 versus Ha: m1 – m2  0
where 1 = female population and 2 = male population.
Note: Sample sizes similar, sample standard deviations
similar. Use of pooled procedure is warranted.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
30
Example 13.5 Male and Female Sleep Times
Two-sample T for sleep [without “Assume Equal Variance” option]
Sex
Female
Male
N
83
65
Mean StDev SE Mean
7.02
1.75
0.19
6.55
1.68
0.21
95% CI for mu(f) – mu(m): (-0.10, 1.02)
T-Test mu (f) = mu(m) (vs not =): T-Value = 1.62 P = 0.11 DF = 140
Two-sample T for sleep [with “Assume Equal Variance” option]
Sex
Female
Male
N
83
65
Mean
7.02
6.55
StDev
1.75
1.68
SE Mean
0.19
0.21
95% CI for mu(f) – mu(m): (-0.10, 1.03)
T-Test mu (f) = mu(m) (vs not =): T-Value = 1.62 P = 0.11 DF = 146
Both use Pooled StDev = 1.72
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
31
13.4 Testing The Difference Between
Two Population Proportions
Step 1: Determine null and alternative hypotheses
H0: p1 – p2 = 0 versus
Ha: p1 – p2  0 or Ha: p1 – p2 < 0 or Ha: p1 – p2 > 0
Watch how Population 1 and 2 are defined.
Step 2: Verify data conditions …
Samples are independent. Sample sizes are large
enough so that –
n1 pˆ1 , n1 1  pˆ1 , n2 pˆ 2 , and n2 1  pˆ 2 
– are at least 5 and preferably at least 10.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
32
Continuing Step 2: The Test Statistic
Under the null hypothesis, there is a common population
proportion p. This common value is estimated using all
the data as:
n1 pˆ 1  n2 pˆ 2
pˆ 
n1  n2
The standardized test statistic is:
sample statistic  null value
z

null standard error
pˆ1  pˆ 2  0
1 1
pˆ 1  pˆ   
 n1 n2 
This z-statistic has (approx) a standard normal distribution.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
33
Step 3: Assuming H0 true, Find the p-value
• For Ha less than, the p-value is the area below z,
even if z is positive.
• For Ha greater than, the p-value is the area above z,
even if z is negative.
• For Ha two-sided, p-value is 2  area above |z|.
Steps 4 and 5: Decide Whether or Not the
Result is Statistically Significant based on
p-value and Make a Conclusion in Context
Choose a level of significance a, and
reject H0 if the p-value is less than (or equal to) a.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
34
Example 13.6 Prevention of Ear Infections
Question: Does the use of sweetener xylitol
reduce the incidence of ear infections?
Randomized Experiment Results:
Of 165 children on placebo, 68 got ear infection.
Of 159 children on xylitol, 46 got ear infection.
Step 1: State the null and alternative hypotheses
H0: p1 – p2 = 0
versus Ha: p1 – p2 > 0
where
p1 = population proportion with ear infections on placebo
p2 = population proportion with ear infections on xylitol
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
35
Example 13.6 Ear Infections (cont)
Step 2: Verify conditions and compute z statistic
There are at least 10 children in each sample who did
and did not get ear infections, so conditions are met.
68
46
pˆ 1 
 .412, pˆ 2 
 .289, and pˆ 1  pˆ 2  .123
165
159
n1 pˆ 1  n2 pˆ 2
68  46
114
pˆ 


 .35
n1  n2
165  159 324
z
pˆ1  pˆ 2  0
1 1
ˆp1  pˆ   
 n1 n2 

.123  0
1 
 1
.351  .35


 165 159 
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
 2.32
36
Example 13.6 Ear Infections (cont)
Steps 3, 4 and 5: Determine the p-value and make
a conclusion in context.
The p-value is the area above
z = 2.32 using Table A.1.
We have p-value = 0.0102.
So we reject the null
hypothesis, the results are
“statistically significant”.
We can conclude that taking xylitol would reduce the
proportion of ear infections in the population of similar
preschool children in comparison to taking a placebo.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
37
13.5 Relationship Between Tests
and Confidence Intervals
For two-sided tests (for one or two means):
H0: parameter = null value and Ha: parameter  null value
• If the null value is covered by a (1 – a)100%
confidence interval, the null hypothesis is not rejected
and the test is not statistically significant at level a.
• If the null value is not covered by a (1 – a)100%
confidence interval, the null hypothesis is rejected and
the test is statistically significant at level a.
Note: 95% confidence interval  5% significance level
99% confidence interval  1% significance level
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
38
Example 13.4 Mean TV hours (M vs F)
Question: Does the population mean daily TV hours
differ for male and female college students?
95% CI for difference in population means: (-0.14, +0.98)
Test H0: m1 – m2 = 0 versus Ha: m1 – m2  0 using a = 0.05
The null value of 0 hours is in this interval.
Thus the difference in the sample means of 0.42 hours
is not significantly different from 0.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
39
Confidence Intervals and One-Sided Tests
When testing the hypotheses:
H0: parameter = null value versus a one-sided alternative,
compare the null value to a (1 – 2a)100% confidence interval:
• If the null value is covered by the interval, the test is
not statistically significant at level a.
• For the alternative Ha: parameter > null value, the test is
statistically significant at level a if the entire interval
falls above the null value.
• For the alternative Ha: parameter < null value, the test is
statistically significant at level a if the entire interval
falls below the null value.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
40
Example 13.6 Ear Infections (cont)
95% CI for p1 – p2 is 0.020 to 0.226
Reject H0: p1 – p2 = 0 and accept Ha: p1 – p2 > 0
with a = 0.025, because the entire confidence
interval falls above the null value of 0.
Note that the p-value for the test was 0.01,
which is less than 0.025.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
41
13.6 Choosing an Appropriate
Inference Procedure
• Confidence Interval or Hypothesis Test?
Is main purpose to estimate the numerical value
of a parameter or to make a “maybe not/maybe yes”
conclusion about a specific hypothesized value for
a parameter?
• Determining the Appropriate Parameter
Is response variable categorical or quantitative? Is there
one sample or two? If two, independent or paired?
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
42
13.7 The Two Types of Errors
and Their Probabilities
When the null hypothesis is true, the
probability of a type 1 error, the level of
significance, and the a-level are all equivalent.
When the null hypothesis is not true,
a type 1 error cannot be made.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
43
Trade-Off in Probability for Two Errors
There is an inverse relationship between the
probabilities of the two types of errors.
Increase probability of a type 1 error =>
decrease in probability of a type 2 error
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
44
Type 2 Errors and Power
Three factors that affect probability of a type 2 error
1. Sample size; larger n reduces the probability of a type 2
error without affecting the probability of a type 1 error.
2. Level of significance; larger a reduces probability of a
type 2 error by increasing the probability of a type 1 error.
3. Actual value of the population parameter; (not in
researcher’s control. Farther truth falls from null value (in
Ha direction), the lower the probability of a type 2 error.
When the alternative hypothesis is true, the probability of
making the correct decision is called the power of a test.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
45
13.8 Effect Size
Effect size is a measure of how much the truth
differs from chance or from a control condition.
m1  m 0
Effect size for a single mean: d 

Effect size for comparing two means:
m1  m 2
d

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
46
Estimating Effect Size
Estimated effect size for a single mean:
x  m0
ˆ
d
s
Estimated effect size for comparing two means:
x1  x2
ˆ
d
s
Relationship:
Test statistic = Size of effect  Size of study
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc.
47