Transcript document

11.1 – Significance Tests: The basics
Inference:
to assess the evidence provided by the sample
to claim information about the population.
x

p̂
p
Hypothesis:
A claim made about a population parameter,
and sample data is gathered to determine
whether the hypothesis is true.
Null Hypothesis:
The statement being tested. We believe this to
be true until we get evidence against it.
NOTATION:
H o :   o
Alternate Hypothesis:
Statement we hope or suspect is true instead
of the null hypothesis
NOTATION:
H a :   o
H a :   o
H a :   o
H a :   o
 x
H a :   o
x 
H a :   o
(Two-Tailed)
x  x
Test Statistic:
A sample statistic that is computed from the data. It
helps us to make a statistical decision. Do we have
enough evidence to reject the null hypothesis or not?
estimate  hypothesized value
test statistic =
standard deviation of the estimate
Z=
x  O

n
p-value:
• This value measures how much evidence you have
against the null hypothesis.
• Small p-values indicate the outcome measured from
the sample data is unlikely given the null hypothesis is
true. It provides strong evidence against your null
hypothesis.
Statistically Significant:
An event unlikely to occur by chance. If your p-value
is small, then it is statistically significant. It is called
alpha, .
Significance Level:
The decisive p-value we fix in advance. This states
when the null hypothesis should be rejected. This
level is compared to the p-value. Common  levels
of rejection are =0.10, =0.05, and =0.01.
p < , then reject the null
p  , then accept the null
Conditions:
SRS
Normality
Independence
Example #1 – State the notation for the null and alternative hypothesis
a. Suppose we work in the quality control department of
Ruffles Potato Chips. The quality control manager wants
us to verify that the filling machine is calibrated properly.
We wish to determine if the mean amount of chips in a bag
is different from the advertised 12.5 ounces. The company
is concerned if there are too many or too few chips in the
bag.
H o :   12.5
H a :   12.5
Example #1 – State the notation for the null and alternative hypothesis
b. According to the US Department of Agriculture, the mean
farm rent in Indiana was $89 per acre in 1995. A
researcher for the USDA claims that the mean rent has
decreased since then. He randomly selected 50 farms from
Indiana and determined the mean farm rent to be $67.
H o :   $89
H a :   $89
Example #1 – State the notation for the null and alternative hypothesis
c. Researchers claim to have found a brain protein that blocks
the craving for fatty food and therefore, increases the loss of
body fat. To test this theory, 100 people are treated with
protein and the reduction in body fat is measured.
Ho :   0
Ha :   0
Example #2
At the bakery where you work, loaves of bread are supposed to
weigh 1 pound. From experience, the weights of loaves produced
at the bakery follow a Normal distribution with standard
deviation  = 0.13 pounds. You believe that new personnel are
producing loaves that are heavier than 1 pound. As supervisor
of Quality Control, you want to test your claim at the 5%
significance level. You weigh 20 loaves and obtain a mean
weight of 1.05 pounds.
a. Identify the parameter of interest. State your null and
alternative hypotheses.
 = True mean weight of loaves of bread
Ho :   1
Ha :   1
Example #2
At the bakery where you work, loaves of bread are supposed to
weigh 1 pound. From experience, the weights of loaves produced
at the bakery follow a Normal distribution with standard
deviation  = 0.13 pounds. You believe that new personnel are
producing loaves that are heavier than 1 pound. As supervisor
of Quality Control, you want to test your claim at the 5%
significance level. You weigh 20 loaves and obtain a mean
weight of 1.05 pounds.
b. Verify the conditions are met.
SRS (must assume)
Normality (yes, pop. is approx normal, therefore, so is sample
dist)
Independence (There are more than 200 loaves of bread)
Example #2
At the bakery where you work, loaves of bread are supposed to
weigh 1 pound. From experience, the weights of loaves produced
at the bakery follow a Normal distribution with standard
deviation  = 0.13 pounds. You believe that new personnel are
producing loaves that are heavier than 1 pound. As supervisor
of Quality Control, you want to test your claim at the 5%
significance level. You weigh 20 loaves and obtain a mean
weight of 1.05 pounds.
c. Calculate the test statistic and the P-value. Illustrate using
the graph provided.
Z
x  O

n
0.05
1.05  1
 1.72


0.13
.029
20
1.72
P(Z > 1.72) = 1 – P(Z < 1.72) =
1.72
P(Z > 1.72) = 1 – P(Z < 1.72) = 1 – 0.9573 = 0.0427
Example #2
At the bakery where you work, loaves of bread are supposed to
weigh 1 pound. From experience, the weights of loaves produced
at the bakery follow a Normal distribution with standard
deviation  = 0.13 pounds. You believe that new personnel are
producing loaves that are heavier than 1 pound. As supervisor
of Quality Control, you want to test your claim at the 5%
significance level. You weigh 20 loaves and obtain a mean
weight of 1.05 pounds.
d. State your conclusions clearly in complete sentences.
I would reject the null hypothesis at the 0.05 level. I
believe that the workers are making the loaves heavier.
11.2 - Carrying Out Significance Tests
Steps to Hypothesis Testing: PHANTOMS
P: Parameter of interest
H: Hypothesis
A: Assumptions
N: Name of Test
T: Test Statistic
O: Obtain P-Value
M: Make a Statistical Decision
S: Summary in context of problem.
One-Sample Z-Test:
Testing the mean when  is known.
Calculator Tip: Z-Test
Stat – Tests - ZTest
Example #1
An energy official claims that the oil output per well in the US
has declined from the 1998 level of 11.1 barrels per day. He
randomly samples 50 wells throughout the US and determines
that the mean output to be 10.7 barrels per day. Assume =1.3
barrels. Test the researchers claim at the =0.05 level.
P: Mean oil output per well in the US
H:
H o :   11.1
H A:   11.1
A:
SRS (says so)
Normality (n 30, so by the CLT, approx normal)
Independence
(Safe to assume more than 500 wells in the US)
N: ZTest
T:
Z
x  O

n
0.4
10.7  11.1
 2.176


1.3
0.1838
50
O:
2.176
P(Z < -2.176) =
O:
2.176
P(Z < -2.176) = 0.0146
M:
<
p ____

0.0146
Reject the Null
0.05
S:
There is enough evidence to reject the claim that the
average oil output per well in the US is 11.1 barrels
per day.
Example #2
The average daily volume of Dell computer stock in 2000 was
31.8 million shares with a standard deviation of 14.8 million
shares according to Yahoo! A stock analyst claims that the
stock volume in 2001 is different from the 2000 level. Based on
a random sample of 35 trading days in 2001, he finds the
sample mean to be 37.2 million shares. Test the analyst’s claim
at the =0.01 level.
P: Mean volume of Dell computer stock
H:
H o :   31.8
H A:   31.8
A:
SRS (says so)
Normality (n 30, so by the CLT, approx normal)
Independence
(Safe to assume more than 350 trading days)
N: ZTest
T:
Z
x  O

n
5.4
37.2  31.8
 2.16


14.8
2.50
35
O:
2.16 2.16
2[ P(Z < -2.16)] =
O:
2.16 2.16
2[ P(Z < -2.16)] =
2[ 0.0154] = 0.0308
M:
>
p ____

0.0308
Accept the Null
0.01
S:
There is not enough evidence to claim that the
average daily volume of Dell stock is different from
31.8 million shares.
Duality of Confidence Intervals and Hypothesis Testing
If the confidence interval does not contain μo, we have
evidence that supports the alternative hypothesis,
thus we reject the null hypothesis at the  level.
Note: The Confidence Interval matches the
two-tailed test only!
Example #3
The average daily volume of Dell computer stock in 2000 was 31.8
million shares with a standard deviation of 14.8 million shares
according to Yahoo! A stock analyst claims that the stock volume
in 2001 is different from the 2000 level. Based on a random
sample of 35 trading days in 2001, he finds the sample mean to be
37.2 million shares. Test the analyst’s claim at the =0.01 level.
a. What was your conclusion from this hypothesis test
in Example #2?
To not reject 31.8 million shares per day
b. Construct a 99% confidence interval for the true average
daily volume of Dell Computer stock in 2001.
Note: We already did P and A
N:
Z-Interval
I:
  
x  Z *

 n
 14.8 
37.2  2.576 

 35 
37.2  6.444
30.756,
43.644 
C: I am 99% confident the true mean daily Dell
volume stock is between 30.756 and 43.644
million shares.
c. Does this interval reaffirm your statistical decision
from the hypothesis test? Explain.
Yes, 31.8 is in the interval, so can’t assume it is
different
Example #4
Does marijuana use affect anger expression? Assume for all nonusers, the mean score on an anger expression scale is 41.5 with a
standard deviation of 6.05. For a random sample of 47 frequent
marijuana users, the mean score was 44.
a. Test the claim that marijuana affects the expression of
anger at the =0.05 level.
P: True mean anger expression for marijuana users
H:
H o :   41.5
H A:   41.5
A:
SRS (says so)
Normality (n 30, so by the CLT, approx normal)
Independence
(Safe to assume more than 470 marijuana users)
N: ZTest
T:
Z
x  O

n
44  41.5


6.05
47
2.5
 2.83
0.88
O:
2.83 2.83
2[ P(Z < -2.83)] =
O:
2.83 2.83
2[ P(Z < -2.83)] =
2[ 0.0023] = 0.0046
M:
>
p ____

0.0046
Reject the Null
0.05
S:
There is not enough evidence to claim that the
average anger expression for marijuana users is 41.5.
Does marijuana use affect anger expression?
Yes, anger expression is different for marijuana
users
b. Calculate a 95% confidence interval for the mean anger
expression of frequent marijuana users. Does this interval
reaffirm your statistical decision in part a?
N:
Z-Interval
I:
  
x  Z *

 n
 6.05 
44  1.96 

 47 
44 1.72966
 42.27,
45.73
C: I am 95% confident the true mean anger
expression for marijuana users is between 42.27
and 45.73.
b. Does this interval reaffirm your statistical decision
in part a?
Yes, 41.5 is not in the interval, so can’t assume it is
the same
11.3 – Use and Abuse of Tests
11.4 – Using Inference to Make Decisions
What  level to use?
 How plausible is Ho? If it represents an assumption that the
people you must convince have believed for years, strong
evidence (small ) will be needed.
 What are the consequences for rejecting Ho? To do this
means you might have to make major changes to accept Ha.
 Consider the sample and if you need to increase the sample
size or look for outliers. Is the sample a true representation of
the population?
 Remember that a certain percent of time you won’t reject
the null. (ex. 5%) Multiple testing helps to check this.
What  level to use?
• Typically ok to use 0.05
• Beware of the p-values of 0.049 and the 0.051!
Errors in Hypothesis Testing:
Because a statistician must make inferences (or conclusions)
based on random data that is subject to sampling errors, we can
make mistakes in hypothesis testing. In fact, there are two types
of errors that can be made.
Ho True
Type I Error
Reject Ho
Do not Reject
Ho
p=

Ho False

Power of the test
p=1–
Type II Error
p=
Note: You will never have to calculate 
To reduce type II error and increase the power of the test:
• Increase the sample size
• Increase the significance level alpha (be careful, if we
choose an alpha that almost guarantees never to make a
type I error, then there is a large type II error, because it
would be hard to reject the null under any circumstance.
Example #1: In a criminal trial, the defendant is held to be
innocent until shown to be guilty beyond a reasonable doubt. If
we consider hypotheses
H0: defendant is innocent
Ha: defendant is guilty
we can reject H0 only if the evidence strongly favors Ha.
1. Make a diagram that shows the truth about the defendant,
and the possible verdicts and that identifies the two types of
error. Which type of error is more serious?
Ho True
Ho False
Reject Ho
I: Innocent and
found guilty
 Guilty and
Do not
Reject Ho
Innocent and
 found innocent
II: Guilty and
found innocent
found guilty
Type I error is more serious
Example #1: In a criminal trial, the defendant is held to be
innocent until shown to be guilty beyond a reasonable doubt. If
we consider hypotheses
H0: defendant is innocent
Ha: defendant is guilty
we can reject H0 only if the evidence strongly favors Ha.
2. Is this goal better served by a test with  = 0.20 or a test
with  = 0.01? Explain your answer.
 = 0.01 because the probability of a Type I error
would be smaller then
Reject Ho
Do not
Reject Ho
Ho True
Ho False
I: Innocent and found  Guilty and found
guilty
guilty

Innocent and found II: Guilty and found
 innocent
innocent
Example #1: In a criminal trial, the defendant is held to be
innocent until shown to be guilty beyond a reasonable doubt. If
we consider hypotheses
H0: defendant is innocent
Ha: defendant is guilty
we can reject H0 only if the evidence strongly favors Ha.
3. Explain what is meant by the power of the test in this
setting.
The ability to find a person guilty that is in fact guilty
Reject Ho
Ho True
Ho False
I: Innocent and found  Guilty and found
guilty
guilty
power
Do not
Innocent and found II: Guilty and found
Reject Ho  innocent
innocent
Example #2: For each of the following samples, state the null and
alternative hypotheses, Identify when a Type I and a Type II Error
would occur.
a. A company specializing in parachute assembly claims that its
competitor’s main parachute failure rate is more than 1%.
You perform a hypothesis test to determine whether the
company’s claim is true. Which error is more serious?
Ho: The main parachute failure rate is 1%
Ha: The main parachute failure rate is more than 1%
a. A company specializing in parachute assembly claims that its
competitor’s main parachute failure rate is more than 1%.
You perform a hypothesis test to determine whether the
company’s claim is true. Which error is more serious?
Ho: The main parachute failure rate is 1%
Ha: The main parachute failure rate is more than 1%
Ho True
I: Failure rate is
1% and think its
more than 1%
Reject Ho
Do not
Reject Ho

Failure rate is
1% and think it
is 1%
Ho False
rate is not
 Failure
1% and think its
more than 1%
II: Failure rate is not
1% and think it is
1%
Type II error is more serious
Example #2: For each of the following samples, state the null and
alternative hypotheses, Identify when a Type I and a Type II Error
would occur.
b. A company that produces snack foods uses a machine to
package 454 gram bags of pretzels. If it is working properly,
the bags will be exactly 454 grams. You perform a hypothesis
test to determine whether the company is packaging the right
amount of grams per bag.
Ho: There is 454 grams of pretzels are in the bag
Ha: There is not 454 grams of pretzels are in the bag
b. A company that produces snack foods uses a machine to
package 454 gram bags of pretzels. If it is working properly, the
bags will be exactly 454 grams. You perform a hypothesis test to
determine whether the company is packaging the right amount
of grams per bag.
Ho: There is 454 grams of pretzels are in the bag
Ha: There is not 454 grams of pretzels are in the bag
Ho True
I: 454 grams in
bag and don’t
think 454g
Reject Ho
Do not
Reject Ho

454 grams in bag
and think 454g.

Ho False
Not 454g in bag,
and don’t think
454g
II: Not 454g in bag
and think 454g.