Ch 9A Hypothesis Testing

Transcript Ch 9A Hypothesis Testing

Chapter 9 Tests of Hypothesis
Single Sample Tests
The Beginnings – concepts and
techniques
Chapter 9A
9-1.1 Statistical Hypotheses
Some Definitions
Statistical Hypothesis - An assertion about a population
parameter or distribution.
Test of hypothesis – arriving at a decision to reject or not
reject a hypothesis based upon a sample from the population.
Null hypothesis – usually the hypothesis of no difference.
The assertion that the researcher usually wants to reject.
Ho:  = 0
Alternate Hypothesis – the assertion that is accepted if the
null hypothesis is rejected. The assertion that the researcher
generally wants to prove.
H1:   0
Hypothesis Test on a Population Mean
Two-Sided Test:
One-Sided Tests:
Test of Hypothesis



If the information in a sample is consistent with the null
hypothesis, then we will conclude that the null hypothesis
cannot be rejected;
If this information is inconsistent with the null hypothesis,
we will conclude that the hypothesis is false and reject the
null hypothesis in favor of the alternate hypothesis.
Critical Region – the set of values for the test statistic that
results in rejecting the null hypothesis.

The test statistic is calculated from the sample; i.e. a sample
statistic
If ˆ  critical region, then reject H0
What can go wrong?
Do not Reject H0
Reject H0
(accept H1)
H0 true
H0 false
correct
decision
Type II
error
Type I
error
correct
decision
How Likely are the errors?
• Type I Error – Incorrectly Rejecting a True Hypothesis
a = P(Type I error)
• (1- a) = probability of not rejecting a true hypothesis
• Type II Error – Incorrectly Accepting a False Hypothesis
b = P(Type II error)
• Power of test (1-b) - probability of correctly rejecting
the null when the alternative is true.
The probability of a type I error is called the significance level of the test.
Our Very First Hypothesis Test


Professor Notso Brite believes that his mean driving time
to the campus from his home is 50 minutes while Dean
Nowet Ah disagrees with him believing that it takes him,
on the average more than 50 minutes.
It is known that the standard deviation of his driving time
is 2.5 minutes and driving time is normally distributed.



humor me here
For the next 10 days, Professor Brite records his driving
time with
X = 51.7 minutes
Can we accept Dean Nowet Ah’s assertion that the mean
driving time must be greater than 50 minutes?
The Hypothesis
H0:  = 50 minutes
H1:  > 50 minutes (one-tailed test)
Given:
X = 51.7 minutes
 = 2.5 minutes
n = 10

2.5
X =
=
= .79
n
10
More Probability of a Type I Error
Let’s set the probability of a Type I error = .05, Then
P(Type I error) = P(reject H0|H0 is correct) = a = .05
P  X  X c |  = 50 = .05
 X   X X c  50 
P



.79

X

P Z  z.05  = .05; P Z  1.6449 = .05
H0:  = 50 minutes
H1:  > 50 minutes
X c  50
= 1.6449; X c = 50  1.6449 .79 ) = 51.3
.79
If X  51.3  reject H 0
Since X = 51.7  51.3, reject H 0
What about the Type II Error?
Incorrectly Accepting a False Hypothesis
P(Type II error) = P(not rejecting H0 |H1 is correct) = b
P  X  X c |   50 = b
H0:  = 50 minutes
H1:  > 50 minutes
But professor, that
probability depends upon
the true value of the
population mean under the
alternate hypothesis.
More about Type II Errors
Say the true mean is 51:
 X   X 51.3  51
P  X  X c |  = 51 = P 

 = P  z  .3797 = .6479
.79 
 X
 X   X 51.3  52 
P  X  X c |  = 52 = P 

 = P  z  .8861 = .1878
.79 
 X
 X   X 51.3  53 
P  X  X c |  = 53 = P 

 = P  z  2.1519 = .0157
.79 
 X
The Situation Graphically Displayed
Probability Density Function
0 = 50; 1 = 51
0.6
0.5
0.4
0.3
0.2
0.1
0
47
48
49
Prob = .6479
50
51
52
Xc = 51.3
53
Prob = .05
More Graphical Display
0 = 50; 1 = 52
0.6
0.5
0.4
0.3
0.2
0.1
0
47
48
49
Prob = .1878
50
51
52
Xc = 51.3
53
54
Prob = .05
55
Prob Accept Null Hyp
The Operating Characteristic (OC)
Curve
1.0000
0.9000
0.8000
0.7000
0.6000
0.5000
0.4000
0.3000
0.2000
0.1000
0.0000
49.5
50
50.5
51
51.5
True Mean
52
52.5
53
53.5
The Power of the Test
• The power is computed as 1 - b, and power can be
interpreted as the probability of correctly rejecting a
false null hypothesis.
• We often compare statistical tests by comparing their
power properties.
The Power of the Test
Power of test (1-b) - probability of correctly rejecting the
null hypothesis when the alternative is true.
Power Curve
Prob reject nulll
1.2000
1.0000
0.8000
0.6000
0.4000
0.2000
0.0000
49.5
50
50.5
51
51.5
True Mean
52
52.5
53
53.5
The Prob-Value
H0:  = 50 minutes
H1:  > 50 minutes (one-tailed test)
Given: X = 51.7  = 2.5, n = 10
 X   X 51.7  50 
P-value = P  X  51.7 |  = 50 = P 



.79

X

= P  z  2.1519 = .0157
The Prob-Value
PDF of X-Bar
0.6
0.5
0.4
0.3
P-Value = .0157
0.2
a = .05
0.1
0
47
48
49
50
51.3
51
52
X = 51.7
53
54
55
Sample Size Determination

X c   0 
P  X  X c |  0 = P z 
 =a
 / n 


X 
P  X  X c |  = P  z  c
=b
For 2-tailed test:
/ n

Xc  0
Xc  
= za ;
=  zb
/ n
/ n
Sample Size in Action

What sample size is need if the level of significance is
one percent and the probability of rejecting the null
hypothesis if the true mean is 52 is 95 percent?
a = .01  z.01 = 2.33
1  b = .95  z.05 = 1.645
2.33  1.645 )  2.5 )

n=
= 24.68  25
2
2
2
2
A Two-Tailed Test
H0:  = 50 minutes
H1:  = 50 minutes (two-tailed test)
Given:
X = 51.7 minutes
 = 2.5 minutes
n = 10

2.5
X =
=
= .79
n
10
More Probability of a Type I Error
Let’s set the probability of a Type I error = .05, Then
P  X c1  X  X c 2 |  = 50 = 1  a = .95
 X c1  50 X   X X c 2  50 
P



.79

.79

X

P  z.025  Z  z.025  = .95; P 1.96  Z  1.96 = .95
X c1  50
= 1.96; X c1 = 50  1.96 .79 ) = 48.4516
.79
X c 2  50
= 1.96; X c 2 = 50  1.96 .79 ) = 51.5484
.79
If X  48.4516 or X  51.5484  reject H 0
Probability of a Type II Error
b = P  X c1  X  X c 2 |  = 1 = P 48.4516  X  51.5484 |   50
 48.4516  1 X   X 51.5484  1 
P



.79

.79

X

1 = 49:
 48.4516  49 X   X 51.5484  49 
b = P


 = P 0.6941  z  3.2258 = .7556
.79
X
.79


1 = 52:
 48.4516  52 X   X 51.5484  52 
b = P


 = P 4.4916  z  .5716 = .2838
.79
X
.79


A Two-Tailed Prob-Value
 X   X 51.7  50 
P-value = 2  P  X  51.7 |  = 50 = 2  P 


.79 
 X
= 2  P  z  2.1519 = 2 .0157 ) = .0314
Reject H0 if a  .0314
z0 =
X  0
X
X  0
=
/ n
A two-sided confidence interval –
a study in comparison
X = 51.7 minutes
Given:
 = 2.5 minutes and n = 10
95% confidence interval:
z.025 = 1.96
X  za /2

n
= 51.7  1.96 .79 ) = (50.1516,53.2484)
A 95% confidence interval identifies a set of acceptable hypotheses
at the 5% level of significance. A mean of 50 lies outside the
interval and is therefore rejected.
Confidence Intervals and Hypothesis
Tests – together at last
0  za /2 x
0
x  za /2 x
0  za /2 x
x
x  za /2 x
9-2 Tests on the Mean of a Normal
Distribution, Variance Known
We wish to test:
The test statistic is:
Reject H0 if the observed value of the test statistic z0 is either:
z0 > za/2 or z0 < -za/2
Fail to reject H0 if
-za/2 < z0 < za/2
9-2 Tests on the Mean of a Normal
Distribution, Variance Known
Alternately
H 0 :  = 0
H1 :    0
X c1 = 0  za /2
X c 2 = 0  za /2

n

n
If X  X c1 or X  X c 2  reject H 0
Points to Ponder
When we are considering Type II errors (beta), we use
the distribution of the test statistic under the
alternative hypothesis.
When we are considering Type I errors (alpha), we
use the distribution of the test statistic under the null
hypothesis.
Statistical versus Practical
Significance


statistical significance says nothing about the
importance of the difference
there may be statistically significant difference
between two values with no practical difference



mean of 50.4 driving minutes versus 49.7 driving
minutes
large sample sizes will identify a difference
There may no statistically significant difference
between two values but there is a significant
practical difference

mean of .20 mm in the diameter of a ball-bearing versus
.18 mm
Interactions -- alpha, beta, sample size
N and a then b
N and
a then b
N then: a and b

Alpha/beta tradeoffs.
Lower alpha value means a
larger beta value. Power of a
test is (1-beta). Lower alpha
implies we are reluctant to
risk rejecting a true
hypothesis. But it means we
must risk accepting a false
one. Only way to improve
both is to increase the
sample size.
On the selection of the level of
significance



Convention is to use .01 or .05
Consider practical consequences of making a type I
or II error
Consider power of the test and sample size





Large N – small difference will be statistically significant –
use small a (.01 - .001)
Small N – large differences may not be detected – use large
a (.05 - .10)
Consider “true” difference
Type I versus Type II errors
Use the P-value and let the reader decide

“I’m just reporting the facts; you decide”
General Procedures for Hypothesis Tests
1.
2.
3.
4.
5.
6.
7.
Identify the parameter of interest.
State the null hypothesis – H0.
Specify the alternative – H1.
Choose the significance level – alpha – risk of Type I error.
Determine the appropriate test statistic.
State the rejection region for the statistic.
Compute the sample quantities (i.e. from the experiment or
measurement) and substitute into the equation for test
statistic.
8. Decide whether to reject H0.
A Little Philosophy
Consider the following:
Consider H0:  = 10 H1  > 0
X C = 11.7; a = .05
• If we reject the null hypothesis:
• either H1 is true
• or we were extremely unlucky and hit on the 5 percent of the samples
that fall in the critical region
• We go with the odds and reject the null
• If we fail to reject the null (assume x-bar = 11.1)
• H0 is still left standing at the end of the test
• The alternative hypothesis is what we wish to prove and believe to be
correct
• The sample supports H1 but the test does not allow us to reject H0
• Therefore, we conclude that the evidence does not allow us to
reject H0 stopping short of saying we accept H0.
Large Sample Test



In most situations, the population variance is
unknown and the population may not be well
modeled as a normal distribution
If n is large (n >40), the sample standard deviation,
s, can be substituted for  with little effect appealing
to the central limit theorem
Exact tests where the population is normal, 2 is
unknown, and n is small results in t-distribution.
A Little Recap

Tests on a mean, variance known, normal
population or large sample size (CLT)
H0:  = 0
H1:   0
X = 0  za /2

c
X c = 0  za /2
  or s )
n
  or s )
n
; reject X  X c
; reject X  X c
H0:  = 0
  or s )
; reject X  X c
H1:  > 0 X c = 0  za
n
H0:  = 0
H1:  < 0
X c = 0  za
  or s )
n
; reject X  X c
Next Time
Time Permitting

Ch 9A Hypothesis Testing

Transcript Ch 9A Hypothesis Testing

Directory