Chapter 21: More About Tests

Download Report

Transcript Chapter 21: More About Tests

Chapter 21: More About Tests
AP Statistics
Null Hypothesis
Stating it can sometimes be tricky
– If event is random or result of “Guessing”, then many
times your null is unusual, such as
1
H : p  0.5 or
0
H0 : p 
6
– Especially when testing to see if “coin is fair” or to see if
“someone has ESP and can predict which closed hand contains
a prize” or to see if “die is fair”
Alpha Levels (Significant Levels)
• If P-Value is small, it tells us that our data
is rare given the null hypothesis
• How rare is rare? How low must it be to
reject the null hypothesis?
• We arbitrarily set a threshold for P-value
and if it is below that value, we reject the
null hypothesis
• This threshold is called the ALPHA LEVEL
Alpha Levels (Significant Levels)
• We denote the Alpha Level: 
• Also called the significant level because values
under the  are considered statistically
significant.
• When we reject the null, “the test is significant at
the  =.05 level.”
• Always select  before you look at data
• Always report P-value and  level in conclusion
Common Alpha Levels

1-sided
2-sided
0.05
1.645
1.96
0.01
2.28
2.575
0.001
3.09
3.29
• When the alternative is
one-sided, the critical
value puts all of  on
one side:
• When the alternative is
two-sided, the critical
value splits  equally
into two tails:
Practical vs Statistical Significance
• For larger sample sizes, small unimportant
deviations from the null can be statistically
significant.
• For smaller sample sizes, large seemingly
important deviations from the null may not
be statistically significant.
• Also, always do a reality check—what is
the big deal about that difference?
Confident Intervals
• You can approximate a hypothesis by
examining a confidence interval
– Just ask whether the null is consistent with a CI for
the parameter at the corresponding confidence
interval
• A 95% Confidence interval corresponds to a
two-sided hypothesis test at   .05 (sum of
the two tails)
• A 95% Confidence interval corresponds to a
one-sided hypothesis test at   .025
Errors
•
Here’s some shocking news for you:
nobody’s perfect. Even with lots of
evidence we can still make the wrong
decision.
When we perform a hypothesis test, we
can make mistakes in two ways:
•
I.
II.
The null hypothesis is true, but we
mistakenly reject it. (Type I error)
The null hypothesis is false, but we fail to
reject it. (Type II error)
Errors
Errors
Type I: You are healthy, but a test says you
have a disease (false positive)
Type II: You are not healthy, but the test
says you do not have a disease (false
negative)
___________________________________
Type I: Jury convicts a innocent person
Type II: Jury fails to convict a guilty person
Type I Errors
How often does a Type I Error Occur?
• Happens when null hypothesis is true, but you have the
misfortune to draw the unusual sample.
• To reject null hypothesis, P-value must fall below
• When the null hypothesis is true, but we reject it the Pvalue is exactly
• When you set  , you are setting the probability of a
Type I error
• Remember, you can only have a Type I error if the null
hypothesis is true


Type II Error
What happens if Null Hypothesis is not true?
• If null hypothesis is false and we reject if, we
have done the correct thing.
• If the null hypothesis is false and we fail to reject
it, we have committed a Type II Error.
• The probability that this error occurs is denoted
by 
Errors in General
Review
• Probability of Type I Error =

• Probability of Type II Error =

Reducing Error
• Neither Error is good
• Difficulty is reducing one, without increasing the other.
• Imagine: To reduce Type I error, I reduce my  ;
however, then my  , or Type II error would increase.
• The only way to reduce both errors is to collect more
evidence—more data
– Many times studies fail because sample sizes are too small to
detect the change they are looking for
– When designing a survey or experiment it is a good idea to
calculate  , for a reasonable
.

Power—related to reducing error
It is natural to think that if we failed to reject the
null hypothesis we did not look hard enough and
made the wrong decision.
– Is the null hypothesis really false and our test was too
weak to detect the strength of the difference?
• We want a test that is strong enough to make
the right decision when should be rejecting the
null hypothesis when it really is false
• The POWER of the test tells us how strong
our test is in rejecting a false null
hypothesis.
Power
• When power is high, we can be confident that we
looked hard enough.
• High Power tells us that our test is strong and
has a very good chance of detecting a false null
hypothesis (very good chance of NOT making a
Type II Error)
• Power is calculated by: 1  
This is the complement of making a Type II Error
Power
• Whenever a study fails to reject the Null
Hypothesis, the Power of the test comes into
play.
• When we calculate Power, we imagine the null
hypothesis is FALSE.
• The value of Power depends on how far the truth
lies from the null hypothesis—this distance is
called the “effect size”
effect size  p0  p
Power
Notice from visual:
• Power = 1  
• Reducing  to lower Type I error will move the critical
*
value, p to the right and have the effect of increasing 
the probability of a Type II error and consequentially
reducing Power
• Notice that the large the effect size, the smaller the
chance of making a Type II Error and the greater that
power of the test.
Reducing Both Type I and Type II
Error
• This was discussed earlier—increase the sample size. The effect of a
larger sample size can be seen below. An increased sample size will
reduce the standard deviation, thus making the curves narrower.
Reducing Both Type I and Type II
Error (cont.)
• Original comparison
of errors:
• Comparison of errors
with a larger sample
size: