Significance Testing

Download Report

Transcript Significance Testing

Significance
Testing
Chapter 13
Victor Katch Kinesiology
Critical Region
The critical region (or rejection
region) is the set of all values
of the test statistic that cause
us to reject the null
hypothesis. For example, see
the red-shaded region in
previous Figure.
Victor Katch Kinesiology
2
Significance Level
The significance level (denoted
by ) is the probability that the
test statistic will fall in the
critical region when the null
hypothesis is actually true.
Common choices for  are
0.05, 0.01, and 0.10.
Victor Katch Kinesiology
3
Critical Value
A critical value is any value separating
the critical region (where we reject the
H0) from the values of the test statistic
that does not lead to rejection of the
null hypothesis, the sampling
distribution that applies, and the
significance level . For example, the
critical value of z = 1.645 corresponds
to a significance level of  = 0.05.
Victor Katch Kinesiology
4
Two-tailed,
Right-tailed,
Left-tailed Tests
The tails in a distribution are the
extreme regions bounded
by critical values.
Victor Katch Kinesiology
5
Two-tailed Test
H0: =
H1 :

 is divided equally between
the two tails of the critical
region
Means less than or greater than
Victor Katch Kinesiology
6
Right-tailed Test
H0: =
H1: >
Points Right
Victor Katch Kinesiology
7
Left-tailed Test
H 0: =
H1: <
Points Left
Victor Katch Kinesiology
8
P-Value
The P-value (or p-value or probability
value) is the probability of getting a
value of the test statistic that is at
least as extreme as the one
representing the sample data,
assuming that the null hypothesis is
true. The null hypothesis is rejected if
the P-value is very small, such as 0.05
or less.
Victor Katch Kinesiology
9
Conclusions
in Hypothesis Testing
We always test the null
hypothesis.
1. Reject the H0
2. Fail to reject the H0
Victor Katch Kinesiology
10
Accept versus
Fail to Reject
Some texts use “accept the null
hypothesis.”
We are not proving the null hypothesis.
The sample evidence is not strong
enough to warrant rejection (such as
not enough evidence to convict a
suspect).
Victor Katch Kinesiology
11
Decision Criterion
Traditional method:
Reject H0 if the test statistic falls
within the critical region.
Fail to reject H0 if the test statistic
does not fall within the critical region.
Victor Katch Kinesiology
12
Decision Criterion
P-value method:
Reject H0 if P-value   (where  is the
significance level, such as 0.05).
Fail to reject H0 if P-value > .
Victor Katch Kinesiology
13
Decision Criterion
Another option:
Instead of using a significance
level such as 0.05, simply identify
the P-value and leave the decision
to the reader.
Victor Katch Kinesiology
14
Example:
Finding P-values
Victor Katch Kinesiology
Wording of Final Conclusion
Victor Katch Kinesiology
16
Hypothesis testing about:
• a population mean or mean difference (paired data)
• the difference between means of two populations
• the difference between two population proportions
Three Cautions:
1. Inference is only valid if the sample is representative
of the population for the question of interest.
2. Hypotheses and conclusions apply to the larger
population(s) represented by the sample(s).
3. If the distribution of a quantitative variable is highly
skewed, consider analyzing the median rather than the
mean – called nonparametric methods.
Victor Katch Kinesiology
17
Significance Testing
Steps in Any Hypothesis Test
1. Determine the null and alternative hypotheses.
2. Verify necessary data conditions, and if met,
summarize the data into an appropriate test
statistic.
3. Assuming the null hypothesis is true, find the pvalue.
4. Decide whether or not the result is statistically
significant based on the p-value.
5. Report the conclusion in the context of the
situation.
Victor Katch Kinesiology
18
Testing Hypotheses About One
Mean or Paired Data
Step 1: Determine null and alternative hypotheses
1. H0: m = m0 versus Ha: m  m0 (two-sided)
2. H0: m  m0 versus Ha: m < m0 (one-sided)
3. H0: m  m0 versus Ha: m > m0 (one-sided)
Often H0 for a one-sided test is written as H0: m = m0.
Remember a p-value is computed assuming H0 is true,
and m0 is the value used for that computation.
Victor Katch Kinesiology
19
Step 2: Verify Necessary Data Condition
Situation 1: Population of measurements of interest
is approximately normal, and a random sample of
any size is measured. In practice, use method if
shape is not notably skewed or no extreme outliers.
Situation 2: Population of measurements of interest
is not approximately normal, but a large random
sample (n  30) is measured. If extreme outliers or
extreme skewness, better to have a larger sample.
Victor Katch Kinesiology
20
Continuing Step 2: The Test Statistic
The t-statistic is a standardized score for measuring
the difference between the sample mean and the null
hypothesis value of the population mean:
sample mean  null value x  m 0
t

s
standard error
n
This t-statistic has (approx) a t-distribution with df = n - 1.
Victor Katch Kinesiology
21
Step 3: Assuming H0 true, Find the p-value
• For H1 less than, the p-value is the area below t,
even if t is positive.
• For H1 greater than, the p-value is the area above t,
even if t is negative.
• For H1 two-sided, p-value is 2  area above |t|.
Victor Katch Kinesiology
22
Steps 4 and 5: Decide Whether or Not the
Result is Statistically Significant based on
the p-value and Report the Conclusion in
the Context of the Situation
These two steps remain the same for all of the
hypothesis tests.
Choose a level of significance , and reject H0
if the p-value is less than (or equal to) .
Otherwise, conclude that there is not enough
evidence to support the alternative hypothesis.
Victor Katch Kinesiology
23
Example Normal Body Temperature
What is normal body temperature? Is it actually
less than 98.6 degrees Fahrenheit (on average)?
Step 1: State the null and alternative hypotheses
H0: m = 98.6
Ha: m < 98.6
where m = mean body temperature in human population.
Victor Katch Kinesiology
24
Example Normal Body Temp (cont)
Data: random sample of n = 18 normal body temps
98.2
97.4
97.8
97.6
99.0
98.4
98.6
98.0
98.2
99.2
97.8
98.6
98.4
97.1
99.7
97.2
98.2
98.5
Step 2: Verify data conditions …
no outliers nor strong
skewness.
Sample mean of 98.217
is close to sample median
of 98.2.
Victor Katch Kinesiology
x
x
x
x
x
x
x
x
x
x
25
Example Normal Body Temp (cont)
Step 2: … Summarizing data with a test statistic
Test of mu = 98.600 vs mu < 98.600
Variable
N
Mean StDev
Temperature 18 98.217 0.684
SE Mean T
P
0.161
-2.38 0.015
Key elements:
Sample statistic: x = 98.217 (under “Mean”)
s
0.684
Standard error: s.e.x  

 0.161 (under “SE Mean”)
n
18
x  m 0 98.217  98.6
t

 2.38 (under “T”)
s
0.161
n
Victor Katch Kinesiology
26
Example Normal Body Temp (cont)
Step 3: Find the p-value
From output: p-value = 0.015
From Table A.3: p-value
is between 0.016 and 0.010.
Area to left of t = -2.38 equals
area to right of t = +2.38. The
value t = 2.38 is between column
headings 2.33 and 2.58 in table,
and for df =17, the one-sided
p-values are 0.016 and 0.010.
Victor Katch Kinesiology
27
Example Normal Body Temp (cont)
Step 4: Decide whether or not the result is
statistically significant based on the p-value
Using  = 0.05 as the level of significance criterion,
the results are statistically significant because 0.015,
the p-value of the test, is less than 0.05. In other
words, we can reject the null hypothesis.
Step 5: Report the Conclusion
We can conclude, based on these data, that the mean
temperature in the human population is actually less
than 98.6 degrees.
Victor Katch Kinesiology
28
Paired Data and the Paired t-Test
Data: two variables for n individuals or pairs;
use the difference d = x1 – x2.
Parameter: md = population mean of differences
Sample estimate: d = sample mean of the differences
Standard deviation and standard error:
sd = standard deviation of the sample of differences;
sd
s.e.d  
n
Often of interest: Is the mean difference in the
population different from 0?
Victor Katch Kinesiology
29
Steps for a Paired t-Test
Step 1: Determine null and alternative hypotheses
H0: md = 0 versus Ha: md  0 or Ha: md < 0 or Ha: md > 0
Watch how differences are defined for selecting the Ha.
Step 2: Verify data conditions and compute test statistic
Conditions apply to the differences.
sample mean  null value d  0
The t-test statistic is: t 

sd
standard error
n
Steps 3, 4 and 5: Similar to t-test for a single mean.
The df = n – 1, where n is the number of differences.
Victor Katch Kinesiology
30
Example Effect of Alcohol
Study: n = 10 pilots perform simulation first under
sober conditions and then after drinking alcohol.
Response: Amount of useful performance time.
(longer time is better)
Question: Does useful performance time decrease
with alcohol use?
Step 1: State the null and alternative hypotheses
H0: md = 0 versus Ha: md > 0
where md = population mean difference between alcohol
and no alcohol measurements if all pilots took these tests.
Victor Katch Kinesiology
31
Example Effect of Alcohol (cont)
Data: random sample of n = 10 time differences
Step 2: Verify data conditions …
Boxplot shows no outliers
nor extreme skewness.
Victor Katch Kinesiology
32
Example Effect of Alcohol (cont)
Step 2: … Summarizing data with a test statistic
Test of mu = 0.0 vs mu > 0.0
Variable
N
Mean StDev
Diff
10 195.6 230.5
SE Mean T
72.9
2.68
P
0.013
Key elements:
Sample statistic: d = 195.6 (under “Mean”)
sd
230.5
Standard error: s.e.d  

 72.9 (under “SE Mean”)
n
10
d  0 195.6  0
t

 2.68
sd
72.9
n
Victor Katch Kinesiology
(under “T”)
33
Example Effect of Alcohol (cont)
Step 3: Find the p-value
From output: p-value = 0.013
From Table A.3: p-value
is between 0.007 and 0.015.
The value t = 2.68 is between
column headings 2.58 and 3.00
in the table, and for df =9,
the one-sided p-values are
0.015 and 0.007.
Victor Katch Kinesiology
34
Example Effect of Alcohol (cont)
Steps 4 and 5: Decide whether or not the
result is statistically significant based on the
p-value and Report the Conclusion
Using  = 0.05 as the level of significance
criterion, we can reject the null hypothesis
since the p-value of 0.013 is less than 0.05.
Even with a small experiment, it appears that
alcohol has a statistically significant effect
and decreases performance time.
Victor Katch Kinesiology
35
Testing The Difference between Two
Means (Independent Samples)
Step 1: Determine null and alternative hypotheses
H0: m1 – m2 = 0 versus
Ha: m1 – m2  0 or Ha: m1 – m2 < 0 or Ha: m1 – m2 > 0
Watch how Population 1 and 2 are defined.
Step 2: Verify data conditions and compute test statistic
Both n’s are large or no extreme outliers or skewness in
either sample. Samples are independent. The t-test statistic is:
t
sample mean  null value x1  x2   0

standard error
s12 s22

n1 n2
Steps 3, 4 and 5: Similar to t-test for one mean.
Victor Katch Kinesiology
36
Example Effect of Stare on Driving
Randomized experiment: Researchers either stared
or did not stare at drivers stopped at a campus stop
sign; Timed how long (sec) it took driver to proceed
from sign to a mark on other side of the intersection.
Question: Does stare speed up crossing times?
Step 1: State the null and alternative hypotheses
H0: m1 – m2 = 0 versus Ha: m1 – m2 > 0
where 1 = no-stare population and 2 = stare population.
Victor Katch Kinesiology
37
Example Effect of Stare (cont)
Data: n1 = 14 no stare and n2 = 13 stare responses
Step 2: Verify data conditions …
No outliers nor extreme skewness for either group.
Victor Katch Kinesiology
38
Example Effect of Stare (cont)
Step 2: … Summarizing data with a test statistic
Sample statistic: x1  x2 = 6.63 – 5.59 = 1.04 seconds
Standard error: s.e.( x1  x2 ) 
t
x1  x2   0  1.04  0  2.41
s12 s22

n1 n2
Victor Katch Kinesiology
s12 s22
1.36 2 0.822 2



 0.43
n1 n2
14
13
0.43
39
Example Effect of Stare (cont)
Steps 3, 4 and 5: Determine the p-value and make
a conclusion in context.
The p-value = 0.013, so we reject the null hypothesis,
the results are “statistically significant”.
The p-value is determined using a t-distribution with
df = 21 (df using Welch approximation formula) and
finding area to right of t = 2.41.
Table A.3 => p-value is between 0.009 and 0.015.
We can conclude that if all drivers were stared at,
the mean crossing times at an intersection would
be faster than under normal conditions.
Victor Katch Kinesiology
40
The Two Types of Errors and
Their Probabilities
When the null hypothesis is true, the
probability of a type 1 error, the level of
significance, and the -level are all equivalent.
When the null hypothesis is not true,
a type 1 error cannot be made.
Victor Katch Kinesiology
41
Type I Error
 A Type I error is the mistake of
rejecting the null hypothesis when it
is true.
 The symbol  (alpha) is used to
represent the probability of a type I
error.
Victor Katch Kinesiology
42
Type II Error
A Type II error is the mistake of failing
to reject the null hypothesis when it is
false.
The symbol (beta) is used to
represent the probability of a type II
error.
Victor Katch Kinesiology
43
Example: Assume that we a conducting
a hypothesis test of the claim p > 0.5. Here
are the null and alternative hypotheses: H0:
p = 0.5, and H1: p > 0.5.
a) Identify a type I error.
b) Identify a type II error.
Victor Katch Kinesiology
Example: Assume that we a
conducting a hypothesis test of the
claim p > 0.5.
Here are the null and alternative
hypotheses: H0: p = 0.5, and H1: p > 0.5.
Identify a type I error.
A type I error is the mistake of rejecting a true
null hypothesis, so this is a type I error:
Conclude that there is sufficient evidence to
support p > 0.5, when in reality p = 0.5.
Victor Katch Kinesiology
Example: Assume that we a
conducting a hypothesis test of the
claim p > 0.5. Here are the null and
alternative hypotheses: H0: p = 0.5, and
H1: p > 0.5.
Identify a type II error
A type II error is the mistake of failing to reject
the null hypothesis when it is false, so this is
a type II error: Fail to reject p = 0.5 (and
therefore fail to support p > 0.5) when in
reality p > 0.5.
Victor Katch Kinesiology
Type I and Type II Errors
Victor Katch Kinesiology
47
Controlling Type I and
Type II Errors
For any fixed , an increase in the sample size
n will cause a decrease in 
For any fixed sample size n , a decrease in 
will cause an increase in . Conversely, an
increase in  will cause a decrease in  .
To decrease both  and , increase the sample
size.
Victor Katch Kinesiology
48
Definition
Power of a Hypothesis Test
The power of a hypothesis test is the
probability (1 - ) of rejecting a false
null hypothesis, which is computed by
using a particular significance level 
and a particular value of the population
parameter that is an alternative to the
value assumed true in the null
hypothesis.
Victor Katch Kinesiology
49
Trade-Off in Probability for Two Errors
There is an inverse relationship between the
probabilities of the two types of errors.
Increase probability of a type 1 error =>
decrease in probability of a type 2 error
Victor Katch Kinesiology
50
Type 2 Errors and Power
Three factors that affect probability of a type 2 error
1. Sample size; larger n reduces the probability of a type 2
error without affecting the probability of a type 1 error.
2. Level of significance; larger  reduces probability of a
type 2 error by increasing the probability of a type 1 error.
3. Actual value of the population parameter; (not in
researcher’s control. Farther truth falls from null value (in
Ha direction), the lower the probability of a type 2 error.
When the alternative hypothesis is true, the probability of
making the correct decision is called the power of a test.
Victor Katch Kinesiology
51