null hypothesis

Download Report

Transcript null hypothesis

Recap of confidence intervals
If the 95%CI we calculated includes the hypothesized μ,
we conclude that our sample is, is not statistically
different from the assumed population
If the 95%CI does not include the hypothesized μ, we
conclude that our sample is statistically different from
the assumed population
Hypothesis Testing Set-up
State the null hypothesis (H0)


In statistics, we always start by assuming that the null
hypothesis is true (“no effect” or “no difference”)
Only if there is convincing evidence do we reject the
null hypothesis
IQ example


In words: “There is no difference in average IQ
between group 1 and group 2.”
In symbols:
μ1 = μ2
or
μ1 – μ2 = 0
Note: The hypothesis is always written in terms of the
population parameter, not the sample statistic.
Hypothesis Testing Set-up
State the alternate hypothesis (HA, Ha, or H1)


The alternative hypothesis states that there is a
difference
Can go in either direction (two-sided or two-tailed)
IQ example


In words: “There is a difference in average IQ between
group 1 and group 2.”
In symbols:
μ1 ≠ μ2
or
μ1 – μ2 ≠ 0
Note: In the medical literature, specific hypotheses are
rarely stated explicitly.
How hypothesis testing is done
Define the null and alternate hypotheses
Collect relevant data from a sample
Calculate the test statistics specific to the null hypothesis
Compare the value of the test statistics to that from a
known probability distribution
Interpret the resultant p-value
What is alpha (α)?
The type I error rate
The probability threshold beyond which the null
hypothesis would be rejected
The probability threshold where we allow for the
rejection of H0 when H0 is true
Conventionally set to 5%
2.5%
2.5%
How is the p-value derived?
Look up in table
Each test statistic is associated with a p-value
What is the p-value?
Under the null hypothesis (H0), the
p-value is the probability of obtaining a
test statistic at least as extreme as the one
observed by chance alone.
What the p-value looks like
Probability
Theoretical
distribution of
test statistic
Test statistic
Sum of the yellow
areas = p-value
0
Value of test statistic
Area under the curve
which represents the
probability of
obtaining a test
statistic at least as
extreme as the one
observed by chance
Alpha and p-value
If p < α then we reject the null hypothesis in favor of
the alternate hypothesis.
Test statistic
2.5%
p-value
2.5%
Alpha and p-value
If p > α then we do not reject the null hypothesis.
Test statistic
2.5%
p-value
2.5%
Alpha and p-value: Example
Table 1: Baseline characteristics of a sample from a study examining bear
attacks in a population of campers
Characteristic
Cases Controls
p-value
Percent female
13.4
40.2
0.001
Mean age
27.4
27.1
0.239
Mean number of days spent camping
5.4
4.6
0.070
Mean daily honey consumption (oz.)
2.3
0.7
0.003
We reject H0 and conclude that there is a statistically significant
difference in the sex distribution between cases and controls.
We do not reject H0 and conclude that there is no statistically
significant difference in mean age between cases & controls.
We do not reject H0 and conclude that there is no statistically significant
difference in the mean of days spent camping between cases & controls.
We reject H0 and conclude that there is a statistically significant
difference in mean honey consumption between cases & controls.
Type I and II error
Type I error (α) occurs when H0 is rejected when it
shouldn’t be

When there truly is no effect or association, but one was
observed by chance
Type II error (β) occurs when H0 is not rejected when
it should be


When there truly is an effect or association, but there
was not one detected
Is a function of statistical power (1-β)
Power, sample size, alpha, and beta
For a given level of α, increasing n (the sample size)
will…


Increase the power of the study to detect a difference or
association
Decrease type II error rate (β)
Studies with small samples are more likely to be
underpowered


Large p-values, even if there appears to be an
association or difference
Wide confidence intervals
Types of error and study conclusions
Decision based on
study results
Unknown reality or truth about population
H0 true
HA true
Reject H0
Type I error (α) Proper decision
Do not reject H0 Proper decision Type II error (β)
Analogous to the American justice system…
Jury’s decision
Unknown reality or truth about defendant
Innocent
Guilty
Found guilty
Type I error (α) Proper decision
Found innocent Proper decision Type II error (β)
Different types of data
Age and race are different types of variables
a.
b.
c.
d.
State the null hypothesis for the distribution of race.
The proportion of Whites is the same in cases and controls.
The proportion of Whites is the different comparing cases to
controls.
The proportion of Whites is lower in the cases than the controls.
The proportion of Whites is higher in the cases than the controls.