Null Hypothesis Significance Testing (NHST)

Download Report

Transcript Null Hypothesis Significance Testing (NHST)

Null Hypothesis Significance Testing (NHST)
A way by which psychologists and other scientists attempt to draw inferences about population parameters.
The way NHST is taught and practiced today represents an amalgam of ideas developed by Fisher, Neyman, and
Pearson in the first half of the 20th Century. This history partly explains why NHST is such a convoluted way of
thinking about research and why so many psychologists cannot accurately define the p-value from NHST.
Suppose you hypothesize that OSU students will, on average, have higher ACT scores than the national average.
You go about testing your hypothesis by randomly sampling 15 students who are currently enrolled at OSU. Their
ACT scores are as follows:
24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23
You obtain normative data for the ACT from the U.S. government website:
http://nces.ed.gov/programs/digest/d09/tables/dt09_147.asp
There you find that the average ACT score for 2009 (the last year reported) was equal to 21.1 with a standard
deviation of 5.1. You treat these values as population parameters to which you will compare your estimated
population mean for OSU students. Do “OSU Students” have a higher mean than the norm?
The point of NHST is to make an inference about population parameters.
Pop1 : General Population (the norm)
µNorm = 21.1
σNorm = 5.1
Pop2 : Population of OSU Students
µOSU = ?
σOSU assumed equal to σNorm
Null
Hypothesis
Alternative
or Research
Hypothesis
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
3-Valued Logic Form
*predicted
Three Properties
1. Population parameters
2. Mutually Exclusive
3. All possible outcomes covered
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
OSU Sample : 24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23
We enter these numbers into our calculators and compute the sample mean. For these 15 students
it equals 24.07 (n = 15, x𝑂𝑆𝑈 = 24.07)
Is this value higher than 21.1? Yes! But, we do not yet conclude that µOSU > µNorm.
Why not? Because we only have the sample mean, not the population mean.
It could very well be the case that the population mean for OSU students is 21.1, thus equal to the
norm.
What do we do then, since we wish to choose one of the hypotheses. Again, notice how the
hypotheses are written in terms of population parameters, and the µOSU value has not been
observed. This is why we refer to this process as “inferential statistics.” We are making an inference
to an unobserved value.
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
OSU Sample : 24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23
We enter these numbers into our calculators and compute the sample mean. For these 15 students
it equals 24.07 (n = 15, x𝑂𝑆𝑈 = 24.07)
So, we are going to put on our NHST hats and think in terms of probabilities. Specifically, if the
population OSU mean were in fact 21.1, what is the probability of obtaining a sample mean of at
least 24.07 from 15 randomly selected students? If a result of obtaining at least 24.07 is really
strange (improbable), then perhaps we can at least rule out 21.1; that is, we can reject the null
hypothesis which states that µOSU = µNorm = 21.1. After rejecting the null hypothesis we can then
choose between the other two hypotheses.
A bit convoluted? Yes! But this is the wacky world of NHST.
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
With NHST, we will make several assumptions and set up some ground rules.
1. Assume the null hypothesis is true. µOSU = µNorm
2. Assume random sampling and a few other things (we’ll discuss later in class)
3. Set up an agreed upon “critical probability” (i.e., critical p-value) before examining the data.
4. In this case use our knowledge of how means behave in a sampling scenario to evaluate the
data and draw a conclusion about the population parameters.
Using the Central Limit Theorem, which applies to means computed from randomly drawn
samples, let’s assume a perfectly normal distribution of mean values. If we assume the null
hypothesis to be true (µOSU = µNorm), then the mean of this distribution of means will equal 21.1.
µx-bar = 21.1
Next, let’s convert the means in the distribution to z-scores. We now have the Standard Normal
Curve which has a mean equal to 0.
Next, we set up what is known as the “Rejection Region” in the distribution. Following the
conventional practice of psychologists for the past 70+ years, we set up this region to correspond
to 5% of the distribution. This is the infamous value underlying the statement “p ≤ .05” in journal
articles (sometimes you might see p ≤ .01). Again, it is a value that is completely arbitrary. We use
it solely on the basis of convention.
You can see below that we’ve cut the 5% evenly so that 2.5% is in the lower tail of the distribution
and 2.5% is in the upper tail of the distribution. This is what we call a two-tailed test. Notice how
the alternative hypothesis has two parts: one in which the OSU mean is higher than the norm, and
one in which the OSU mean is lower. These two directions correspond to the two tails of the
distribution.
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
If the observed mean, converted to a z-score, is NOT in the rejection region, then we’ll “fail to
reject the null hypothesis” and declare the result as “nonsignificant, p > .05.”
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
If the observed mean, converted to a z-score, is in the rejection region, then we’ll “reject the null
hypothesis” and declare the finding as “statistically significant, p ≤ .05.”
Next, let’s refer to the .05 (5%) as our “alpha level” (α = .05). We’ll also refer to this as our pcritical, or pcrit, value.
Using alpha (pcrit) we determine the critical z-values values for our statistical test. As we’ll see
shortly, the test we will run on these ACT data is a z-test for means. The critical values are those zscores that cut off the rejection region. Looking up these values in the z-table, we find zcrit = +/1.96. Again, as seen below, these are the z-values that demarcate 2.5% of each tail of the
distribution.
Finally, we are ready to see if the OSU student’s data are “statistically significant.”
OSU Sample : 24, 29, 19, 19, 26, 28, 22, 18, 19, 32, 30, 25, 26, 21, 23
(n = 15, x𝑂𝑆𝑈 = 24.07)
We next convert the mean to a z-score using what is known as the “z-test for means” formula:
𝑧𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 = 𝑧𝑜𝑏𝑠 =
x− 𝜇
𝜎
𝑛
=
24.07−21.1
5.1
15
=
2.97
1.32
= 2.25
𝑧𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 = 𝑧𝑜𝑏𝑠 =
x− 𝜇
𝜎
𝑛
=
24.07−21.1
5.1
15
=
2.97
1.32
= 2.25
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
2.25
We can see that 2.25 falls in the rejection region. We therefore “Reject the Null Hypothesis”
which is equivalent to saying that the result is “statistically significant, p ≤ .05.”
Ho : µOSU = µNorm
HA : µOSU < µNorm
or µOSU > µNorm
2.25
Now that we’ve rejected the null hypothesis, which alternative hypothesis do we accept?
We see that the sample mean for the OSU students is 24.07, which is higher than the norm (21.1).
If we regard 24.07 as an estimate of the OSU population mean (µOSU), then we conclude that the
OSU population mean is higher than the norm, µOSU > µNorm.
That is the specific inference we have drawn here.
How improbable is the sample mean? x𝑂𝑆𝑈 = 24.07
This mean was converted to a z-score of 2.25. Since we are running a two-tailed test, we always look in both tails
of the distribution. We thus look up the probability for 2.25 (or higher) in our z-table and double the probability
to get the total region demarcated by the +/- 2.25 values in the distribution (see below).
-2.25
2.25
From the z-table, we find .0122. Doubling this value gives us .0244. This is our pobserved (or pobs) value which is less
than pcrit (.05). This is another way of reaching the same conclusion: Reject the null hypothesis, the result is
“statistically significant, p = .0244.”
So what is this pobs value psychologists consider so important?
Simply stated, it is a “weirdness statistic.” Assuming the null hypothesis is true, along with other assumptions
(we’ll cover later in class), and given a random sample, obtaining a sample mean of at least 24.07 is an unusual
(i.e. weird) result. One could see values this extreme as a normal part of sampling variability, but they would be
unusual (p = .0244). All in all, then, we aren’t saying a whole lot when we declare our results to be “statistically
significant!”
-2.25
2.25