Lecture 2: Null Hypothesis Significance Testing

Download Report

Transcript Lecture 2: Null Hypothesis Significance Testing

Lecture 2:
Null Hypothesis Significance
Testing Continued
Laura McAvinue
School of Psychology
Trinity College Dublin
Null Hypothesis Significance Testing
• Previous lecture, Steps of NHST
–
–
–
–
–
–
Specify the alternative/research hypothesis
Set up the null hypothesis
Collect data
Run the appropriate statistical test
Obtain the test statistic and associated p value
Decide whether to reject or fail to reject the null
hypothesis on the basis of p value
Null Hypothesis Significance Testing
• Decision to reject or fail to reject Ho
– P value
– Probability of obtaining the observed results if Ho is
true
– By convention, use the significance level of p < .05
– Conclude that it is highly unlikely that we would
obtain these results by chance, so we reject Ho
– Caveat! The fact that there is a significance level
does not mean that there is a simple ‘yes’ or ‘no’
answer to your research question
Null Hypothesis Significance Testing
• If you obtain results that are not statistically
significant (p>.05), this does not necessarily mean
that the relationship you are interested in does not
exist
• There are a number of factors that affect whether
your results come out as statistically significant
– One and two-tailed tests
– Type I and Type II errors
– Power
One and Two-tailed Tests
• One-tailed / Directional Test
– Run this when you have a prediction about the
direction of the results
• Two-tailed / Non-Directional Test
– Run this when you don’t have a prediction about the
direction of the results
Recall previous example…
• Research Qu
– Do anxiety levels of students differ from anxiety
levels of young people in general?
• Prediction
– Due to the pressure of exams and essays, students
are more stressed than young people in general
• Method
– You know the mean score for the normal young
population on the anxiety measure = 50
– You predict that your sample will have mean > 50
– Run a one-tailed one-sample t test at p < .05 level
One-tailed Test
• Compare the mean of your
sample to the sampling
distribution for the
population mean
• Decide to reject Ho if your
sample mean falls into the
highest 5% of the sampling
distribution
Dilemma
• But! What if your prediction is wrong?
– Perhaps students are less stressed than the general
young population
• Their own bosses, summers off, no mortgages
– With previous one-tailed test, you could only reject
Ho if you got an extremely high sample mean
– What if you get an extremely low sample mean?
• Run a two-tailed test
– Hedge your bets
– Reject Ho if you obtain scores at either extreme of
the distribution, very high or very low sample mean
Two-tailed Test
• You will reject Ho when a
score appears in the highest
2.5% of the distribution or
the lowest 2.5%
• Note that it’s not the highest
5% and the lowest 5% as
then you’d be operating at p
= .1 level, rejecting Ho for
10% of the distribution
• So, we gain ability to reject
Ho for extreme values at
either end but values must
be more extreme
Errors in NHST
• Howell (2008) p. 157
– “Whenever we reach a decision with a statistical test,
there is always a chance that our decision is the
wrong one”
• Misleading nature of NHST
– Because there is a significance level (p = .05), people
interpret NHST as a definitive exercise
– Results are statistically significant or not
– We reject Ho or we don’t
– The Ho is wrong or right
Errors in NHST
• Remember we are dealing with probabilities
– We make our decision on the basis of the likelihood
of obtaining the results if Ho is true
– There is always the chance that we are making an
error
• Two kinds of Error
– We reject Howhen it is true (Type I error)
• We say there’s a significant difference when there’s not
– We accept Ho when it is false (Type II error)
• We say there is no significant difference when there is
Type I Error
• Our anxiety example
• Predict that students will
have greater anxiety score
than young people in general
• Test Ho that students’
anxiety levels do not differ
from young people
• One-tailed one sample t-test
at p < .05
• Compare sample mean with
sampling distribution of
mean for the population (Ho)
Type I Error
• Decide to reject Ho if your
sample mean falls in the top
5% of the distribution
• But!
• This 5%, even though at the
extreme end, still belongs to
the distribution
• If your sample mean falls
within this top 5%, there is
still a chance that your
sample came from the Ho
population
Type I Error
• For example, if p = .04, this means that there is a very small
chance that your sample mean came from that population,
– But this is still a chance, you could be rejecting Ho when it
is in fact true
• Researchers are willing to accept this small risk (5%) of making
a Type I error, of rejecting Ho when it is in fact true
• Probability of making Type I error = alpha  = the significance
level that you chose
– .05, .01
Type II Error
• So why not set a very low significance level to
minimise your risk of making a Type I error?
– Set p < .01 rather than p < .05
• As you decrease the probability of making a Type I
error you increase the probability of making a Type II
error
• Type II Error
– Fail to reject Ho when it is false
– Fail to detect a significant relationship in your data
when a true relationship exists
• For argument’s sake,
imagine that H1 is correct
• Sampling Distribution under
Ho
• Sampling Distribution under
H1
• Reject Ho if sample mean
equals any value to the right
of the critical value (red
region)
– Correct Decision
• Accept Ho if sample mean
equals any value to the left
of the critical region
– Type II Error
Four Outcomes of Decision Making
True State of Nature
Decision
Ho is True
Ho is False
Accept Ho
Correct Decision
Type II Error
Reject Ho
Type I Error
Correct Decision
Power
• You should minimise both Type I and Type II errors
– In reality, people are often very careful about Type I
(i.e. strict about ) but ignore Type II altogether
• If you ignore Type II error, your experiment could be
doomed before it begins
– even if a true effect exists (i.e. H1 is correct), if  is
high, the results may not show a statistically
significant effect
• How do you reduce the probability of a Type II error?
– Increase the power of the experiment
Power
• Power
– The probability of
correctly rejecting a
false Ho
– A measure of the ability
of your experiment to
detect a significant
effect when one truly
exists
– 1-
How do we increase the power of
our experiment?
• Factors affecting power
– The significance level ()
– One-tailed v two-tailed test
– The true difference between Ho and H1(o 1)
– Sample Size (n)
The Influence of  on Power
• Reduce the significance level ()…
– Reduce the probability of making a Type I
error
• Rejecting the Ho when it is true
– Increase the probability of making a Type II
error
• Accepting the Ho when it is false
– Reduce the power of the experiment to
detect a true effect as statistically
significant
Reduce  and reduce power
Increase  and increase power
But! You increase the probability of a Type I error!
The Influence of One v Two-tailed
Tests on Power
• We lose power with a twotailed test
– power is divided across
the two tails of the
experiment
– Values must be more
extreme to be
statistically significant
The Influence of the True Difference
between Ho and H1
• The bigger the difference between o and 1, the easier it is to detect it
The Influence of Sample Size on
Power
• The bigger the sample size, the more power you have
• A big sample provides a better estimate of the population mean
• With bigger sample sizes, the sampling distribution for the mean
clusters more tightly around the population mean
• Standard deviation of the sampling distribution, known as
standard error the mean is reduced
• There is less overlap between the sampling distributions under
Ho and H1
• The power to detect a significant difference increases
The Influence of Sample Size on
Power
Sample Size Exercise
• Open the following dataset
– Software / Kevin Thomas / Power dataset (revised)
– Explores the effects of Therapy on Depression
• Perform two Independent Samples t-test
– Analyse / Compare means / Independent Samples t
test
– Group represents Therapy v Control
– Score represents post-treatment depression
– 1. Group1 & Score1
– 2. Group 2 & Score 2
Complete the following table
Analysis 1
Size of sample
Therapy mean score
Therapy standard
deviation
Control mean score
Control standard
deviation
Mean difference
T statistic
df
P-value
Analysis 2
What explains these results?
Analysis 1
Analysis 2
Size of sample
20
200
Therapy mean score
5.5
5.5
Therapy standard
deviation
3.03
2.89
Control mean score
6.3
6.3
Control standard
deviation
2.75
2.62
Mean difference
-.8
-.8
T statistic
-.618
-2.051
Df
18
198
P-value
.54
.042
So, how do I increase the power of
my study?
• You can’t manipulate the true difference between Ho
and H1
• You could increase your significance level () but
then you would increase the risk of a Type I error
• If you have a strong prediction about the direction of
the results, you should run a one-tailed test
• The factor that is most under your control is sample
size
– Increase it!