Transcript Chapter 4
Psy B07
SAMPLING DISTRIBUTIONS
AND HYPOTHESIS TESTING
Chapter 4
Slide 1
Psy B07
Outline
Sampling Distributions revisited
Hypothesis Testing
Using the Normal Distribution to test
Hypotheses
Type I and Type II Errors
One vs. Two Tailed Tests
Chapter 4
Slide 2
Psy B07
Statistics is Arguing
Typically, we are arguing either 1) that some
value (or mean) is different from some other
mean, or 2) that there is a relation between
the values of one variable, and the values of
another.
Thus, we typically first produce some null
hypothesis (i.e., no difference or relation) and
then attempt to show how improbably
something is given the null hypothesis.
Chapter 4
Slide 3
Psy B07
Sampling Distributions
Just as we can plot distributions of
observations, we can also plot distributions of
statistics (e.g., means).
These distributions of sample statistics are
called sampling distributions.
For example, if we consider the 24 students in
a tutorial who estimated my weight as a
population, their guesses have an x of 168.75
and an of 12.43 (2 = 154.51)
Chapter 4
Slide 4
Psy B07
Sampling Distributions
If we repeatedly sampled groups of 6 people,
found the x of their estimates, and then
plotted the x’s, the distribution might look like:
35
30
25
20
15
10
5
0
155 157.5 160 162.5 165 167.5 170 172.5 175 177.5 180 182.5 185
Chapter 4
Slide 5
Psy B07
Hypothesis Testing
What I have previously called “arguing” is
more appropriately called hypothesis testing.
Hypothesis testing normally consists of the
following steps:
1) some research hypothesis is proposed (or
alternate hypothesis) - H1.
2) the null hypothesis is also proposed - H0.
Chapter 4
Slide 6
Psy B07
Hypothesis Testing
3) the relevant sampling distribution is
obtained under the assumption that H0 is
correct.
4) I obtain a sample representative of H1 and
calculate the relevant statistic (or
observation).
5) Given the sampling distribution, I calculate
the probability of observing the statistic (or
observation) noted in step 4, by chance.
6) On the basis of this probability, I make a
decision
Chapter 4
Slide 7
Psy B07
The Beginnings of an Example
One of the students in the tutorial
guessed my weight to be 200 lbs. I
think that said student was fooling
around. That is, I think that guess
represents something different that do
the rest of the guesses.
H0 - the guess is not really different.
H1 - the guess is different.
Chapter 4
Slide 8
Psy B07
The Beginnings of an Example
1) obtain a sampling distribution of H 0.
2) calculate the probability of guessing
200, given this distribution
3) Use that probability to decide
whether this difference is just chance,
or something more.
Chapter 4
Slide 9
Psy B07
A Touch of Philosophy
Some students new to this idea of hypothesis
testing find this whole business of creating a
null hypothesis and then shooting it down as a
tad on the weird side, why do it that way?
This dates back to a philosopher named Karl
Popper who claimed that it is very difficult to
prove something to be true, but no so difficult
to prove it to be untrue.
Chapter 4
Slide 10
Psy B07
A Touch of Philosophy
So, it is easier to prove H0 to be wrong,
than to prove HA to be right.
In fact, we never really prove H1 to be
right. That is just something we imply
(similarly H0).
Chapter 4
Slide 11
Psy B07
Using the Normal Distribution to
test Hypotheses
The “Marty’s Weight” example begun
earlier is an example of a situation
where we want to compare one
observation to a distribution of
observations.
This represents the simplest hypothesistesting situation because the sampling
distribution is simply the distribution of
the individual observations.
Chapter 4
Slide 12
Psy B07
Using the Normal Distribution to
test Hypotheses
Thus, in this case we can use the stuff we
learned about z-scores to test hypotheses that
some individual observation is either
abnormally high (or abnormally low).
That is, we use our mean and standard
deviation to calculate the a z-score for the
critical value, then go to the tables to find the
probability of observing a value as high or
higher than (or as low or lower than) the one
we wish to test.
Chapter 4
Slide 13
Psy B07
Finishing the Example
= 168.75
= 12.43
Critical = 200
x
z
200 168 .75
12.43
2.51
Chapter 4
Slide 14
Psy B07
Finishing the Example
From the z-table, the area of the portion
of the curve above a z of 2.51 (i.e., the
smaller portion) is approximately .0060.
Thus, the probability of observing a
score as high or higher than 200 is
.0060
Chapter 4
Slide 15
Psy B07
Making Decisions given
Probabilities
It is important to realize that all our test really
tells us is the probability of some event given
some null hypothesis.
It does not tell us whether that probability is
sufficiently small to reject H0, that decision is
left to the experimenter.
In our example, the probability is so low, that
the decision is relatively easy. There is only a
.60% chance that the observation of 200 fits
with the other observations in the sample.
Thus, we can reject H0 without much worry.
Chapter 4
Slide 16
Psy B07
Making Decisions given
Probabilities
But what if the probability was 10% or
5%? What probability is small enough
to reject H0?
It turns out there are two answers to
that:
the real answer.
the “conventional” answer.
Chapter 4
Slide 17
Psy B07
The “Real” Answer
First some terminology. . . .
The probability level we pick as our
cut-off for rejecting H0 is referred to
as our rejection level or our significance
level.
Any level below our rejection or
significance level is called our rejection
region
Chapter 4
Slide 18
Psy B07
The “Real” Answer
OK, so the problem is choosing an appropriate
rejection level.
In doing so, we should consider the four
possible situations that could occur when
we’re hypothesis testing.
Decision
Chapter 4
Real state of the World
H0 true
H0 false
Reject H0
Type I error
Correct
Fail to
reject H0
Correct
Type II error
Slide 19
Psy B07
Type I and Type II Errors
Type I error is the probability of rejecting the
null hypothesis when it is really true.
Example: saying that the person who
guessed I weigh 200 lbs was just screwing
around when, in fact, it was an honest guess
just like the others.
We can specify exactly what the probability of
making that error was, in our example it was
.60%.
Chapter 4
Slide 20
Psy B07
Type I and Type II Errors
Usually we specify some “acceptable” level of
error before running the study.
then call something significant if it is below
this level.
This acceptable level of error is typically
denoted as
Before setting some level of it is important to
realize that levels of are also linked to Type
II errors
Chapter 4
Slide 21
Psy B07
Type I and Type II Errors
Type II error is the probability of failing
to reject a null hypothesis that is really
false.
Example: judging OJ as not guilty
when he is actually guilty.
The probability of making a Type II
error is denoted as
Chapter 4
Slide 22
Psy B07
Type I and Type II Errors
Unfortunately, it is impossible to precisely
calculate because we do not know the shape
of the sampling distribution under H1.
It is possible to “approximately” measure ,
and we will talk a bit about that in Chapter 8.
For now, it is critical to know that there is a
trade-off between and , as one goes down,
the other goes up.
Thus, it is important to consider the situation
prior to setting a significance level.
Chapter 4
Slide 23
Psy B07
The Conventional Answer
While issues of Type I versus Type II error are
critical in certain situations, psychology
experiments are not typically among them
(although they sometimes are).
As a result, psychology has adopted the
standard of accepting =.05 as a conventional
level of significance.
It is important to note, however, that there is
nothing magical about this value (although
you wouldn’t know it by looking at published
articles).
Chapter 4
Slide 24
Psy B07
One vs. Two Tailed Tests
Often, we want to determine if some critical
difference (or relation) exists and we are not
so concerned about the direction of the effect.
That situation is termed two-tailed, meaning
we are interested in extreme scores at either
tail of the distribution.
Note, that when performing a two-tailed test
we must only consider something significant if
it falls in the bottom 2.5% or the top 2.5% of
the distribution (to keep at 5%).
Chapter 4
Slide 25
Psy B07
One vs. Two Tailed Tests
If we were interested in only a high or low
extreme, then we are doing a one-tailed or
directional test and look only to see if the
difference is in the specific critical region
encompassing all 5% in the appropriate tail.
Two-tailed tests are more common usually
because either outcome would be interesting,
even if only one was expected.
Chapter 4
Slide 26
Psy B07
Other Sampling Distributions
The basics of hypothesis testing
described in this chapter do not change.
All that changes across chapters is the
specific sampling distribution (and its
associated table of values).
The critical issue will be to realize which
sampling distribution is the one to use
in which situation.
Chapter 4
Slide 27