t Test for a Single Group Mean (Part 2)

Download Report

Transcript t Test for a Single Group Mean (Part 2)

Psych 5500/6500
The t Test for a Single Group Mean (Part 2):
•
•
•
p Values
One-Tail Tests
Assumptions
Fall, 2008
1
Reporting Results
When a t test analysis is reported in the literature the
tcritical values and a curve with the rejection regions
are rarely given. Usually there is not even an
explicit statement about whether or not H0 was
rejected.
Instead, the first example from the previous lecture
would commonly be reported like this:
t(5) = 3.82, p=.012
The general format is:
t(d.f.) = tobtained, p=...
2
p values
Let’s take a closer look at what is reported:
t(5) = 3.82, p=.012, or in generic terms,
t(d.f.) = tobtained, p=...
While the d.f. and the value of tobtained are of
interest, the most important piece of
information is the value of ‘p’, this tells you
whether or not H0 was rejected.
3
1) Pragmatical Understanding of ‘p values’
The single most important thing to understand about
a ‘p value’ is this:
If p is less than or equal to your significance level
then you reject H0, if it is greater than your
significance level then you do not reject H0.
t(5) = 3.82, p=.012
If our significance level was the usual .05, then we
know that H0 was rejected.
Even if you are unfamiliar with the statistic procedure
i.e. χ²(5) = 6.43, p=.06 you can still easily tell if H0
was rejected of not simply by looking at the p value
(here H0 is not rejected).
4
Examples
In the following analyses H0 was rejected (assuming
the significance level was set at .05):
p=0.04
p<.05
p<.001
p=.05
In the following H0 was not rejected:
p=0.07
p>.05
p=0.051
Giving a range for the p value (e.g. ‘p<.05’) used to be quite common
because computing the exact value of p was difficult without a
computer, now an exact value of p is usually provided.
5
2) Conceptual Understanding of ‘p values’
The ‘p value’ is a probability, in this case it is a
conditional probability:
p(getting a result that far or further from what
H0 predicted | H0 is true)
Let’s go back and take a look at the t test
examples from the previous lecture.
6
First Example
(Sample Mean = 106)
Note that the obtained value of t falls within the ‘reject H0’ region. 7
H0 stated that the mean of the population was
100, the sample mean we obtained was
106. The ‘p value’ in this case would be the
probability of obtaining a sample mean that
is 6 or more away from 100 (in either
direction).
8
The p value is the probability of obtaining a sample mean of 106 (t=3.82)
or greater, or a sample mean of 94 (t=-3.82) or less, if this curve (based
upon H0) is correct. In this case p = .012 (.006 on each tail). How to compute that
is covered next, for now note that as the reject H0 regions add up to .05, then the p
value of our result is obviously less than .05.
9
p=0.012 (.006 on each tail). The t table is not very useful for computing the p
value here as the table covers only a small number of possible p values (you can’t
look up t=3.82 and see what value of p goes with that). We have two tools we can
use: 1) in the t tool I have written you can input the value of t and the df and the
tool will give you the value of p; and 2) if you have SPSS analyze the data using the
‘One Sample t Test’, it will give you both the value of t and the value of p.
10
Second Example
(Sample Mean = 102)
Note the obtained value of t falls within the ‘do not reject H0’ region.
11
H0 stated that the mean of the population
was 100, the sample mean we obtained
was 102. The ‘p value’ in this case would
be the probability of obtaining a sample
mean that is 2 or more away from 100 (in
either direction).
12
The p value is the probability of obtaining a sample mean of 102 (t=1.27)
or greater, or a sample mean of 98 (t=-1.27) or less, if this curve (based
upon H0) is correct. Note that p must be greater than .05 here, in this case p=.25
(.125 on each tail).
13
Important!
Note again the ‘p value’ is the probability of
the obtained sample mean given H0 is true,
it is not the probability that H0 is true given
the sample mean we obtained. This is a
common mistake.
14
‘A’ = getting a sample mean that far from what H0
predicted.
‘B’ = H0 being true.
p value = p(A|B).
It would be nice if it were p(B|A), or in other words if it
were the probability that H0 is true given our
sample mean, because that is what we really want
to know. To find the p(B|A), however, we would
have to turn to Bayes Theorem (we will take a look
at that again later in the semester).
15
3) Mechanical Understanding of ‘p values’
Looking back at the graphs with the rejection regions
and the p values, we can see that another way to
understand a p value is that it is what our
significance level would have to be in order to
reject H0.
Thus p=.04 means that we could have set the
significance level at .04 (rather than .05) and still
have rejected H0. While p=.06 would mean we
would have had to set our significance level at .06
(which is not allowed) in order to reject H0.
16
So What is Our Significance Level
Really?
We have to decide upon a significance level before
we run an analysis (we have to set up our decision
making criteria before we look at the results) and
so we set our significance level to .05.
Say we then analyze the data and find that p=.03
(this is often reported as ‘being significant at the
.03 level’).
Is our significance level .05 or .03? Authors often
write as if it were .03 .
17
My Answer
The significance level is .05 and α=.05. We
would have rejected H0 if p=.05 or p=.049,
that is our criterion. After the fact we see that
we could have made p=.03 and still have
rejected H0, but our decision-making criterion was .05, and thus that was the
actual probability of making a type 1 error if
H0 were true.
18
On ‘Significance’
Dictionary definition of significance: ‘Full of meaning,
important, momentous’.
Statistical Significance: ‘We were able to reject H0
(i.e. the results were unlikely if H0 were true)’.
When p is less than or equal to .05 we way the
results were ‘statistically significant’, this does not
necessarily mean the results are very meaningful or
important or momentous. It does mean that we can
conclude that H0 is probably false, which may or
may not be important.
19
Example
Let’s go back to:
H0: μElbonia = 100 (same as USA)
HA: μElbonia  100 (different than the USA)
Let’s say that with a sample N=10,000 we
obtain a sample mean of 101. As we will
see in the lecture on power, with such a
large N even such a small difference is
likely to lead to rejection of H0.
20
The results are ‘statistically significant’
because we rejected H0.
Whether the results are ‘theoretically
significant’ would depend upon whether
rejecting H0 sheds important light on the
theory that predicted Elbonians would have
an IQ other than 100.
Whether the results are ‘socially significant’
would depend upon whether it significantly
adds to our world to know that Elbonians are
just a little tiny bit smarter on average than
people in the USA.
21
Bottom Line
Statistical significance is important because it
is a prerequisite to making something out of
the analysis.
Whether that something is important or trivial is
beyond the scope of statistics.
People sometimes lose track of that, and begin
to believe that getting statistically significant
results is an end in itself.
22
Final Point about Significance
(Really...I Promise)
One view on statistical significance is that results are either
statistically significant or they are not. To say something
‘neared significance’ or was ‘almost significant’ is
meaningless (like saying someone is almost pregnant) at
best and violates the logic of null hypothesis testing at
worst.
Others argue that .05 is arbitrary, and so a result of .06 is just
about as good and probably reflects that the null hypothesis
is wrong.
More on this later in the semester when we tackle the
controversy surrounding null hypothesis testing.
23
However, it seems like both sides agree that if
the results are statistically significant (i.e.
p.05) then it makes sense to say that a
p=.001 is more convincing than a p=.049
(following along with the pregnancy
metaphor: if you are pregnant you can
either be at the beginning of the pregnancy
or well advanced).
24
Segue
Recall that when we run an experiment we are
testing a theory that makes some prediction, and
that prediction becomes our alternative hypothesis
(HA).
So far our examples have involved examining a
theory which predicts a difference (i.e. μElbonia 
100) but does not predict in which direction that
difference will go (whether μElbonia will be less than
or greater than 100). Such ‘nondirectional’
hypotheses are examined by putting a reject H0
region on both tails of the sampling distribution,
and are thus called ‘two-tailed tests’.
25
One-tailed Tests
When the theory we are testing specifically
predicts in which direction the difference
should go (i.e. we are testing a ‘directional’
hypothesis), then we perform a ‘one-tailed
test’.
First we will look at how to perform the t test
when the theory predicts that Elbonians
should have higher IQ’s than people in the
USA.
26
Writing the Hypotheses
If the theory predicts that the mean score of
Elbonians is greater than 100:
H0: μElbonia  100
HA: μElbonia > 100
Notes:
1. HA always state what the theory predicts.
2. Conceptually H0 is still the hypothesis of ‘no
difference’, but it needs to be written the way it
is to insure that the two hypotheses are
mutually exclusive and exhaustive.
27
Setting up the Rejection Region
Note that the full .05 is in one tail now, making it easier to
reject H0 if the results go in that direction and impossible to
reject H0 if the results go in the other direction. The new tc
value is 2.015, for the two-tail test it was 2.571.
28
Determining Which Tail Gets the
Rejection Region
H0: μElbonia  100
HA: μElbonia > 100
1. The conceptual approach: If HA is correct we will
want to ‘reject H0’, so put the reject H0 region
where HA predicts the results will fall (in this case
above 100).
2. The idiot proof approach: pretend that the
symbol in HA is an arrow, it points to the tail with
the rejection region.
29
Decision
30
p Value
With a one-tail test the p value is the area of the curve from the
tobt value to the end of the tail that has the rejection region.
31
p Value
The p value is the probability of getting a result that is that
far from what H0 predicted if H0 is true. We can see that the
p value must be less than .05, actually its p=.006.
32
Another Example
Now let’s see how to set things up if we are
testing a theory which specifically predicts
that the mean IQ of Elbonians is less than
100.
33
Writing Hypotheses
If the theory predicts that the mean score of
Elbonians is less than 100:
H0: μElbonia  100
HA: μElbonia <100
Again, HA expressed the prediction made by
the theory, while H0 is every other possibility.
34
Setting Up the Rejection Region
35
Which Tail?
H0: μElbonia  100
HA: μElbonia <100
1. The conceptual approach: If HA is correct we will
want to ‘reject H0’, so put the reject H0 region
where HA predicts the results will fall (in this case
below 100).
2. The idiot proof approach: pretend that the symbol
in HA is an arrow, it points to the tail with the
rejection region.
36
Decision
37
p Value
With a one-tail test the p value is the area of the curve from the
tobt value to the end of the tail that has the rejection region..
38
p Value
The p value is the probability of getting a result that is that
far from what H0 predicted if H0 is true. We can see that the
p value must be greater than .05, actually its p=.994.
39
The Trade-Off
Doing a one-tail test is a bit of a gamble; if
the theory is correct in its prediction then it
is easier to reject H0 in favor of HA (i.e. the
theory) because all .05 is put in the tail the
theory predicts. If the results go in the
opposite direction from that predicted by
the theory, however, then it is impossible to
reject H0 no matter how much a difference
there is.
40
Justifications for Making a Test OneTailed
Bottom line: you have to decide that the test is
one-tailed before you obtain your data.
1. Refer to results from prior, similar
experiments.
2. You are testing a theory which specifically
predicts which way the results should go.
3. Only one tail is of interest.
41
SPSS Analysis
The t test for a single group mean can be found
in the SPSS: ‘Analyze>>Compare
Means>>One-Sample T Test...’ menu.
For the value of the ‘Test Value’ input the
population mean according to H0.
42
SPSS Output
One-Sample Statistics
N
Y
Mean
6
Std. Deviation
106.00
3.847
Std. Error
Mean
1.571
One-Sample Test
Test Value = 100
t
Y
3.820
df
Sig. (2-tailed)
5
.012
Mean
Difference
6.000
95% Confidence
Interval of the
Difference
Lower
1.96
Upper
10.04
43
SPSS Analysis (cont.)
When SPSS does a ‘One Sample T Test’ analysis it will give a
value of ‘t’ and a value of ‘p’. The value of t is for tobtained.
The value of p is for a two-tailed test. If you want the p
value for a one-tailed test do the following:
First, determine whether the difference between the sample
mean and population mean proposed by H0 was in the
direction predicted by Ha.
Second, if the direction was predicted by Ha then the actual
p value = (SPSS p value)/2, if the direction was the
opposite of that predicted by Ha then the actual p value =
1 – ((SPSS p value)/2)).
In the former case (Ha made the correct prediction) the p value
goes down by half, in the latter case (Ha made the wrong
prediction) the p value is greater than .50
44
SPSS Analysis (cont.)
The confidence interval given by SPSS in this
analysis is the interval that has the stated
probability (95% unless you indicate
otherwise) of containing the true value of the
difference between the mean of the
population from which the sample was
drawn, and the mean of the population as
stated by H0, i.e. the true value of:
Difference
 μ Y actual  μ Y according
to H0

45
SPSS Analysis (cont.)

Difference  μ
μ
Y actual
Difference
Y according to H0
 μ Elbonians  1 00 
For the example when the sample mean was 106,
the SPSS output states that the 95% confidence
interval of the difference between the mean IQ of
Elbonians and the value proposed by H0 is:
1.96  Difference  10.04
As that interval does not contain ‘0’ we reject H0.
46
Assumptions
In all of the statistical tests we look at we will
be assuming that we have a valid measure
procedure, and that there is no systematic
bias in our samples.
47
Assumptions Underlying This t
Test
The t test for a single group mean has these
additional assumptions:
1. That the scores in the population are
normally distributed.
2. That the scores are independent.
48
Assumption of Normality
The t test is said to be ‘robust’ in terms of this
assumption, which means that the population can
be fairly non-normal without it having much of an
effect on the validity of the analysis, particularly
when the N of our sample is large (which will
influence the shape of the SDM towards
normality thanks to the Central Limit Theorem).
But some deviations from normality can create
problems. We will take a closer look at this
assumption in an upcoming lecture.
49
Assumption of Independence
The assumption of independence of scores
means that any one person’s score is
unaffected by (can’t be predicted by)
anyone else’s score. This would be
violated if we measured the same person
twice, or if we let two Elbonians work
together on the IQ test.
50
Independence of Errors
The ‘independence of the scores’ is sometimes
referred to as the ‘independence of errors.
Calling it the ‘independence of errors’ will make
more sense within the context of the type of
analyses we will learn next semester, and we will
take a look at what it means at that time. For
now, just think of independence in terms of any
one score not influencing (i.e. not able to
predict) any other score in the sample.
51