RM_Diff_Betw_Means

Download Report

Transcript RM_Diff_Betw_Means

Difference Between Means Test (“t” statistic)
Analysis of Variance (F statistic)
Difference Between Means (“t”) Test
•
So far we’ve examined several statistics that can be used to test hypotheses:
– Chi-Square (X 2), which requires all variables be categorical
– Regression (R2), which requires all variables be continuous
– Logistic regression (b and Exp b), which requires a nominal dependent variable
•
The difference between the means test (t) is used to test hypotheses with categorical
independent and continuous dependent variables
– Gender  height
– Gender  cynicism (1-5 scale)
•
We compare the means of two randomly drawn samples
– The null hypothesis can be rejected if the difference, reflected in the magnitude of
the t statistic, goes so far beyond what would be expected by sampling error, that
there are less than five chances in 100 (p> .05) that the relationship between the
variables is due to chance
– This sampling error is called the “standard error of the difference between means”
– the difference between all possible pairs of means, due to chance alone
When using the t table we must know whether the hypothesis is 1-tailed (direction of
effect predicted) or 2-tailed (direction not predicted)
Major advantage: Remember that weak real-life effects can produce significant results?
– When comparing means, we know their actual values. This lets us recognize
situations where differences are, in the real world, trivial.
•
•
Calculating t
1. Obtain the “pooled sample variance” Sp2
(Simplified method – midpoint between the two sample variances)
2.
Compute the S.E. of the Diff. Between Means
3.
Compute the t statistic
Actual (“obtained”) difference between means
Predicted difference due to sampling error
4.
Compute the “degrees of freedom” df = n1 + n2 - 2 (total number of cases in both samples minus 2)
•
The t is a ratio: the greater the difference between means, the smaller the predicted error, the larger
the t coefficient
•
The larger the t, the more likely we are to reject the null hypothesis. According to the null, any
relationship between variables, any difference between means, is due to chance. Our actual difference
between means must be “significantly” larger than the difference we would obtain by chance.
•
We use a table to determine whether the t is large enough to reject the null hypothesis (see next slide).
We can reject the null if the probability that the difference between means is due to chance is less
than five in one-hundred (p< .05).
•
If the probability that the difference between the means is due to chance is five in one-hundred or
larger (p> .05), the null hypothesis is true.
Classroom exercise - Jay’s police department
H1: Male officers more cynical than females (1 - tailed)
H2: Officer gender determines cynicism (2 - tailed)
1.
2.
3.
Draw one sample of male officers, one of females
Compute each sample’s variance, then obtain the pooled sample variance
Compute the S.E. of the Difference Between Means
4.
Calculate the t coefficient
5.
Compute the degrees of freedom
6.
Check the t table (next slide) for significance. Confirm the (working) hypothesis if there
are less than 5 chances in 100 that the null is true. Be sure to use the correct significance
row (1-tailed or 2-tailed). Use the one-tailed test if the working hypothesis predicted the
direction of the difference (H1) - that is, that one group, male or female, would be
significantly more cynical than the other. Use the two-tailed test if you predicted that
cynicism would be significantly different, but not which group would be more cynical
(H2). One-tailed hypotheses require a smaller t to reach statistical significance.
df = n1 + n2 - 2
t-table
1. Is hypothesis one-tailed (direction of change
in the DV predicted) or two-tailed (direction not
predicted)?
H1: Males more cynical than females.
This is one-tailed, so use the top row.
H2: Males and females differ in cynicism.
This is two-tailed, so use the second row.
2. df, “Degrees of Freedom” represents sample
size – add the numbers of cases in both samples,
then subtract two: df = n1 + n2 – 2
3. To call a t “significant” (thus reject the null
hypothesis) the coefficient must be as large or
larger than what is required at the .05 level; that
is, we cannot take more than 5 chances in 100
that the difference between means is due to
chance.
•
For a one-tailed test, use the top row, then
slide over to the .05 column. For a two-tailed
test, use the second row, then slide to .05
column. If the t is smaller than the number at
the intersection of the .05 column and the
appropriate df row, it is non-significant.
•
If the t is that size or larger, it is significant.
Slide to the right to see if it is large enough
to be significant at a more stringent level.
Parking lot exercise
Higher income  More expensive car
1.
Transfer your panel’s data from the
other coding sheet
2.
Compute each sample’s variance, then
calculate the pooled sample variance
= 2.97 = 1.49
2
3.
Compute the S.E. of the Difference
Between Means
 1.49 ( 1 + 1 )
10 10
 1.49 (.2) =  .3 = .55
…continued on next slide
2
1
1
1
1
1
2
1
2
2
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
1.4
.6
.4
.4
.4
.4
.4
.6
.4
.6
.6
.36
.16
.16
.16
.16
.16
.36
.16
.36
.36
2.4
.27
4
5
2
5
2
1
4
4
5
1
3.3
3.3
3.3
3.3
3.3
3.3
3.3
3.3
3.3
3.3
.7
1.7
1.3
1.7
1.3
2.3
.7
.7
1.7
2.3
.49
2.9
1.7
2.9
1.7
5.3
.49
.49
2.9
5.3
24.17
2.7
4.
Calculate the t coefficient
df = 18
t = -3.5
1.4 - 3.3 -1.9
=
= -3.5
.55
.55
Note: the sign, + or -, indicates the direction of
the difference between groups. Keep that in
mind! It turns out that, consistent with the
hypothesis, the faculty lot has the more
expensive cars. We had arbitrarily placed it
second, so subtracting yields a negative t.
5.
Compute the df (degrees of freedom)
df = n1 + n2 - 2 = 10 + 10 - 2 = 18
6.
Check the t table for significance.
Note: Use the one-tailed test if the working
hypothesis predicts the direction of the
difference (which parking lot would have
more expensive cars). Use the two-tailed test if
the hypothesis predicts there will be a
difference in car values between the lots, but
not which lot would have the more expensive
cars.
Yes! There are less than five chances
in one-thousand that the null
hypothesis is true (one-tailed) or less
than one in one-hundred that it is true
(two-tailed)
More complex mean
comparisons:
Analysis of Variance
When there are more than two groups:
Analysis of Variance
Dependent variable: continuous
Independent variable(s): categorical
Example: does officer professionalism vary between cities? (scale 1-10)
City
L.A.
S.F.
S.D.
Mean
8
5
3
Calculate the “F” statistic, look up the table. An “F” statistic that is sufficiently
large can overcome the null hypothesis that the differences between the
means are due to chance.
“Two-way” Analysis of Variance
•
Stratified independent variable(s)
•
City
L.A.
S.F.
S.D.
Mean – M
10
7
5
Mean - F
6
3
2
Within
Between
F statistic is a ratio of “between-group” to “within” group differences. To
overcome the null hypothesis, the differences in scores between groups
(between cities and, overall, between genders) should be much greater than
the differences in scores within cities
Between group variance (error + systematic effects of ind. variable)
Within group variance (how scores disperse within each city)
Homework
Homework assignment
Two random samples of 10 patrol officers from the XYZ
Police Department, each officer tested for cynicism
(continuous variable, scale 1-5)
Sample 1 scores: 3 3 3 3 3 3 3 1 2 5 -- Variance = .99
Sample 2 scores: 2 1 1 2 3 3 3 3 4 2 -- Variance = .93
Pooled sample variance Sp2
Simplified method: midpoint between the two sample variances
2
Sp =
s2 1 + s2 2
2
Standard error of the difference between means
x
1
-x2 =  Sp2 (
1
n1
x1
-x2
1
+n )
2
T-Test for significance of the difference between means
x1 -x2
t = -------------x -x
1
2
CALCULATIONS
Pooled sample variance: .96
Standard error of the difference between means: .44
t statistic: 1.14
df – degrees of freedom: (n1 + n2) – 2 = 18
Would you use a ONE-tailed t-test OR a TWO-tailed ttest?
Depends on the hypothesis
Two-tailed (does not predict direction of the change):
Gender  cynicism
One-tailed (predicts direction of the change): Males
more cynical than females
Can you reject the NULL hypothesis? (probability
that the t coefficient could have been produced by
chance must be less than five in a hundred)
NO – For a ONE-tailed test need a t of 1.734 or higher
NO – For a TWO-tailed test need a t of 2.101 or higher
Final exam practice
•
You will be given scores and variances for two samples and asked to decide whether their
means are significantly different.
•
You will be asked to state the null hypothesis. You will then compute the t statistic. You be
given formulas, but should know the methods by heart. Please refer to week 15 slide show.
•
To compute the t you will compute the pooled sample variance and the standard error of the
difference between means.
•
You will then compute the degrees of freedom (adjusted sample size) and use the t table to
determine whether the coefficient is sufficiently large to reject the null hypothesis.
– Print and bring to class:
http://www.sagepub.com/fitzgerald/study/materials/appendices/app_f.pdf
– Use the one-tailed test if the direction of the effect is specified, or two-tailed if not
•
You will be asked to express using words what the t-table conveys about the significance (or
non-significance) of the t coefficient
•
Sample question: Are male CJ majors significantly more cynical than female CJ majors? We
randomly sampled five males and five females. Males: 4, 5, 5, 3, 4 Females: 4, 3, 4, 4, 5
– Null hypothesis: No significant difference between cynicism of males and females
– Variance for males (provided): 0.7 Variance for females (provided): 0.5
– Pooled sample variance = .6 SE of the difference between means = .49 t = .41 df = 8
– Check the “t” table. Can you reject the null hypothesis? NO
– Describe conclusion using words: The t must be at least 1.86 (one-tailed test) to reject
the null hypothesis of no significant difference in cynicism, with only five chances in
100 that it is true.