Infer_Stat_II

Download Report

Transcript Infer_Stat_II

Review - Confidence Interval
-1.96
-1.0
17 pct. of scores
0
+1.0 +1.96
66 pct.
17 pct. of scores
100 percent of scores
2 ½ pct.
•
•
•
95 percent of scores
2 ½ pct.
Most variables used in social science research (e.g., age, officer cynicism) are normally
distributed, meaning that their scores cluster around the population mean (parameter)
If we take repeated random samples from a normal or near-normal population, the
means of these samples will also distribute normally
Placing the mean of a random sample at the center, and specifying the level of certainty
that we want by setting the corresponding z score, we create a “confidence interval”, a
range of means into which, to our chosen certainty, the population mean will fall
Homework
• Two random samples of officers tested for cynicism
• For each sample, we needed to specify the confidence interval into
which the population parameter (mean of population) will fall, to a 95
percent certainty
– In social science research we don’t want to take more than five
chances in 100, or 5 percent, of being wrong
• Remember that 95 percent of the cases in a normally distributed
population fall between a z of -1.96 and +1.96 (meaning that 5 percent
will not)
• So - always use a z of 1.96
Sample 1 (n=10)
Officer
Score
Mean
Diff.
Sq.
1
3
2.9
.1
.01
2
3
2.9
.1
.01
3
3
2.9
.1
.01
4
3
2.9
.1
.01
5
3
2.9
.1
.01
6
3
2.9
.1
.01
7
3
2.9
.1
.01
8
1
2.90
1.9
3.61
9
2
2.9
.9
.81
10
5
2.9
2.1
4.41
Sum
8.90
s2
.99
Variance (sum of squares / n-1)
Standard deviation (sq. root of variance)
s
.99
Sample 2 (n=10)
Officer
Score
Mean
Diff.
Sq.
1
2
2.4
.4
.16
2
1
2.4
1.4
1.96
3
1
2.4
1.4
1.96
4
2
2.4
.4
.16
5
3
2.4
.6
.36
6
3
2.4
.6
.36
7
3
2.4
.6
.36
8
3
2.4
.6
.36
9
4
2.4
1.6
2.56
10
2
2.4
.4
.16
Sum
s2
Variance (sum of squares / n-1)
Standard deviation (sq. root of variance)
8.40
s
.93
.97
Standard error of the mean of the population of patrol officers
Estimated difference, due to chance, between the means of
repeated random samples (in this example, of 10 patrol officers)
and the population parameter (mean of the population)
s
Sx = ------___
 n-1
Estimate based on sample 1 = .33
Estimate based on sample 2 = .32
Confidence interval into which the population mean should fall with a 95 percent certainty
(probability 5 percent that population parameter - mean - falls outside this range)
ci =x +/- z (Sx )
Results based on sample 1:
Mean = 2.9
Sx = .33 (z = 1.96)
2.9 +/- .65
Left limit = 2.25
Right limit = 3.55
Results based on sample 2:
Mean = 2.4
Sx = .32 (z = 1.96)
2.4 +/- .63
Left limit = 1.77
Right limit = 3.03
Answer for final exam
• There will be one word question on the final for each statistical
technique covered during the period
• Here is the question for confidence interval:
– Explain what you would tell the police chief that the confidence
interval that you computed actually means
• Here is a good answer:
– There are ninety-five chances in one-hundred that, based on the
mean __(insert variable)___ calculated from a random sample of
(insert sample size) officers, the mean for all the officers in the
department would fall between _______ and _______.
Review - Confidence Interval
z scores
-1.96 -1.0
0
Sample 1
2.25
2.9
3.55
Sample 2
1.77
2.4
3.03
2 ½ pct.
•
•
•
•
+1.0 +1.96
95 percent of scores
2 ½ pct.
Our first sample had a mean of 2.9. The second sample mean was 2.4.
We used a z of 1.96, which set the probability that the population mean would fall within
our confidence interval at 95 percent
Based on sample 1, there are 95 chances in 100 that the population mean (parameter)
falls between 2.25 and 3.55. Or, there are 5 chances in 100 that it doesn’t.
Based on sample 2, there are 95 chances in 100 that the population mean (parameter)
falls between 1.77 and 3.03. Or, there are 5 chances in 100 that it doesn’t.
How to narrow the confidence interval #1
•
•
•
Increase the probability of being wrong! That is, the chance that the population
mean (parameter) actually falls OUTSIDE the calculated range
First, select a smaller z-score. Say, z = 1.0
– According to the z-table, .3413 of cases (34%) fall between the mean and z = 1.0
– So, another 34% of the cases must fall between the mean and z = -1.0
– That’s 68% of the cases. So, 32% of the cases must fall outside this range
(68% + 32% = 100%)
– This presents a much bigger chance of being wrong than by using a z of 1.96,
where only 5% of the cases fall outside the range
In social science research, we always use a z of 1.96 to set confidence intervals,
because we don’t want to take a greater than 5 in 100 chance of being wrong
ci =x +/- z (Sx )
•
Results based on sample 1:
– Mean = 2.9
Sx = .33 z = 1.00
– 2.9 +/- .33 Left limit = 2.57 Right limit = 3.23
•
Results based on sample 2:
– Mean = 2.4
Sx = .32 z = 1.00
– 2.4 +/- .32 Left limit = 2.08 Right limit = 2.72
How to narrow the confidence interval #2
•
Increase the sample size!
– This lets you use z = 1.96, so that the probability that the population parameter
falls within the confidence interval is the social science required 95 percent
ci =x +/- z (Sx )
•
Increase sample 1 size to 30, and to 100
•
To keep things simple, the sum of squares (8.9) was tripled, and also multiplied by
10. The mean (2.9) was kept the same
n = 30
n = 100
New sum of squares = 26.7
New sum of squares = 89
s2 (variance) = .92
s2 (variance) = .9
s (standard deviation) = .96
s (standard deviation) = .95
Sx (standard error of the mean) = .18
Sx (standard error of the mean) = .1
z (Sx ) = .35
z (Sx ) = .2
Confidence interval = 2.55  3.25
Confidence interval = 2.7  3.1
Old ci was 2.25  3.55
Old ci was 2.25  3.55
Diff. Between Means Test
•
Independent variable is categorical and the dependent variable is
continuous
•
To determine if these variables are associated compare the means of two
randomly drawn samples
Is the difference between the means so large that we can reject the null
hypothesis (that the difference was produced by chance?)
•
– Must calculate standard error of the difference between means
• The difference between all possible pairs of means, due to chance
alone
– To do so:
• Obtain the “pooled sample variance” (for us, half-way between the
obtained variances)
• Compute the S.E. of the Diff. Between Means
• Compute the t-statistic
Pooled sample variance Sp2
Simplified method: midpoint between the two sample variances
2
Sp =
s2 1 + s2 2
2
Standard error of the difference between means
x
1
-x2 =  Sp2 (
1
+
n1
1
x1
-x2
)
n2
T-Test for significance of the difference between means
x1 -x2
t = -------------x -x
1
2
df = (n1 + n2) - 2
• The result is a ratio
• Our obtained difference between means
is on top
t=
x 1 - x 2
_____________________

x 1 - x 2
• The predicted difference due to error is
on the bottom
• The greater our difference, the larger
the t
• The larger the t, the more likely we are
to reject the null hypothesis (see the ttable in the back of the book)
• When we reject the null hypothesis, we
say that the difference between means is
statistically significant, meaning that the
samples were not drawn from the same
population – we only thought they were
Class exercise -- Difference Between Means Test
H1: Male officers more cynical than females (1 - tailed)
H2: Officer gender determines cynicism (2 - tailed)
1.
2.
3.
4.
5.
What is the null hypothesis?
Draw one sample of male officers, one of females
Compute the Standard Error of the Difference Between Means
Perform a t-test
Use the table in the back of the book to check for significance
(no more than 5 chances in 100 that the null hypo. is correct)
Note: one tailed hypotheses (direction of the effect on the dependent variable is
predicted) require a smaller “t” to reach statistical significance than two-tailed
hypotheses, where only an effect is predicted, not its direction
Discuss homework assignment