File - collingwoodresearch

download report

Transcript File - collingwoodresearch

POSC 202A: Lecture 9
Lecture: statistical significance.
Statistical Significance
The fundamental question of statistical significance:
Statistical Significance
The fundamental Question:
How likely is the result we observe to be the
product of chance?
This question drives all of the tests we perform by
allowing us to differentiate the systematic
from the stochastic.
Statistical Significance
Confidence IntervalAn interval calculated from sample data that is guaranteed to
capture the true population parameter.
It tells us how large of an interval we need to create in order to
capture the true population value in some fixed percentage of
the intervals we draw.
Statistical Significance
Think of it this way:
We can draw a sample in order to estimate some statistic. If we
repeat over and over we start to create a sampling
distribution.
To create a 95% CI we need to see how large of an interval
around the statistic we need to create to satisfy the condition
that in 95% of the samples drawn the interval we draw
contains the true population parameter (value).
Page 386 has a nice graph of this.
Statistical Significance
Confidence Interval:
An example: a 95% confidence interval is the range needed to
capture the true population value in 95% of the intervals we
draw from a population.
Statistical Significance
Confidence Interval:
The interval thus captures the variability inherent in using samples
to draw inferences about a population.
To estimate, we need an estimate of the population mean and the
standard deviation.
Reported in the form:
Estimate  Margin of Error
Generally, the interval is:
mean – (z*sd); mean+(z*sd)
Statistical Significance
Confidence Interval:
To estimate, we need an estimate of the population mean and the
standard deviation.
How do we calculate this for a sampling distribution?
This is for proportions:
Mean=p
S .D. 
p(1  p)
n
Refresher: Thumb’s Rule
Recall that with a normal distribution:
Apportionment of area about the mean is
+/- 1sd= 68%
+/- 2sd= 95%
+/- 3sd= 99.7%
So, a 95% CI corresponds to
x +/- 2sd= 95%
Statistical Significance
Lets create an example using a sample from Dear Abby, in which
400 women responded of whom 60% would rather just
cuddle than have sex with their husbands.
What do we want to know?
Statistical Significance
What do we want to know?
Is 60% beyond what we would expect to see
due to chance alone?
Is this likely to be just random variation?
Statistical Significance
Lets create an example using a sample from Dear Abby, in which
400 women responded of whom 60% would rather just
cuddle than have sex with their husbands.
Sample mean=p
S .D. 
.6 =p, so: S.D.=
p(1  p)
n
.6(1  .6)
.24 .49


 .0245
400
400 20
Confidence Interval
Our confidence interval is thus:
x +/- 2sd= 95 CI
Or
.6 +/- 2(.0245)= .6-.049 and .6+.049
Confidence Interval
.6 +/- 2(.0245)= .6-.049 and .6+.049
Round off to .05
95% CI
.55
.60
x
.65
Confidence Interval
95% of all samples will capture the true
population parameter in the range
between .55-.65
95% CI
.55
.60
x
.65
Confidence Interval
95% of all samples will capture the true
population parameter in the range
between .55-.65
From this we conclude that we are 95%
confident that between 55% and 65% of
women prefer cuddling to sex.
Confidence Interval
What would we have expected if they
answered randomly?
Exercise
A study of graduate placement finds that
examining 50 graduates, 20% of
students earn jobs at universities that
are not teaching intensive.
Construct a confidence interval that allows
us to assess whether this result is too
small to attribute to chance.
Statistical Significance
How do our estimates change with the size of the population?
Recall we found
S .D. 
p(1  p)
n
.6(1  .6)
.24 .49


 .0245
400
400 20
The population average (mean) stays the same
regardless of sample size. But what of the SD?
Sample Size and Confidence Intervals
Where the sample statistic is .5
Sample #1 Sample #2 Sample #3 Sample #4 Sample #5 Sample #6 Sample #7
Mean
N
0.5
100
0.5
200
0.5
300
0.5
400
0.5
500
0.5
1000
0.5
10000
Sd
95% CI
0.05
0.4
0.035
0.43
0.029
0.44
0.025
0.45
0.022
0.46
0.016
0.47
0.005
0.49
to
0.6
0.57
0.56
0.55
0.54
0.53
0.51
Sample Size and Standard Error of an Average
0.06
100
Standard Error
0.05
0.04
200
0.03
300
400
500
0.02
1000
0.01
5000
0
0
1000
2000
3000
Sample Size
4000
5000
6000
Confidence Interval
A shortcut for approximating the 95% CI for
a proportion:
1
N
More accurate as you get closer to an even split (i.e. 50%)
Confidence Interval
Mean
Sample Size
SD-calculation
p(1  p)
N
2x SD
0.10
900
0.20
900
0.010
0.020
0.033
.
0.30
900
0.40
900
0.50
900
0.013
0.027
0.015
0.031
0.016
0.033
0.017
0.033
0.033
0.033
0.033
0.033
2 SD approximation
1
N
Statistical Significance
Recall--The fundamental Question:
How likely is the result we observe to be the
product of chance?
This question drives all of the tests we perform by
allowing us to differentiate the systematic
from the stochastic.
Significance testing is all about comparisons.
Statistical Significance
Significance testing is all about comparisons.
Is what we observe close or far from what we
expect ?
Is what we observe so far from what we expect
that we cannot attribute what we see to
chance alone.
Statistical Significance
An example: Imagine we took a valid random sample of women’s
preferences and got the same result as Dear Abby’s survey
of women’s cuddling preferences (60% preferred cuddling to
sex; 400 women responded).
How would we conduct a significance test?
We need to identify the appropriate comparison.
Statistical Significance
How would we conduct a significance test?
What if women randomly answered “snuggle” or “sex”?
Statistical Significance
What we would expect if women just answered randomly?
If so, we might expect 50% of respondents to prefer snuggling.
Statistical Significance
What we would expect if women just answered randomly?
If so, we might expect 50% of respondents to prefer snuggling.
Then we need to know if what we observed is (60%) too large of a
result to be attributable to chance alone.
How can we determine this?
Statistical Significance
How can we determine this?
One way is to estimate a 95% confidence interval around our
sample mean (60%) and see if it contains the result we would
expect sue to chance alone (50%).
Confidence Interval
Recall, we created the following interval earlier. We can simply
look to see if .50 is within the interval constructed.
Since it is not, then our sample result is statistically significantly
different than chance.
95% CI
.50
.55
.60
x
.65
Statistical Significance
How can we determine this?
A second way is to conduct a significance test by solving for
areas under the normal curve. Recall we know how to find
the likelihood that some event occurs using the formula:
.
Z
Xi  X

This formula asks whether what we see is too far from chance to be attributed to
chance alone.
Statistical Significance
We simply calculate the number o standard units what we observe
is from random chance.
Z
Xi  X

60  50

4
2.5
Then use the Z table to obtain the likelihood that we see a sample as large as 60%
if the true value is 50%.
Statistical Significance
Our Z table only goes to 3.4!
Less than 1 time in 10,000 would we see a sample mean of 60% if
the process were driven by chance alone.
Confidence Interval
We could illustrate this process on the normal curve as well.
The Q: How likely is it that we would see a result above 60%?
Z>.9998
Z<.0001
.50
x
.55
.60
Xi