sampling error

Download Report

Transcript sampling error

Making Inferences
Sample Size, Sampling Error, and 95% Confidence Intervals
• Samples: usually necessary (some exceptions) and don’t
need to be huge to be accurately representative of the
entire population you want to study
• e.g., 1936 election between Alf Landon and FDR;
Literary Digest predicted sweeping victory for Landon
(based on sample of 2 million people)
Sample Size, Sampling Error, and 95% Confidence Intervals
• Sampling Error (also known as Standard Error):
is simply the difference between the estimates obtained
from the sample and the true population value (e.g.,
president’s approval rating of 52% (±4%); determined by
the sample’s size and standard deviation
• Confidence Level (also known as Confidence Interval):
95 percent confidence level or interval would mean that
95 out of 100 samples that might be selected would
generate an estimate of presidential approval within the
range of 48-56%.
The Mayor and Your Job as Lead Pollster
The Normal Distribution & Sampling
• Example: For his upcoming reelection campaign, Michael Bloomberg wants
to know how many Independents there are in N.Y. city, which has grown
rapidly in population the last several years. Although the N.Y. Bureau of
Elections reports that 25% of registered voters claim “Independent” status, he
wants to test the validity of this figure.
• Consequently, Bloomberg asks you to conduct a poll to estimate the
proportion/percent of citizens, 18 years or older, who are Independents rather
than Democrats or Republicans in NYC.
• You interview 10 randomly chosen individuals and find that 2 of them are
registered to vote as an Independent. Based on this finding you might think
that the proportion is closer to 20%, which is a little bit below the last reported
proportion. This difference, 5%, is called the sampling error.
• What you need is some way to measure the uncertainty in your estimate, so
that you can tell Mr. Bloomberg what the margin of error is.
The Normal Distribution & Sampling, cont’d
• Let’s say you repeat the interview procedure 4 more times and get
estimates of 20%, 30%, 40% and 20% (2 + 3 + 4 +2 Independents all
out of 10 respondents) divided by 4 = 27.5%, which is not too far from
the originally reported value of 25%.
• What would happen if you repeated the process over and over and over,
say, 1,000 independent samples of 10 interviewees and calculated the
proportion of Independents in each one? After a while you would have a
substantial list of sample proportions (see the first figure on handout #1.)
In a simulated example, you end up with a mean proportion of .248
(24.8%) and a standard deviation of .141 (14.1%).
• Now think of standard error as an indicator of how much uncertainty
there is in your estimate. We see in the first figure, for example, that
about 2/3rds of the estimates are in the range of .248 +/- .141, or
between .107 and .389 (10.7% and 38.9%). Consider this the “68%
confidence interval.” The other third of the samples are below .107
(10.7%) or above .389 (38.9%).
The Normal Distribution & Sampling, cont’d
• In other words, after 1,000 samples of 10 randomly chosen NYC voters, you
could tell Michael Bloomberg that the proportion of Independents in New York
City is probably between 10% and 39%. When he asks, “What do you mean by
‘probably’?” a technical answer on your part would be, “I’m 68% sure.”
• Not surprisingly, he threatens to fire you by the end of the month if you can’t do
better than that. He wants you to narrow the range of uncertainty. What do you
do? You ask for another $1 million, so that you can take larger sample sizes
(e.g., 50 randomly chosen people instead of 10).
• Repeating the same process of 1,000 samples, but this time with 50 NYC voters,
in each sample, you get the results in the 2nd figure on handout #1: mean
proportion of .251 (25.1%) and a standard deviation of .064 (6.4%).
• Now you can tell Mr. Bloomberg that the proportion of Independents in NYC is
probably, with 68% confidence, between 19% and 31%. “You’re getting better,”
he says, “but I still want some more certainty.” “O.k.,” you say, “show me some
more $$ and I’ll get your some more certainty.”
The Normal Distribution & Sampling, cont’d
•
By the time you’ve conducted 1,000 samples of interviews with 500 randomly selected
NYC voters (see figure 4 on handout #2), your mean is still .250 (25%), but with a
standard deviation now of only .019 (essentially 2%).
•
Hence, now you can tell Mr. Bloomberg that the proportion of Independents in NYC is
probably, with 95% confidence (2 standard deviations either way; 1 standard deviation
either way would give you 68% confidence), between 21% and 29%.
•
But let’s say that earlier in the process, Mayor Bloomberg randomly surveyed 10
individuals (e.g. friends, butler, misc. staff) about their voter status and found a rate of
35% registered Independents. He asks for you to demonstrate why the upcoming
campaign shouldn’t work with his figure instead. I mean, he got it himself personally.
•
You have to show him, based on your original study of 1,000 samples of 10 random New
York City voters, what the likelihood is of a finding of 35% registered Independents. This
is first done by computing the amount of random sampling error or standard error:
(Pollack, p. 106)
Sampling or Standard error = standard deviation ÷ square root of the sample size (n=10)
Sampling or Standard error = 14.1% divided by 3.16
Sampling or Standard error = 4.5%
Inference Using the Normal Distribution & Z Scores
• The central limit theorem (Pollack, p. 108) tells us that there is a 68% chance that
the true population mean of NYC voters registered as “Independent” lies within
plus or minus 1 standard error of the sample mean (25% in your study), and there
is a 95% chance that it lies within plus or minus 1.96 standard errors of the
sample mean (again, 25%).
• Conversely, there is only a 5% probability that the true population of NYC voters
registered as “Independent” is more or less than 1.96 standard errors away from
the sample mean:
• “Low” end of 95% confidence interval
= sample mean – 1.96 standard errors
= 25% – 1.96 (4.5%)
= 16.2%
• “High” end of 95% confidence interval = sample mean + 1.96 standard error
= 25% + 1.96 (4.5%)
= 33.8%
Conclusion: Invariably, then, 95% of all possible random samples of 10 NYC voters will
produce sample means of between 16.2% and 33.8% “Independent” registered voters.
Inference Using the Normal Distribution & Z Scores
• Given these results, how “random” is Mayor Bloomberg’s finding of 35%
registered Independents among NYC’s voting population?
• First, standardize his 35% finding into a Z score:
Z = Bloomberg mean – larger sample mean of 1,000 samples of 10 voters ÷ standard error
Z = 35% - 25% divided by 4.5%
Z = 10% divided by 4.5%
Z = 2.27
• Based on the table of Z scores (Pollack, p. 110), how likely is it that a truly
random sample of registered NYC voters would find 35% to be Independents?
.0116 = 1.16 or 1.2% (Basically, you could say, “Mr. Mayor, the odds of
finding that 35% of registered voters in NYC are “Independent” are 1 out
of a 100 and, congratulations Sir, you got that one. Now stop wasting my
time and let me do the polling in this campaign.”