Simple Random Sampling
Download
Report
Transcript Simple Random Sampling
SOCY3700
Selected Overheads
Prof. Backman
Fall 2007
Update history
10/05/07 Add slides 1-25
10/10/07 Add slides 26-33
10/14/07 Add slides 34-45
Measurement Validity
• Measurement validity is the
extent to which a measure
measures whatever it is
intended to measure
• Three types of measurement
validity
– Face validity – does the
measure seem (“on its face”) like
it measures what it’s supposed
to (often tested by asking experts
and others)
Measurement Validity, cont.
– Content validity - the extent to
which the measure covers the
full range of the concept
• The richer the concept (say,
religiosity or feminism), the more
likely that multiple indicators will be
needed
– Criterion validity – the extent to
which the measure is supported
by other accepted measures
• Concurrent validity – how well the
measure correlates with other
measures of the concept
• Predictive validity – how well the
measure correlates with other
concepts its should be related to
Levels of Measurement
• Nominal – values identify
categories only
– Do not have arithmetic meaning
– Also called categorical variables
– When there are only two
categories, called dichotomies
or binary variables
– Two technical requirements
for categories:
• Exhaustive ( every observation fits
into some category)
– Leads to lots of “Others”
• Mutually exclusive (every
observation fits in exactly one
category)
Levels of Measurement, cont.
• Ordinal – same characteristics as
nominal PLUS the fact that
categories can be ranked from
lower to higher
– Mathematical operation of subtraction
makes no sense, but > and < do
– Most common: Likert
• Interval – same characteristics as
ordinal PLUS the fact that the
arithmetic difference between any
two values makes sense
– That is, the usual subtraction
operation makes the usual arithmetic
sense
• Ratio – same characteristics as
interval PLUS the fact that there is
a sensible zero value
– Thus division and ratios make sense
Abbreviations often used
for “Other” categories
• NA – no answer or not
answered
• DK – don’t know
• NAP – not applicable. Often
this means the question was
not even asked
• nec or n.e.c.– not elsewhere
classified. Typically in the
category title, “Other, nec”
Ecological and
Reductionist Fallacies
• Unit of analysis – level (individual
or some kind of aggregate)
addressed by your theory or
hypothesis
• Unit of observation – level
(individual or some kind of
aggregate) from which data are
collected
• Ecological fallacy – drawing
conclusions about individuals
based on data from aggregates
• Reductionist fallacy – drawing
conclusions about aggregates
based on data from individuals
Writing About Crosstabulations
From a Sample
• Lead with what is important
– What’s important?
• The fate of your hypotheses (if you
have stated some)
• The overall pattern for the
dependent variable, especially if it
is striking or surprising. Then look
at deviations from the pattern in the
categories of your independent
variable
• Big differences between categories
of your independent variable
• Things of interest to your audience
– Remember, the usual point of a
crosstab is to display differences
between categories of the
independent variable
Writing About Crosstabulations
From a Sample, cont.
• Do not use raw counts; use
percents
• Use the correct percents
– Do not confuse row, column, and total
percents
• Be sure to specify the base for
percents
– Usually something like, “… x percent
of [the base] …” or “Of all [bases]
surveyed, x percent responded…”
• Round percents in your text (but
not necessarily in your tables) to
integers
• Be ready to convert percents to
simple fractions
– For example, 23 percent could be
called “nearly a quarter” or “about one
in four”
Writing About Crosstabulations
From a Sample, cont.
• Do not confuse percentage
differences and percentage
point differences
– Percentage differences cannot
be calculated by simple
subtraction
• Be ready to collapse
categories
– For example, to combine
“Strongly agree” and “Agree”
responses into one category
• Be ready to calculate
cumulative percents
Central Limit Theorem
If repeated random samples of size n
are drawn from any population with
mean μ and standard deviation δ,
the sampling distribution of sample
means will be normal as n gets
large, with mean μ and standard
deviation δ/√n (also known as the
standard error of the mean) .
Hence, the standard deviation of the
means drawn from many, many
samples reflects 1) the standard
deviation of the population, and 2)
the sample size
Probability Sampling
• Probability sampling is any
method of drawing a sample of
elements from a population
such that the probability that
any element or set of elements
will be included in the sample
is known and is not zero
• The chief advantage of
probability sampling is that
the accuracy (or lack thereof)
of estimates of population
parameters from the sample
can be estimated
Simple Random Sampling
(SRS)
• Frame – complete list of the
survey population
• Sample size – calculated
based on desired precision of
results
• Selection rule – random
selection without replacement
• Estimate of population mean is
the sample mean
– Unbiased
– s.e. = √fpc * (δ / √sample size)
Simple Random Sampling:
Advantages and Disadvantages
• SRS advantages
– Samples are easy to draw
– Samples are easy to use
– Estimation of errors is “easy”
• SRS disadvantages
– Not always the lowest standard error
method
– Requires complete roster
– Can be very expensive
• Completing the frame may be expensive
• Reaching geographically dispersed
respondents may be expensive
– May require large sample sizes to deal
with rare population elements
• Most elements in the sample will not be
rare
Finite Populations and
Sampling
• Sampling error estimation depends
on the Central Limit Theorem
• The Central Limit Theorem applies
to infinite populations
– Infinite populations are easy to do in
theory, but rare in practice
• If you sample everyone in a finite
population, the sampling error
would be 0
– The closer you get to sampling
everyone, the smaller your error
should be
– Central Limit Theorem says error is
proportional to δ/√n
Finite Populations and
Sampling, cont.
• The finite population correction
factor (fpc) takes into account the
reduction in error you should get
from sampling all or a large fraction
of a finite population
• The fraction of the population that
is in the sample, n/N, is called the
sampling ratio (f)
• fpc = (N-n)/(N-1) ≈ (N-n)/N
= (1 – f)
• The standard error of the mean
from a finite population (with
simple random sampling) is
√fpc * (δ/√n)
• In practice, we ignore the fpc when
the sampling ratio is less than 10%
Stratified Sampling
• Frame
– Usual SRS frame except broken
into exhaustive, mutually
exclusive groups
– Requires knowledge ahead of
time about how many elements
in the population there are in
each group
– Each group is a stratum (plural
strata)
• Sample size - calculated based
on desired precision of results
– Calculations more complex than
with SRS because there are
more alternatives
Stratified Sampling (2)
• Selection rules
– Cases are drawn from each
stratum
– Cases within strata are drawn by
SRS
– Two alternatives for number
drawn with each stratum
• Proportionate to size – every
element in the population has an
equal chance of being drawn into
the sample, regardless of stratum
• Disproportionate – some strata
will have a larger proportion of the
sample than they will of the
population
Stratified Sampling (3)
• Estimation of the mean
– If proportionate to size selection is
used, the sample mean is an unbiased
estimate of the population mean
– If disproportionate selection is used,
weights must be used to obtain an
unbiased estimate of the population
mean
– Standard error of the mean will
ordinarily be lower than the standard
error from a simple random sample of
the same size
– The more homogeneous the elements
are within strata, the more efficient
stratified sampling will be
Stratified Sampling:
Advantages and Disadvantages
(compared with
Simple Random Sampling)
• Advantages
– Reduced standard errors of estimate
over SRS
– Can thus get the same precision as
SRS with smaller sample size
– If proportionate selection is used,
unweighted sample statistics can be
used to estimate population
parameters
– Disproportionate selection can be
used to get sufficient numbers of
members of rare populations
• Disadvantages
– Requires advanced knowledge about
stratum sizes
– Disproportionate selection requires
use of weights in making estimates of
parameters
Cluster Sampling
• Most complex method. Often used
in conjunction with stratification and
SRS; this is called multi-stage
sampling
• Frame
– Broken into groups called clusters
– Complete frame is needed only for
clusters that are selected
• It is necessary to know the size of clusters
that are not selected
• Sample size – usually calculated
based on explicit tradeoff between
costs and precision of results
– Calculations more complex than with
SRS or stratification because there are
more alternatives
Cluster Sampling (2)
• Selection rules
– A sample of the clusters is drawn
by simple random sampling
– Within each cluster either all the
elements or a simple random
sample of the elements are
drawn
– When possible, sample sizes
within clusters are drawn
proportionate to size
– NOTE that in cluster sampling
only some of the clusters are
used, while in stratified sampling,
all of the strata are
Cluster Sampling (3)
• Estimation of the mean
– If clusters and elements within clusters
were drawn so that all elements in the
population had equal probabilities of
selection, the sample mean is an
unbiased estimate of the population
mean. This rarely is possible
– In the likely case of unequal
probabilities of selection, weights must
be used to obtain an unbiased
estimate of the population mean
– Standard error of the mean will
ordinarily be higher than the standard
error from a simple random sample of
the same size
– The more heterogeneous the
elements are within strata, the more
efficient cluster sampling will be
• To the extent possible, each cluster should
be representative of the entire population
Cluster Sampling:
Advantages and Disadvantages
(compared with
Simple Random Sampling)
• Advantages
– Cost control
• In general, the only reason to use
clustering is to reduce financial or time
costs
– Can be used with stratification of
clusters to help control standard errors
– If proportionate selection is used,
unweighted sample statistics can be
used to estimate population
parameters
• Disadvantages
– Sampling consultant probably needed
– Larger standard errors than with SRS
– Parameter and error estimation
usually requires use of weights
Sample Pathologies
• Biggest, most common
problem: non-response
– Estimation of parameters and
errors assumes that data were
collected from every element in
the sample
• Limitations on generalizability
due to mismatch between the
population of interest (target
population) and the frame
(survey population)
– Called coverage error
Surveys á la Dillman:
Eight Steps
1. Decide what information you
need
2. Choose a survey method
3. Draw a sample
4. Write questions
5. Design the questionnaire
6. Field the survey
7. Turn answers into usable
data
8. Report results
Source: Patricia Salant and Don A. Dillman. 1994.
How to Conduct Your Own Survey. NY: Wiley
Dillman on the Survey
Process
• Dillman analyzes the survey
process from an exchange
theory perspective
– There is an exchange between
the researcher and the
respondent
– Compliance with researcher’s
request for information is a
function of the social rewards the
researcher can offer the
respondent
• Rewards such as gratitude,
opportunity to have a say on
something important
Writing Survey Questions
• Question topics
– There is little you can’t ask about
– Useful distinction:
• Questions about subjective states
like attitudes, beliefs, and
knowledge
• Questions about objective
phenomena like behavior or
demographic attributes
– Always remembering that in a
questionnaire even objective
phenomena are filtered through the
respondent’s mind
Writing Survey Questions (2):
Question Form
• Two basic question forms:
open-ended and closed-ended
• Open-ended questions are
questions to which
respondents can give any
answer
• Closed-ended questions both
ask a question and provide the
respondent with preset
answers to the question to
choose among
Pp. 177ff in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Writing Survey Questions (3):
Closed-ended Questions
• Questions with ordered
categories
– E.g., Likert scale items
– When there is an order, be sure
to use it
• Questions with unordered
categories
• Partially closed-ended
– One option is something like
“Other (please specify) ____”
Writing Survey Questions:
Neuman’s
Dirty Dozen Don’ts
1. Avoid jargon, slang, and
abbreviations
2. Avoid ambiguity, confusion,
and vagueness
a. Whatever
3. Avoid emotional language
a. Can evoke frames that
effectively hijack the intent of
the question
4. Avoid prestige bias
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Writing Survey Questions:
Neuman’s
Dirty Dozen Don’ts (2)
5. Avoid double-barreled
questions
6. Do not confuse beliefs with
reality
7. Avoid leading questions
8. Avoid asking questions that
are beyond respondents’
capabilities
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Writing Survey Questions:
Neuman’s
Dirty Dozen Don’ts (3)
9. Avoid false premises
10. Avoid asking about intentions
in the distant future
11. Avoid double negatives
12. Avoid overlapping or
unbalanced response
categories
Pp. 170-3 in W.L. Neuman. 2007. Basics of Social
Research. 2nd ed. Boston: Pearson
Questionnaire Layout (1)
• Very important
– Reflects your professionalism in the
eyes or ears of your respondents and
the eyes of your interviewers
– Affects the likelihood of measurement
error through respondent or
interviewer error
– Affects response rate
• In mail surveys designed primarily
with respondent in mind
• In telephone and face-to-face
surveys, designed with both
interviewer and respondent in mind
Questionnaire Layout (2):
Mail Surveys
• Overall objectives
– Minimize perceived (and real)
respondent burden
– Don’t confuse respondent
– Simplify later data entry
• Make a booklet
– Questions are enclosed inside a
booklet made of folded legal
sized (8.5 x 14 inch) paper
– No questions on the front or
back of the booklet
Questionnaire Layout (3):
Mail Surveys
• Front page of booklet:
– Title of study
– Some graphic stuff
– Sponsor
– Return address
• Back page
– Request for comments
– Thank you
– Return address and telephone
contact information
Questionnaire Layout (4):
Mail Surveys
• Overall question sequence
– Start easy
• First question must grab attention,
reflect the issues in the cover letter,
and not be too difficult or
threatening
– Start on topic
– Group like questions together
• Makes writing transitions easier
– Keep threatening questions until
later in the questionnaire
– Get your demographics last
• That’s probably least important to
you and apparently least relevant to
respondent
Questionnaire Layout (5):
Mail Surveys
• Layout of individual pages
– Use white space
• What counts is not how many
pages the survey is, but rather how
long it seems to be to respondents
– Use fonts consistently to
distinguish questions, answers,
and instructions
• Dillman likes to use bold for
questions, all caps for answers,
unbolded for transitions, and
unbolded in parentheses for
instructions
– Establish a vertical flow
– Precode the answers, usually on
the left margin
Fielding Mail Surveys (1)
Overview
1. We’re always trying to
increase response rates
2. Respondents are most
likely to respond if they think
benefits outweigh their costs
3. We need to keep
respondents engaged from the
opening of the mail through the
returning of the completed
questionnaire
Source: Salant and Dillman
Fielding Mail Surveys (2)
Bottom lines
1. Mail survey response rates
depend very much on the
number of contacts
2. Mail surveys require
advanced planning
- Be sure you have the
resources to meet the schedule
3. What really matters is the
overall look and feel of the
questionnaire
- It’s a lot like buying (or selling!)
a car
Source: Salant and Dillman
Fielding Mail Surveys (3)
• First mailout – advanced
notice letter
– Sent to the entire sample
– Mailed first class
– Handwritten signature
– Explains why there will be a
survey
– Explains why participation will be
appreciated
• Put yourself on the mailing list
for this and all other mailings
Fielding Mail Surveys (4)
• Second mailout – cover letter,
questionnaire, and return
envelope
– Sent one week after advanced
notice
– Cover letter
• Personalized
• Explains survey purpose
• Explains ID# on the questionnaire
and promises confidentiality
• Reinforces importance of
everyone’s participation
• Specifies who should complete the
questionnaire
• Thanks respondent for participation
• Hand signed
Source: Salant and Dillman
Fielding Mail Surveys (5)
• Questionnaire – with ID
number
• Return envelope is stamped,
addressed, and ready for use
Fielding Mail Surveys (6)
• Third mailout – postcard
followup
– 4 to 8 days later
– Personalized
– Reminding and thanking
• Fourth mailout – new cover
letter, questionnaire, and
return envelope
– Three weeks after the second
mailout (the first one with a copy
of the questionnaire)
– Sent only to addresses that have
not yet returned the survey
Fielding Mail Surveys (7)
• The four mailings should yield
a final response rate of 50 – 60
percent
• To further increase response
rate, one can:
– Send another follow up like the
fourth mailing
– Send the follow up as certified or
express mail
– Telephone
• Often you will discover that people
shouldn’t have been in the sample
in the first place
Sampling Review
• Rule of thumb sampling error of a
proportion at the 95 percent
confidence level =
1 / square root (sample size)
– If size = 400, error = 1/20 = 5%
• The Central Limit Theorem is
important for social science
research because it provides the
mathematical basis for using
probability samples 1) to make
estimates of parameters from large
populations using small samples
and 2) to estimate the precision of
those estimates
Sampling Review (2)
• In both stratified and cluster
sampling the survey population
is divided into exhaustive,
mutually exclusive groups.
Each group could be either a
stratum or a cluster
• If we use all the groups in our
final sample, we call each
group a stratum
• If we use only some of the
groups in our final sample, we
call each group a cluster