Lecture Notes

Download Report

Transcript Lecture Notes

Psych 5500/6500
Populations, Samples, Sampling
Procedures, and Bias
Fall, 2008
1
Populations
Population
1. A large set of items.
2. It is the group we are trying to find out about
when we run a study.
2
Populations (cont.)
A population does not need to be an existing group
of people, it can be:
1. Animals
2. Objects
3. Hypothetical
4. Scores on some measure
3
Samples
Sample
1. A subset of the population.
2. Those members of the population that we
actually measure.
4
Basic Approach
Basic approach: use the sample to tell us about the larger
population from which the sample was drawn.
• representative sample: a sample that is similar to the
population in terms of the variables in which we are
interested.
• non-representative (‘biased’) sample: a sample that is
not similar to the population in terms of the variables
in which we are interested.
5
Basic Problem
Basic problem: we don’t know whether or not the
sample we have drawn is representative of the
population (to know that we would have to
compare the sample to the population, but if we
know that much about the population then why
are we sampling?)
6
The Solution
• The solution is to turn to probability theory. We
can only use probability theory, however, if
certain conditions are met.
• First, we must determine the possible reasons for
obtaining a non-representative sample.
7
The Two Biases
• Random bias: obtaining a non-representative
sample purely due to chance (this can be
modeled using probability theory).
• Systematic bias: obtaining a non-representative
sample for any reason other than chance (this
can not be modeled using probability theory).
8
Handling Bias
• Statistical procedures cannot cope with
systematic bias, it is a case of garbage in and
garbage out.
• Statistical procedures can, however, cope with
the possibility of random bias, for the probability
of getting a non-representative sample due to
chance can be determined mathematically
9
Determining the Type of Bias that
Might Arise
The type of bias that might arise is determined by
how we sample from the population. Random
sampling is a way to avoid systematic bias.
There are many forms of random sampling,
we’ll take a look at one just to develop the idea.
10
Random Sampling
Simple Random Sampling
1. Everyone in the population has an equal chance of
being included in the sample.
2. The selections are independent of each other.
If we successfully randomly sample then we will only
have random bias to worry about (no systematic bias).
11
Simple Random Sampling
1) Everyone in the population has an equal chance
of being included in the sample:
a) Selection must be random. Random not the same
as haphazard.
b) Subject attrition (someone drops out of the study
or refuses to participate) might bias the sample.
More on that next:
12
Subject Attrition
If attrition is due to non-random factors than that
can introduce systematic bias. If, for example,
people with a particular viewpoint refuse to
participate than they no longer have an equal
chance of being in the data and the sample will be
nonrepresentative for a systematic (non-random)
reason.
13
Random Sampling (cont.)
2) The selections are independent of each other:
a) Selecting every 10th name from an alphabetical list
could bias the results. The probability of sampling
people with the same last name (who might be
related) would be affected.
b) Technically we should sample with replacement,
but we don’t. See next slide.
14
Sampling with Replacement
Sampling with Replacement means that once someone has
been selected their name goes back into the pool of
people to select from (like returning a card to a deck
after it has been selected before selecting again). This
preserves the criteria that each selection is independent
of other selections.
We don’t do this in psychology because measuring the
same person twice creates other problems. If our
samples are small compared to the population there is
only a trivial difference in the independence of selections
between sampling with replacement and sampling
without replacement and so sampling without
replacement is ok.
15
Sampling with Replacement
Example: if we draw ten cards from a deck without
replacement then the earlier draws influence the
later draws. For example, if you draw four aces
then the chances of drawing another ace is zero.
If, however, we shuffle 500 decks together and
draw ten cards, then the fact that we draw four
aces earlier in the sample does not have a big
influence on the probability of drawing another
ace.
16
Reducing Random Bias
When we randomly sample we can never know for
sure whether or not we have a representative
sample. We can, however, increase the
probability of obtaining a representative
sample by:
1. Increasing the size of our sample.
2. Selecting a variable that has low variability in
its values.
17
Hite Report: A Cautionary Tale
Back Off Buddy: A New Hite Report stirs up a
furor over sex and love in the 1980's.
Time Magazine Oct 12, 1987.
1. 95% of the women respondents reported
emotional and psychological harassment
from the men they love.
2. 98% want to make "basic changes" in their
love relationship
3. 70% of women married 5 years or more
have had an affair.
18
But only 4.5% of the 100,000 questionnaires
mailed were returned. If you put in the whole
group sent a questionnaire then.
1) 4.2% reported harassment
0.3% did not report harassment
95.5% did not respond
2) 4.4% want to make basic changes in their
relationship
0.1% do not
95.5% did not respond
3) 3.2% have had an affair
1.4% have not
95.5% did not respond
19
Hite’s Response
When questioned about the low return rate,
Hite responded that a reply of 4,500 was a
sufficiently large enough N. What both Hite
and the interviewer didn’t understand is that
if you have systematic bias in your sample
(in this case from attrition) then increasing N
doesn’t help.
If, for example, your population consists of all
voters, and you only sample from one
political party, then it doesn’t matter how big
20
N is, it still won’t be a representative sample.
Random Sampling in Psychology
We can almost never randomly sample in
psychology.
1. It usually requires a complete and up-to-date list of
everyone in the population. If your population, for
example, is ‘women’, good luck at getting a list of
all women on the earth
2. Subject attrition for non-random reasons violates
the principle of everyone having an equal chance of
being measured. Good luck at getting people from
Mongolia to come to your study.
21
Convenience Sampling
Convenience sampling: a non-random sampling
procedure based upon using who is convenient to
recruit to be in your study (e.g. Intro to Psych
students).
This is about as non-random as you can get. How do we
justify this? It is based on the idea that just because
you open the door (wide) to systematic bias doesn’t
mean that it will come in. It is up to you to keep
systematic bias out of your sample. There are at
least three ways of accomplishing that.
22
Controlling Systematic Bias (1)
1) Design a study where it would be reasonable to argue
that you would have obtained a similar result if you
could have randomly sampled. This will be
influenced by what you are measuring. For example,
if you are measuring the effect of a chemical on pupil
dilation then Intro to Psych students would probably
be as representative as a random sample from a larger
population of the same age range. If, on the other
hand, you are measuring support for higher education
among voters then Intro to Psych students would
probably be a biased sample.
23
Controlling Systematic Bias (2)
2) Create a representative sample. An example of
this would be the selection of institutions in the
article On Being Sane in Insane Places
(described in the lecture).
24
Controlling Systematic Bias (3)
3) Get your sample, and then decide what
population it represents (i.e. determine to what
population you would feel comfortable
generalizing the results). This seems to be the
most common approach in psychology.
25
Summary
We are interested in populations, but they are too
big to measure, so we rely on samples to tell us
what the populations are like. We don’t,
however, know if the sample is representative of
the population. Statistic procedures were
essentially invented to solve this problem, but
they only work if you do not have systematic
bias in your sample.
26