Jan11-16: ppt version

Download Report

Transcript Jan11-16: ppt version

Stat 1510
Statistical Thinking & Concepts
Producing Data: Sampling
1
Data

Primary Data is the data collected by the
investigator conducting the research / study
with a specific purpose.

Secondary Data - is data collected by someone
other than the user for the same or different
purpose.
2
Population
Researchers often want to answer questions
about some large group of individuals (this group
is called the population)
 Population is a set of units. This population is
potentially infinite or even hypothetical.
 If the time that unit are measured is important,
then the population is often called process.
 So in analysis, it is important to be clear about
what is the definition of population.

3
Population & Sample
We consider three type of units.
The target population is the set of units to which
the investigators set out to investigate in the
definition of the problem
 The study population is the set of units that could
have been in the sample
 The sample which is the set of units actually
selected for the investigation. The total number of
units in the sample is called sample size and the
way that the samples are selected is called
sampling protocol or sampling design.

4
Population & Sample
5
Example
Faculty of Science of Memorial University
want to know the opinion of students on
the university facilities. For this purpose,
they conducted a survey by selecting a
random sample of 200 students survey
registered for Winter 2013.
Identify target population, study
population sample & sampling unit
6
Example
Target Population: Faculty of Science,
MUN, Students
Study Population – All students registered
for Winter 2013 in FoS of MUN
Sample – 200 students selected for this
survey
Sampling Unit: Each selected student
7
Bad Sampling Designs
 Voluntary response sampling
– allowing individuals to choose to be in the sample
 Convenience sampling
– selecting individuals that are easiest to reach
 Both of these techniques are biased
– systematically favor certain outcomes
8
Voluntary Response
 To
prepare for her book Women and Love, Shere
Hite sent questionnaires to 100,000 women asking
about love, sex, and relationships.
– 4.5% responded
– Hite used those responses to write her book
 Moore
(Statistics: Concepts and Controversies,
1997) noted:
– respondents “were fed up with men and eager to fight
them…”
– “the anger became the theme of the book…”
– “but angry women are more likely” to respond
9
Convenience Sampling
 Sampling
mice from a large cage to study
how a drug affects physical activity
– lab assistant reaches into the cage to select
the mice one at a time until 10 are chosen
 Which
mice will likely be chosen?
– could this sample yield biased results?
10
Purposive Sampling
 Consider
the selection of football team or
soccer team.
 Consider selection of students for a math
skill competition
In the above sampling scheme, we select
the sampling units with a well defined
purpose and samples are not randomly
picked.
11
Simple Random Sampling

Each individual in the population has the same
chance of being chosen for the sample

Each group of individuals (in the population) of
the required size (n) has the same chance of
being the sample actually selected

Random selection:
– “drawing names out of a hat”
– table of random digits
– computer software
12
Table of Random Digits
 Table
B on pg. 692 of text
– each entry is equally likely to be any of the 10
digits 0 through 9
– entries are independent of each other
(knowledge of one entry gives no information about
any other entries)
– each pair of entries is equally likely to be any
of the 100 pairs 00, 01,…, 99
– each triple of entries is equally likely to be
any of the 1000 values 000, 001, …, 999
13
Choosing a
Simple Random Sample (SRS)
STEP 1: Label each individual in the
population
STEP 2: Use Table B to select labels at
random
14
Simple Random Sample
with and without replacement
Case 1: In without replacement, each
selected sampling unit will not
replaced back to the population.
Case 2: In with replacement, each
sampled unit will be replaced
back to the population.
15
Probability Sample
a
sample chosen by chance
 must know what samples are possible and
what chance, or probability, each possible
sample has of being selected
 a SRS gives each member of the
population an equal chance to be selected
16
Stratified Random Sample
 first
divide the population into groups of
similar individuals, called strata
 second, choose a separate SRS in each
stratum
 third, combine these SRSs to form the full
sample
17
Stratified Random Sample
Example
Suppose a university has the following student
demographics:
Undergraduate
55%
Graduate
20%
First Professional
5%
Special
20%
A stratified random sample of 100 students could be
chosen as follows: select a SRS of 55
undergraduates, a SRS of 20 graduates, a SRS of
5 first professional students, and a SRS of 20
special students; combine these 100 students.
18
Stratified Random Sample
Example
We would like to take a sample to represent
Canadian population
We have different provinces and we wish represent the all
provinces should be represented in the sample
A stratified random sample of 1000 people could be
chosen as follows: From each province, we select
random samples. Since population in each province
differ heavily, samples from each province should
be proportional to its population.
19
Multistage Sample
 several
stages of sampling are carried out
 useful for large-scale sample surveys
 samples at each stage may be SRSs, but
are often stratified
 stages may involve other random sampling
techniques as well (cluster, systematic,
random digit dialing, …)
20
Cautions about Sample Surveys

Undercoverage
– some individuals or groups in the population are left
out of the process of choosing the sample

Nonresponse
– individuals chosen for the sample cannot be contacted
or refuse to cooperate/respond

Response bias
– behavior of respondent or interviewer may lead to
inaccurate answers or measurements

Wording of questions
– confusing or leading (biased) questions; words with
different meanings
21
Nonresponse
 To
prepare for her book Women and Love,
Shere Hite sent questionnaires to 100,000
women asking about love, sex, and
relationships.
– 4.5% responded
– Hite used those responses to write her book
– angry women are more likely to respond
22
Response Bias
A
door-to-door survey is being conducted
to determine drug use (past or present) of
members of the community. Respondents
may give socially acceptable answers
(maybe not the truth!)
 For
this survey on drug use, would it
matter if a police officer is conducting the
interview? (bias from interviewer)
23
Response Bias
Asking the Uninformed
Washington Post National Weekly Edition (April 10-16, 1995, p. 36)
A
1978 poll done in Cincinnati asked
people whether they “favored or
opposed repealing the 1975 Public
Affairs Act.”
– There was no such act!
– About one third of those asked expressed
an opinion about it.
24
Wording of Questions
A newsletter distributed by a politician to his
constituents gave the results of a “nationwide survey
on Americans’ attitudes about a variety of
educational issues.” One of the questions asked
was, “Should your legislature adopt a policy to assist
children in failing schools to opt out of that school
and attend an alternative school--public, private, or
parochial--of the parents’ choosing?” From the
wording of this question, can you speculate on what
answer was desired? Explain.
25
Wording: Deliberate Bias
 “If
you found a wallet with $20 in it,
would you return the money?”
 “If
you found a wallet with $20 in it,
would you do the right thing of returning
the money?”
26
Wording: Unintentional Bias
 “I
have taught several students over the
past few years.”
– How many students do you think I have
taught?
– How many years am I referring to?
 “Over
the past few days, how many
servings of fruit have you eaten?”
– How many days are you considering?
– What constitutes a serving?
27
Wording: Unnecessary Complexity
 “Do
you sometimes find that you have
arguments with your family members
and co-workers?”
– Arguments with family members
– Arguments with co-workers
28
Wording: Ordering of Questions
 “How
often do you normally go out on a
date? about ___ times a month.”
 “How happy are you with life in general?”
– Strong association between these questions.
– If the ordering is reversed, then there would
be no strong association between these
questions
29
Inferences about the Population

Values calculated from samples are used to
make conclusions (inferences) about unknown
values in the population

Variability
– different samples from the same population may yield
different results for a particular value of interest
– estimates from random samples will be closer to the
true values in the population if the samples are larger
– how close the estimates will likely be to the true values
can be calculated -- this is called the margin of error
30