第八章确定样本计划和样本容量

Download Report

Transcript 第八章确定样本计划和样本容量

第八章
确定样本计划
和样本容量
样本和抽样的基本概念





总体(population)
样本(sample)
样本单位(sample unit)
普查(census)
抽样误差(sampling error)
Defining the Population

The entire group under study as specified by
the research objectives
Supermarkets that are part of chains
(Safeway, Publix, Woodman’s) located in
Wisconsin.
Persons in charge of household financial
planning located in Pima County, Arizona
Sample and Sample Unit


A sample is a subset of the population that
should represent the entire group. A sample
is said to be “representative” if, in fact, it
represents the entire population.
A sample unit is the basic level of
investigation…a college student, a
housewife, a purchasing agent, a
supermarket, a bank, etc.
Census



A census is defined as an accounting of the
complete population.
Census is taken every 10 years
It is very difficult to take a census when the
population is large and/or not easily
accessed.
Sampling Error



Sampling error is any error that occurs
because a sample is used.
Sampling error is caused by two factors: 1.
Method of sample selection (the PLAN) and 2.
The size of the sample.
If the average height in this class is really 5’9”
and a sample of the people on the first row
shows their average height to be 5’6”, we
have sampling error of 3”.
Sample Frame



The sample frame is a master list of all the sample
units in the population.
If the population were defined as all chain
supermarkets in Wisconsin, then a list containing all
such stores would be the sample frame.
Sometimes a physical list is not possible.
The sample frame for a mall intercept survey would
be all shoppers who were walking through the mall
on the days data were collected.
样本和抽样的基本概念





总体(population):是指根据调研计划的目的
所规定的研究整体。
样本(sample):是能代表整体的总体的子集。
样本单位(sample unit):是调研中最基本的被
调查对象。
普查(census):一种完整总体的说明。
抽样误差(sampling error):因调查中使用样本
产生的误差。
12.2 Reasons for Taking a
Sample



Taking a census can be very costly if the
population is large and/or hard to access.
Data analysis with hundreds of thousands of
observations, even with today’s computers, is
cumbersome.
We shall learn that samples, some
surprisingly small, can produce very precise
estimates of the population values.
两种基本的抽样方法


概率抽样(probability samples):即总体
的成员都有一个被选为样本的已知概率。
非概率抽样(nonprobability samples):
即总体中的成员选为样本的概率是未知
的。
Two Basic Sampling Methods


Probability Samples…are ones in which
members of the population have a
known chance (probability) of being
selected into the sample.
Nonprobability Samples…are those in
which the chances of selecting
members from the population into the
sample are unknown.
Examples...


If there are 10,000 students in this university and
we took a sample of 100 of them drawn from a
“frame” provided to us from the Registrar’s office,
would we know the probability of drawing a
member of the population into our sample? Yes.
100/10000=.01 or 1%.
Therefore, this would be a probability sample.
Examples, continued

What if we took the members of this class as
our sample to represent all students in the
university? Lets assume there are 30
students in the class. Can you calculate the
chance (probability) that ALL students in
university had to be a part of our sample?
No. The 30 members of this class have a
probability of 1.00 but all other students have
a probability of 0! Therefore, we have a
nonprobability sample.


Why is it important to distinguish
between probability and nonprobability
sampling methods?
It is only through probability samples
that one may assess the preciseness
with which the sample values represent
the population values.
概率抽样




简单随机抽样(simple random sampling);
系统抽样(systematic sampling);
整群抽样(cluster sampling);
分层抽样(stratified sampling):
成比例分层抽样(proportionate stratified
sampling)
不成比例分层抽样(disproportionate stratified
sampling)
非概率抽样




便利抽样(convenience samples);
判断抽样(judgment samples);
推荐抽样(referral samples, snowball
samples);
定额抽样(quota samples)。
12.3 Two Basics Sampling
Methods
Sampling
methods
Probability
samples
Systemati
c
Stratified
Cluster
Simple
random
Nonprobabilit
y samples
Convenienc
e
Judgment
Snowball
Quota
Simple Random Sampling




Define the population
Develop a frame that has all of the
population members included
Use some nonbiased (random) method
of drawing from the frame, the sample
units
To draw may use “blind draw” or table
of random numbers
Advantages/Disadvantages of
SRS
+Can be simple (under certain conditions)
+Does give you a probability sample
- Must predesignate (number,label) every
member of the population as listed on the
frame
-Requires complete listings on the frame and
they may not be available in all situations
Systematic Sampling




Define the population
Obtain a list of all population members as the
frame
Estimate the number of members of the
population and divide by the sample size:
Skip Interval =Population list size
Sample size
Take a random start on the frame and draw a
sample unit, skip the interval, draw another,
continue
Systematic Sampling
+ One of the most prevalent types of sampling
used in place of SRS because it has
“Economic Efficiency” over SRS because it
can be conducted more easily and in a
shorter time period than SRS
+ Provides a probability sample
- May not be as precise as SRS; care must be
taken that there are no “periodicities” in the
frame.
PROBABILITY SAMPLING METHODS
Cluster Sampling…population is divided into subgroups and
a sample of subgroups are drawn
Area Sampling: One Step--select one group and take census
Two Step Approach--randomly select several groups and
sample from each
Stratified Sampling…presence of subgroups that may differ
Strata
Weighted Mean=to get a total mean
Proportionate/Disproportionate Sample
NON-PROBABILITY SAMPLING
METHODS
Convenience Sampling
Judgment Sampling
Referral Sampling
Quota Sampling
Nonprobability Sampling
Methods
Convenience Samples
Easy to collect
Judgement Samples
Based on judgmental selection criteria
Quota Samples
Demographic characteristics in the same proportion as
in the population
Snowball Samples
Additional respondents selected on referral from initial
respondents.
设计样本计划(sample plan)







定义有关的总体;
获得总体的名单;
设计样本方案(大小和方法);
接近总体;
抽取样本
替换方案: 顺移替换(drop-down substitution)
超量抽样(oversampling)
重新抽样 (resampling)
样本证实(sample validiation);
若有需要,重新抽样。
与样本容量有关的概念与要点



样本容量(sample size)与样本精确度
(sample accuracy)
样本容量与样本对于总体的代表性
(representativeness)无关
样本容量可以影响结果的精确度
Sample Representativeness v. Sample
Accuracy


How representative a sample of the population is
determined by the sampling method or plan .
Sample accuracy refers to how close the statistic, generated
from the sample data, is to the true population value the
statistic is estimating. Sample accuracy is related to the
size of the sample.
确定样本容量的方法

教条式方法(arbitrary approach)
(10% of the
population)

约定式方法(conventional approach)
(What others
have done )

成本基础法(cost basis approach)
(How many does
budget allow?)

统计分析法(statistical analysis approach)
(Adequate n for subgroup analysis)

置信区间法(confidence interval approach)
(Allows us to predetermine how precise our estimates are)
置信区间法是运用差异性
置信区间、样本分布以及平均
数标准误差或百分率标准误差
等概念来创建一个有效的样本。
几个相关概念




差异性(variability):是指受访者对某一特
定问题的答案在相异性方面的总结。
置信区间(confidence interval)
抽样分布(sampling distribution)
平均数标准误差(standard error of the
mean)或者百分率标准误差(standard
error of a percentage)
Variability: the amount of
dissimilarity (or similarity) in
respondents answers to a
question



Nominal data: Do you prefer product A
over product B? Yes or No
The greatest variability is 50% yes;
50% no.
We can conceptualize variability for
nominal or ordinal data by examining
bar charts. The more even the bars,
the greater the variability.
1st Qtr
High Variance
1st Qtr
Low Variance
Variability:
the amount of dissimilarity (or
similarity) in respondents answers to a question


Interval or ratio data: On this scale from 1 to 5, how
would you value a college degree? Or, how many
miles do you drive in your personal car during a year?
We can conceptualize variability for interval or ratio
data by examining the distribution of scores. The
flatter the distribution, the greater the variability.
The more peaked, the less variability. Standard
deviation is a measure of variability when data is
either interval or ratio.
Confidence Intervals:
a range
whose endpoints define a certain
percentage of responses to a question.


We know that +/- 1.96 z scores will contain
95% of all answers to a particular question.
Therefore, it allows us to make statements of
how confident we are that data will fall within
a certain range (95% confidence=+/-1.96z;
99% confidence=+/-2.58z)
Why Can We Apply our
Knowledge of the Normal
Curve?


The Central Limits Theorem states that
if we take many samples, of at least
size 30, then the means from these
samples will form a normal distribution.
It is this THEORETICAL SAMPLING
DISTRIBUTION (a NORMAL curve) that
allows us to use statistical inference.
When we use z =+/-1.96 we are
capturing 95% of the total sample
means in a distribution of possible
sample means. This allows us to be
95% confident…95 times out of 100,
our sample mean is not going to be
too far removed from the population
mean.


Standard Error of a Mean (or
Percentage)
The standard error is a measure of how
far away from the true population value
a typical sample result is expected to
fall. It is calculated by dividing a
measure of variance by the sample size.
How to Estimate Variability
1. Estimate based upon a former study of
the same population.
2. Estimate based upon a pilot study of
the population.
3.a.For a Percentage, set pq at 50 x 50.
3.b. For a mean, estimate the range and
divide by 6.
How to Set Accuracy (e=how
precise we want our estimate to be
of the population value)



Managers, working with researchers,
determine e. How precise do you need to be?
+/-5%?? The more precise, the lower e.
The less precise, the higher e.
When estimating %, e is set as a
percentage…e.g., 5.
When estimating a mean, e is set in terms of
the number of units being estimated, e.g.,
number of miles driven, number of Big Macs
eaten, etc.
How Do We Set Level of
Confidence (z)?


Decision made by management, with
advice from researchers.
Rules of Thumb:
95% level of confidence=1.96z
99% level of confidence=2.58z
SAMPLE SIZE FORMULA FOR A
PERCENTAGE
2
z (pq)
n
2
e
Where…
n = the sample size
z = standard error associated with the chosen level of
confidence
p = estimated variability in the population
q = (100-p)
e = acceptable error
SAMPLE SIZE FORMULA FOR A MEAN
n = s 2 z2
e2
Where…
n = the sample size
z = standard error associated with the chosen level of
confidence
s = variability indicated by an
estimated standard deviation
e = acceptable error
SAMPLING FROM SMALL POPULATIONS
If the sample exceeds 5% of the population size, use the
Finite Multiplier to recalculate the sample as follows...
Sample size = Sample size formula *
Nn
N1
The Logic of Sample Accuracy

Sample Size and Survey Accuracy with
p=50% and q=50%
Sample Size and Accuracy
Accuracy (+,-)
20%
15%
10%
5%
0%
50
200
350
500
650
800
950
1100
1250
1400
95% 13.9% 6.9%
5.2%
4.4%
3.8%
3.5%
3.2%
3.0%
2.8%
2.6%
99% 18.2% 9.1%
6.9%
5.8%
5.1%
4.6%
4.2%
3.9%
3.6%
3.4%
Sam ple Size
Sample Size Formula for a
Percentage
At 95% ( z = 1.96)
p1200
n
10
1000
800
20
600
30
400
40
200
500
e=3
384
683683
896
384
384
246
216 1,024
138
1,067
20
10
e=4
216 1,024
896
384
576
504
504 396
323
576
60040
30
p
e=5
1,067
138
246
600
323
387
396
387
50
e=3
e=4
e=5