Week 7 - Sampling and Basic Statistics

Download Report

Transcript Week 7 - Sampling and Basic Statistics

Basic Sampling & Review of
Statistics
Basic Sampling

What is a sample?


Selection of a subset of elements from a larger
group of objects
Why use a sample?

Saves



Time
Money
Accuracy

Lessens non-sampling error
Basic Sampling

Major definitions
 Sample population – entire group of people from whom the
researcher needs to obtain information
 Sample element -- unit from which information is sought
(consumers)
 Sampling unit -- elements available for selection during the
sampling process (consumers who are in the US at the time of
the study)
 Sampling frame -- list of all sampling units available for selection
to the sample (list of all consumers who are in the US at the time
of the study)
 Sampling error -- difference between population response
and sample response
 Non-sampling error – all other errors that emerge during data
collection
Basic Sampling

Procedure for selecting a sample






Define the population – who (or what) we want
data from
Identify the sampling frame – those available to
get data from
Select a sampling procedure – how we are
going to obtain the sample
Determine the sample size (n)
Draw the sample
Collect the data
Basic Sampling

General Types of Samples



Non-probability – selection of element to be included in
final sample is based on judgment of the researcher
Probability – each element of population has a known
chance of being selected
 Selection of element is chosen on the basis of
probability
Characteristics of probability samples
 Calculation of sampling error (+ or - z (sx))
 Make inferences to the population as a whole
Non-Probability samples




Convenience
 Sample is defined on the basis of the convenience of the
researcher
Judgment
 Hand-picked sample because elements are thought to be able to
provide special insight to the problem at hand
Snowball
 Respondents are selected on the basis of referrals from other
sample elements
 Often used in more qualitative/ethnographic type studies
Quota
 Sample chosen such that a specified proportion of elements
possessing certain characteristics are approximately the same as
the proportion of elements in the universe
Probability Samples

Simple random sample (SRS)



Systematic Sample


Assign a number to each sampling unit
Use random number table
Easy alternative to SRS
Stratified sample


Divide population into mutually exclusive strata
Take a SRS from each strata
Probability Samples

Cluster sample

Divide population into mutually exclusive clusters




Select a SRS of clusters
One-stage -- measure all members in the cluster
Two-stage --measure a SRS within the cluster
Area sample


One-stage -- Choose an SRS of blocks in an area;
sample everyone on the block
Two-stage -- Choose an SRS of blocks in an area;
select an SRS of houses on the block
Random Number Table

80147 27404 38749 31272 53703 59853 88288 29540 32340
50499 69466 59448 16059 46226 82283 20995 57976 47035
26741 87624 04973 06042 02837 12450 83611 70130 84015
42358 67330 65857 96833 03905 09246 93224 41290 70534
56244 25672 90829 95360 34881 89760 98565 25268 45158
85488 11382 86815 60516 12855 55839 53444 07514 71861
05378 78270 86152 35949 86556 08178 96428 31677 25932
69725 11787 59044 43831 36354 58785 91492 19927 61180
37422 55580 01105 91088 47699 51308 13923 52635 63057
78675 58380 19264 36613 37681 34477 44090 88692 01769
15655 73998 98969 97496 28472 35545 40885 24863 72929
02174
Hypothetical Sample Populations
Responden
t Number
Income
($,000)
Education
(Years)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
56
60
64
68
72
76
80
84
88
92
96
100
104
108
112
116
120
124
128
132
8
9
11
11
11
12
12
12
12
12
13
13
14
14
15
16
16
17
18
20
Yogurt
Consumptio
n
(Cartons/Yea
r)
73
3
95
71
86
40
21
81
65
44
80
12
43
56
35
17
72
70
80
15
Satisfaction City
Level (1 –
7)
1
3
5
4
6
2
7
7
7
7
4
5
2
4
7
1
3
3
7
4
Madison
Milwaukee
Milwaukee
Milwaukee
Madison
Milwaukee
Madison
Madison
Madison
Milwaukee
Other
Madison
Milwaukee
Milwaukee
Madison
Other
Milwaukee
Milwaukee
Madison
Madison
Review of Statistics

Probability Samples – note that statistical error can
be computed when they are used


Descriptive statistics


Thus, need to know about statistics
Estimates of descriptions of a population
Statistical terms used in sampling



Mean (m or x) -- Sxi/n
Variance (s2 or s2) -- S(xi-x)2/n - 1
Standard Deviation (s or s) – Square Root (Variance)
Review of Statistics

Inferential Statistics

Terms



Parameter -- m
Statistic -- x
Sample Statistics


Best estimate of population parameter
Why? -- Central Limit Theorem
Review of Statistics

Central Limit Theorem

Based on the distribution of the means of
numerous samples


Sampling Distribution of Means
Theorem states:


as sample size (n) approaches infinity (gets large), the
sampling distribution of means becomes normally
distributed with mean (m) and standard deviation (s/√ n)
Allows the calculation of sampling error ( s /√ n)

Thus a confidence interval can be calculated
Review of Statistics

Confidence interval -- tells us
how close, based on n and the
sampling procedure, how close
the sampling mean (x) is to the
population mean (m)
 Formula:
 x - z (sx) < (m) < x + z
(sx)
 z-values:
 90% -- 1.28
 95% -- 1.96
 99% -- 2.58
Review of Statistics

Confidence interval -- interpretation

For the same sampling procedure, 95 out 100
calculated confidence intervals would include
the true mean (m)
Sample Size

Sample size and total error




Larger n increases probability of non-sampling
error
Larger n reduces sampling error (s /√ n)
Effect on n on total error?
Can pre-determine the level of error (by setting
n)

Depends mainly on the method of analysis
Sample Size

Sample size when research objective is
estimate a population parameter





CI = x ± z Sx
CI = x ± 1.96 (s/ √n)
n = x ± z2 s2/ h2
n = (1.96)2 s2/ h2
n = (3.84) s2/ h2


s = expected standard deviation
h = absolute precision of the estimate (or with of the
desired confidence interval)
Sample Size (Sample Exercise)

n = (1.96)2 s2/ h2


S = 7.5
h = .50

n = (3.84) (56.25)/.025
n = 216/.025
n = 8640

What if s = 10; h = 1




n = (3.84) (100)/1
n = 384
Sample Size (Conclusion)


Unaffected by size of universe
Affected by


Choice of Desired Precision of Confidence
Interval
Estimate of standard deviation
Sample Size

Sample size estimation



With cross-tabulation
based research
Objective is to get a
minimum of 25 subjects
per cell
Must estimate
relationship up front –
what is smallest cell
Fem
<30
30+
Total
.25
.35
.60
Male .30
Total .55
Small .40
est
(.10)
.45
Sample Size



Know smallest cell size
should be 25
Calculate Total Sample
size 25 is 10% of
sample
Total Sample size



25 = .10 n
25/.10 = n
250 = n
<30
30+
Fem
Male
Total
25
Total