Sampling for EHES

Download Report

Transcript Sampling for EHES

Sampling for
EHES
Principles and
Guidelines
Johan Heldal & Susie Cooper
1
Statistics Norway
Overview
•
•
•
•
Why this kind of sampling?
Target population & sample size
Sampling frames.
Probability sampling
• Two-stage sampling - PSUs
• Stratification
• Stage 1 sampling
 Sample sizes
 Sampling PSUs with PPS
•
•
•
•
Stage 2 sampling
A cost model
Age-gender stratification
Further aspects
2
Why?
• Goals for EHES:
• To estimate distribution of risk levels
within national populations.
• To compare risk levels among national
populations.
• To predict levels of disease in the future.
• Different from ordinary goals for
epidemiologists: to establish risk
factors and models for risk.
3
Ideal Target Population
• Core: All persons 25-64 years at a given
date with permanent residence in a
country.
 Can be extended by age to 18+.
 Should also include institutionalized.
• Sample size: At least 500 in each of
(M,W) x (25-34, 35-44, 45-54, 55-64):
 Total ≥ 4000 persons.
 For pilot ≥ 200 persons.
4
Main Sampling Frame
• List of persons/addresses from which
to take a sample (register or census).
 Should cover the target population but
may need ”adds-on”.
 ”adds-on”: List of institutions
• A good list frame may be unavailable.
• Can use ”Map frames” (NHANES).
• Telephone directories may be
complicated.
5
Probability sampling
• Sampling in scientific surveys is
carried out as Probability Sampling
(e.g. simple random sampling)
• Every sampling unit and every target
unit has a defined probability of
being selected.
• It must be possible to calculate this
probability at least for all units being
sampled.
6
Two stage sampling
• Primary Sampling Unit: Area that can
be handled by one examination site.
• Small enough that every person living
there can easily travel to the site.
• Or be easily visited.
• Can be created from small census tracts,
municipalities, electoral districts, post
code areas or … .
• Divide the country into disjoint PSUs.
7
Two stage sampling
• Stratification: Group the PSUs into groups
of ”close PSUs”, Strata.
• Use geography and other known information to
group similar PSUs together.
• Stage 1: Take a probability sample of
PSUs in each stratum.
• Stage 2:Then take a probability sample of
persons/-households/-addresses in each
sampled PSU.
8
9
Strata consists of PSUs
• PSU sizes:
 Ni = # persons, households, addresses
of PSU no. i.
 Can vary, but not too much within a
stratum.
 Recommended Ni ≥ 1000.
• Stratum size: N = N1 + … + NM
• A sample of m ≥ 2 PSUs and n persons
or addresses, is taken from the
stratum.
10
Stage 1 sampling
• Selection probabilities for PSUs i :
πi = mNi/N
(PPS sampling)
• Each PSU gets the same sample size
p = n/m
(persons, addresses).
• Gives every person in the same stratum
equal probability of being selected.
• m and p can be calculated in a costvariance optimal way in each stratum.
• The program EHESsampling takes care of
the calculations and performs sampling.
11
Stage 2 sampling
• Sampling of persons or addresses
within each of the PSUs sampled at
stage 1.
• Simple random sampling of p = n/m
(persons, addresses) in every
sampled PSU.
12
A cost model
C1 = cost of establishing an extra PSU
C2 = cost of inviting an extra person to
the PSU.
Total variable cost budget model
C = C1m + C2n
m and p = n/m can be calculated to
minimize variance given the size of
this budget. EHESsampling can do
this.
13
Age-gender stratification
• At stage 2: Sample separately for
each of the eight (M,W) x 4 age
domains.
• An option only if the main sampling
frame consists of individual persons.
• Gives better control of sample size
within each age-gender domain.
• Not necessary if sampling size very
large.
14
With address frames
• Address:
1. A dwelling or
2. A house with many dwellings
1. Dwelling: Invite all eligible persons
in the dwelling, if not too many
2. Sample some dwellings at the
address with a Kish grid. (Stage 3)
Then do as in 1.
15
Time and place aspects
• A HES takes time (say a year).
• Avoid confoundation between time of
year and geography.
• A randomized design for the order of
visiting the PSUs recommended.
• Simpler to handle if many teams
work in parallel.
16
Thank you!
17