Looking for a way
Download
Report
Transcript Looking for a way
Measuring Coverage:
Post Enumeration Surveys
Owen Abbott
Office for National Statistics, UK
Agenda
• Introduction
• Why have a PES?
• Essential features of a PES
– Survey Design
– Fieldwork
• Analysing the data
– Matching
– Estimation
• Results from 2001 UK Census
• Discussion
Why do we need a PES?
• Census won’t count every household or person
• Undercount causes bias in estimates
• In the UK in 2001, we estimated that 3 million
persons (6%) did not fill in the form
• Increasing problem from 1981 to 1991 to 2001
• The undercount is not evenly spread
– Inner Cities
– Deprived areas
– Young persons
Why do we need a PES?
• Census counts alone not good enough
• UK Users demand robust census population
estimates
– Central Government resource allocation
– Yearly demographic population estimates
– Government Policy
• So we need to measure how many households and
persons the census misses, and work out:
– where they are missed from
– their characteristics
Basic Methodology
• PES - Census Coverage Survey (CCS) in UK
– In the UK approx 1% population
• Match the PES to the Census
• Use the people the PES sees that the census didn’t
to estimate how many missed
– where and characteristics
• Add to the Census counts (either at aggregate
level or impute (UK))
2001 UK ‘One Number Census’ framework
CENSUS
CCS
MATCHING
CENSUS +
CCS
Dual System and
regression estimation
DESIGN GROUP
ESTIMATE BY
AGE AND SEX
Quality
Assurance
Synthetic estimation
LAD
ESTIMATES
Imputation
controlled to LAD
estimates
ADJUSTED
INDIVIDUAL AND
HOUSEHOLD DATA
AND TABLES
Sum
NATIONAL
POPULATION
ESTIMATE
Post Enumeration Survey
Key features:
A - Design
– Sample survey
– Sample size dependent on accuracy (and geographic
level) requirements
B - Fieldwork
–
–
–
–
–
Conducted after the census has finished
Independent re-enumeration
Area based
Door to door interview
Focused on measuring coverage
Post Enumeration Survey - Design
• Multi-stage Stratified sample
• Select a sample of (small) geographical areas that
can be re-enumerated
– UK uses Postcodes (about 20 hhs)
– US uses blocks (about ????100 hhs)
• Sample stratified by:
– Geography
– Area type
– Demography
2001 UK PES Design
Geographical Strata:
• Local Authorities (mean pop 120k) grouped into
contiguous groups called Estimation areas (EAs),
each having 500k pop
Area Type and Demographic strata:
• Within every EA a sample of 1991 Enumeration
Districts was selected, stratified using a hard-tocount index and the 1991 age-sex structure
– (1991 EDs have about 200 households)
2001 UK PES Design
• Hard to count index was a national stratification
using a combination of variables associated with
undercount e.g:
–
–
–
–
Unemployed
Multi-occupied
Private rented
Language difficulty
• 3 level index, split into 40%, 40%, 20% nationally
• Within each selected ED a sample of 3, 4 or 5
postcodes was selected
Post Enumeration Survey - Field
• Aim: enumerate all the people and households in
the sampled areas
• Carry out the survey after the Census
– Census fieldwork finished
• Independence critical (see later)
– Interview based
– Independent re-enumeration
– Separate fieldforce and management
– No address list (UK have address list for Census)
– Difficult if doing quality at same time, as not independent
Post Enumeration Survey - Field
• In UK, focused on measuring coverage
– Previously measured quality as well
– Found that separate surveys more effective
– Can focus on getting maximal response in sampled areas
• UK 2001 PES used very short interview
– key household and demographic questions only
•
•
•
•
•
•
•
•
Accommodation type
Tenure
Name
Gender
Date of Birth (or Age)
Student
Ethnicity
Activity last week
Post Enumeration Survey - Field
• Other initiatives to maximise response:
– Pairwork and teamwork
– Refusal avoidance training
– Calling strategy
– Up to 10 attempts to interview
– Last attempt deliver form to return in post
Post Enumeration Survey
• Interviewer Duties:
– Establish the postcode boundaries
– Conduct independent listing of all residential and nonresidential addresses
– Seek out obscure accommodation
– Deliver advance notification cards
– Identify/probe for all households at an address
– Make contact with householders
– Conduct doorstep interviews
– Persuade potential refusals
– Report Progress
Post Enumeration Survey
• Map
Post Enumeration Survey
• Property Listing
Listing Sheet for Postcode ……………………..
Address/Household
Sheet ….…. of ……….
Date 1
Date 2
Date 3
Date 4
Advance
Leaflet
Delivered
Date 5
Date 6
Time 1
Time 2
Time 3
Time 4
Time 5
Interview
Refusal
Non-residential
Vacant
Visitor only
Communal
Notes:
Time 6
Interviewer No.
Form No.
Date 7
Date 8
Date 9
Date 10
Time 7
Time 8
Time 9
Time 10
Analysing the data - Matching
• Match Census returns to CCS returns
• Require very high quality
– Minimise false negative matches (missed matches, see
later)
• In 2001, we used hierarchical nature of data to help
match
– Match within sampled areas (geographical blocking)
– First match household
– Then match persons within households
Analysing the data - Matching
• Used a five stage strategy, designed to minimise
false negative matches:
–
–
–
–
–
Exact matching
High probability matching
Clerical assisted probability matching
Clerical matching
Final expert review of non-matches
• Developed our own in-house system
• Allowed access to scanned form images (this was
crucial)
PO155RR
PO155RR
29
29
ERIC
ERIC
SMITH
SMITH
13
13
MALE
MALE
SINGLE
SINGLE
Analysing the data - Matching
• Output:
– Match between Census and CCS
– Census only
– CCS only
Analysing the data – Estimation
• Dual System Estimation (DSE)
– Capture-recapture as used for wildlife
• Simple example: How many fish in a lake?
– Catch as many as possible on day 1
• Count them (N1)
• Mark with a red dot
• Return them to the lake
– Catch as many as possible on day 2
• Count them (N2)
• Count how many have red dots (N12)
– Number of fish in lake= (N1 * N2)/N12
Analysing the data - Estimation
• Use matched Census+CCS data
• DSE estimates adjustment for those missed in both
Census and CCS
Counted
By Census
Yes
No
Counted By CCS
Yes No
n11
n10
n1+
n01
n00
n0+
n+1
n+0
n++
DSE count (for a postcode):
n++ = n1+ x n+1 n11
Analysing the data - Estimation
• DSE assumptions
– Independence
– Homogeneity of capture probabilities
– Perfect matching
– Closure
– No list inflation
• Violation of these assumptions leads to bias (in
both directions)
• Lots of literature on DSE
Analysing the data – Estimation
• DSE can only be used within the sample
• Need additional step to get to population totals
• In 2001, we used DSE at postcode level
• Then used a ratio estimator to predict for non-
sampled postcodes (again lots of literature)
DSE
Census
Analysis – Getting to small areas
• Ratio estimator produced estimates for 500k
population blocks
• Needed estimates for Local Authorities (about 120k
population)
• Sample size not sufficient to do directly
• So used small area estimation techniques
– these borrow strength across areas
– We used a fixed effect to model LA differences
• LA population estimates from the model then
constrained to EA totals
Quick summary of 2001 UK method
• In 2001, One Number Census methodology was
developed
–
–
–
–
–
–
Large CCS (320,000 households)
Matching
Capture Recapture
Modified ratio estimator
Small area estimation to get LA totals
Imputation
• Estimated 1.5 million households missed
• 3 million persons missed (most from the missing
households but some from counted households)
Results
• England and Wales population about 50m
individuals in 20m households
• Estimated 1.5 million households missed
• 3 million persons missed (most from the missing
households but some from counted households)
Underenumeration in 2001
Underenumeration of Census by agegroup
16.0%
14.0%
12.0%
ONC/Census
10.0%
8.0%
6.0%
4.0%
2.0%
0.0%
0
1-4
5-9
10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84
Agegroup
Males
85+
Females
Response Rates in 2001
Summary
• Fundamental that the census is good
– This does not make a bad census good, it makes a
good census better!
• US, Australia, NZ, Canada, UK all measure
coverage (and most use a PES)
– All aim at measuring coverage for assessing census
quality, most do not fully adjust the outputs
– Coverage for most is around 96-98%
– Increasing problems of overcoverage
• The design and fieldwork of the PES are important
to get right
More info
• Brown, J.J., Diamond, I.D., Chambers, R.L., Buckner, L.J.,
and Teague, A.D. (1999), “A methodological strategy for a
one-number census in the UK,” Journal of the Royal
Statistical Society A, 162, 247-267.
• www.statistics.gov.uk/census2001/onc.asp
• [email protected]