Transcript PPT
Could that be true?
Methodological issues when deriving educational
attainment from different administrative
datasources and surveys
Presentation for the
IAOS Conference on Reshaping Official Statistics
Shanghai, October 14-16, 2008
Bart F.M. Bakker
Manager Section Socio-Economic State
Statistics Netherlands
The problem
• Increasing use of administrative data for official statistics,
because:
• lower costs
• smaller response burden
• covering all elements of the population for small domain
statististics
• Surveys only additional
• The problem:
• unknown or poor quality of part of the administrative data
• unknown or poor quality of statistical outcomes if
administrative sources are combined
.......... Could that be true?
2
General idea
• Administrative data are collected with one or
more traditional survey techniques, so:
they have the same errors
as traditional surveys
• The size of the errors depends on the audits
the register keeper execute
• Variables that are important to the register
keeper are assumed to be of better quality
.......... Could that be true?
3
Survey Error (Grooves et al., 2004)
Measurement
Representation
Concept
Target population
frame error
validity
Sample frame
Operationalisation
sampling error
Sample
measurement
error
non-response
error
Response
Respondents
processing
error
correction
error
Corrected
response
Postsurvey
corrections
Survey
outcome
.......... Could that be true?
4
Measurement
Representation
Administrative concept
Target population
validity of
administrative
concept
coverage
errors
Operationalisation of
administrative concept
Set of registered
population elements
measurement
error
linking
error
Response
(administrative
concept)
Set of linked
population elements
processing
error
correction
error
Corrected response
(statistical concept)
Postlinking
corrections
register
outcome
.......... Could that be true?
5
An example: educational attainment
The goal of the project
• Determining the educational attainment of as
many persons as possible
• that can be used to derive a background
variable for all kinds of research
• and, if the validity is reasonable, can be used
for the estimation of the educational attainment
in small areas and small subgroups
• not one register available
.......... Could that be true?
6
Sources
• CRIHO: students in higher education from 1986
• ERR: students who did an exam in general
secondary education from 1999
• Education Number Registers: students in
secondary general education from 2004
• CWI: job-seekers who are registered as such in
the employment exchange from 1990
• WSF: students with student grants from 1999
• LFS: 1% samples from the population aged >15
from 1996
.......... Could that be true?
7
Table 1. The registers and their quality
Source
CRIHO
Education
Registers
ERR
WSF
CWI
Measurement
Object
Validity
register
variable
good
good
good
good
reasonable
Measurement
error
register
variable
nil
nil
nil
nil
many
Processing error
register
variable
nil
few
nil
few
few
statistical
variable
nil
nil
nil
many
many
nil
nil
Representation
register
target
population
nil
a few schools are
missing
from second year
alright,
improvements still
possible
statistical
target
population
only public higher
education in the
Netherlands from
1986
only (large part of)
public secondary
general education
from 1999
only (large part of)
secondary
education from
2003
only higher
education in the
Netherlands
from 1995
only a large
part of
jobseekers
from 1990
Linking error
statistical
target
population
nil
nil
nil
few
few
Correction error
statistical
target
population
nil
nil
nil
nil
nil
Coverage error
.......... Could that be true?
8
Micro-integration: harmonisation
• Determine the classification of educational
attainment
• Harmonise the copied information on the
training programmes
• Derive the classification
• Derive information whether certificates are
attained
• The date that the certificates are attained
.......... Could that be true?
9
Micro-integration: correction for
measurement errors
Is the educational attainment valid at the
reference date?
1. Border that the probability is <5% that
someone will attain a higher level
2. Probability <5% that someone has attained
a higher level since the latest certificate is
attained
Both empirically determined with the use of life
tables
.......... Could that be true?
10
Micro-integration: correction for
measurement errors
•
For one person on one reference date
more than one valid score on educational
attainment is available
•
Choose the source with the best quality:
1. CRIHO, Education Number Register, ERR
2. LFS
3. WSF
CWI only for weighting
.......... Could that be true?
11
Derive educational attainment
Derive the highest educational level attained from:
• all followed training programmes before reference date
• the certificates that are attained before reference date
• validity on reference date
• choose source with best quality
• downgrade the followed training programmes not
ended with a certificate
• impute with the use of age <15 years
.......... Could that be true?
12
Results: coverage
100%
90%
80%
register 15+
70%
60%
coverage
50%
LFS 15+
40%
30%
PR 0-14 +
register 15+
+ LFS 15+
20%
10%
0%
0
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100105
age
.......... Could that be true?
13
Weighting the data
Coverage shows selectivity
• underrepresentation of vocational education on
secondary level
• overrepresentation of youngsters
Weight to the population, result in two vectors
• the valid scores on educational attainment on
reference date and
• a weight
.......... Could that be true?
14
Conclusions
• Administrative data have the same errors as
traditional surveys
• And some more…
• Combining data from registers and surveys is
promising
• But complicated
• Always do research on the quality of the
administrative data
.......... Could that be true?
15