VIP project-Statistical Matching in the framework of - CROS
Download
Report
Transcript VIP project-Statistical Matching in the framework of - CROS
Statistical Matching in the framework of
the modernization of social statistics
Aura Leulescu & Emilio Di Meglio
EUROSTAT
Unit F3 - Living conditions and social protection statistics
Key priorities in the EU context
to respond to cross-cutting and complex user needs by
providing broad indicators on economic well-being and Quality
of Life (Stiglitz Report, Europe 2020, GDP and beyond
communication, OECD initiative on measuring well-being, etc.);
– Demand for a comprehensive and coherent system of
socio-economic statistics
to go beyond aggregates and capture heterogeneity in the
population: multivariate distributions, sub-national statistics,
vulnerable sub-groups;
– Demand for micro-level statistical information that
encompasses both social and economic aspects
2
Premises
No single survey can provide all the necessary
information
No common identifiers allow record linkage at EU level
Need for micro (meso)-level integrated statistical
information from a coordinated network of surveys
and data collection processes at EU level
Statistical matching?
High potential benefits:
– Increased and better use of existing data at minimum costs,
– Enhanced conceptual and statistical consistency across
surveys,
– Development of in house expertise in the domains of data
matching transferable to other projects.
But also high risks:
– Inherent limitations of statistical matching techniques and
model-based imputation;
– Need to consider both micro level data matching and
meso-level data matching (small sub-populations could
also be matched).
4
Matching project: 1) Scope
This project should:
carry-out methodological work, identify and test statistical
matching algorithms based on the “fitness for purpose”
principle;
identify suitable criteria for assessing validity of findings based
on both input quality and the robustness of the matching
methods proposed;
produce methodological guidelines and recommendations for
further implementation in Eurostat and/or MSs.
5
Matching project: 2) Investigation streams
The project should assess the quality of the results and the
relevance of the approach to cover specific needs:
Material well-being estimates based on wealth, consumption
and income (matching of HFCS, HBS and SILC);
Quality of Life indicators that go beyond monetary
resources (matching of SILC with LFS and EHIS and outside
sources, such as ESS and EQLS);
Poverty estimates at regional level, linked to the monitoring
of Europe 2020 (matching of data from SILC, EHIS and LFS).
6
Matching project: 3) Timeline
I phase: some preliminary analysis focused especially on
setting the boundaries for the project
– Dec 2010- July 2011 External contract for matching EUSILC, ESS and EQLS
– Dec 2010- April 2011 In-house matching exercise (review
state of the art & preliminary analysis focused on the
reconciliation datasets)
II phase
– May 2011- Dec 2012 Follow up of the in-house exercise
– May 2011 Launch call of tender (according to preliminary
results of the three investigation streams)
– November 2011 Signature contract(s)
– December 2012 Recommendations for implementation
7
Matching project: 4) Organizational aspects
The project is expected:
– to draw on both external contracts and the development of
in-house expertise on matching techniques;
– to involve various stakeholders: concerned units in
Eurostat, ECB, Eurofound, Commission users (DG EMPL,
DG SANCO, DG REGIO) and academic experts;
– to develop synergies with ESS initiatives:
• Core social variables
• ESSnet on Data Integration
• ESSnet on Small Area Estimation
Matching exercise: ex-ante reconciliation 1
Main purpose: identify specific realistic objectives
Identify target variables
a) Income, consumption and wealth
– HFCS: value of assets and liabilities;
– EU-SILC: material deprivation, detailed income;
– HBS: food expenditure, leisure goods and services, transport expenditure;
b) Quality of life indicators
– EQLS/ESS: social capital, quality of society, satisfaction variables
– LFS: job quality, training...
– SILC: standards of living
c) Regional estimates
– Impute household disposable equivalized income in LFS
Matching exercise ex-ante reconciliation 2
Select matching/ stratification variables
– Predictive power (econometric models, correlations, multivariate analysis)
– Data quality
– Consistency of concepts and statistical content
Deal with different weights from the various surveys
Define the observation level
– Individual
– Household
– Sub-population
What type of auxiliary information we can use to validate results?
–
–
overlap samples (NL);
(partial) overlap variables (income classes in EQLS; some material
deprivation; food consumption in HFCS)
Matching exercise: methods and quality
assessment - Preliminary ideas
Matching algorithms
– Hot deck techniques, regression based, multiple imputation?
– Deal with complex survey designs (constraints)
– Create synthetic datasets versus estimate parameters (e.g. estimate
frequencies by class of income & wealth);
How to assess quality/validity?
– Checking the marginal and joint distributions of the donor/fused dataset;
– Assess probability of good match (ex.: distribution distances donorrecipient)
Need to assess the sensitivity of the results to changes in assumptions:
– Simulation exercises; auxiliary information; theoretical validation;
Some applications: SPSD Canada (Liu& Kovacevic, 1997), ISTAT (Coli
et al, 2006)
11