Transcript Estimation

The Challenge of Integrating New
Surveys into an Existing Business Survey
Infrastructure
Éric Pelletier
Statistics Canada
ICES-III Montréal, Québec, Canada
June 18-21, 2007
Outline
Introduction to the Unified Enterprise Survey
(UES)
Culture surveys environment
Integration steps to UES
From Culture to UES: Frame, sampling, etc.
Special case: Film Production survey
UES Estimation process
Back-casting for the previous two years
Conclusion and future work
2
Unified Enterprise Survey (UES)
UES comprises many business surveys which
use unified concepts and processes
1997: 7 surveys
…
2005: 45 surveys
2006: 54 surveys
2007: 62 surveys
The goal of UES: produce reliable estimates at
the provincial and industrial levels
3
Objectives of the UES
Promote an increasing use of tax data
Reduce the cost of the surveys
Reduce the response burden
Produce estimates for the financial
variables (revenue, expenses, salaries and
wages, etc.) and non-financial variables for
all UES industrial sectors
4
UES Sampling Process
Sampling frame: Business Register of
Statistics Canada (list of establishments)
Sampling unit: Within a given enterprise, a
cluster of establishments within the same
province and industrial group
For example: establishments A and B in the
same province and industry  sampling unit
Simple units (activity in one province and one
industry) and complex units
5
UES Sampling Process
Stratification:
Province, Industry, Revenue
Strata
1 take-all stratum
2 take-some strata
1 take-none stratum  below thresholds, tax data
Exclusion thresholds
Delimit the take-none units from the take-some units
(no questionnaire is sent to the take-none)
6
UES Sample Design
T1 (unincorporated)
T2 (corporations)
Take-alls
Stratum=2
Survey
Take-some
Stratum=1
Tax
Take-none
7
UES schedule
For example, for reference year 2006 (RY2006):
Sampling: October 2006
Collection: February to October 2007
Edit & Imputation: July 2007 to December 2007
Estimation: November 2007 to March 2008
The estimates are produced within 15 months
(January 2007 to March 2008)
The estimation is done one year after the
selection of the sample
8
Culture surveys environment
‘Activity’ based frames (e.g. list of books)
Census surveys
Occasional surveys (annual surveys, not
necessarily every year)
Maintained by Culture Division
 The Culture Streamlining Initiative was put in
place to reduce the duplication in annual survey
processes while promoting the use of the
business survey infrastructure
9
Culture environment
versus UES environment
In the UES, the frame is based on industrial
structure (economic survey) rather than
activity (e.g. list of books, list of films, etc.)
For the analysts, it’s a change in the way
they are analysing the data
More flexibility in the UES environment
All the steps of a survey were compared to
facilitate the integration
10
Advantages of the
integration to UES
Common methodologies for all annual
enterprise surveys
Possible to adapt some of the parameters
for the needs of the surveys (at the
sampling, imputation or estimation process)
Infrastructure was established in 1997 with
the Enterprise Statistics Division
Relatively easy to integrate new surveys
11
Integration of surveys into UES
Two sets of surveys:
“Wave 1” surveys in RY2006 (Book Publishers, Heritage
Institutions and Performing Arts)
“Wave 2” surveys in RY2007 (Film Distribution, Film
Production, Film Post-Production, Movie Theatres and Sound
Recording)
Integration in two steps:
Step 1: From culture environment to industry-based
survey, the years before UES (called “UES_lite”)
Step 2: Integration to UES
12
Integration schedule
RY2004
RY2005
“Wave 1” UES_lite UES_lite
surveys
“Wave 2”
surveys
Culture
RY2006
RY2007
UES
UES
UES_lite UES_lite
13
UES
“UES_lite” environment
Concepts are similar to the UES surveys
The processing is done outside the UES
infrastructure
The surveys are processed by the subject
matter division and the methodology division
As opposed to UES processing, which is
primarily handled by another Statistics Canada
division called the Enterprise Statistics Division
14
From Culture to UES
Sampling, Frame:
1. Culture: Census - ‘Activity’ based
2. “UES_lite”: Sample - Establishments
3. UES: Sample - Establishments within the
same enterprise, same province, same
industry code
The analysts were able to create
reconciliation files between the frames
Some other minor differences
15
Special case:
Film Production survey
Collection:
Special case with the Film Production survey
for RY2005
The Business Register (BR) is not up-to-date
enough for this survey
Links were discovered between the sampled
establishments and establishments outside the
sampling frame
16
Special case:
Film Production survey
Pre-contact was done for all the units
Approximately 400 units were added to the
sample (these units were not on the Business
Register)
Indirect sampling was used to address this
problem
A different estimation program was created for
this survey
17
UES and “UES_lite”
Estimation Process
Total estimate = Survey portion + Tax portion
Survey portion:
Horvitz-Thompson estimator
Outlier detection and treatment
Final weight calculation
Tax portion (take-none portion):
Below the exclusion thresholds: Tax data
Domain estimations: Industry, Province, etc.
Variance and coefficient of variation (CV)
18
Special case:
Film Production survey
Estimation:
The Film Production survey RY2005 was a special
case
Due to the application of indirect sampling, the inverse
probability method was implemented (see Choudhry
(2006))
Without going into all the details,
The inverse probability method determines the probability that
at least one sampling unit on the frame which leads to the
reporting unit would be sampled
The base weight is computed as the inverse of the selection
probability
19
Special case:
Film Production survey
The complex weighting procedure led to the
use of replicates in estimating the variance of
the estimates
More precisely, the jackknife replication
method is used to calculate the variance
The estimates will be produced within the next
few weeks: the release date for RY2005 is July
2007 (same release date as the other Wave 2
surveys), a little bit behind schedule…
20
Special case:
Film Production survey
The Film Production survey for RY2007
(integration year in UES) could not be put into the
UES process because:
Cost of the post-selection additions
Timeliness
Different processes, like the jackknife replication
method for the variance calculations
Instead of the inverse probability method, the
weight share method will be used
With this method, we assign an average weight based
on the sampled units and the number of links
21
Special case:
Film Production survey
The weight share method cannot be integrated
directly into the UES process
A way to integrate the weight share method into the
UES process was derived (see Beaumont (2007))
With this, it will be adaptable to the regular UES
estimation program
The difference from the inverse probability method is
that with the weight share method, we expect a slight
increase in the variance
This “special” integration will be done at the end of
2007 / beginning of 2008
22
Estimation – Back-casting
For RY2005 (first year in “UES_lite) for the
“Wave 2” surveys, the previous estimates
were produced in the Culture environment
As was previously shown, the frame is
different for RY2005 (Business Register)
Potential break in the series
Back-casting procedure is used to
reproduce historical estimates using the
Business Register
23
Estimation – Back-casting
Back-casting is done for the two previous
reference years (for example, RY2003 and
RY2002)
A match between the units from the
RY2005 sample and the units from the
previous culture files is done using the
reconciliation files
If the unit is not matched to the previous
year’s culture files, the data is imputed
24
Estimation – Back-casting
Adjustments to the weights will be done
based on the population counts from the
Business Register for the two back-casting
years (for example, RY2003 and RY2002)
Estimates are produced by domains, and
the CV are calculated for the two backcasting years for the “Wave 2” surveys
(released date is July 2007)
25
Infrastructure - Processing
One of the main challenges in the integration of
those surveys is the communication between the
three parties:
Methodology division (responsible for the survey
methods)
Subject matter division (responsible for the content,
the analysis and the publication)
Enterprise Statistics Division (responsible for the
business survey infrastructure)
Started in October 2006, the process will be
completed in March 2009
26
Conclusion and future work
Presently, three “Wave 1” surveys are being
integrated into UES for RY2006 (sample was
selected in October 2006, estimation is being
prepared)
Next year, for RY2007, the “Wave 2” surveys will
be integrated
Because of the infrastructure, some modifications
will be made to the UES estimation program for
the Film production survey, in order to integrate
this survey into UES
27
Thanks
Special thanks to everyone who worked on
those surveys, and who helped me in the
preparation of this presentation
28
For more
Information
please contact
Pour plus
d’information,
veuillez contacter
Éric Pelletier
(613) 951-5213
[email protected]
Visit our web site at
www.statcan.ca