Monitoring year-to-year variation in structural business

Download Report

Transcript Monitoring year-to-year variation in structural business

Monitoring year-to-year variation
in structural business statistics
Contribution to Q2008 – Rome, 9 July 2008
Session: Editing and Imputation I
[email protected], ESTAT – G1
Structural Business Statistics (SBS)
 Yearly statistics covering the ‘Business Economy’
(NACE sections C-K).
 Geographical coverage: EEA + candidate countries
 Many characteristics: financial, employment…
 Multiple breakdowns: activity, size class, region
Actors
 Produced by National Statistical Institutes, using uniform
definitions (but data collection methodologies may vary)
 Role of ESTAT: collecting, validating data flows,
confidentiality treatment, publishing data series.
Data validation: a ‘macro editing’ tool
Main variation causes of
aggregates in data flows:
– performance of the
individual enterprises
– change in the composition
of the set of enterprises
– raw data error
(misreporting)
– data processing error
(editing flaw).
Essential characteristics of macro-editing tool
 not to overload correspondents with false alerts
 suitable threshold to single out influential anomalies
Previous practice
 Symmetric [-20%, +20%] confidence interval
 Applied to all (but a few) characteristics
 Possibly generating hundreds (if not thousands) of
“anomalous” variations
 Skilful application required by ESTAT database manager
 Small aggregates vary more -> Unreasonable burden for
NSI of small countries
Factors influencing evolution of SBS data
 Macro-economic
<- can be compensated for
– Economic growth
– Inflation PPI/CPI (SBS data are in current prices)
– Currency fluctuations
 Micro economic
– Prospering of enterprises
– Business demography in the sector
 Administrative: business register related
– Registering enterprises / deregistering merged, closed
down or suspended units
– Activity classification of enterprises
Heuristics: Basic assumptions
 Assumptions:
– year-to-year variations (YTYV) of individual enterprises = set of random
observations
– Enterprises very unevenly distributed in size and the YTYV of large
corporation influential on the sector average.
– Economies of scale come to our rescue: since large YTYV more typical
for small companies.
– variance of average: YTYV ~ 1/n
– Standard deviation on the average
YT YV~ 1/ n
 Knowing economic growth G and inflation I, change of the
aggregates could be estimated.
– So can we expect Vt є [Vt-1* (1+Gt)*(1+It)*(1 ± 2.σ/√nt-1)] with 95%
probability ?
 No, because of several sources of bias
Heuristics: sources of bias
 Non-financial business economy: NACE C-K \ J: not a full
coverage
 Stratification by NACE: non-random sample -> heavily
biased sector evolution, moreover:
– We use one unique ‘inflation’ number (CPI) instead of array of
sectoral PPI
 GDP is a sum of values added. Other characteristics:
possibly different evolution
Result of bias: expectation value =>
expectation interval
Heuristics: variability of characteristics
 A few characteristics can be negative of close to zero:
– Change in stocks or work in progress (frequently)
– Gross operating surplus (rarely)
– Value added (almost never)
 Consequences:
– Volatile characteristics -> large % YTYV
– Variance increase of the characteristics
 Measures taken:
– Dropping volatile characteristics
– Widening confidence limits of expectation interval (lack
of predictability ≈ extra bias source)
Heuristics: Bringing it together
 (Standard) Confidence interval limited by a Standard lower
boundary (SLB) and standard upper boundary (SUB)
 Adapted boundaries: number of enterprises in year t-1
2.
 SLB / ( 1 
nt 1
2.
) ; SUB * ( 1 
nt 1
)
 2. σ imply 95% confidence limits, leaving 5% anomalies (too
many) … but we have no idea about σ.
=> 2. σ is considered a parameter: We fit the value 4 to obtain an
80% reduction of the number of ‘anomalies’ as compared to
previous practice.
Heuristics: Method applied
 Standard Confidence Interval:
– width depending on characteristics
– tuned using CPI and/or growth data (compare in national currency)
– Symmetrical on log-scale
 Tuned interval for ‘Business demography’ characteristics.
– SLB / ( 1  4
) < (nt/nt-1) < SUB*( 1  4
n( t 1)
 Tuned interval for Financial characteristics
n( t 1)
)
– [SLB / (1+…) * (1+real growth) * (1+inflation rate) ; SUB*(1+ …) * (1+real
growth)*(1+inflation rate)]
 Tuned interval for Employment characteristics
– [SLB / (1+…) * (1+real growth); SUB*(1+ …) * (1+real growth)]
Confidence interval standard lower and upper boundaries
Characteristics
Inflation?
Growth?
SLB
SUB
Number of enterprises
N
N
0.82
1.22
Turnover
Y
Y
0.82
1.22
Purchases
Y
Y
0.82
1.22
Value added
Y
Y
0.77
1.30
Personnel costs
Y
Y
0.82
1.22
Number of employees
N
Y
0.82
1.22
Turnover / person empl.
Y
N
0.85
1.18
Purchases/ product.value
N
N
0.85
1.15
Implementation and discussion
 Deterministic method => programmed in Access for distribution
 Test more tolerant on small aggregates => Reduced burden for small MS
(confirmation in ‘2003-04 field test’)
 Raising awareness on influential changes
 'macro-editing tool‘: signalling suspicious aggregates:
– Business demographic change?
– Micro-data to be reviewed? Selective editing of ‘suspect’ subset.
– Same ‘macro editing tool’ front end (NSI) and back end (ESTAT) ->
shorter validation cycle
 Field test: Number of anomalies varies between 0.37% and 4.6% (!)
 Correlation low (0.15) between ‘country size’ (number of inhabitants) and
anomaly frequency: small and large MS are treated on equal footing.