Small area data sources - NCRM EPrints Repository
Download
Report
Transcript Small area data sources - NCRM EPrints Repository
What is … small area estimation
Dimitris Ballas
Department of Geography
University of Sheffield
e-mail: [email protected]
http://www.sheffield.ac.uk/geography/staff/ballas_dimitris
Outline
• Small area data sources
• Why small area estimation?
• Methodological approaches to small
area estimation
• Spatial microsimulation
• Policy relevance examples
• Further reading and resources
(including web-links to free software)
Small area data sources: the census of population
• Census data describe the state of the whole
nation, area by area – no other social
survey has such comprehensive spatial
coverage
• Extremely relevant for policy analysis –
used by government in the allocation of
billions of pounds of public expenditure
• Very valuable commercially – essential
ingredients in marketing analysis and retail
modelling
After Rees, P, Martin, D, Williamson, P (eds) (2002),
The Census Data System, Chichester, Wiley
Examples of more small area data sources
Neighbourhood statistics topics
(http://www.neighbourhood.statistics.gov.uk ):
Census of population
Crime and Safety
Economic Deprivation
Education, Skills and Training
Health and Care
Housing
Physical Environment
Deprivation and Classification
Income and Lifestyles
Population and Migration
Why small area estimation?
• Need for small area estimates of variables such as
income, poverty, wealth, health, fear of crime, healthy
lifestyles…
• We know little about the interdependencies between
household structure or type and their lifestyles at the
small area level
• There is no ‘live’ geographical database of household
types linked to earning capabilities (both earned
and/or transfer payments) which can be used both to
explore spatial variations in lifestyles and behaviour
and to monitor the effects of changes in taxation,
family credit, pensions, social security payments etc.
Why small area estimation?
• Policy makers need small area estimates
• Academics need small area estimates
• Public like small area estimates
– “What’s happening in my backyard”
Policy relevance
– socio-economic impact assessment
– geographical impacts of social policy
– what-if socio-spatial analysis
6
Small area estimation methods
• Conduct a survey
- very costly
- confidentiality issues
• Small area estimation methods can be applied to get
survey data down to small area level and to evaluate
the spatial impacts of policies
• Various methodologies of small area estimation
– Statistical approaches
– Spatial microsimulation approaches
• Deterministic reweighting (IPF)
• Probabilistic reweighting (CO)
• Generalised linear regression (GREGWT)
Methodological approaches to small area
estimation
• Statistical approaches (more linked to statisticians)
– Synthetic estimation
– Multi-level modelling
– Bayesian approaches
• Spatial microsimulation approaches (more linked to
geographers)
– Deterministic reweighting approaches (IPF)
– Probabilistic reweighting approaches (combinatorial optimisation)
– Generalised linear regression (GREGWT)
• But many links between the methods
For a review of a recent effort to explore linkages between these two often
separate sets of approaches see:
http://www.ncrm.ac.uk/research/NMI/2012/smallarea.php
8
A very simple approach to generating
indirect non-survey designed estimates
- Obtain small area total numbers from the census
on variables that may be correlated with a ‘target
variable (e.g. for income would be correlated
with “occupational classification”)
- obtain information at the national, or sometimes
regional level information on the same variable
cross-tabulated by the census variable (e.g.
earnings by occupational classification)
- multiply the known census totals by average
value for each area
A model-based approach (Office for National
Statistics, Heady et al., 2003)
• Estimating ‘average weekly household’ at the electoral ward level in
England and Wales on the basis of the following predictors:
• the social class of the ward population;
• Household type/composition
• Regional/country indicators
• the employment status of the ward population
• the proportion of the ward population claiming DWP benefits;
• the proportion of dwellings in each of the Council Tax bands in a ward
• “The model-based approach is based on finding a relationship between
weekly household income (as measured in the Family Resources Survey
(FRS)) and covariate information (usually from Census or administrative
sources) for the wards that are represented in the Survey”
see http://www.neighbourhood.statistics.gov.uk/HTMLDocs/images/ModelBased_Income_Estimates%28V2%29_tcm97-51115.pdf
Spatial Microsimulation
•
•
•
A technique aiming at building large
scale data sets
Modelling at the microscale
A means of modelling real life events
by simulating the characteristics and
actions of the individual units that
make up the system where the events
occur
What is microsimulation?
PERSON
AHID
PID
AAGE12
SEX
AJBSTAT
…
AHLLT
AQFVOC
ATENURE
AJLSEG
…
1
1000209
10002251
91
2
4
…
1
1
6
9
…
2
1000381
10004491
28
1
3
…
2
0
7
-8
…
3
1000381
10004521
26
1
3
…
2
0
7
-8
…
4
1000667
10007857
58
2
2
…
2
1
7
-8
…
5
1001221
10014578
54
2
1
…
2
0
2
-8
…
6
1001221
10014608
57
1
2
…
2
1
2
-8
…
7
1001418
10016813
36
1
1
…
2
1
3
-8
…
8
1001418
10016848
32
2
-7
…
2
-7
3
-7
…
9
1001418
10016872
10
1
-8
…
-8
-8
3
-8
…
10
1001507
10017933
49
2
1
…
2
0
2
-8
…
11
1001507
10017968
46
1
2
…
2
0
2
-8
…
12
1001507
10017992
12
2
-8
…
-8
-8
2
-8
…
Static spatial microsimulation
• Reweighting probabilistic approaches, which
typically reweight an existing national
microdata set to fit a geographical area
description on the basis of random sampling
and optimisation techniques
• Reweighting deterministic approaches, which
reweight a non geographical population
microdata set to fit small area descriptions,
but without the use of random sampling
procedures
• Synthetic probabilistic reconstruction models,
which involve the use of random sampling
Static spatial microsimulation
PERSON
AHID
PID
AAGE12
SEX
AJBSTAT
…
AHLLT
AQFVOC
ATENURE
AJLSEG
…
1
1000209
10002251
91
2
4
…
1
1
6
9
…
2
1000381
10004491
28
1
3
…
2
0
7
-8
…
3
1000381
10004521
26
1
3
…
2
0
7
-8
…
4
1000667
10007857
58
2
2
…
2
1
7
-8
…
5
1001221
10014578
54
2
1
…
2
0
2
-8
…
6
1001221
10014608
57
1
2
…
2
1
2
-8
…
7
1001418
10016813
36
1
1
…
2
1
3
-8
…
8
1001418
10016848
32
2
-7
…
2
-7
3
-7
…
9
1001418
10016872
10
1
-8
…
-8
-8
3
-8
…
10
1001507
10017933
49
2
1
…
2
0
2
-8
…
11
1001507
10017968
46
1
2
…
2
0
2
-8
…
12
1001507
10017992
12
2
-8
…
-8
-8
2
-8
…
Static spatial microsimulation
Small area table 1
(household
type)
Small area table Small area table 3
2 (number
(tenure status)
of cars)
Area 1
Area 1
Area 1
60 "married couple
households"
10 no car
60 owner occupier
20 "Single-person
households"
80 1 car
20 Local Authority or
Housing association
20 "Other"
10 2+ cars
20 Rented privately
Area 2
Area 2
Area 2
40 "married couple
households"
40 no car
60 owner occupier
20 "Single-person
households"
40 1 car
20 Local Authority or
Housing association
40 "Other"
20 2+ cars
20 Rented privately
Tenure and car ownership example
Household car
ownership
characteristics
Household tenure
characteristics
1
car
2+
cars
No
car
Owneroccupier
LA/HA
rented
Other
Simulation
27
24
49
39
17
44
Census
50
20
30
60
10
30
Absolute
error
23
4
19
21
7
14
Combinatorial optimisation:
simulated annealing
• Origins in thermodynamics
• Metropolis et al. (1953) suggested an algorithm for the
efficient simulation of the evolution of a solid material to
thermal equilibrium
• Annealing is a physical process in which a solid material is
first melted in a heat bath and then it is cooled down
slowly until it crystallises
• First used in a spatial microsimulation context by
Williamson, P., Birkin, M., Rees, P. (1998), The estimation
of population microdata by using data from small area
statistics and samples of anonymised records,
Environment and Planning A, 30, 785-816
Other methodologies
• Hill-climbing, genetic algorithms
• Deterministic reweighting
approaches
• Probabilistic synthetic
reconstruction techniques (IPFbased approaches)
Deterministic Reweighting the British
Household Panel Survey (BHPS) - a
simple example (1)
A hypothetical sample of individuals (list format)
Individual
1st
2nd
3d
4th
5th
sex
male
male
male
female
female
age-group weight
over-50
1
over-50
1
under-50
1
over-50
1
under-50
1
In tabular format:
age/sex male female
under-50
1
1
over-50
2
1
Hypothetical Census data for
a small area:
age/sex male female
under-50
3
5
over-50
3
1
Reweighting the BHPS - a
simple example (2)
Calculating a new weight,
so that the sample will fit into the Census table
Individual sex
1st
male
2nd
male
3d
male
4th
female
5th
female
age-group weight New weight
1 x 3/2 = 1.5
over-50
1
1 x 3/2 = 1.5
over-50
1
1 x 3/1 = 3
under-50
1
1 x 1/1 = 1
over-50
1
1x 5/1 = 5
under-50
1
In tabular format:
age/sex male female
under-50
1
1
over-50
2
1
Hypothetical Census data for
a small area:
age/sex male female
under-50
3
5
over-50
3
1
Probabilistic synthetic reconstruction
After Birkin, M., Clarke, M. (1988), SYNTHESIS – a synthetic spatial information
system for urban and regional analysis: methods and examples,
Environment and Planning A, 20, 1645-1671
Probabilistic synthetic reconstruction techniques
Steps
Age, sex and marital status
and location (DED level)
(given)
Probability of hh of given age
, sex and location (ED level)
being at work
Random number
Economic activity assigned to
hh on the basis of random
sampling
Probability (conditional upon
the above attributes) of hh
being an employee in the
Agriculture industry
Random number
Economic activity category
assigned on the basis of
random sampling
…
1st
Age: 25
Sex: Male
Marital
Status: Single
GeoCode:
Leitrim Co.,
DED 001
Ballinamore
0.7
2nd
Age: 76
Sex: Female
Marital Status:
married
GeoCode:
Leitrim Co.,
DED 002
Cloverhill
0.2
… Last
… Age: 30
Sex: Male
Marital Status:
married
GeoCode:
Leitrim Co.,
DED 078
Rowan
… 0.7
0.55
At work
0.5
Other (e.g.
Retired)
… 0.45
… At work
0.6
… 0.2
0.4
Employee in
Agriculture
… 0.6
… Not an
employee in
Agriculture
… …
…
…
SMILE model, after Ballas, D., Clarke, G. P., Wiemers, E., (2006) Spatial microsimulation for rural policy analysis
in Ireland: The implications of CAP reforms for the national spatial strategy,
Journal of Rural Studies, vol. 22, pp. 367-378 (doi:10.1016/j.jrurstud.2006.01.002)
Dynamic spatial microsimulation
• Probabilistic dynamic models, which use event
probabilities to project each individual in the
simulated database into the future (e.g. using
event conditional probabilities).
• Implicitly dynamic models, which use
independent small area projections and then
apply the static simulation methodologies to
create small area microdata statically
Probabilistic dynamic models
Steps
Age, sex and marital status
and location (DED level)
(given)
Probability (conditional upon
age, sex, location) of hh to
migrate
Random number
Migration status assigned on
the basis of random sampling
Probability (conditional upon
age, sex, location) of hh to
survive
Random number
Survival status
1st
Age: 25
Sex: Male
Marital
Status: Single
GeoCode:
Leitrim Co.,
DED 001
Ballinamore
0.30
2nd
Age: 76
Sex: Female
Marital Status:
married
GeoCode:
Leitrim Co.,
DED 002
Cloverhill
0.05
… Last
… Age: 30
Sex: Male
Marital Status:
married
GeoCode:
Leitrim Co.,
DED 078
Rowan
… 0.26
0.2
Migrant
0.4
Non-migrant
… 0.4
… Non-migrant
0.9
0.5
0.5
Survived
0.4
Deceased
0.8
… 0.4
… Survived
after Ballas D , Clarke, G P, Wiemers, E, (2005) Building a dynamic spatial microsimulation
model for Ireland , Population, Space and Place, 11, 157–172 (http://dx.doi.org/10.1002/psp.359)
SimBritain: combining Census
data with the BHPS
Census of UK population:
• 100% coverage
• fine geographical detail
• Small area data available
only in tabular format
with limited variables to
preserve confidentiality
• cross-sectional
British Household Panel
Survey:
• sample size: more than
5,000 households
• Annual surveys (waves)
since 1991
• Coarse geography
• Household attrition
Ballas, D. , Clarke, G.P., Dorling, D., Eyre, H. and Rossiter, D., Thomas, B (2005)
SimBritain: a spatial microsimulation approach to population dynamics,
Population, Space and Place 11, 13–34 (http://dx.doi.org/10.1002/psp.351)
SimBritain modelling
approach
1.
2.
3.
4.
5.
6.
7.
8.
Establish a set of constraints
Choose a spatially defined source population
Repeatedly sample from source
Adjust weightings to match first constraint
Adjust weightings to match second constraint
…
Adjust weightings to match final constraint
Go back to step 4 and repeat loop until results
converge
9. Save weightings which define membership of
SimBritain
CONSTRAINT TABLES
TABLE
CATEGORY
Car Ownership no cars
1 car
2+ cars
Social Class
affluent
middle income less affluent
Demography
1 child
2+ children
no children
Employment
active
retired
inactive
Households
married couple
lone parent
other
Tenure
owner occupied
council tenant
other
How do we know it makes
sense?
Average age
se = 1.0
r squared = .760
beta = 1.22
RR
cage
50
48
46
44
42
40
38
36
34
32
30
30
32
34
36
38
40
sage
42
44
46
48
50
How do we know it makes
sense?
Long-term illness
se = 1.7
r squared = .767
beta = 1.19
cill
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.00
0.05
0.10
0.15
sill
0.20
0.25
0.30
The potential of microsimulation for policy
analysis
Classifying households
• Very poor: all households with income below 50% of the
median York income
• Poor: all households with income more than 50% of the
median but lower than 75% of the median
• Below-average: all households living on incomes higher
than 75% of the median but less than or equal to the
median
• Above-average: all households living on incomes higher
than the median and lower than 125% of the median
• Affluent: all households living on incomes above 125% of
the median
Ballas, D., Clarke, G P, Dorling D, Rossiter, D. (2007), Using SimBritain to Model
the Geographical Impact of National Government Policies,
Geographical Analysis 39, pp.44-77 (doi:10.1111/j.1538-4632.2006.00695.x)
Living standards of very poor households
Very poor households
1991
2001
2011
2021
Households (% of all households in York)
17.2%
17.3%
17.8%
21.3%
Individuals (% of all individuals in York)
14.7%
13.3%
13.7%
20.5%
Children (% of all children in York)
21.8%
17.7%
18.6%
38.5%
9.0%
7.3%
5.4%
7.9%
Elderly (over 64 years as a % of all individuals in
group)
30.1%
32.0%
33.3%
44.2%
Individuals in group with father's occupation:
unskilled (%)
10.5%
6.8%
3.3%
15.1%
Reporting anxiety and depression (% of all
individuals in group)
10.6%
10.3%
7.4%
3.1%
Reporting health problems with alcohol or drugs
(% of all individuals in group)
0.9%
1.1%
0.3%
0.0%
Individuals who reported that they have no one to
talk to
19.9%
23.8%
31.1%
31.5%
LLTI (as a % of all individuals in group)
Using SimBritain to Model the Geographical Impact of National
Government Policies
Working Families Tax Credits
Amount in
2002-3
Adjusted
for 1991
Couple or lone parent
£60.00
£ 42.39
Child aged
under 16
£26.35
£ 18.62
16-18
£27.20
£ 19.22
30 hours credit
£11.65
£
Disabled child credit
£35.50
£ 25.08
Couple or lone parent
£16.25
£ 11.48
Child
£46.75
£ 33.03
70% of up to
£135
70% of up
to £95.39
70% of up to
£200
70% of up
to £141.31
£22.70
£ 16.04
8.23
Enhanced disability credit
Childcare credit
One child
Two or more children
Additional partners in a polygamous marriage
The estimated spatial impact in York
The estimated spatial impact in Wales
Further reading and resources (including software)
•
•
•
•
•
Combinatorial Optimisation software (including dummy dataset and associated
documentation) by Paul Williamson (University of Liverpool):
http://pcwww.liv.ac.uk/~william/microdata/CO%20070615/CO_software.html
Iterative Proportional Fitting and integerisation R code and data: Lovelace, R, Ballas D
(2013), ‘Truncate, replicate, sample’: A method for creating integer weights for spatial
microsimulation, Computers, Environment and Urban Systems,
http://www.sciencedirect.com/science/article/pii/S0198971513000240 (open access article
including publicly available R code and data)
A recent review of the state of the art and research challenges by Adam Whitworth
(University of Sheffield): Whitworth, A et al. (2013) Evaluations and improvements in small
area estimation methodologies. Discussion Paper. NCRM
http://www.ncrm.ac.uk/research/NMI/2012/smallarea.php and
http://eprints.ncrm.ac.uk/3210/ ). This includes a Spatial Microsimulation R-Library by
Dimitris Kavroudakis (University of the Aegean) including R code available from:
http://www.shef.ac.uk/polopoly_fs/1.268326!/file/sms_Manual_v9.zip
An introductory text to spatial microsimulation: Ballas, D., Rossiter, D, Thomas, B., Clarke G,
Dorling D, (2005), Geography matters: simulating the local impacts of national social policies,
Joseph Roundtree Foundation http://www.jrf.org.uk/sites/files/jrf/1859352669.pdf
2-day NCRM/TALISMAN course: An Introduction to Spatial Microsimulation Using R,
19-20 September 2014, University of Cambridge
http://store.leeds.ac.uk/browse/extra_info.asp?compid=1&modid=2&deptid=9&catid=47&pr
odid=449