Transcript Document
Ethnic dealignment?
Combining individual and aggregate data to
improve estimates of ethnic voting in Britain in
2001 and 2005
Stephen Fisher, Jane Key, Nicky
Best, Sylvia Richardson
Department of Sociology, University of Oxford and
Department of Epidemiology and Public Health
Imperial College, London
http://www.bias-project.org.uk
Outline
Introduction and substantive issues
Methods and Results
Standard multilevel model
Ecological and Hierarchical Related Regression
Results
Results
Discussion and further work
Introduction and substantive issues
NCRM BIAS project: Overall goals
To develop a set of statistical frameworks for
combining data from multiple sources
To improve our capacity to handle limitations
inherent in observational data.
Key statistical tools: Bayesian hierarchical
models and ideas from graphical models form
the basic building blocks for these
developments
A decline in ethnic minority support for Labour?
Ethnic minority vote consistently around 80% from 1974 to 2001
Between 2001 and 2005 there were
Islamic terrorist attacks
US and UK led invasions of Afghanistan and Iraq
Heightened security and suspicion of non-whites
Unlawful detention of foreign terror suspects
Convictions of British soldiers for Iraqi prisoner abuse
These and other events are thought to have undermined support
for Labour among ethnic minorities.
On the other hand, harsh stance on immigration in Conservative
2005 election campaign may have alienated ethnic voters
This paper seeks to test whether the gap in Labour vote between
whites and non-whites narrowed between 2001 and 2005.
Data
Problem:
Not enough high quality survey data on ethnic minorities
So combine individual and aggregate data using new class of
multilevel models developed by BIAS
Individual-level:
British Election Study post-election surveys.
97 registered ethnic minorities in 2001, and 137 in 2005
Constituency-level:
2001 and 2005 election results
2001 Census data on % who are non-white
Population:
Focus on Labour voting as proportion of registered pop. since
census might be reasonable proxy for this, but not voting pop.
Methods and Results
Data sources: individual data
Source of data
Resolution
Coverage
Variables
N
British Election
Study,
2001, 2005
Individual
Sample of
electorate
Individuals
Outcome*, 2,669
$
Predictor 3,431
General Election
results,
2001, 2005
Area
(Parliamentary
Constituency)
Electorate
Outcome
2001 Census
†
‡
Constituencies
128
(2001)
128
(2005)
43.2 million 641
43.1 million 628
(2001)
(2005)
Area
Population Predictor 57.1 million 641
(Parliamentary
Constituency)
†
*Outcome = Vote choice (Labour / other);
Outcome = Proportion voting Labour
$
‡
Predictor = Ethnicity (white / non-white);
Predictor = Proportion non-white
Multilevel model for individual BES data
b
a
s2
qi
xij
yij
person j
area i
Multilevel model for individual BES data
b
a
s2
qi
xij
yij
person j
area i
yij = voted Labour (1) / other (0)
xij = non-white (1) / white (0)
Multilevel model for individual BES data
b
a
s2
qi
xij
yij
person j
area i
yij = voted Labour (1) / other (0)
xij = non-white (1) / white (0)
yij ~ Bernoulli(pij),
person j, area i
logit pij = a + b xij + qi
Multilevel model for individual BES data
b
a
s2
qi
xij
yij
person j
area i
yij = voted Labour (1) / other (0)
xij = non-white (1) / white (0)
yij ~ Bernoulli(pij),
person j, area i
logit pij = a + b xij + qi
qi ~ Normal(0, s2)
Multilevel model for individual BES data
b
a
s2
qi
xij
yij
person j
area i
yij = voted Labour (1) / other (0)
xij = non-white (1) / white (0)
yij ~ Bernoulli(pij),
person j, area i
logit pij = a + b xij + qi
qi ~ Normal(0, s2)
b = individual-level effect of
ethnicity on vote choice
qi = “unexplained” area effects
Results: Share of the vote for whites and non-whites
based on BES survey data
A. As a proportion of voters
N
% Labour
Std. error.
95% CI
White
1876
43
1.4
(40,46)
2001
Non-White
67
72
6.1
(60,84)
White
2546
36
1.3
(33,38)
2005
Non-White
97
51
6.3
(38,63)
Results: Share of the vote for whites and non-whites
based on BES survey data
A. As a proportion of voters
N
% Labour
Std. error.
95% CI
White
1876
43
1.4
(40,46)
2001
Non-White
67
72
6.1
(60,84)
White
2546
36
1.3
(33,38)
2005
Non-White
97
51
6.3
(38,63)
Results: Share of the vote for whites and non-whites
based on BES survey data
B. As a proportion of electorate
N
% Labour
Std. error.
95% CI
White
2572
33
1.1
(31,35)
2001
Non-White
97
55
5.7
(43,66)
White
3294
27
1.0
(25,29)
2005
Non-White
137
37
5.0
(27,47)
Results: Share of the vote for whites and non-whites
based on BES survey data
B. As a proportion of electorate
N
% Labour
Std. error.
95% CI
White
2572
33
1.1
(31,35)
2001
Non-White
97
55
5.7
(43,66)
White
3294
27
1.0
(25,29)
2005
Non-White
137
37
5.0
(27,47)
Results from regression analysis of BES electorate data
2001
No random effect
With random effect
2005
Change
-1.0
-0.5
0.0
0.5
1.0
estimated effect of ethnicity on Labour voting
(log odds ratio)
Comments
Between 2001 and 2005, non-white vote drops from
72% to 51% (voters)
55% to 37% (electorate)
Small sample size → large SE and CI for the proportion of nonwhites voting Labour
Gap in Labour vote between whites and non-whites narrows from
29 points in 2001 to 15 points in 2005 (voters)
22 points in 2001 to 10 points in 2005 (electorate)
But, change is not statistically significant (multilevel analysis)
Is this just because sample size is too small?
What can we learn from aggregate data?
Data sources: individual and aggregate data
Source of data
Resolution
Coverage
Variables
N
British Election
Study,
2001, 2005
Individual
Sample of
electorate
Individuals
Outcome*, 2,669
$
Predictor 3,431
General Election
results,
2001, 2005
Area
(Parliamentary
Constituency)
Electorate
Outcome
2001 Census
†
‡
Constituencies
128
(2001)
128
(2005)
43.2 million 641
43.1 million 628
(2001)
(2005)
Area
Population Predictor 57.1 million 641
(Parliamentary
Constituency)
†
*Outcome = Vote choice (Labour / other);
Outcome = Proportion voting Labour
$
‡
Predictor = Ethnicity (white / non-white);
Predictor = Proportion non-white
Standard ecological regression model
t2
a
b
ci
Yi
area i
Xi
Ni
Standard ecological regression model
Yi = number voting Labour
Ni = registered electorate
t2
a
b
Xi = proportion non-white
ci
Yi
area i
Xi
Ni
Standard ecological regression model
Yi = number voting Labour
Ni = registered electorate
t2
a
b
Xi = proportion non-white
Yi ~ Binomial(qi, Ni),
area i
ci
logit qi = a + bXi + ci
Yi
ci ~ Normal(0, t2)
area i
Xi
Ni
Standard ecological regression model
Yi = number voting Labour
Ni = registered electorate
t2
a
b
Xi = proportion non-white
Yi ~ Binomial(qi, Ni),
area i
ci
logit qi = a + bXi + ci
Yi
ci ~ Normal(0, t2)
b = effect of area ethnicity on
probability of voting Labour
b ≠ b → ecological bias
area i
Xi
Ni
Ecological bias
Bias in ecological studies can be caused by:
Confounding
confounders can be area-level (between-area) or individuallevel (within-area).
→ include control variables and/or random effects in model
Non-linear covariate-outcome relationship, combined with
within-area variability of covariate
No bias if covariate is constant in area (contextual effect)
Bias increases as within-area variability increases
…unless models are refined to account for this hidden
variability
Alleviating ecological bias
Alleviate bias associated with within-area covariate variability
Obtain information on within-area distribution fi(x) of
covariates, e.g. from individual-level data
Use this to form well-specified model for ecological data
by integrating (averaging) the underlying individual-level
model
Yi ~ Binomial(qi , Ni); qi = pij(x) fi(x) dx
qi is average group-level probability (of voting Labour)
pij(x) is individual-level probability given covariates x
fi(x) is distribution of covariate x within area i
Alleviating ecological bias
Consider single binary covariate x, e.g. white/non-white
f(xi) → proportion of individuals with x=1 in each area
Individual-level model
pij = probability of voting Labour
log pij = a + b xij
(log link assumed for simplicity)
→ pij = ea if person j is white (xij=0)
pij = ea+b if person j is non-white (xij=1)
Integrated group-level model
Xi = proportion non-white in area i (mean of xij)
qi = average probability (proportion) voting Labour area i
= ∑j pij /Ni = ea (1-Xi) + ea+b Xi
Standard ecological regression model
t2
Yi ~ Binomial(qi, Ni),
logit qi = a + bXi + ci
a
b
area i
ci
ci ~ Normal(0, t2)
Yi
area i
Xi
Ni
Integrated ecological regression model
s2
Yi ~ Binomial(qi, Ni),
a
b
area i
qi = pij(xiji,a, b,qi)fi(x)dx
qi
qi ~ Normal(0, s2)
Yi
area i
Xi
Ni
Integrated ecological regression model
s2
Yi ~ Binomial(qi, Ni),
a
b
area i
qi = pij(xiji,a, b,qi)fi(x)dx
qi
qi ~ Normal(0, s2)
b can be interpreted as
individual-level effect of
ethnicity on probability of
voting Labour
Yi
area i
Xi
Ni
Combining individual and aggregate data
Multilevel model
for individual data
b
xij
a
Integrated
ecological model
s2
s2
qi
qi
yij
person j
a
b
Yi
area i
area i
Xi
Ni
Combining individual and aggregate data
a
b
Hierarchical Related
Regression
(HRR) model
s2
Joint likelihood for yij
and Yi depending on
shared parameters a,
b, qi and s2
qi
xij
yij
Yi
person j
area i
Xi
Ni
Combining individual and aggregate data
a
b
s2
Estimation carried out using
R software (maximum likelihood)
or WinBUGS (Bayesian)
qi
xij
yij
Yi
person j
area i
Xi
Ni
Comparison of results from individual and HRR analysis
Individual
Combined (HRR)
2001
No random effect
With random effect
2005
Change
-1.0
-0.5
0.0
0.5
1.0
estimated effect of ethnicity on Labour voting
(log odds ratio)
Comparison of results from individual and HRR analysis
Individual
Combined (HRR)
2001
No random effect
With random effect
2005
Change
-1.0
-0.5
0.0
0.5
1.0
estimated effect of ethnicity on Labour voting
(log odds ratio)
Discussion and further work
Conclusions
BES survey estimates halving of gap in Labour voting
between whites and non-whites of from 29 to 15 points
Due to small-N for ethnic minorities, not statistically significant
Combined aggregate and individual level HRR analysis
suggests a significant decline in the ethnic voting gap
But if constituency level random effects are allowed for the
change is again statistically insignificant
→ considerable heterogeneity between constituencies
→ suggests other important individual or area predictors
Lack of statistical significance
may reflect data problems (see below)
may be ‘real’ – BES may over-estimate change in Labour share
of ethnic vote (quota sample reported 66% ethnic minorities
questioned voted Labour in 2005, compared with 51% in BES)
Substantive Data Limitations
Norm is to consider share of the vote, so unfortunate that this
can’t be done using HRR model
Ethnic minorities aren’t all the same
But Labour voting as a share of the electorate still a valid issue,
substantive conclusions likely to be similar.
Previous research suggests Blacks more Labour than S Asians
Unfortunately not enough data or variance (at both levels) to
explore differences between minority groups.
Other sources of ecological bias are likely due to absence of
controls for other relevant variables, eg. socio-economic factors
HRR models can be extended to include additional variables
Requires constituency-level data on joint distribution of ethnicity
and other relevant variables
Strengths of HRR approach……
Aims to provide individual-level inference using aggregate
data by:
Fitting integrated individual-level model to alleviate one
source of ecological bias
Including samples of individual data to help identify effects
Uses data from all constituencies, not just those in BES
survey
Improves precision of parameter estimates
…..and limitations of HRR approach
Integrated individual-level model relies on large contrasts in
the predictor proportion across areas
Limited variation in % non-white across constituencies:
(median 2.7%, 95th percentile 33%; only 9 constituencies
in 2005 had non-white majority)
Our estimates may not be completely free from ecological
bias (Jackson et al, 2006)
Estimation of ethnicity effect strongly confounded with
area random effects
Further Work
Further analysis will consider fuller model specifications with
ethnic contextual effects and individual and aggregate level
control variables
Also intend to investigate inclusion of other sources of
individual-level data, such as opinion polls, in HRR models
References
Fisher S, Key J, Best N, Richardson S. Ethnic dealignment? Combining individual
and aggregate data to improve estimates of ethnic voting in Britain in 2001 and
2005. Paper in preparation.
Jackson C, Best N and Richardson S. (2008) Studying place effects on health by
synthesising individual and area-level outcomes. Social Science and Medicine,
67:1995-2006
Jackson C, Best N and Richardson S. (2008) Hierarchical related regression for
combining aggregate and individual data in studies of socio-economic disease
risk factors. J Royal Statistical Society Series A: Statistics in Society
171(1):159-178
Jackson C, Best N and Richardson S. (2006) Improving ecological inference using
individual-level data Statistics in Medicine, 25(12):2136-2159
Papers available from
www.bias-project.org.uk
Validated turnout based on BES data
N
% Voting
Std. error.
95% CI
White
2572
72
1.1
(70,74)
2001
Non-White
97
66
5.7
(54,77)
White
3294
75
1.0
(73,77)
2005
Non-White
137
73
4.5
(64,81)
Simulation Study
True
Effect
True log
OR
Ecological model
Eco + Ind model
Smoking range
%
0-25%
0 -exposed:
25%
(100 areas)
areas)
(100
Individual data
Smoking range
%
0-50%
0 -exposed:
50%
(100 areas)
areas)
(100
Area data
Area data + sample
of 10 individuals
Smoking range
%
0 -exposed:
100% 0-100%
(100 areas)
areas)
(100
Smoking range
exposed: 0-25%
0%
- 25%
(25
(25areas)
areas)
-0.5
0.0
0.5
1.0
log
OR
IHD
for smokers
EstimatedLog
effect
exposure
on outcome
RRofofof
IHD
for whites
smokers
1.5