measurement - World Bank
Download
Report
Transcript measurement - World Bank
+
Michael J. Gilligan, New York University
Lab Experiments for
Measurement in Program
Evaluation
+
The Task
Government/NGO/CBO
programs wish to change
participants attitudes and beliefs in particular
ways
Typically
these program coach participants in the
‘right’ set of attitudes and beliefs.
Examples
Pro-social behaviors: contributions to public goods, trust,
tolerance, non-violence and so on
Attitude and behaviors toward marginalized groups:
women minorities, particular ethnic groups
These
programs would like to be able to measure
whether their efforts have been successful
+ The Problem
Randomized
control trials are essential to be able
to make causal statements about the effects of the
program
But
randomized control trials are not a solution to
the measurement problem—indeed they are a
hindrance to it.
RCT
programmers only operate with ‘treated’
populations so only treated populations receive
coaching on the ‘right’ responses
RCTs, the
very thing that is insuring unbiasedness
with respect to subject pools (balance) is
introducing bias in measurement
+
Social Capital &
Pro-social Attitudes
+ Definition
[S]ocial
networks and the norms of
reciprocity and trustworthiness that
arise from them. …[S]ocial capital is
closely related to ... “civic virtue.''
The difference is … civic virtue is
more powerful when embedded in a
dense network of reciprocal social
relations. A society of many virtuous
but isolated individuals is not
necessarily rich in social capital
(Putnam 2000).
+
We
are interested in
measuring:
Altruism
Trust
Trustworthiness
Willingness
to contribute to public
goods
The social networks that
(purportedly) support these
behaviors
+ Implications for Development
Trust: crucial
for cost-effective self
enforcement of contracts
Compliance
with social norms: nonviolence, compromise, fairness
Contributions
to public goods:
essential for economic efficiency
Respect
for legitimate sources of
authority
+ A Few Findings (among many)
Putnam
(1993) shows that local
governments in Italy are more efficient
where there is greater civic engagement.
Knack
and Keefer (1997) demonstrate that
increases in country-level trust lead to
large increases in the country’s economic
growth.
La
Porta et. al. (1997) establish a strong
positive link between trust and judicial
efficiency and a strong negative link
between trust and corruption.
+ Implications
The World
Bank and other international
actors have many programs to foster social
capital and pro-sociality
Community-based DDR
Community-driven development
programs
A focus on local capacity in development
efforts
“Local ownership” of development
programs to foster sustainability
+
Measuring Social Capital and
Social Norms
These
are very difficult concepts
to measure
In
many cases they are not observed
directly
Indicators differ greatly across different
cultures
People are often unwilling to reveal
behavior that is not pro-social
+ Traditional survey measures
‘Generally
speaking, would you say that most
people can be trusted or that you can’t be too
careful in dealing with people?’ (World Values
Survey)
“Would you be willing to contribute a day of
free time to … ?”
How difficult do you think it would be for your
community to reach agreement on …?”
In the last three months have you contributed
time or money to a community-based
organization?
Did you vote in the last election?
+
Bias concerns with surveys
Programmers
coach respondents
in the ‘right’ answers to these
types of questions
They
do not operate in control
communities at all so respondents
many not even know the ‘right’
answers
+ Observational
Measures
Number
of people who voted
in the last election
Number
of people who show
up to clean up a public park
Contributions
fund
to a community
+ The measures have great external
(real world) validity but …
Are
we measuring social attitudes or
leadership strength?
…
or intimidation?
…or
corruption?
Example: Voter
turnout in the Soviet
Union was routinely above 98 percent
‘Good’ outcomes
may be caused by
the exact opposite of good institutions
and pro-social attitudes
+ Structured Observational Measures
‘Structured
Community Activities’ (Casey
Glennerster and Miguel)
Funds
collected in matching-grant scheme
Decision making over allocating salt or batteries
Allocation of tarpaulin
Tuungame
Project, Congo (Humpreys,
Sanchez de la Sierra and van der Windt
2013)
Participation
in matching funds for a public good
Allocation of a $100 ‘windfall’
Participation in a community meeting
+ Structured
Observational
Measures
Structured
and therefore more
comparable to each other
Have
great external validity …
but we still cannot disentangle
individual factors (attitudes) from
community-wide factors
(leadership, institutions)
+ Lab-in-the Field Activities
Observing
behavior in a controlled
laboratory setting
All
social pressures, political institutional
effects etc., are removed by design of the
experiment
We
observe only people’s responses to the
incentives that we (the experimenters)
offer them
We
are able to disentangle attitudes from
community-wide factors
+ Loss in External Validity
Community-wide
factors
(leadership, institutional
efficiency) are excluded from the
lab so we cannot obtain measures
of them
Thus
lab activities are best
combined with the other
measurement methods
+ Behavioral games
Three
important games are:
Altruism game
Trust game
Public goods game
Our main interest is in the altruism,
trust and public goods games, but we
also need to conduct the other games
to control for risk attitudes, patience
and altruism
+ Game Instruction
+
+ Altruism Activity
Subjects
were given a sum of money
Nepal; 40
NPR in 5 NPR notes
Sudan: 3 pounds in half-pound coins
Cambodia: 16,000 KHR in 4,000 KHR notes
Subjects
decide how much they
want to contribute to a local needy
family
The
identity of the family is not
revealed
+
+ Trust/Trustworthiness Activity
Subjects
are randomly assigned to one of two roles:
sender or receiver (we use neutral names in the
field)
Both
types are given initial endowment of money
Senders
decide how much of their endowment to
send to the receiver
We
triple that amount and give it to the receiver
The
receiver decides how much of this total to
return to the sender
All
players and types are anonymous
Nash: send
Social
zero, return zero
optimum: send full endowment, return
whatever is necessary to support trusting behavior
+
+ Public Goods Game
All
subjects play simultaneously
Each
player is given two cards, one
with an “X” and one blank
For
each “X” card turned in in the
first round all players receive an
amount of money, say 4NPR
Turning
in an “X” card in the second
round earns the player that turned it
in a larger amount, say 20 NPR
+
Attitudes Toward
Marginalized Groups
+ Examples
Many
programs are interested
improving the status of
marginalized groups, especially
women
Governments/NGOs/CBOs
are
often interested in easing (often
violent) ethnic rivalries,
especially in post-conflict settings
+ Same Problem
RCT
programmers only operate
with ‘treated’ populations so only
treated populations receive
coaching on the ‘right’ responses
RCTs, the
very thing that is
insuring unbiasedness with
respect to subject pools (balance)
is introducing bias in
measurement
+ A Variety of Options
Standard
games (altruism, trust, public
goods etc.) can be used to measure attitudes
toward ‘out groups’ groups
Bracic 2013 attitudes toward Roma in the
former Yugoslavia
Observing
behavior of deliberation,
cooperation and teamwork among mixed
groups
Karpowitz and Mandelberg 2014
deliberation in mixed groups of men and
women
+ Observing group behavior
Bales
Interaction Process Analysis
Participants
are given a task that requires a
group decision or cooperation
Record
interactions according to a specific
set of criteria to code whatever the
researcher is interested in measuring
(respect, hostility, etc.)
The
trick
Not
cuing participants that this is a study of ingroup out-group interaction
Incentivizing participants to act according to
beliefs about the out-group
+ Example: Attitudes toward Gender
and Ethnicity in the Liberian
National Police (LNP)
The
government of Liberia
adopted an explicit 30% quota for
women in the LNP
We
did NOT conduct an RCT but
we were interested in: testing
some of the assumptions of the
gender program
+ Program proponents claimed that
more women would produce a variety
of benefits
More consensual decision making
Greater sensitivity to gendered
crimes
Decades
of social psychology
findings that women would not
participate fully in group
deliberations.
+ The program had been underway for several
years so officers new the attitudes toward
female officer that they were supposed to
have
Thus
a survey would not have been a
convincing measurement strategy
We
had groups of size officers complete
team tasks and randomized the number of
female officers in each group
We
observed team members’ to see
if men reacted differently in groups with more women
Groups with more women deliberated more consensually
and were more likely to see crime as gendered
+ Findings
Female
officers were not, in general, more
likely to see a gendered crime but more
competent women were
Groups
with more women members were
not more likely to see a gendered crime
Groups
with more women were not more
consensual
Backlash
effect: Men in majority female
groups were significantly more aggressive.
+ Conclusion
Programming by its very nature coaches beneficiaries in
giving the types of survey responses answers the program
would like to hear
Randomization exacerbates this problem
Behavioral measures are appealing but:
Measures with high external validity can make it hard to
disentangle mechanisms at the individual and community
level
Fine tuning individual incentives correctly get at attitudes
even when subjects are cued to the ‘right’ answer:
monetary reward will induce people will act on actually
held beliefs rather than the ‘socially correct’ ones
Lab-in-the field activities address both of these issues and
provide an important tool for measuring the social effects of
programs, at some loss of external validty