measurement - World Bank

Download Report

Transcript measurement - World Bank

+
Michael J. Gilligan, New York University
Lab Experiments for
Measurement in Program
Evaluation
+
The Task
 Government/NGO/CBO
programs wish to change
participants attitudes and beliefs in particular
ways
 Typically
these program coach participants in the
‘right’ set of attitudes and beliefs.
 Examples


Pro-social behaviors: contributions to public goods, trust,
tolerance, non-violence and so on
Attitude and behaviors toward marginalized groups:
women minorities, particular ethnic groups
 These
programs would like to be able to measure
whether their efforts have been successful
+ The Problem
 Randomized
control trials are essential to be able
to make causal statements about the effects of the
program
 But
randomized control trials are not a solution to
the measurement problem—indeed they are a
hindrance to it.
 RCT
programmers only operate with ‘treated’
populations so only treated populations receive
coaching on the ‘right’ responses
 RCTs, the
very thing that is insuring unbiasedness
with respect to subject pools (balance) is
introducing bias in measurement
+
Social Capital &
Pro-social Attitudes
+ Definition
[S]ocial
networks and the norms of
reciprocity and trustworthiness that
arise from them. …[S]ocial capital is
closely related to ... “civic virtue.''
The difference is … civic virtue is
more powerful when embedded in a
dense network of reciprocal social
relations. A society of many virtuous
but isolated individuals is not
necessarily rich in social capital
(Putnam 2000).
+
We
are interested in
measuring:
Altruism
Trust
Trustworthiness
Willingness
to contribute to public
goods
The social networks that
(purportedly) support these
behaviors
+ Implications for Development
Trust: crucial
for cost-effective self
enforcement of contracts
Compliance
with social norms: nonviolence, compromise, fairness
Contributions
to public goods:
essential for economic efficiency
Respect
for legitimate sources of
authority
+ A Few Findings (among many)
 Putnam
(1993) shows that local
governments in Italy are more efficient
where there is greater civic engagement.
 Knack
and Keefer (1997) demonstrate that
increases in country-level trust lead to
large increases in the country’s economic
growth.
 La
Porta et. al. (1997) establish a strong
positive link between trust and judicial
efficiency and a strong negative link
between trust and corruption.
+ Implications
 The World
Bank and other international
actors have many programs to foster social
capital and pro-sociality
 Community-based DDR
 Community-driven development
programs
 A focus on local capacity in development
efforts
 “Local ownership” of development
programs to foster sustainability
+
Measuring Social Capital and
Social Norms
These
are very difficult concepts
to measure
 In
many cases they are not observed
directly
 Indicators differ greatly across different
cultures
 People are often unwilling to reveal
behavior that is not pro-social
+ Traditional survey measures
 ‘Generally
speaking, would you say that most
people can be trusted or that you can’t be too
careful in dealing with people?’ (World Values
Survey)
 “Would you be willing to contribute a day of
free time to … ?”
 How difficult do you think it would be for your
community to reach agreement on …?”
 In the last three months have you contributed
time or money to a community-based
organization?
 Did you vote in the last election?
+
Bias concerns with surveys
Programmers
coach respondents
in the ‘right’ answers to these
types of questions
They
do not operate in control
communities at all so respondents
many not even know the ‘right’
answers
+ Observational
Measures
Number
of people who voted
in the last election
Number
of people who show
up to clean up a public park
Contributions
fund
to a community
+ The measures have great external
(real world) validity but …
Are
we measuring social attitudes or
leadership strength?
…
or intimidation?
…or
corruption?
Example: Voter
turnout in the Soviet
Union was routinely above 98 percent
‘Good’ outcomes
may be caused by
the exact opposite of good institutions
and pro-social attitudes
+ Structured Observational Measures
 ‘Structured
Community Activities’ (Casey
Glennerster and Miguel)
 Funds
collected in matching-grant scheme
 Decision making over allocating salt or batteries
 Allocation of tarpaulin
 Tuungame
Project, Congo (Humpreys,
Sanchez de la Sierra and van der Windt
2013)
 Participation
in matching funds for a public good
 Allocation of a $100 ‘windfall’
 Participation in a community meeting
+ Structured
Observational
Measures
Structured
and therefore more
comparable to each other
Have
great external validity …
 but we still cannot disentangle
individual factors (attitudes) from
community-wide factors
(leadership, institutions)
+ Lab-in-the Field Activities
 Observing
behavior in a controlled
laboratory setting
 All
social pressures, political institutional
effects etc., are removed by design of the
experiment
 We
observe only people’s responses to the
incentives that we (the experimenters)
offer them
 We
are able to disentangle attitudes from
community-wide factors
+ Loss in External Validity
Community-wide
factors
(leadership, institutional
efficiency) are excluded from the
lab so we cannot obtain measures
of them
Thus
lab activities are best
combined with the other
measurement methods
+ Behavioral games
 Three
important games are:
Altruism game
Trust game
Public goods game

Our main interest is in the altruism,
trust and public goods games, but we
also need to conduct the other games
to control for risk attitudes, patience
and altruism
+ Game Instruction
+
+ Altruism Activity
Subjects
were given a sum of money
 Nepal; 40
NPR in 5 NPR notes
 Sudan: 3 pounds in half-pound coins
 Cambodia: 16,000 KHR in 4,000 KHR notes
Subjects
decide how much they
want to contribute to a local needy
family
The
identity of the family is not
revealed
+
+ Trust/Trustworthiness Activity
 Subjects
are randomly assigned to one of two roles:
sender or receiver (we use neutral names in the
field)
 Both
types are given initial endowment of money
 Senders
decide how much of their endowment to
send to the receiver
 We
triple that amount and give it to the receiver
 The
receiver decides how much of this total to
return to the sender
 All
players and types are anonymous
 Nash: send
 Social
zero, return zero
optimum: send full endowment, return
whatever is necessary to support trusting behavior
+
+ Public Goods Game
All
subjects play simultaneously
Each
player is given two cards, one
with an “X” and one blank
For
each “X” card turned in in the
first round all players receive an
amount of money, say 4NPR
Turning
in an “X” card in the second
round earns the player that turned it
in a larger amount, say 20 NPR
+
Attitudes Toward
Marginalized Groups
+ Examples
Many
programs are interested
improving the status of
marginalized groups, especially
women
Governments/NGOs/CBOs
are
often interested in easing (often
violent) ethnic rivalries,
especially in post-conflict settings
+ Same Problem
RCT
programmers only operate
with ‘treated’ populations so only
treated populations receive
coaching on the ‘right’ responses
RCTs, the
very thing that is
insuring unbiasedness with
respect to subject pools (balance)
is introducing bias in
measurement
+ A Variety of Options
 Standard
games (altruism, trust, public
goods etc.) can be used to measure attitudes
toward ‘out groups’ groups
 Bracic 2013 attitudes toward Roma in the
former Yugoslavia
 Observing
behavior of deliberation,
cooperation and teamwork among mixed
groups
 Karpowitz and Mandelberg 2014
deliberation in mixed groups of men and
women
+ Observing group behavior
 Bales
Interaction Process Analysis
 Participants
are given a task that requires a
group decision or cooperation
 Record
interactions according to a specific
set of criteria to code whatever the
researcher is interested in measuring
(respect, hostility, etc.)
 The
trick
 Not
cuing participants that this is a study of ingroup out-group interaction
 Incentivizing participants to act according to
beliefs about the out-group
+ Example: Attitudes toward Gender
and Ethnicity in the Liberian
National Police (LNP)
The
government of Liberia
adopted an explicit 30% quota for
women in the LNP
We
did NOT conduct an RCT but
we were interested in: testing
some of the assumptions of the
gender program
+ Program proponents claimed that
more women would produce a variety
of benefits
More consensual decision making
Greater sensitivity to gendered
crimes
Decades
of social psychology
findings that women would not
participate fully in group
deliberations.
+ The program had been underway for several
years so officers new the attitudes toward
female officer that they were supposed to
have
 Thus
a survey would not have been a
convincing measurement strategy
 We
had groups of size officers complete
team tasks and randomized the number of
female officers in each group
 We


observed team members’ to see
if men reacted differently in groups with more women
Groups with more women deliberated more consensually
and were more likely to see crime as gendered
+ Findings
 Female
officers were not, in general, more
likely to see a gendered crime but more
competent women were
 Groups
with more women members were
not more likely to see a gendered crime
 Groups
with more women were not more
consensual
 Backlash
effect: Men in majority female
groups were significantly more aggressive.
+ Conclusion

Programming by its very nature coaches beneficiaries in
giving the types of survey responses answers the program
would like to hear

Randomization exacerbates this problem

Behavioral measures are appealing but:
 Measures with high external validity can make it hard to
disentangle mechanisms at the individual and community
level
 Fine tuning individual incentives correctly get at attitudes
even when subjects are cued to the ‘right’ answer:
monetary reward will induce people will act on actually
held beliefs rather than the ‘socially correct’ ones

Lab-in-the field activities address both of these issues and
provide an important tool for measuring the social effects of
programs, at some loss of external validty