Defining and measuring fairness of districting plans

Download Report

Transcript Defining and measuring fairness of districting plans

Some things to talk about
• Social and political polarization
• A cool dynamic network simulation (which we haven’t done
yet)
• Statistical cutoffs and p-values (work of Wald, Berger, …)
• Survey weighting and poststratification
Studying social and political
polarization
Andrew Gelman
Departments of Statistics and Political Science, Columbia University
7 Feb 2009
Also: Tian Zheng, Thomas DiPrete, Julien Teitler, Jiehua Chen,Tyler McCormick,
Rozlyn Redd, Juli Simon Thomas, Delia Baldassarri, David Park,Yu-Sung Su,
Matt Salganik, Duncan Watts, Sharad Goel
Studying social and political
polarization
• Questions from sociology
• Questions from political science
• Sources of data
• Statistical challenges
Questions from sociology
• The “degree distribution”
• Characteristics of “the social network”
• Homophily
• Quantifying segregation
• Knowing and trusting
Questions from political science
• Polarization of Democrats and Republicans
• Polarization of political discourse
• How are people swayed by news media, talk radio, each
other, …
• Geographic polarization
• Polarization and the perception of polarization
Sources of data
• Complete data on small social networks (schools, monks, …)
• Very sparse data on large social networks (Framingham, …)
• Complete data on other networks (scientific coauthors, …)
• Other network datasets (email, Facebook, …)
• From random sample surveys
• Questions about close contacts (GSS 1985/2004, NES 2000)
• Questions about acquaintances (“How many X’s do you know?”)
Statistical challenges:
Misconceptions of others
• Examples
• Name
• Disease status
• Sexual preference
• Political leanings
• Challenge/opportunity: attributed and perceived attributes
• Appearance vs. reality
• How large is the “footprint” of a group?
Statistical challenges: Learning
about small and large groups
• 1500 respondents x 750 acquaintances = 1 million
• Potential to learn about small groups
• Potential to learn about people you can’t interview
• Difficulty with large groups
• For example, “How many Democrats do you know”
• #known is too high to quickly estimate
• Potential solution: look at subnetworks
• “Cube model” (individuals x groups x subnetworks)
• Need main effects and two-way interactions
Statistical challenges: Network
structure
• Social network is patterned
• Sex, age, ethnicity, SES, location
• Names, occupations, attitudes
• Correct for non-uniform patterns by using a mix of names
• Estimate non-uniform patterns using a conditional
probability matrix for ages
• Overdispersion to model unexplained variation
• Can’t do much with triangles, 4-cycles, etc.
Statistical challenges: Recall bias
• Some people are easier to recall than others
• David, Olga, Sharad
• For some sets of names, can be quantified:
Nicole/Christine/Michael
• Sliding definitions
• Who are your friends?
• Estimates of average #known range from 300 to 750 to …
• Estimates of average #trusted range from 1.5 to 15 to 150
Statistical challenges: Returning to
the social science questions
• Polarization as political segregation in the social network
• Comparing polarization to perceived polarization
• Answering conjectures such as: People in big cities know
more people but trust fewer people
• Getting geography back in the picture
Forming Voting Blocs and Coalitions as a
Prisoner's Dilemma: A Possible Theoretical
Explanation for Political Instability
Andrew Gelman
Departments of Statistics and Political Science, Columbia University
7 Feb 2009
Dynamic network model for
political coalitions
 Mathematics of coalitions
 Forming a coalition helps the subgroup (or they wouldn’t do it)
 But it hurts the general population (negative externality)
 Coalitions are inherently unstable
 Coalitions of coalitions
 Opportunistic acts of secession, poaching, and dissolution
 The simulation I want to do:
 Set up a political settings: “agents” with attributes and locations
 Payoff function for agents
 Locally optimal moves
 Scheduling
 Implementation
Statistical cutoffs and p-values
Andrew Gelman
Departments of Statistics and Political Science, Columbia University
7 Feb 2009
Setting a cutoff for selecting
patterns for further study
 Old problem in statistics: Neyman, Wald, Berger, …
 Also of interest to biologists!
 Some different goals:
 Finding patterns that are “statistically significant”
 Classifying into those to study further, and those to set aside
 Mathematical framework: distribution of a “score”
 Solution depends upon:
 Distribution of the score among “uninteresting” cases
 Distribution of the score among “interesting” cases
 Number of uninteresting and interesting cases
 Cost of follow-up of uninteresting cases
 Cost of follow-up of interesting cases
Survey weighting and poststratification
Andrew Gelman
Departments of Statistics and Political Science, Columbia University
7 Feb 2009
Survey weighting and poststrafication
 General framework for adjusting for differences between
sample and population
 Population estimate = avg over poststratification cells
 You might have to model:
 The survey response
 Size of poststratification cells
 Probabilities of selection
 Respondent-driven sampling example:
 Cells determined by “gregariousness” and “distance”
 Could approx correlations using clustering