No Slide Title
Download
Report
Transcript No Slide Title
Social Network Data
Outline
1. Foundations: Data basics & Software (review/catch up)
2. Collecting Data
1. Relations
2. Level of analysis
3. Sources
3. Data Accuracy
1. How accurate is it?
2. What can we do about it
3. Effect on measurement
4. Network Ethics
1. Data collection
1. Informed Consent
2. Deductive Disclosure
3. Building a frame (closed-answer sets)
4. Illicit Relations
5. Novel Compilations of “extant” data
2. Data Use
1. Risks of identifying R’s (and named non-R’s) positions
2. Action in response to nets (military, police, firm)
Foundations
Methods
From pictures to matrices
b
b
d
a
c
e
Undirected, binary
a
b
1
a
b 1
c
1
d
e
c
d
1
1
c
e
a
1
a
b 1
c
1
d
e
1
1
a
e
Directed, binary
1
1
d
b
1
c
1
d
e
1
1
1
Foundations
Methods
From matrices to lists
a
a
b 1
c
d
e
b
1
c
d
e
1
1
1
1
1
1
1
1
Adjacency List
ab
bac
cbde
dce
ecd
Arc List
ab
ba
bc
cb
cd
ce
dc
de
ec
ed
Foundations
Basic Measures
Basic Measures & A little graph theory
For greater detail, see:
http://www.analytictech.com/networks/graphtheory.htm
Volume
The first measure of interest is the simple volume of
relations in the system, known as density, which is the
average relational value over all dyads. Under most
circumstances, it is calculated as:
D=
X
N ( N - 1)
Foundations
Basic Measures
Basic Measures & A little graph theory
Volume
At the individual level, volume is the number of relations, sent
or received, equal to the row and column sums of the adjacency
matrix.
a
a
b 1
c
d
e
b
1
c
1
1
d
e
1
1
1
Node In-Degree Out-Degree
a
1
1
b
2
1
c
1
3
d
2
0
e
1
2
Mean:
7/5
7/5
Foundations
Data
Basic Measures & A little graph theory
Reachability
Indirect connections are what make networks systems. One
actor can reach another if there is a path in the graph
connecting them.
b
a
a
d
c
b
e
f
c
f
d
e
Foundations
Basic Matrix Operations
One of the key advantages to storing networks as matrices is that we can use all of the
tools from linear algebra on the socio-matrix.
Some of the basics matrix manipulations that we use are as follows:
1)
Definition
A matrix is any rectangular array of numbers. We refer to the matrix dimension as
the number of rows and columns
a b c d e
a 0 1 0 0 0
b 1 0 0 0 0
c 0 1 0 1 1
d 0
e 0
0 0 0 0
0 1 1 0
(5 x 5)
W B
1 0
1 0
0 1
0 1
0 1
1 0
(5x2)
Age
13
10
7
8
16
11
(5x1)
Foundations
Basic Matrix Operations
Matrix operations work on the elements of the matrix in particular ways. To do so,
the matrices must be conformable. That means the sizes allow the operation.
For addition (+), subtraction (-), or elementwise multiplication (#), both matrices
must have the same number of rows and columns. For these operations, the matrix
value is the operation applied to the corresponding cell values.
1 3
A= 4 7
2 5
2 3
B= 7 1
0 4
3 6
A+B = 11 8
2 9
3 9
Multiplication by a scalar: 3A = 12 21
6 15
A-B =
-1 0
-3 6
2 1
2 9
A#B = 28 7
0 20
Foundations
Basic Matrix Operations
The transpose (` or T) of a matrix reverses the row and column
dimensions.
Atij=Aji
So a M x N matrix becomes an N x M matrix.
a b
c d
e f
T
=
a c e
b d f
Foundations
Basic Matrix Operations
The matrix multiplication (x) of two matrices involves all elements of the
matrix, and will often result in a matrix of new dimensions. In general, to
be conformable, the inner dimension of both matrices must match. So:
A3x2 x B2x3 = C3 x 3
But
A3x3 x B2x3 is not defined
Substantively, adding ‘names’ to the dimensions will help us keep track of
what the resulting multiplications mean:
So multiplying (send x receive)x (send x receive) = (send x receive), giving
us the two-step distances (the sender’s recipient's receivers).
Foundations
Basic Matrix Operations
The multiplication of two matrices Amxn and Bnxq results in Cmxq
n
C mq = amk bkq
k =1
a b
c d
e f
g h
a b
c d
e f
g h i
j k l
(3x2)
(2x3)
=
=
ae+bg
ce+dg
ag+bj
cg+dj
eg+fg
af+bh
cf+dh
ah+bk
ch+dk
eh+fk
(3x3)
ai+bl
ci+dl
ei+fl
Foundations
Basic Matrix Operations
The powers (square, cube, etc) of a matrix are just the matrix times
itself that many times.
A2 = AA or A3 = AAA
We often use matrix multiplication to find types of people one is
tied to, since the ‘1’ in the adjacency matrix effectively captures
just the people each row is connected to.
(Preview: This is also how we do compound relations: Mother x Brother “Uncle”)
Foundations
Data
Basic Measures & A little graph theory
Reachability
The distance from one actor to another is the shortest path
between them, known as the geodesic distance. If there is
at least one path connecting every pair of actors in the
graph, the graph is connected and is called a component.
Two paths are independent if they only have the two endnodes in common. If a graph has two independent paths
between every pair, it is biconnected, and called a
bicomponent. Similarly for three paths, four, etc.
Foundations
Data
Basic Measures & A little graph theory
Calculate reachability through matrix multiplication.
(see p.162 of W&F)
0
1
0
0
0
1
e
d
c
b
a
f
1
0
1
0
0
0
X
0 0
1 0
0 1
1 0
1 1
1 0
0
0
1
1
0
0
1
0
1
0
0
0
2
0
2
0
0
0
0
2
0
1
1
2
X2
2 0
0 1
4 1
1 2
1 1
0 1
Distance
. 1 2 0
1 . 1 2
2 1 . 1
0 2 1 .
0 2 1 1
1 2 1 2
0
1
1
1
2
1
0
2
1
1
.
2
4
0
6
1
1
0
X3
0 2
6 1
2 5
5 2
5 3
6 1
0
2
0
1
1
2
0
4
0
2
2
4
2
1
5
3
2
1
4
0
6
1
1
0
1
2
1
2
2
.
Distance
. 1 2 3 3
1 . 1 2 2
2 1 . 1 1
3 2 1 . 1
3 2 1 1 .
1 2 1 2 2
1
2
1
2
2
.
Foundations
Data
Basic Measures & A little graph theory
Mixing patterns
Matrices make it easy to look at mixing patterns: connections
among types of nodes. Simply multiply an indicator of
category by the adjacency matrix.
e
d
c
b
a
f
0
1
0
0
0
1
1
0
1
0
0
0
X
0 0
1 0
0 1
1 0
1 1
1 0
0
0
1
1
0
0
1
0
1
0
0
0
Race
1 0
1 0
0 1
0 1
0 1
1 0
R G
R 4 2
Race`(X)Race=
G 2 6
X(Race)
2 0
1 1
2 2
0 2
0 2
1 1
Foundations
Data
Basic Measures & A little graph theory
Matrix manipulations allow you to look at direction of ties,
and distinguish symmetric from asymmetric ties.
To transform an asymmetric graph to a symmetric graph, add
it to its transpose.
0
1
0
0
0
1
0
1
0
0
X
0
0
0
0
1
0
0
1
0
1
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
XT
0 0
1 0
0 0
1 0
1 0
0
0
1
1
0
0
2
0
0
0
2
0
1
0
0
0
1
0
1
2
0
0
1
0
1
0
0
2
1
0
Max Sym
0 1 0 0 0
1 0 1 0 0
0 1 0 1 1
0 0 1 0 1
0 0 1 1 0
MIN Sym
0 1 0 0 0
1 0 0 0 0
0 0 0 0 1
0 0 0 0 0
0 0 1 0 0
Social Network Software
UCINET
•The Standard network analysis program, runs in Windows
•Good for computing measures of network topography for single nets
•Input-Output of data is a special 2-file format, but is now able to read
PAJEK files directly.
•Not optimal for large networks, but much better than it used to be!
•Available from:
Analytic Technologies
Social Network Software
PAJEK
•Program for analyzing and plotting very large networks
•Intuitive windows interface
•Used for most of the real data plots in this presentation
•Started mainly a graphics program, but has expanded to a wide range of
analytic capabilities
•Can link to the R & SPSS statistical package
•Free
•Available from:
Social Network Software
SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)
•is a collection of IML and Macro programs that allow one to:
a) create network data structures from nomination data
b) import/export data to/from the other network programs
c) calculate measures of network pattern and composition
d) analyze network models
•Allows one to work with multiple, large networks
•Easy to move from creating measures to analyzing data
http://www.soc.duke.edu/~jmoody77/span/span.zip
Social Network Software
STATNET
•Program designed to estimate statistical models on networks in R.
Statnet Team
http://csde.washington.edu/statnet/
Other R Resources:
Carter Butts (UC-Irvine, Sociology) – SNA & PermNet
•Program for general network analysis in R
•Does most of what we’ve discussed today…
Social Network Software
STATNET
•Program designed to estimate statistical models on networks in R.
Statnet Team
http://csde.washington.edu/statnet/
Other R Resources:
iGraph
Social Network Software
Other R Resources:
McFarland’s R Labs:
Social Network Data
Collecting: theory concepts
What information do you want to collect? This is ultimately a theory
question – about how you think the social network setting matters.
Some dimensions to this question:
•“actually existing social relations” or “perceived relations”
•“Who do you eat lunch with?” vs. “Who is your friend”
•“Who do you talk to” vs. “who is important in your life”
•Are you more interested in getting the right contacts or the right
type of contacts?
•Dynamism: “Episodic” relations or “typical”/ “long-term” ties?
•Research shows that people have a bias toward naming the
normal – so we include people who are “usually” there – is that
what you want?
•Do you need to be able to distinguish naming flux from structural
dynamics?
Social Network Data
Level of Analysis
What scope of information do you want?
•Boundary Specification: key is what constitutes the “edge” of the
network
Local
“Realist”
(Boundary from actors’
Point of view)
Nominalist
(Boundary from researchers’
point of view)
Global
Everyone connected to
ego in the relevant
manner (all friends, all
(past?) sex partners)
All relations relevant
to social action
(“adolescent peers
network” or “Ruling
Elite” )
Relations defined by a
name-generator,
typically limited in
number (“5 closest
friends”)
Relations within a
particular setting
(“friends in school” or
“votes on the supreme
court”)
Social Network Data
Level of Analysis
Boundary Specification Problem
While students were
given the option to name
friends in the other
school, they rarely do.
As such, the school likely
serves as a strong
substantive boundary
Social Network Data
Level of Analysis
Boundary Specification Problem
Granis: Boundary can be
temporal: effects on characteristics
of the PhD exchange graph.
Social Network Data
Level of Analysis
Network Sampling
1.
The level of analysis implies a perspective on sampling:
1. Local random probability sampling
2. Complete Census
These two are not as dissimilar as they may appear:
a) Local nets imply global connectivity:
a) Every ego-network is a sample from the population-level global network,
and thus should be consistent with a constrained range of global
networks. See Jeff Smith’s work on this.
b) If you have a clustered setting, many alters in a local network may
overlap, making partial connectivity information possible.
c) For attribute mixing (proportion of whites with black friends, etc.), egonetwork data is sufficient to draw population inference
Social Network Data
Level of Analysis
Network Sampling
1.
The level of analysis implies a perspective on sampling:
1. Local random probability sampling
2. Complete Census
These two are not as dissimilar as they may appear:
a) Complete networks never are:
a) Lots of efficiency in data collection is had because you don’t have to ask
ego about alter’s characteristics if you are going to interview alter
anyway. But, taking advantage of this efficiency assumes very high
coverage rates.
Social Network Data
Level of Analysis
Network Sampling
2.
Snowball and “link trace” designs
Link-Tracing Designs
Ego-networks
Complete Census
Basic idea is to use “adaptive sampling” – start with (a) seed node(s), identify
the network partners, and then interview them.
Earliest “snowball” samples are of this type. Most recent work is “respondent
driven sampling. (RDS)”
-- If done systematically, some inference elements are knowable. Else, you
have to try and disentangle the sampling process from the real structure
Social Network Data
Level of Analysis
Snowball Samples:
• Start with a name generator, then any demographic or relational
questions.
• Have a sample strategy, examples include:
• Random Walk designs (Klovdahl)
• Strong tie designs
• All names designs
• Cross Links (Project 90)
• Get contact information from the people named
• If time, as at least some of the people a “network density”
module as you would for any ego-network design
Snowball samples are very effective at providing network context
around focal nodes, RDS tools allow generalizations from these
data.
Social Network Data
Sources
Existing Sources of Social Network Data:
There are lots of network data archived. Check INSNA for a listing. The PAJEK data
page includes a number of exemplars for large-scale networks.
Local Network data:
•
Fairly common, because it is easy to collect from sample surveys.
•
GSS, NHSL, Urban Inequality Surveys, etc.
•
Pay attention to the question asked
•
Key features are (a) number of people named and (b) whether alters are
able to nominate each other or not.
Social Network Data
Sources
Existing Sources of Social Network Data:
Partial network data:
• Much less common, because cost goes up significantly once you
start tracing to contacts.
• Snowball data: start with focal nodes and trace to contacts
• CDC style data on sexual contact tracing
• Limited snowball samples:
• Colorado Springs drug users data
• Geneology data
• Small-world network samples
• Limited Boundary data: select data within a limited bound
• Cross-national trade data
• Friendships within a classroom
• Family support ties
Social Network Data
Sources
Existing Sources of Social Network Data:
Complete network data:
• Significantly less common and never perfect.
• Start by defining a theoretically relevant boundary
• Then identify all relations among nodes within that boundary
• Co-sponsorship patterns among legislators
• Friendships within strongly bounded settings (sororities,
schools)
• Examples:
• Add Health on adolescent friendships
• Hallinan data on within-school friendships
• McFarland’s data on verbal interaction
• Electronic data on citations or coauthorship (see Pajek data
page)
• See INSNA home page for many small-scale networks
Social Network Data
Sources
Existing Sources of Social Network Data:
Complete network data:
• Electronic Trace Data
• Examples:
• Sensor data (PNAS on high-school)
• Cell Phone logs
• Email logs
• Bluetooth devices
• Web traffic cookie data (which sites do you visit)
•
Often complete and non-intrusive; but meaning is still
ambiguous and there are potential ethical issues that run deep.
Social Network Data
Sources - Survey
a)
Network data collection can be time consuming. It is better (I think) to
have breadth over depth. Having detailed information on <50% of the
sample will make it very difficult to draw conclusions about the general
network structure.
b) Question format:
• If you ask people to recall names (an open list format), fatigue will
result in under-reporting
• If you ask people to check off names from a full list, you can often get
over-reporting
c) It is common to limit people to a small number if nominations (~5). This
will bias network measures, but is sometimes the best choice to avoid
fatigue.
d) People answer the question you ask, so be clear in what you ask.
Social Network Data
Sources - Survey
Local Network data:
• When using a survey, common to use an “ego-network module.”
• First part: “Name Generator” question to elicit a list of names
• Second part: Working through the list of names to get
information about each person named
• Third part: asking about relations among each person named.
GSS Name Generator:
“From time to time, most people discuss important matters with other people.
Looking back over the last six months -- who are the people with whom you
discussed matters important to you? Just tell me their first names or initials.”
Why this question?
•Only time for one question
•Normative pressure and influence likely travels through strong ties
•Similar to ‘best friend’ or other strong tie generators
•Note there are significant ambiguities with this name generator
Social Network Data
Sources - Survey
Electronic Small World name generator:
Social Network Data
Sources - Survey
Local Network data:
The second part usually asks a series of questions about each person
GSS Example:
“Is (NAME) Asian, Black, Hispanic, White or something else?”
ESWP example:
Will generate N x (number of attributes) questions to the survey
Social Network Data
Sources - Survey
Local Network data:
The third part usually asks about relations among the alters. Do this
by looping over all possible combinations. If you are asking about a
symmetric relation, then you can limit your questions to the n(n-1)/2
cells of one triangle of the adjacency matrix:
1 2 3 4 5
1
2
3
4
5
GSS: Please think about the relations between the people you just mentioned. Some of them may
be total strangers in the sense that they wouldn't recognize each other if they bumped into each
other on the street. Others may be especially close, as close or closer to each other as they are to
you. First, think about NAME 1 and NAME 2. A. Are NAME 1 and NAME 2 total strangers? B.
ARe they especially close? PROBE: As close or closer to eahc other as they are to you?
Social Network Data
Sources - Survey
Local Network data:
The third part usually asks about relations among the alters. Do this
by looping over all possible combinations. If you are asking about a
symmetric relation, then you can limit your questions to the n(n-1)/2
cells of one triangle of the adjacency matrix:
Social Network Data
Sources - Survey
Snowball Samples:
Social Network Data
Sources - Survey
Complete Network data
• Data collection is concerned with all relations within a specified
boundary.
•
Requires sampling every actor in the population of interest (all
kids in the class, all nations in the alliance system, etc.)
•
The network survey itself can be much shorter, because you are
getting information from each person (so ego does not report on
alters).
•
Two general formats:
• Recall surveys (“Name all of your best friends”)
• Check-list formats: Give people a list of names, have them
check off those with whom they have relations.
Social Network Data
Sources - Survey
Complete network surveys require
a process that lets you link answers
to respondents.
•You cannot have anonymous
surveys.
•Recall:
•Need Id numbers & a
roster to link, or handcode names to find
matches
•Checklists
•Need a roster for people
to check through
Social Network Data
Sources - Survey
Some important points that should (?) be obvious:
•You cannot have anonymous complete network surveys (you have to match
names).
•With a recall format:
•Get some way to uniquely identify nodes
• ID Numbers from a roster link, phone numbers, etc.
• If you have to hand-code, compare to a roster or w. an expert
informant
•Need to have uniform nomination opportunity:
•If A is asked to nominate B, but B is never given the opportunity…standard
measures don’t work.
•Checklists
•Need a roster for people to check through
Social Network Data
Sources - Archive
We often have information on links among people or organizations from
archival records. Examples:
•Citation or Acknowledgements in Science Networks
•Co-membership in boards of directors
•See as examples: Olimoney.net or theyrule.org both projects that use
electronic tools to “scrape” the web for data on companies or campaign
contributions.
•http://dirtyenergymoney.com/view.php
•http://www.theyrule.net/
Social Network Data
Name Generator Effects
Key Questions – How does the question affect responses?
“Cloning Headless Frogs” & “Trivial Topics & Rich Ties”
1)
What do people talk about?
2)
Given the heterogeneity of the topics discussed, is there a foundation from
which one could use the GSS data to describe anything meaningful about core
discussion networks?
Social Network Data
Accuracy & Missing Data – Cloning Headless Frogs
Key Questions:
1)
What do people talk about? & Who did they talk to?
Note that the topic was heavily dependent on the questionnaire order. In this survey, it
was the first question.
Social Network Data
Accuracy & Missing Data – Cloning Headless Frogs
Social Network Data
Accuracy & Missing Data – Cloning Headless Frogs
talks about what with who?
Connections are significant cells from table 5.
Social Network Data
Accuracy & Missing Data – Rich Ties
talks about what with who?
Brashears
Social Network Data
Accuracy & Missing Data – Rich Ties
talks about what with who?
Brashears
Social Network Data
Accuracy & Missing Data
1)
Why do so many people not report talking about anything with anybody?
•
44% report nobody to talk to
•
More likely to be without spouses, unemployed and non-white
•
56% report nothing important to talk about.
Other key bits on data collection:
2) Structure of the survey matters: lines given one of the best predictors of
number of relations observed
3) Reciprocity is not expected, particularly w. limited response formats.
- Better to symmetrize based on “or”
Social Network Data
Effects of missing data
Whatever method is used, data will always be incomplete. What are the
implications for analysis?
Example 1. Ego is a matchable person in the School
Out
Un
Ego
M
M
True Network
Out
Un
Ego
M
M
Out
Un
M
M
M
M
Observed Network
Social Network Data
Effects of missing data
Example 2. Ego is not on the school roster
M
M
Un
Un
M
M
M
M
M
M
M
Un
Un
Un
True Network
M
Observed Network
Social Network Data
Effects of missing data
Example 3:
Node population: 2-step neighborhood of Actor X
Relational population: Any connection among all nodes
1-step
2-step
3-step
F
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full (0)
Full
Full (0)
F
Full
Full
Full (0)
F
(0)
Full
Full
UK
F
(0)
Full (0)
Unknown
UK
Social Network Data
Effects of missing data
Example 4
Node population: 2-step neighborhood of Actor X
Relational population: Trace, plus All connections among 1-step contacts
1-step
2-step
3-step
F
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full (0)
Full
Full (0)
Full
Full (0)
F
Full
F
(0)
Full
Unknown
UK
F
(0)
Full (0)
Unknown
UK
Social Network Data
Effects of missing data
Example 5.
Node population: 2-step neighborhood of Actor X
Relational population: Only tracing contacts
1-step
2-step
3-step
F
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
3.1
3.2
3.3
F 1 2 3 4 5 1 2 3 4 5 6 7 8 1 2 3
Full (0)
Full
Full (0)
Full
Full (0)
F
Unknown
F
(0)
Full
Unknown
UK
F
(0)
Full (0)
Unknown
UK
Social Network Data
Effects of missing data
Example 6
Node population: 2-step neighborhood from 3 focal actors
Relational population: All relations among actors
Focal
1-Step
2-Step
3-Step
Focal
Full
Full
Full (0)
Full (0)
1-Step
Full
Full
Full
Full (0)
2-Step
Full
(0)
Full
Full
UK
3-Step
Full
(0)
Full (0)
Unknown
UK
Social Network Data
Effects of missing data
Example 7.
Node population: 1-step neighborhood from 3 focal actors
Relational population: Only relations from focal nodes
Focal
1-Step
2-Step
3-Step
Focal
Full
Full
Full (0)
Full (0)
1-Step
Full
Unknown
Unknown
Full (0)
2-Step
Full
(0)
Unknown
Unknown
UK
3-Step
Full
(0)
Full (0)
Unknown
UK
Social Network Data
Effects of missing data on measures
Smith & Moody, 2014
Identify the practical effect of missing data as a measurement error problem:
induce error and evaluate effect.
Randomly select nodes to delete, remove their edges & recalculate statistics of
interest.
Social Network Data
Effects of missing data on measures
Smith & Moody, 2014
Social Network Data
Smith & Moody, 2014
Effects of missing data on measures
Centrality
Social Network Data
Smith & Moody, 2014
Effects of missing data on measures
Homophily
Social Network Data
Effects of missing data on measures
What to do about missing data?
Easy:
• Do nothing. If associated error is small ignore it. This is the default, not
particularly satisfying.
Harder: Impute ties
• If the relation has known constraints, use those (symmetry, for example)
• If there is a clear association, you can use those to impute values.
• If imputing and can use a randomization routine, do so (akin to multiple
imputation routines)
• All ad hoc.
Hardest:
• Model missingness with ERGM/Latent-network models.
• Build a model for tie formation on observed, include structural missing &
impute. Handcock & Gile have new routines for this.
• Computationally intensive…but analytically not difficult.
Social Network Data
Ethics – Data Collection
Goal: To gain key social insights in a manner than helps without hurting.
- Responsibility to respondents
- Fair, honest, safe treatment in return for participation.
- Safety has typically relied on some combination of:
a) Informed consent (they know what they are in for)
b) Anonymity / Confidentiality
- Responsibility to other network scientists
- Our works should not jeopardize other’s ability to work
Key problems:
- Need to link respondents means anonymity cannot be complete
- Some people named may not be respondents – thus have not given consent
- Position within the network may create social hardship
Social Network Data
Ethics – Data Collection
Informed Consent
- Respondents have a right to refuse to participate if they feel the work is unethical,
burdensome, dangerous, or just plain don’t want to play.
What standing do “secondary” respondents have?
What dangers?
- Imagine link-tracing from a mistress to a spouse….
Social Network Data
Ethics – Data Collection
Anonymity & Confidentiality
Can work without names (in Ego-network) survey. But if the setting is highly clustered,
you may still be able to identify through “deductive disclosure” -
Deductive Disclosure Risks:
Social Network Data
Ethics – Data Collection
Start with: 536
White, Male, 10th Graders in Two parent Households:
Who are Jewish:
10
And Have No Siblings:
1
Start with: 484
White, Male, 7th Graders in Two parent Households:
Who Have Ever Been Held Back A Grade in School:
87
And Play Basketball:
5
And Smoke:
1
Deductive Disclosure Risks:
Social Network Data
Ethics – Data Collection
Start with: 87
Black, Female, 12th Graders in Two parent Households:
Who have Never been Held Back:
77
And Smoke Regularly:
5
And Have 2 siblings
1
And are Catholic
1
Deductive Disclosure Risks:
Social Network Data
Ethics – Data Collection
Start with: 98
Black, Female, 7th Graders in One parent Households:
Who Are Baptist:
41
And have no Siblings:
9
And Play Baskettball:
1
And have one Sibling:
13
And Smoke:
1
And have > one Sibling:
19
And are Born in April:
1
Deductive Disclosure Risks:
Social Network Data
Ethics – Data Collection
This same feature can allow for anonymous interviewing of sensitive partners:
use simple information to find implicit matches
0.5
0.5
Long Duration
Proportion of Group
Proportion of Group
Married
0
0
0.4
0.48
0.56
0.5
0.64
0.72
0.8
0.88
0.96
0.4
0.48
0.5
0.64
0.72
0.8
0.88
0.96
Same Sex (known to be false)
Proportion of Group
Proportion of Group
Short Duration
0.56
0
0
0.4
0.48
0.56
0.64
0.72
0.8
0.88
Pair Mean Match Score
0.96
Observed Distribution
0.4
0.48
0.56
0.64
0.72
0.8
0.88
Pair Mean Match Score
Random Distribution
0.96
Social Network Data
Ethics – Data Collection
Other Data Collection issues:
-- Building a roster? You may need people’s permission to be on the roster.
-- Binding limits of past research promises (if the interviewer knows, does that violate
a “we will not tell anyone your answer” clause?
-- Illicit or Illegal relations. If you find evidence of a crime in your snowball, do you
have to report? (think age of sex partners vs selling/buying drugs)
-- “non-invasive” data collection – bluetooth device readers, email logs, web-click or
purchase / marketing data?
Social Network Data
Ethics – Data use
How the data are ultimately used
is a key issue for the analyst.
Consider this diagram.
What role does the social
network analyst have in
deconstructing terrorist
networks? Criminal Cartels?
Gangs?
Covert connectivity is a hallmark
of many illicit relations, how do
we fit into that work?
Social Network Data
Ethics – Data use
What about performace in a
firm? Burt shows that one’s
position in the network is key to
having “good ideas” – if true,
you could imagine linking
position in the network to
evaluations and promotions…