lecture17 - personal homepage server for the University of
Download
Report
Transcript lecture17 - personal homepage server for the University of
School of Information
University of Michigan
SI 614
Community structure in networks
Lecture 17
Outline
One mode networks and cohesive subgroups
measures of cohesion
types of subgroups
Affiliation networks
team assembly
Why care about group cohesion?
opinion formation and uniformity
if each node adopts the opinion of the majority of its
neighbors, it is possible to have different opinions in
different cohesive subgroups
within a cohesive subgroup – greater uniformity
Other reasons to care
Discover communities of practice (more on this next
time)
Measure isolation of groups
Threshold processes:
I will adopt an innovation if some number of my contacts do
I will vote for a measure if a fraction of my contacts do
What properties indicate cohesion?
mutuality of ties
everybody in the group knows everybody else
closeness or reachability of subgroup members
individuals are separated by at most n hops
frequency of ties among members
everybody in the group has links to at least k others in the group
relative frequency of ties among subgroup members
compared to nonmembers
Cliques
Every member of the group has links to every other
member
Cliques can overlap
overlapping cliques of size 3
clique of size 4
Considerations in using cliques as subgroups
Not robust
one missing link can disqualify a clique
Not interesting
everybody is connected to everybody else
no core-periphery structure
no centrality measures apply
How cliques overlap can be more interesting than that
they exist
Pajek
remember from class on motifs:
construct a network that is a clique of the desired size
Nets>Fragment (1 in 2)>Find
a less stingy definition of cohesive subgroups: k cores
Each node within a group is connected to k other nodes
in the group
4 core
3 core
Pajek: Net>Partitions>Core>Input,Output,All
Assigns each vertex to the largest k-core it belongs to
subgroups based on reachability and diameter
n – cliques
maximal distance between any two nodes in subgroup is n
2-cliques
theoretical justification
information flow through intermediaries
frequency of in group ties
Compare # of in-group ties
within-group ties
ties from group to nodes external to the group
Given number of edges incident on nodes in the group, what is the probability
that the observed fraction of them fall within the group?
The smaller the probability – the stronger the cohesion
considerations with n-cliques
problem
diameter may be greater than n
n-clique may be disconnected (paths go through nodes not in
subgroup)
2 – clique
diameter = 3
path outside the 2-clique
fix
n-club: maximal subgraph of diameter 2
cohesion in directed and weighted networks
something we’ve already learned how to do:
find strongly connected components
keep only a subset of ties before finding connected
components
reciprocal ties
edge weight above a threshold
1 Digbys Blog
2 JamesWalcott
3 Pandagon
4 blog.johnkerry.com
5 Oliver Willis
6 America Blog
7 Crooked Timber
8 DailyKos
9 American Prospect
10Eschaton
11Wonkette
12TalkLeft
13Political Wire
14Talking Points Memo
15Matthew Yglesias
16Washington Monthly
17MyDD
18JuanCole
19Left Coaster
20Bradford DeLong
(A)
1
21
2
3
4
6
7
9
10
8
24
25
26
15
18
16
14
33
35
34
37
20
29
30
32
31
19
(C)
28
12
17
27
11
13
(B)
23
22
5
38
40
39
36
21 JawaReport
22Voka Pundit
23Roger LSimon
24Tim Blair
25Andrew Sullivan
26 Instapundit
27Blogsfor Bush
28 LittleGreenFootballs
29Belmont Club
30Captain’s Quarters
31Powerline
32 HughHewitt
33 INDCJournal
34RealClearPolitics
35Winds ofChange
36Allahpundit
37MichelleMalkin
38WizBang
39Dean’s World
40Volokh
Example: political
blogs
(Aug 29th – Nov 15th, 2004)
A) all citations between A-list
blogs in 2 months
preceding the 2004
election
B) citations between A-list
blogs with at least 5
citations in both directions
C) edges further limited to
those exceeding 25
combined citations
only 15% of the
citations bridge
communities
Affiliation networks
otherwise known as
membership network
e.g. board of directors
hypernetwork or hypergraph
bipartite graphs
interlocks
m-slices
transform to a one-mode network
weights of edges correspond to number of affiliations in
common
m-slice: maximal subnetwork containing the lines with a
multiplicity equal to or greater than m
A=
1
1
1
1
0
1
1
1
1
0
1
1
2
2
0
1
1
2
4
1
0
0
0
1
1
1-slice
1
1
1
1
2
2 slice
Pajek:
Net>Transform>2Mode to 1-Mode>
Include Loops,
Multiple Lines
Info>Network>Line
Values (to view)
Net>Partitions>Valued
Core>First threshold
and step
Scottish firms interlocking directorates
legend:
2-railways
4-electricity
5-domestic products
6-banks
7-insurance companies
8-investment banks
methods used directly on bipartite graphs rare
Finding bicliques of users accessing documents
An algorithm by Nina Mishra, HP Labs
Documents
Users
Team Assembly Mechanisms
Determine Collaboration Network Structure and Team Performance
Roger Guimera, Brian Uzzi, Jarrett SpiroLuıs A. Nunes Amaral
Science, 2005
astronomy and
astrophysics
social
psychology
economics
Issues in assembling teams
Why assemble a team?
different ideas
different skills
different resources
What spurs innovation?
applying proven innovations from one domain to another
Is diversity (working with new people) always good?
spurs creativity + fresh thinking
but
conflict
miscommunication
lack of sense of security of working with close collaborators
Parameters in team assembly
1. m, # of team members
2. p, probability of selecting individuals who already
belong to the network
3. q, propensity of incumbents to select past collaborators
Two phases
giant component of interconnected collaborators
isolated clusters
creation of a new team
incumbents (people who have already collaborated with
someone)
newcomers (people available to participate in new
teams)
pick incumbent with probability p
if incumbent, pick past collaborator with probability q
Time evolution of a collaboration network
newcomer-newcomer collaborations
newcomer-incumbent collaborations
new incumbent-incumbent collaborations
repeat collaborations
after a time t of inactivity, individuals are removed from the network
BMI data
Broadway musical industry
2258 productions
from 1877 to 1990
musical shows performed at least
once on Broadway
team: composers, writers,
choreographers, directors, producers
but not actors
Team size increases from 1877-1929
the musical as an art form is still
evolving
After 1929 team composition
stabilizes to include 7 people:
choreographer, composer, director,
librettist, lyricist, producer
Collaboration networks
4 fields (with the top journals in each field)
social psychology (7)
economics (9)
ecology (10)
astronomy (4)
impact factor of each journal
ratio between citations and recent citable items published
A= total cites in 1992
B= 1992 cites to articles published in 1990-91 (this is a subset of A)
C= number of articles published in 1990-91
D= B/C = 1992 impact factor
size of teams grows over time
degree distributions
data
data generated
from a model with
the same p and q
and sequence of
team sizes formed
Predictions for the size of the giant component
higher p means already published individuals are co-
authoring – linking the network together and increasing
the giant component
S = fraction of network occupied by the giant component
Predictions for the size of the giant component
(cont’d)
increasing q can slow the growth of the giant component
– co-authoring with previous collaborators does not
create new edges
network statistics
Field
teams
individuals
p
q
fR
S (size of giant
component)
BMI
2258
4113
0.52
0.77
0.16
0.70
social
psychology
16,526
23,029
0.56
0.78
0.22
0.67
economics
14,870
23,236
0.57
0.73
0.22
0.54
ecology
26,888
38,609
0.59
0.76
0.23
0.75
astronomy
30,552
30,192
0.76
0.82
0.39
0.98
what stands out?
what is similar across the networks?
different network
topologies
ecology
economics
astronomy
main findings
all networks except astronomy close to the “tipping” point
where giant component emerges
sparse and stringy networks
giant component takes up more than 50% of nodes in
each network
impact factor (how good the journal is where the work
was published)
p positively correlated
going with experienced members is good
q negatively correlated
new combinations more fruitful
ecology, economics,
social psychology
S for individual journals positively correlated
more isolated clusters in lower-impact journals
ecology
social psychology