CS728 Lecture 5 Stochastic Models of the Web

Download Report

Transcript CS728 Lecture 5 Stochastic Models of the Web

CS728
Lecture 5
Generative Graph Models and
the Web
Importance of Generative Models
Gives insight into the graph formation process:
– Anomaly detection – abnormal behavior,
evolution
– Predictions – predicting future from the past
– Simulations and evaluation of new algorithms
– Graph sampling – many real world graphs like
the web are too large and complex to deal with
– Goal: generating graphs with small world
property, clustering, power-laws, other naturally
occurring structures
Graph Models: Waxman Models
• Used for models of clustering in Internet-like topologies and
networks with long and short edges
• The vertices are distributed at random in a plane.
• An edge is added between each pair of vertices with
probability p.
p(u,v) =  * exp( -d / (*L) ), 0  ,   1.
• L is the maximum distance between any two nodes.
• Increase in alpha increases the number of edges in the graph.
• Increase in beta increases the number of long edges relative
to short edges.
• d is the Euclidean distance from u to v in Waxman-1.
• d is a random number between [0, L] in Waxman-2.
Graph Models: Configuration Model
• Random Graph from given degree
sequence
• Problem: Given a degree sequence,
d1,d2, d3, …., dn generate a random graph
with that degree sequence
• Solution:
Place di stubs onto vertex I
Choose pairs of stubs at random
• Problem: we may construct graphs with loops and
multiedges
• To prevent this there must be enough “absorbing”
residual degree capacity.
• Algorithm:
• Maintain list of nodes sorted by residual degrees d(v)
• Repeat until all nodes have been chosen:
– pick arbitrary vertex v
– add edges from v to d(v) vertices of highest residual
degree
– update residual degrees
To randomize further, we can start with a realization and
repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v)
Works OK, But is there a more ‘natural’ generative model?
Generative Graph models:
Preferential attachment
• Price’s Model [65] : Physics citations –
“cummulative advantage”
• Herb Simon [50’s]: Nobel and Turing Awards,
political scientist “rich get richer” (Pareto)
• Matthew effect / Matilda effect: sociology
• Barabasi and Albert 99: Preferential attachment:
– Add a new node, create d out-links
– Probability of linking a node is proportional to
its current degree
• Simple explanation of power-law degree
distributions
Issues with preferential attachment and
Power-laws
• Barabasi model fixed constant m for out-degree
• Price’s model directed with m mean out-degree
• Probability of adding a new edge is proportional to its
(in) degree k
– problem at the start degree 0
– Price’s model: prop to deg + 1
– Analysis: prob a node has degree k
• pk ~ k-3 (Barabasi model)
• pk ~ k-(2+1/m) power-law with exponent 2-3 (Price)
• Exercise: give pseudocode that generates such a
graph in linear time
Variations on the PA Theme
•
•
•
•
•
Clustering, Small-World and Ageing
Copying Model
Alpha and beta Models
Temporal Evolution
Densification
Graph models: Copying model
• Copying model
• [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]:
– Add a node and choose the number of
edges to add
– Choose a random vertex and “copy” its
links (neighbors)
• Also generates power-law degree
distributions
• Generates communities - clustering
Graph Models: The Alpha Model
Watts (1999)
 model: Add edges to nodes, as
in random graphs, but makes
links more likely when two
nodes have a common friend.
For a range of  values:
Probability of linkage as a function
of number of mutual friends
( is 0 in upper left,
1 in diagonal,
and ∞ in bottom right curves.)
– The world is small (average
path length is short), and
– Groups tend to form (high
clustering coefficient).
Graph Models: The Beta Model
Watts and Strogatz (1998)
“Link Rewiring”
=0
 = 0.125
=1
People know
their neighbors.
People know
their neighbors,
and a few distant people.
People know
others at
random.
Clustered, but
not a “small world”
Clustered and
“small world”
Not clustered,
but “small world”
Graph Models: The Beta Model
First five random links reduce the
average path length of the
network by half, regardless of N!
Both  and  models reproduce
short-path results of random
graphs, but also allow for
clustering.
Small-world phenomena occur at
threshold between order and
chaos.
Clustering coefficient /
Normalized path length
Watts and Strogatz (1998)
Clustering coefficient (C) and average
path length (L) plotted against 
Other Related Work
• Hybrid models: Beta + Waxman on grid
• Huberman and Adamic, 1999: Growth dynamics of the
world wide web
– Argue against Barabasi model for its age dependence
• Kumar, Raghavan, Rajagopalan, Sivakumar and
Tomkins, 1999: Stochastic models for the web graph
• Watts, Dodds, Newman, 2002: Identity and search in
social networks
• Medina, Lakhina, Matta, and Byers, 2001: BRITE: An
Approach to Universal Topology Generation
• …
Statistics
• Statistics of common networks:
N - nodes
K-
D-
C-
degree distance clique
fraction
Actors
225,226 61
Powergrid
4,941
C.elegans 282
3.65
0.79
2.67 18.7
0.08
14
0.28
2.65
Large k =
large c?
Small c =
large d?
Modeling Ageing and Temporal
Evolution
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is guess for
E(t+1) =? 2 * E(t)
• A: over-doubled?
Temporal Evolution of Graphs
• Densification Power Law
– networks appear denser over time
– the number of edges grows faster than the
number of nodes – average degree is
increasing
or
equivalently
a … densification exponent
Graph Densification
• Densification Power Law
• Densification exponent: 1 ≤ a ≤ 2:
– a=1: linear growth – constant outdegree (assumed in the literature so
far)
– a=2: quadratic growth – clique
Densification – ArXiv citation
graph in Physics
• Citations among
physics papers
E(t)
• 1992:
– 1,293 papers,
2,717 citations
• 2003:
– 29,555 papers,
352,807 citations
• For each month M,
create a graph of all
citations up to month
M
1.69
N(t)
Densification – Patent Citations
• Citations among
patents granted
E(t)
• 1975
– 334,000 nodes
– 676,000 edges
• 1999
– 2.9 million nodes
– 16.5 million edges
• Each year is a
datapoint
1.66
N(t)
Densification – Internet
Autonomous Systems
• Graph of Internet
• 1997
– 3,000 nodes
– 10,000 edges
• 2000
– 6,000 nodes
– 26,000 edges
• One graph per day
E(t)
1.18
N(t)
Evolution of the Diameter
• Prior work on Power Law graphs hints
at Slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time
– As the network grows the distances
between nodes slowly decrease
Diameter – ArXiv citation graph
• Citations
among physics
papers
• 1992 –2003
• One graph per
year
diameter
time [years]
Diameter – “Patents”
diameter
• Patent citation
network
• 25 years of data
time [years]
Diameter – Autonomous
Systems
diameter
• Graph of
Internet
• One graph per
day
• 1997 – 2000
number of nodes
Next Time: Densification –
Possible Explanations
• Generative models to capture the
Densification Power Law and Shrinking
diameters
• 2 proposed models:
– Community Guided Attachment – obeys
Densification
– Forest Fire model – obeys Densification,
Shrinking diameter (and Power Law degree
distribution)