H. Hubert Chan, Mugizi Robert Rwebangira, “A Random

Download Report

Transcript H. Hubert Chan, Mugizi Robert Rwebangira, “A Random

A Random-Surfer Web-Graph
Model
Mugizi Rwebangira
(Joint work with Avrim Blum & Hubert Chan)
1
The Web as a Graph
Consider the World Wide Web as a graph, with web pages as
nodes and hyperlinks between pages as edges.
links.html
index.html
resume.html
http://cnn.com
2
Studying the Web
Since the Web emerged there has been a lot of interest in:
1. Empirically studying properties of the Web Graph.
2. Modeling the Web Graph mathematically.
Benefits of Generative Models:
1. Simulation – When real data is scarce
2. Extrapolation – How will the graph change?
3. Understanding – Inspire further research on real data
3
Power Law
f(x) ~ g(x) if Limx→∞ f(x)/g(x) = 1
e.g (x+1) ~ (x+2)
The distribution of a random variable X follows a power law if
Prob [X=k] ~ Ck-α
Example: Prob [X=k] = k-2
4
Power Law: Prob [X=k] = k-2
5
Power Law
Prob [X=k] ~ Ck-α
log Prob [X=k] ~ log C –α log k
Prob [X=k] = k-2
log Prob [X=k] = -2 log k
6
Power Law: Log-Log plot
7
Power Law contd.
More general definition:
Prob [X≥k] ~ Ck-α
Particularly useful if X takes on real values.
Sometimes referred to as “heavy tailed” or “scale free.”
8
Power Laws in Degree distribution
Let G be a graph.
Let Xk be the proportion of nodes with degree k in G.
Then if Xk ~ Ck-α
we say that G has power law degree distribution.
9
Properties of the Web Graph
A Power-law degree distribution has been observed
in a wide variety of graphs including citation
networks, social networks, protein-protein
interaction networks and so on.
It has also been observed in the Web Graph.
[Barabási & Albert]
10
Outline
•
•
•
•
•
•
Background/Previous Work
Motivation
Models
Theoretical results
Experimental results
Conclusions
11
Classic Random Graph Models
•
In the G(n,p) random graph model:
1. There are n nodes.
2. There is an edge between any two nodes
with probability p.
•Was proposed by Erdös and Renyi in 1960s.
12
Online G(n,p)
In this model each new node makes k
connections to existing nodes uniformly
at random.
For this talk we will focus on k = 1,
hence the graph will be a tree.
13
Online G(n,p)
T=1
T=2
T=3
½
T=4
½
⅓
⅓
⅓
14
Properties of Online G(n,p)
• E[degree of first node] = 1+ 1/2 +1/3+1/4 +
…1/n = (log n)
• E[max degree] = (log n)
• Xk = Proportion of nodes with degree k
E[Xk] = (½k)
NOT POWER LAWED!!
15
Online G(n,p)
(n=100,000, average of 100 runs)
16
Preferential Attachment
In the Preferential Attachment model, each new
node connects to the existing nodes with a
probability proportional to their degree.
[Barabási & Albert]
17
Preferential Attachment
Degree = in-degree + out-degree
T=1
T=2
T=3
Deg = 1
Deg = 3
¾
¼
T=4
Deg = 4
Deg = 1
2
1
Deg = 1
3
1
6
6
18
Preferential Attachment
E[degree of 1st node] = √n
Preferential Attachment gives a power-law
degree distribution. [Mitzenmacher, Cooper
& Frieze 03, KRRSTU00]
19
Preferential Attachment
20
Other Models
Kumar et. al. proposed the “copying model.”
[KRRSTU00]
Leskovec et. al. propose a “forest fire” model
which has some similarites to this work.
[LKF05]
21
Outline
•
•
•
•
•
•
Background/Previous Work
Motivation
Models
Theoretical results
Experimental results
Conclusions
22
Motivating Questions
Why would a new node connect to nodes of high degree?
-Are high degree nodes more attractive?
-Or are there other explanations?
How does a new node find out what the high degree nodes are?
23
Motivating Questions
Motivating Observation:
•Suppose each page has a small probability p of being interesting.
•Suppose a user does a (undirected) random walk until they
find an interesting page.
•If p is small then this is the same as preferential attachment.
•What about other processes and directed graphs?
24
Outline
•
•
•
•
•
•
Background/Previous Work
Motivation
Models
Theoretical results
Experimental results
Conclusions
25
Directed 1-step Random Surfer, p=.5
T=1
T=2
Start with a single node
with a self-loop.
1. Choose a node uniformly at random
2. With probability p connect
3. With probability (1-p) connect to its neighbor
T=3
(½) (½)+ (½) (½)+ (½) (½)
¾
¼
26
Directed 1-step Random Surfer
It turns out this model is a mixture of connecting to nodes
uniformly at random and preferential attachment.
Has a power-law degree distribution.
But taking one step is not very natural.
What about doing a real random walk?
27
Directed Coin Flipping model
1. 2.Pick
aanode
uniformly
at random.
IfFlip
HEADS
coinconnect
of bias pto current
node,
else walk to neighbor
D
C
NEW NODE
B
A
RANDOM STARTING NODE
1. COIN TOSS: TAIL (at node A)
2. COIN TOSS: TAIL (at node B)
3. COIN TOSS: HEAD (at node C)
28
Directed Coin Flipping model
1. At time 1, we start with a single node with a self-loop.
2. At time t, we choose a node u uniformly at random.
3. We then flip a coin of bias p.
4. If the coin comes up heads, we connect to the current node.
5. Else we walk to a random neighbor and go to step 3.
“each page has equal probability p of being interesting to us”
29
Outline
•
•
•
•
•
•
Background/Previous Work
Motivation
Models
Theoretical results
Experimental results
Conclusions
30
Is Directed Coin-Flipping Powerlawed?
We don’t know … but we do have some partial results ...
31
Virtual Degree
Definitions:
Let li(u) be the number of level i descendents of node u.
l1(u) = # of children
l2(u) = # of grandchildren, e.t.c.
Let  = (β1, β2,..) be a sequence of real numbers with 1=1.
Then v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + …
We’ll call v(u) the “Virtual degree of u with respect to .”
32
Virtual Degree
u
v(u) = 1 + β1 (2)
# of children
+ β2 (4) + β3 (0)
+ β4 (0) + ...
# of grandchildren
33
Virtual Degree
Easy observation: If we set βi = (1-p)i then the expected
increase in deg(u) is proportional to v(u).
Expected increase in deg(u) = p/t + (1-p)pl1(u)/t + (1-p)2pl2(u)/t + …
= (p/t)v(u)
u
34
Virtual Degree
Theorem: There always exist βi such that
1. For i ≥ 1, |βi| · 1.
2. As i → ∞, βi →0 exponentially.
3. The expected increase in v(u) is proportional to v(u).
Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1
E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,...
for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …
35
Virtual Degree, continued
Let vt(u) be the virtual degree of node u at time t and
tu be the time when node u first appears.
Theorem: For any node u and time t ≥ tu,
E[vt(u)] = Θ((t/tu)p)
So, the expected virtual degrees follow a power law.
36
Actual Degree
We can also obtain lower bounds on the expected values
of the actual degrees:
Theorem: For any node u and time t ≥ tu,
E[degree(u)] ≥ Ω((t/tu)p(1-p))
37
Outline
•
•
•
•
•
•
Background/Previous Work
Motivation
Models
Theoretical results
Experimental results
Conclusions
38
Experiments
• Random graphs of n=100,000 nodes
• Compute statistics averaged over 100
runs.
• K=1 (Every node has out-degree 1)
39
Online Erdös-Renyi
40
Directed 1-Step Random Surfer, p=3/4
41
Directed 1-Step Random Surfer, p=1/2
42
Directed 1-Step Random Surfer, p=1/4
43
Directed Coin Flipping, p=1/2
44
Directed Coin Flipping, p=1/4
45
Undirected coin flipping, p=1/2
46
Undirected Coin Flipping p=0.05
47
Outline
•
•
•
•
•
•
Background/Previous Work
Motivation
Models
Theoretical results
Experimental results
Conclusions
48
Conclusions
Directed random walk models appear to
generate power-laws (and partial
theoretical results).
Power laws can naturally emerge, even if all nodes
have the same intrinsic “attractiveness”.
49
Open questions
•Can we prove that the degrees in the directed coinflipping model do indeed follow a power law?
•Analyze degree distribution for the undirected coin-flipping
model with p=1/2?
•Suppose page i has “interestingness” pi. Can we analyze
the degree as a function of t, i and pi?
50
Questions?
51