A. Blum, J. Lafferty, M.R. Rwebangira, R. Reddy “Semi

Download Report

Transcript A. Blum, J. Lafferty, M.R. Rwebangira, R. Reddy “Semi

A Random-Surfer Web-Graph
Model
Avrim Blum, Hubert Chan, Mugizi Rwebangira
Carnegie Mellon University
1
The Web as a Graph
Consider the World Wide Web as a graph, with web
pages as nodes and links between pages as edges.
links.html
index.html
resume.html
http://cnn.com
Experiments suggest that the degree distribution of
the Web-Graph follows a power law [FFF99].
2
Power Law
The distribution of a quantity X follows a power law if
Pr (X=k) = Ck-α
Taking the logarithm of both sides:
log Pr (X=k) = log C –α log k
Thus if we take a log-log plot of a power law
distribution we will obtain a straight line.
3
Previous Work
Barabási and Albert proposed the Preferential
Attachment model[BA99]:
Each new node connects to the existing nodes
with a probability proportional to their degree.
It is known that Preferential Attachment gives a
power-law distribution. [Mitzenmacher,
Cooper & Frieze 03, KRRSTU00]
Other models proposed include the “copying
model.” [KRRSTU00]
4
Motivating Questions
Why would a new node connect to nodes of high degree?
-Are high degree nodes more attractive?
-Or are there other explanations?
How does a new node find out what the high degree nodes are?
Motivating Observation:
•Suppose each page has a small probability p of being interesting.
•Suppose a user does a (undirected) random walk until they find an
interesting page.
•If p is small then this is the same as preferential attachment.
•What about other processes and directed graphs?
5
Directed 1-Step Random Surfer
At time 1, we start with a single node with a self-loop.
At time t, a node is chosen uniformly at random, with probability
p the new node connects to this node, or with probability 1-p
it connects to a random out-neighbor of that node.
(Extension: Repeat process k times for each new node to get out-degree k)
Note: This model is just another way of stating the directed
preferential attachment model.
6
Directed 1-step Random Surfer, p=.5
T=1
T=2
(½) (½)+ (½) (½)+ (½) (½)
T=3
¾
T=4
¼
(½) (⅓)+ (½) (⅓)+ (½) (⅓)+(½) (⅓)
2
1
3
1
6
6
7
Directed Coin Flipping model
1. At time 1, we start with a single node with a self-loop.
2. At time t, we choose a node uniformly at random.
3. We then flip a coin of bias p.
4. If the coin comes up heads, we connect to the current node.
5. Else we walk to a random neighbor and go to step 3.
“each page has equal probability p of being interesting to us”
8
NEW NODE
RANDOM STARTING NODE
1. COIN TOSS: TAIL
2. COIN TOSS: TAIL
3. COIN TOSS: HEAD
9
Is Directed Coin-Flipping Powerlawed?
We don’t know … but we do have some partial results ...
Note: unlike for undirected graphs, the case p → 0 is not so
interesting since then you just get a star.
10
Virtual Degree
Definition:
Let li(u) be the number of level i descendents of u.
Let i (i ≥ 1) is a sequence of real number with 1=1.
Then v(u) = 1 + ∑ βi li(u) (i ≥ 1)
11
Virtual Degree
u
v(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + β4 l4(u) + ...
= v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ...
Easy observation: If we set βi = (1-p)i then the expected
increase in degree(u) is proportional to v(u).
12
Virtual Degree
Theorem: There always exist βi such that
1. For i ≥ 1, |βi| · 1.
2. As i → ∞, βi →0 exponentially.
3. The expected increase in v(u) is proportional to v(u).
Recurrence: 1=1, 2=p, i+1=i – (1-p)i-1
E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,...
for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …
Let vt(u) be the virtual degree of node u at time t and
tu be the time when node u first appears.
Theorem: For any node u and time t ≥ tu,
E[vt(u)] = Θ((t/tu)p)
13
Virtual Degree, contd
Let vt(u) be the virtual degree of node u at time t and
tu be the time when node u first appears.
Theorem: For any node u and time t ≥ tu,
E[vt(u)] = Θ((t/tu)p)
We also have some weak concentration bounds. Unfortunately
not strong enough: if these could be strengthened then would
have a proof that virtual degrees (not just their expectations)
follow power law.
14
Actual Degree
We can also obtain lower bounds on the actual degrees:
Theorem: For any node u and time t ≥ tu,
E[l1(u)] ≥ Ω((t/tu)p(1-p))
15
Experiments
• Random graphs of n=100,000 nodes
• Compute statistics averaged over 100
runs.
• K=1 (Every node has out-degree 1)
16
Uniform random connections
17
Directed 1-Step Random Surfer, p=3/4
18
Directed 1-Step Random Surfer, p=1/2
19
Directed 1-Step Random Surfer, p=1/4
20
Directed Coin Flipping, p=1/2
21
Directed Coin Flipping, p=1/4
22
Undirected coin flipping, p=1/2
23
Undirected Coin Flipping p=0.05
24
Conclusions
• Directed random walk models appear to
generate power-laws (and partial
theoretical results).
• Power laws can naturally emerge, even
if all nodes have the same intrinsic
“attractiveness”. (Even in absence of
“role model” as in copying-model)
25
Open questions
•Can we prove that the degrees in the directed coinflipping model indeed follow a power law?
•Analyze degree distribution for undirected coinflipping model with p=1/2?
•Suppose page i has “interestingness” pi. Can we
analyze the degree as a function of t, i and pi?
26
Questions?
27