Transcript Pagerank

Link Analysis
HITS - Kleinberg’s Algorithm
• HITS – Hypertext Induced Topic Selection
• For each vertex v Є V in a subgraph of interest:
a(v) - the authority of v
h(v) - the hubness of v
• A site is very authoritative if it receives many
citations. Citation from important sites weight
more than citations from less-important sites
• Hubness shows the importance of a site. A good
hub is a site that links to many authoritative sites
2
Authority and Hubness
5
2
3
1
4
1
6
7
a(1) = h(2) + h(3) + h(4)
h(1) = a(5) + a(6) + a(7)
3
Authority and Hubness Convergence
• Recursive dependency:
a(v)  Σ
w Є pa[v]
h(w)
h(v)  Σ w Є ch[v] a(w)
• Using Linear Algebra, we can prove:
a(v) and h(v) converge
4
Authority and Hub Pages
Matrix representation of operations I and O.
Let A be the adjacency matrix of SG: entry (p, q) is 1 if
p has a link to q, else the entry is 0.
Let AT be the transpose of A.
Let hi be vector of hub scores after i iterations.
Let ai be the vector of authority scores after i
iterations.
i
Operation I: ai = AT hi-1
T
T
ai  A Aai 1 ai  A A a0
Operation O: hi = A ai
T i
T
hi  AA hi 1 hi  AA  h0
Normalize after every multiplication
5
Does the procedure converge?
x1  Mx0 ( M  AAT )
x x
2
x2  Mx1  M 2 x0
xk  M x0
k
xk
diag( 1 ,2 ,... n ), 1  2 ... n )
M


E

E 1
[ eˆ1eˆ2 ... eˆn ]
M 2  EE 1 EE 1  E2 E 1
 
M  E E  1 E  k
 1
x0  c1eˆ1  c2 eˆ2  ...  cn eˆn
k
k
M k x0  eˆ1
1
k
k
 1
 E


The rate of convergence depends on the “eigen gap” 1  2
6
HITS Example
Find a base subgraph:
• Start with a root set R {1, 2, 3, 4}
• {1, 2, 3, 4} - nodes relevant to
the topic
• Expand the root set R to include
all the children and a fixed
number of parents of nodes in R
 A new set S (base subgraph) 
7
HITS Example
BaseSubgraph( R, d)
1.
Sr
2.
for each v in R
3.
do S  S U ch[v]
4.
P  pa[v]
5.
if |P| > d
6.
then P  arbitrary subset of P having size d
7.
SSUP
8.
return S
8
HITS Example
Hubs and authorities: two n-dimensional a and h
HubsAuthorities(G)
|V|
1
2
3
4
5
6
1  [1,…,1] Є R
a0  h 0  1
t 1
repeat
for each v in V
do at (v)  Σ
7
8
9
10
11
12
ht (v)  Σ w Є pa[v] a
(w)
t -1
a t  at / || at ||
h t  ht / || ht ||
t  t+1
until || a t – at -1 || + || h t – ht -1 || < ε
return (a t , h t )
h
w Є pa[v] t -1
(w)
9
HITS Example Results
Authority
Hubness
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Authority and hubness weights
10
PageRank

Introduced by Page et al (1998)


The weight is assigned by the rank of parents
Difference with HITS


HITS takes Hubness & Authority weights
The page rank is proportional to its parents’ rank, but
inversely proportional to its parents’ outdegree
11
Matrix Notation
P is the adjacency matrix based
on the link structure of the Internet.
r   Pr
12
Problem

“Rank Sink” Problem


In general, many Web pages have no
inlinks/outlinks
It results in dangling edges in the graph
1
P   P  (1   ) E
n
E is the matrix of all ones
Interpretation: In each time step the surfer visiting a page
will jump to a random page With a probability 1  
13
Stability

Whether the link analysis algorithms based
on eigenvectors are stable in the sense that
results don’t change significantly?

The connectivity of a portion of the graph is
changed arbitrary

How will it affect the results of algorithms?
14
Stability of HITS
Ng et al (2001)
• A bound on the number of hyperlinks k that can added or deleted
from one page without affecting the authority or hubness weights
• It is possible to perturb a symmetric matrix by a quantity that grows
as δ that produces a constant perturbation of the dominant
eigenvector
δ: eigengap λ1 – λ2
d: maximum outdegree of G
15
Stability of PageRank
Ng et al (2001)
V: the set of vertices touched by the perturbation


The parameter ε of the mixture model has a stabilization role
If the set of pages affected by the perturbation have a small rank, the
overall change will also be small
tighter bound by
Bianchini et al (2001)
δ(j) >= 2 depends on the edges incident on j
16