The effect of New Links on Google Pagerank

Transcript The effect of New Links on Google Pagerank

The effect of New Links on
Google Pagerank
By Hui Xie
Apr , 07
Computing PageRank
Matrix representation
Let P be an nn matrix and pij be the entry at the
i-th row and j-th column.
If page i has k>0 outgoing links
pij = 1/k if page i has a link to page j
pij = 0 if there is no link from i to j
If page I has no outgoing links
pij = 1/n j=1,…,n
Google matrix
• G=cP+(1-c)(1/n)eeT
e=(1,…,1)T
• G is stochastic matrix
Ge=e
• There exists a unique column vector π such
that
πT G= πT,
πT e=1
• πT =(1-c)/n eT(I-cP)-1
Discrete Time Markov Chains
• A sequence of random variables {Xn} is
called a Markov chain if it has the Markov
property:
• States are usually labeled {(0,)1,2,…}
• State space can be finite or infinite
Transition Probability
• Probability to jump from state i to state j
• Assume stationary: independent of time
• Transition probability matrix:
P = (pij)
• Two state MC:
Side Topic: Markov Chains
• A discrete time stochastic process is a sequence
of random variables {X0, X1, …, Xn, …} where the
0, 1, …, n, … are discrete points in time.
• A Markov chain is a discrete-time stochastic
process defined over a finite (or countably infinite)
set of states S in terms of a matrix P of transition
probabilities.
• Memorylessness property: for a Markov chain
• Pr[Xt+1 = j | X0 = i0, X1 = i1, …, Xt = i] =
Pr[Xt+1 = j | Xt = i]
•
Side Topic: Markov Chains
• Let pi(t) be the probability of being in state i at time
step t.
• Let p(t) = [p0(t), p1(t), … ] be the vector of
probabilities at time t.
• For an initial probability distribution p(0), the
probabilities at time n are
•
p(n) = p(0) Pn
• A probability distribution p is stationary if p = p P
• P(Xm+n =j|Xm = i) = P(Xn =j|X0 = i) = Pn(i,j)
absorbing Markov chain
Define a discrete-time absorbing markov chain
{Xt ,t=0,1,…}with the state space {0,1,…,n}
Where transitions between the states 1,…, n are
conducted by the matrix cP, and the state 0 is
absorbing.
The transition matrix is
0 
1


 (1  c)e cP 
Random walk interpretation
• Walk starts at a uniformly chosen web page
• At each step, if currently at page p
• W/p  , go to a uniformly chosen
outneighbor of p
• W/p 1 -  , stop
• Let Nj be the total number of
visits to state j before absorption
including the visit at time t = 0
if X0 is j . Formally,
N  1
,
j  1,..., n.

j
t 0
{ X t  j}
• Then zij=(I-cP)-1ij=E(Nj|X0=I)
• Let qij be the probability of reaching
the state j before absorption if
the initial state is i. Then we
have
• Theorem Let X denote a Markov chain
with state space E. The total number of
visits to a state j∈E under the condition
that the chain starts in state i is given by
P(Nj=m|X0=j)=qjjm-1(1-qjj)
and for i!=j
P(Nj=m|X0=i)= 1-qij
if m=0
qij qjjm-1(1-qjj) if m>=1
Corollary For all i,j ∈E the relations
zij=(1-qii)-1 and zij=qijzjj hold
Outgoing links from i do not affect qji for any j!=I
So by changing the outgoing links, a page can control its
PageRank up to multiplication by a factor zii=1/(1-qii)
For 0<=qii<=c2 ,
1<=zii<=(1-c2)-1≈3.6 for c=0.85
Rank one update of google pagerank
• Page 1 with k0 old links has k1 newly
created links to page 2 to k1+1
• k=k0+k1 , p1T be the first row of matrix P
• Updated hyperlink matrix
P  P  e1uT ,
k1 1
k1 T
1
T
T
u   ei  p1
k i 2
k
• According to (9) the ranking of
page 1 increases when
For z11=1/(1-q11),
zi2=qi1z11, i>1
The above is equivalent to
Hence, the page 1 increases its ranking when it
refers to pages that are characterized by a
high value of qi1. These must be the pages that
refer to page 1 or at least belong to the same
Web community. Here by a Web community we mean
a set of Web pages that a surfer can reach from
one to another in a relatively small number of
steps.
the PageRank of page j increases if
c
k1
k1 1
z
i 2
ij
 z1 j
1 c n
pj 
zkj

n k 1
the PageRank of page j increases if
c
k1
k1 1
z
i 2
ij
 z1 j
if several new links are added then the PageRank
of page j might actually decrease even if this
page receives one of the new links.
Such situation occurs when most of newly created
links point to “irrelevant” pages.
• For instance, let j = 2 and assume
that there is no hyperlink path
from pages 3,…,k+1 to page 2.Then
zij is close to zero for i = 3,…, k
+ 1, and the PageRank of page 2
will increase only if (c/k1)z22 >
z12, which is not necessarily true,
especially if z12 and k1 are
considerably large.
Asymptotic analysis
• Let  j  min{n  , X n  j} be the stopping time of
the first visit to the state j
• Mij=E( |X
=i) be the average time needed to
j 0
reach j starting from i(mean first passage time)
Optimal Linking Strategy
• Consider a page i = 1,…,n and
assume that i has links to pages
i1,…,ik distinct from i. Further,
let mij(c) be the mean first passage
time from page i to page j for the
Google transition matrix G with
parameter c.
1
pi 
mii (c)
• outgoing links from i do not affect
mji(c) for any j!= i. Thus, by linking
from i to j , one can only alter k, this
means that the owner of the page I has
very little control over its pagerank.
The best that he can do is to link only
to one page j* such that
m j i (c)  min{m ji (c)}
•
j
*
Note that (surprisingly) the PageRank of
j* plays no role here.
• Theorem. The optimal linking
strategy for a Web page is to have
only one outgoing link pointing to
a Web page with a shortest mean
first passage time back to the
original page.
Conclusions
• Our main conclusion is that a Web
page cannot significantly
manipulate its PageRank by changing
its outgoing links.
• Furthermore, keeping a logical
hyperlink structure and linking to
a relevant Web community is the
most sensible and rewarding policy.

The effect of New Links on Google Pagerank

Transcript The effect of New Links on Google Pagerank

Directory