Transcript Document

A Fuzzy Web Surfer Model
Narayan L. Bhamidipati
and
Sankar K. Pal
Indian Statistical Institute
Kolkata
Contents
•
•
•
•
•
Web Surfer Models
Markov Chains
Fuzzy Markov Chains
Fuzzy Web Surfer Models
Advantages and Limitations
Web Surfer Models
•
•
•
•
•
•
Surfer visits pages randomly
Various assumptions, various models
Random Surfer Model (PageRank)
HITS (goes back and forth)
Intelligent/Directed Surfer Model
PHITS, WPSS
Markov Chains
P( X n1 )  QP ( X n )  Q P( X 0 )
n
• MC is aperiodic if each of its states is
• MC is irreducible if it is possible to reach
any state from any other state
• Regular if aperiodic and irreducible
• A regular MC has a unique stationary
distribution
Surfer Models as Markov Chains
•
•
•
•
•
Web pages are states
Moving between pages are the transitions
Clicking or typing URLs
Transition probabilities
Steady state distribution yields the page
ranks
Fuzzy Markov Chains
• As opposed to classical Markov Chains
(which are based on Probability Theory)
• Fuzzy Transition Matrix
• Employ max-min algebra (instead of the
usual addition and multiplication)
 ( X n1 )  Q ( X n )  Q  ( X 0 )
n
Fuzzy Web Surfer Model
• Uncertainty involved in following links
• Uncertainty involved in existence of links
• Uncertainty involved in page(let)
boundaries
• Modeled in a fuzzy sense
Fuzziness in Link’s Existence ?
•
•
•
•
•
“A link either exists or it does not exist”
Pagelets: web pages split into sections
Link to a page, but which section ?
What if the sections are not made explicit ?
Have to guess if a link is intended for a
particular pagelet
Pagelets
• Coherent parts of web pages
• Pagelets may differ widely in terms of
content
• Need not necessarily be split into explicit
sections
• Identification by considering the structure
of the documents
Model Formulation
• Each pagelet is referred to as a page
• Every link in the web graph has an
associated fuzzy number denoting the
possibility of it being followed
• Obtain fuzzy transition matrix Q
• (i, j) element of Q denotes the belief of
moving to page i, when the surfer is on
page j
Model Formulation
• Usual PageRank link computations are
performed using the Fuzzy Transition
Matrix on the max-min algebra
 ( X n1 )  Q ( X n )  Q  ( X 0 )
n
• Concept of FuzzRank
Favorable Properties
•
•
•
•
•
Able to model fuzziness in links
Also can capture fuzziness in page contents
Finite Convergence
Analysis in terms of fuzzy eigen sets
Robust Computation
To be Explored…
• Conditions for ergodicity (and hence
regularity) of fuzzy Markov chains are not
completely known (as yet)
• The steady state fuzzy distribution depends
on the initial state
• What do the different convergent fuzzy
distributions correspond to ?
Conjecture
• The distinct steady state fuzzy distributions
correspond to web communities
• Probably helpful in identifying communities
from a given set of pages
References
• K. Avrachenkov and E. Sanchez. Fuzzy markov chains and
decision-making. Fuzzy Optimization and Decision
Making, 1(2):143–159, June 2002.
• J. J. Buckley and E.Eslami. Fuzzy markov chains:
Uncertain probabilities. Mathware and Soft Computing,
9(1):33–41, 2002.
• M. Diligenti, M. Gori, and M. Maggini. A unified
probabilistic framework for web page scoring systems.
IEEE Transactions on Knowledge and Data Engineering,
16(1):4–16, January 2004.