Homogeneous Ants for Web Document Similarity Modeling and

Download Report

Transcript Homogeneous Ants for Web Document Similarity Modeling and

Selected topics in Ant 2002
By Hanh Nguyen
Selected topics in Ant 2002
• Homogeneous Ants for Web Document
Similarity Modeling and Categorization
• Ant Colonies as Logistic Processes Optimizers
• Anti-pheromone as a Tool for Better Exploration
of Search Space.
Homogeneous Ants for Web
Document Similarity Modeling
and Categorization
Outline
•
•
•
•
Abstract
Introduction
Ant Colony Models for Data Clustering
Homogeneous Multi-agent System for
Document Clustering
• Experimental Result
• Concluding Remarks & Future Directions
• References
Abstract
Apply the self-organizing and autonomous
behavior of social insects for organizing of
web document.
Introduction
Why?
The size of the internet has doubling its size every
year. Estimated 2.1 billion as of July 2001
Organizing and categorizing document is not
scalable to the growth of internet.
Document clustering?
Is the operation of grouping similar document
to classes that can be used to obtain an analysis
of the content.
Ant clustering algorithm categorize web document
to different interest domain.
Ant Colony Models for Data Clustering
Data clustering?
is the task that seek to identify groups of similar
objects based on the value of their attributes.
• Messor sancta ants collect and pile dead corpses to
form “cemeteries” (Deneubourg et al. )
f: fraction of items in the neighborhood of the agent
k1, k2: threshold constants
Ant Colony Models for Data Clustering
The model later extend by Lume & Faieta to include
distance function d, between data objects .
• c is a cell, N(c) is the number of adjacent cells of c, alpha
is constant
Homogeneous Multi-agent System for
Document Clustering
• Main components: colony of agents, feature vector of
web document, 2D grid.
• Rule: agent move one step at a time to an adjacent cell.
Only a single agent and/or a single item are allowed to
occupy a cell at a time. Picking up or dropping item
based on Pp & Pd
• N(c) = 8,oi is the item at cell i, g(oi) determine the
similarity of oi and other item of oj, where j E N(c)
• Density:
Homogeneous Multi-agent System for
Document Clustering
Similarity measure
r is the number of common term in doci and docj
m,n is the total number of term in doci and docj,
respectively. F is the frequency
Homogeneous Multi-agent System for
Document Clustering
Experimental Results
• Experimental data: 84 web pages from 4
different categories: Business, Computer,
Health and Science. These web page
have 17,776 distinct words.
• Use 30x30 toroidal grid
• 15 agents.
• tmax is 300,000. k1 and k2 in [0.01, 0.2]
increment of 0.05 for each run.
Experimental Results
• t=0
Experimental Results
• t = 50,000
Experimental Results
• t = 200,000
Experimental Results
• t = 300,000
Experimental Result
• Table
Concluding Remarks and Future
Directions
• Initial study on ant-based in clustering web documents.
• The experiments are able to product a near
homogeneous clusters.
Future work:
• a larger perceivable time dependent neighborhood for
agent.
• Formulation of a stopping criterion based on
homogeneity and spatial distribution of clusters.
• Introduction of pheromone traces.
References
• References
1. Lawrence, S., Giles, C.L.: Accessibility of Information on the Web. Nature,
400. (1999)
2. Cyveillance: Sizing the Internet. A Cyveillance Study (2000)
3. Yahoo! Web Directory: http://www.yahoo.com
4. Baeza-Yates, R., Ribeiro-Yates, B.: Modern Information Retrieval. ACM, NY
(1999)
5. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, NY
(1973)
6. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence: From Natural
to Artificial
Systems. Oxford University Press, New York, NY (1999)
7. Deneubourg, J. L., Goss, S., Franks, N.R., Sendova-Franks, A., Detrain, C.,
Chretien, L.:
The Dynamics of Collective Sorting: Robot-like Ants and Ant-like Robots. Proc.
Int. Conf.
Sim. of Adap. Behavior: From Animals to Animats. MIT, MA (1990)
The End
• Question?
• Comment?