Semantic Overlay Networks for P2P Systems

Download Report

Transcript Semantic Overlay Networks for P2P Systems

Semantic Overlay Networks
for P2P Systems
Arturo Crespo and Hector Garcia-Molina
Motivation and Related
Work

Partition the P2P network into several
thematic networks



Queries for a content will not reach nodes
without such content
Flooding in smaller networks with smaller TTL
(or more results with same)
Edutella, Hypercup: peers with similar
content connect to the same SuperPeer
Node partitioning

When does a node belong to SON A?
When it contains a piece of type A
 When it contains more than x pieces of
type A

• Less nodes per SON=>more results sooner
• Less SONs per node=>less connections
• As in DBFS, coverage problem
SON Classification

Classification must provide:

Load-balance
• Each category has similar number of nodes
• Each node belongs to a small number of
categories

Easy and accurate way to classify a
document
Statistics
Classification from “All Music Guide”
based on music style (26 styles
without and 255 styles with
substyles)
 Traces from 1800 Napster nodes
 Classifications on decade and tone
provide worse results

Statistics of styles
A node belongs to a style if it has 1
document
 24% of the nodes belong to one
style
 90% nodes belong to up to eight
 14 styles with 200 to 400 nodes
 2 styles with 1000 to 1200 nodes
 1 style with 1600 to 1800 nodes

Statistics of substyles
18% of the nodes belong to one
substyle
 90% nodes belong to up to 30 styles
 87% of substyles with less than 400
nodes

Statistics of document
classification


Classification based on the “All Music
Guide” database
Reasons of failure:






Filename format
All documents not songs
Misspellings
Database not complete
25% of documents classified incorrectly
4% of nodes classified incorrectly
Layered SONs
Styles and substyles
 Minimum required percentage (or
number) of documents of type A to
join SON A
 May not be able to join a substyle
but may join a style
 Nodes belonging to a substyle, do
not join the parent style

Searching
A document is classified to belong to
(sub)style A
 Search all substyles of style A SONs
 Search (sub)style A
 Search a higher level SON
 Until we get enough results
 (How do we locate each SON?)

Results
Acyclic graphs, to measure effect of
msgs to nodes without such content
 To get half the documents that
match a query:

Layered SONs: 461 msgs
 Gnutella:
1731 msgs
