Semantic Overlay Networks for P2P Systems
Download
Report
Transcript Semantic Overlay Networks for P2P Systems
Semantic Overlay Networks
for P2P Systems
Arturo Crespo and Hector Garcia-Molina
Motivation and Related
Work
Partition the P2P network into several
thematic networks
Queries for a content will not reach nodes
without such content
Flooding in smaller networks with smaller TTL
(or more results with same)
Edutella, Hypercup: peers with similar
content connect to the same SuperPeer
Node partitioning
When does a node belong to SON A?
When it contains a piece of type A
When it contains more than x pieces of
type A
• Less nodes per SON=>more results sooner
• Less SONs per node=>less connections
• As in DBFS, coverage problem
SON Classification
Classification must provide:
Load-balance
• Each category has similar number of nodes
• Each node belongs to a small number of
categories
Easy and accurate way to classify a
document
Statistics
Classification from “All Music Guide”
based on music style (26 styles
without and 255 styles with
substyles)
Traces from 1800 Napster nodes
Classifications on decade and tone
provide worse results
Statistics of styles
A node belongs to a style if it has 1
document
24% of the nodes belong to one
style
90% nodes belong to up to eight
14 styles with 200 to 400 nodes
2 styles with 1000 to 1200 nodes
1 style with 1600 to 1800 nodes
Statistics of substyles
18% of the nodes belong to one
substyle
90% nodes belong to up to 30 styles
87% of substyles with less than 400
nodes
Statistics of document
classification
Classification based on the “All Music
Guide” database
Reasons of failure:
Filename format
All documents not songs
Misspellings
Database not complete
25% of documents classified incorrectly
4% of nodes classified incorrectly
Layered SONs
Styles and substyles
Minimum required percentage (or
number) of documents of type A to
join SON A
May not be able to join a substyle
but may join a style
Nodes belonging to a substyle, do
not join the parent style
Searching
A document is classified to belong to
(sub)style A
Search all substyles of style A SONs
Search (sub)style A
Search a higher level SON
Until we get enough results
(How do we locate each SON?)
Results
Acyclic graphs, to measure effect of
msgs to nodes without such content
To get half the documents that
match a query:
Layered SONs: 461 msgs
Gnutella:
1731 msgs