P2P Systems and Technologies - Department of Computer Science

Transcript P2P Systems and Technologies - Department of Computer Science

SeLene November 2002
ICS-FORTH & Univ. of Crete
P2P Systems & technologies
Zacharioudakis Giorgos
Zacharioudakis Giorgos
1
SeLene November 2002
ICS-FORTH & Univ. of Crete
Presentation overview






P2P architectures & typical systems
Technical issues
Popular P2P Systems
Research areas
Project JXTA technology
Vision about SeLene project
Zacharioudakis Giorgos
2
SeLene November 2002
ICS-FORTH & Univ. of Crete
What is Peer-to-Peer?

Definition: Nodes of equal roles exchanging information and services
directly
Scale: millions (billions?) of peers
Nature of peers: PC’s
Application: lightweight semantics (e.g., file-sharing)

Is this a new idea?
IP routing
DNS, NTP
Distributed Databases
Zacharioudakis Giorgos
3
SeLene November 2002
ICS-FORTH & Univ. of Crete
P2P vs. Distributed DBMS
Example P2P application: file-sharing
Simple data model & query language
No complex query optimization
Transactions
Easy interoperation
Network Partitions
No guarantee on quality of results
Distributed Query Optimization
Individual site availability
Interoperation of
unimportant
heterogeneous data sources
Local updates
Reliability/failure of nodes
No transactions
Network partitions OK


Traditional DDBMS Issues:

Complex features do not scale  Simple
Amenable to large-scale
network of PCs
Zacharioudakis Giorgos
4
SeLene November 2002
ICS-FORTH & Univ. of Crete
P2P Applications
File sharing
Napster, Gnutella
 Instant Messaging
Jabber
 Distributed Computation
SETI@home
 Web services
Akamai

Zacharioudakis Giorgos
Distributed storage
Freenet
 Anonymity, censorship resistance
Mixmaster remailers
Red Rover, Publius
 Cooperative work
Groove
 Other ...

5
SeLene November 2002
ICS-FORTH & Univ. of Crete
Technical issues







scalability
fault tolerance
speed
bandwidth consumption
processing cost
security
anonymity
Zacharioudakis Giorgos






publishing/retrieval
metadata
semantic querying
availability of results
interoperability
...
6
ICS-FORTH & Univ. of Crete
SeLene November 2002
Metadata and Interoperability
Metadata
System metadata (e.g filename, bitrate, filesize etc)
Resource metadata (e.g relations, hierarchies etc)
 Currently, queries are in the form of keyword matching
We would like to perform queries in more expressive languages,
taking advantage of semantic knowledge metadata
 Technologies:
Programming interfaces:
XML-RPC, SOAP, HTTP, JXTA
Data and metadata representation - common ontologies and format
XML, RDF

Zacharioudakis Giorgos
7
ICS-FORTH & Univ. of Crete
SeLene November 2002
Different Approaches to Distributed Search

Network topology based architectures
Relies on the organization of peers within the network to route
requests
These approaches focus on how to reduce the diameter of the graph
representing the distributed networks

Content based approaches
Message content is used in either the organization of the network or
the routing of messages or both
These approaches focus on how to reduce the query path-length of
the access structure they use
Zacharioudakis Giorgos
8
SeLene November 2002
ICS-FORTH & Univ. of Crete
Spectrum of “Purity”
Hybrid
Centralized index, P2P file
storage and transfer
Napster, SETI@home
 Super-peer
A “pure” network of “hybrid”
clusters
Morpheus, e-donkey
 Pure
functionality completely
distributed
Freenet, Gnutella

Zacharioudakis Giorgos
9
ICS-FORTH & Univ. of Crete
SeLene November 2002
Publishing/Requesting/Responding

hybrid
central
indexing
each node registers to a central index
queries are performed to the central index
retrieval is done from other ‘peer’ nodes

pure
each
‘peer’ manages its own index about local (remote) resources
queries are typically performed with broadcasts
retrieval is done from responding ‘peers’ that hold the requested resource

super-peers
some
nodes act as coordinators and manage indices for a subset of nodes
each node registers to its local coordinator
queries are performed to the coordinators, which in turn communicate as
in a distributed p2p system with other super-peers
retrieval is done from other ‘peers’ that hold the requested resource
Zacharioudakis Giorgos
10
ICS-FORTH & Univ. of Crete
SeLene November 2002
Representative P2P Systems

Network topology based architectures
Napster
Gnutella
Morpheus

Content based architectures
Chord
P-Grid
Zacharioudakis Giorgos
11
SeLene November 2002
ICS-FORTH & Univ. of Crete
Napster (hybrid)

Membership: Each client joins a server, where he registers its local
files to the central index

Query: A client make queries to the central server which returns
references to the clients that actually hold the resources

Retrieval: The client connects to other ‘peer’ clients and retrieves the
resource. The selection is performed by the user but it could be done
automatically based on bandwidth, load or other criteria
Zacharioudakis Giorgos
12
SeLene November 2002
ICS-FORTH & Univ. of Crete
Napster (hybrid)
1
membership /
register resources
server
2
get file
3
query
response {1,4}
4
...
Zacharioudakis Giorgos
13
SeLene November 2002
ICS-FORTH & Univ. of Crete
Gnutella (pure)
Gnutella is not a system: it is a protocol, with various existing gnutella
clients that implement it.
 Membership: Through a predefined static list with addresses or
through “host caches”, a peer can connect to a set of gnutella clients.
After connection a client expands its list of known addresses with the
lists obtained from other peers.
 Query: A peer broadcasts a query to its known peers; these forward
the query to their known peers and so on until a max TTL (packet’s
Time To Live) is reached, which is the depth limit of the query.
 Retrieval: Peers that hold the requested resource respond to the
peer that issued the query. Through the reverse path of the query,
the originating peer finally discovers a list of peers having the
resource and then obtains it from one of them.

Zacharioudakis Giorgos
14
SeLene November 2002
ICS-FORTH & Univ. of Crete
Gnutella (pure)
Breadth-First Search (BFS)
= source
= forward
query
= processed
query
= found
result
= forward
response
Zacharioudakis Giorgos
15
SeLene November 2002
ICS-FORTH & Univ. of Crete
Gnutella (pure)
Each peer maintains a small minimum number of simultaneous active
connections
 These peers are selected from a locally maintained host catcher list
containing the addresses of all known peers
 Peer discovery
 watching PING-PONG messages
 noting the addresses of peers initiating queries
 receiving connections from previously unknown hosts
 out-of-band channels (IRC, Web)
 host caches
 Query propagation: upon receiving a query a peer broadcasts it to all
peers that is currently connected to, and so on as a chain letter
 If a peer has a file that matches the query, sends an answer back
(though it still forwards the query). This process continues to a
maximum depth (“search horizon”)

Zacharioudakis Giorgos
16
SeLene November 2002
ICS-FORTH & Univ. of Crete
Morpheus (Super-Peer)

Self organizing network
 Neither search requests nor actual downloads pass through any
central server
 The network is multi-layered, so that more powerful computers get to
become search hubs ("SuperNodes")
 Any client may become a SuperNode, if it meets the criteria of
processing power, bandwidth and latency
 Network management is automatic - SuperNodes appear and
disappear according to demand
Zacharioudakis Giorgos
17
SeLene November 2002
ICS-FORTH & Univ. of Crete
Morpheus (Super-Peer)
SN2
SN4
SN4
12.34.56.78
SN3
SN1
Zacharioudakis Giorgos
18
SeLene November 2002
ICS-FORTH & Univ. of Crete
Morpheus (Super-Peer)
Intelligent downloads
 Morpheus implements a type of fail-over system that attempts to
locate another peer sharing the same file, and automatically resume
the download where it left off at the failed host
 When Morpheus search engine finds that more than one active peer
is serving a particular file, it associates the list of peers with the file
for later reference
 If the user instructs Morpheus to download the file, it can distribute
the download task over this list of peers
Supernode
 SuperNodes act like local search
hubs and proxy search requests
on behalf of their connected peers

Peer 1
Peer 2
Peer 3
Get file 1
Zacharioudakis Giorgos
File 1
File 2
.
.
.
File n
File 1
File 2
.
.
.
File n
File 1
File 2
.
.
.
File n
19
SeLene November 2002
ICS-FORTH & Univ. of Crete
Chord (content based search)
Chord is a lookup service, not a search
service
Based on binary search trees
 Provides just one operation:
0 0
A peer-to-peer hash lookup:
Lookup(key)  IP address
Chord does not store the data
Uses Hash function:
Key identifier = SHA-1 (key)
Node identifier = SHA-1 (IP
address)
Both are uniformly distributed
Both exist in the same ID space
 How to map key IDs to node IDs?
A key is stored at its successor:
node with next higher ID (modulo N)

Zacharioudakis Giorgos
1
4
6
- a node
7
10
M
- an item
K11
K0
N10
N1
Circular
ID space
K7
K4
20
SeLene November 2002
ICS-FORTH & Univ. of Crete
Chord (content based search)
The
goal of Chord is to provide the
performance of a binary search which means
O(log N) query path-length
In order to manage a maximum path-length
O(log N) each node maintains a routing table
(called “finger table”) with at most m entries
(where m=logN)
 The ith entry in the table at node n contains
the identity of the first node s that succeeds n
by at least 2i-1 on the identifier circle (all
arithmetic modulo 2m)
 i.e., s = successor(n + 2i-1), 1≤ i ≤ m
 Note that the first finger of n is its
immediate successor on the circle
Start (n + 2i1)
Interval of
responsibility
Successor
1
[1,2)
1
2
[2,4)
3
4
[4,0)
0
0
7
1
6
2
5
4
3
existing node
not existing node, but a
possible value in ID space
Zacharioudakis Giorgos
21
SeLene November 2002
ICS-FORTH & Univ. of Crete
Chord (content based search)
Important characteristics
 Each node stores info only about a
small number of possible IDs (at most
logN)
 Knows more info about nodes closely
following it on the identifier circle
 A node’s table does not generally
contain enough info to locate the
successor of an arbitrary key k
Start
Int.
Succ.
1
[1,2)
1
2
[2,4)
3
4
[4,0)
0
0
1
7
6
2
5
Int.
Succ.
2
[2,3)
3
3
[3,5)
3
5
[5,1)
0
3
4
Zacharioudakis Giorgos
Start
Start
Int.
Succ.
4
[4,5)
0
5
[5,7)
0
7
[7,3)
0
22
SeLene November 2002
ICS-FORTH & Univ. of Crete
Chord (content based search)
How
do we locate the successor of a key k?
If n can find a node whose ID is closer than
its own to k, that node will know more about the
identifier circle in the region of k than n does
Thus n searches its finger table for the node j
whose ID most immediately precedes k, and
asks j for the node it knows whose ID is closest
to k
N110
By repeating this
process, n learns about start Interval Succ.
nodes with IDs closer
100 [100,101) 110
and closer to k
N99
101
[101,103)
5
Gradually we will find
103 [103,107)
5
the immediate
predecessor of k
107 [107,115)
5
Zacharioudakis Giorgos
115
[115,3)
5
3
[3,35)
5
35
[35,100)
60
“Finger Table” Allows
Log(n)-time Lookups
…
…
…
9
[9,13)
10
13
[13,21)
20
N5
N10
K19
N20
N32
N80
N60
Lookup
(K19)
23
SeLene November 2002
ICS-FORTH & Univ. of Crete
Chord Autonomy






When new keys are inserted the system is not affected. It just finds
the appropriate node and stores it
When nodes join or leave, the finger tables must be correctly
maintained and also some keys must be transferred to other nodes
Also, every key is stored only in one node, which means that if that
node becomes unavailable the key is also unavailable
This incurs an O(log2N) cost for maintaining the finger tables and
assuring correctness of the system while nodes join/leave the system
This imply a restricted autonomy of the system
The only replicated information is (implicitly) the finger tables,
because each node has to maintain its own
Zacharioudakis Giorgos
24
SeLene November 2002
ICS-FORTH & Univ. of Crete
P-Grid
Basic characteristics
Based on building distributed, binary prefix trees
Use of randomized algorithms for constructing the access structure,
updating the data and performing the search
Scale gracefully, equally for all nodes
 Access structure
We assume that the index terms are binary strings, built from 0’s & 1’s
The search space is partitioned into intervals
Every peer takes over responsibility for one interval
As each key corresponds to a path in the binary prefix tree the peer is
also responsible for one path of the search tree
Each peer stores the peers responsible for the other branches of the
path for routing
Search requests are either processed locally or forwarded to the peers
on the alternative branches
25

Zacharioudakis Giorgos
SeLene November 2002
ICS-FORTH & Univ. of Crete
P-Grid
1
Key intervals
Level 0
2
3
4
5
6
0
1
1
Key intervals
Level 1
2
01
00
Key intervals
Level 2
001
Zacharioudakis Giorgos
6
1
0010
6
4
5
11
10
2
01
3
3
0100
100
4
1001
5
1011
110
27
SeLene November 2002
ICS-FORTH & Univ. of Crete
P-Grid
01
queries
Key intervals
Level 0
10
1
2
3
4
5
6
0
Key intervals
Level 1
1
1
2
001
Zacharioudakis Giorgos
3
01
00
Key intervals
Level 2
6
1
0010
6
5
11
10
2
01
4
3
0100
100
4
1001
5
1011
110
28
SeLene November 2002
ICS-FORTH & Univ. of Crete
P-Grid Autonomy
The system implies that peers eventually meet, but does not examine
how does this occur, i.e. it is possible that they never meet
 As many peers can be responsible for the same key the general
problem is how to find all those peers in case of an update
Proposed solutions
multiple BFS or DFS searches for a key and propagating the
update to them
Creating lists of “buddies” for each peer (i.e. other peers that
share the same key) and propagate the update to all buddies
 These imply that although the system is decentralized and peers
does not rely to central authorities, the construction and update of the
access structure may impose some performance issues, especially
when updating a key

Zacharioudakis Giorgos
29
SeLene November 2002
ICS-FORTH & Univ. of Crete
P-Grid Autonomy
When a new node enters the system, assumes that he is responsible
over the whole prefix namespace interval
 When he meets with other nodes they split the interval and each
maintain a reference to the other node
 When a node leaves abruptly, the other nodes have incorrect
references and as soon as they are aware of it they ‘resume’
responsibility over that prefix interval
 The replicated information in this system is the multiple references to
the same keys and the “buddies” lists (when used) in order to face
the update problem

Zacharioudakis Giorgos
30
SeLene November 2002
ICS-FORTH & Univ. of Crete
P2P comparison
Paradigm
Search type
Search cost
(messages)
Autonomy
Napster
Centralized
indexing
String
comparison
O(1)
Low
Gnutella
Breadth-first
search on
graph
String
comparison
Morpheus Super-peers
Metadata
comparison
Very high
TTL
2 *  C * (C  1)
i
i 0
O(logN)?
High
Chord
Implicit binary Equality
search trees
O(logN)
Restricted
P-Grid
Binary prefix
trees
O(logN)
High
Zacharioudakis Giorgos
Prefix
31
ICS-FORTH & Univ. of Crete
SeLene November 2002
P2P performance metrics





Bandwidth
Storage (replication)
Processing cost
Path-length (required hops)
Quality of Results
Number of results
Satisfaction (true if # results >= X, false otherwise)
Time to satisfaction
Zacharioudakis Giorgos
32
SeLene November 2002
ICS-FORTH & Univ. of Crete
Hybrid p2p




Advantages
Simple to manage and
availability of results -due to
central indexing
Less (aggregated) bandwidth
consumption
Small processing cost for peers
Idle nodes that do not offer
resources does not downscale
system’s performance
Zacharioudakis Giorgos




Disadvantages
Does not scale
Single point of failure
Great processing cost for server
Vulnerable to censorship
33
SeLene November 2002
ICS-FORTH & Univ. of Crete
Pure p2p




Advantages
Efficiency: harnessing unused
resources
Self-organizing
Robustness and availability
through replication
Anonymity/legal
protection/censorship resistant
Zacharioudakis Giorgos




Disadvantages
Difficult to manage and poor
results due to lack of central
indexing
Bandwidth consuming
Idle nodes downscale the
overall performance
Higher processing cost for
peers
34
SeLene November 2002
ICS-FORTH & Univ. of Crete
Super peers
Advantages





Scalable
Fault tolerant
Adaptable and self-organizing
Efficient
Low path-length
Zacharioudakis Giorgos
Disadvantages
 Hard to manage/maintain
 Complex topology, difficult to
evaluate its metrics (through
simulation or trace driven
analysis)
35
SeLene November 2002
ICS-FORTH & Univ. of Crete
Content-based searching architectures





Advantages
Low search cost ( O(logN) )
Harnessing the content
information into queries.
Good approach for content that
can be described with simple
attributes.
Less messages per query than
a random graph.
Load balancing.




Zacharioudakis Giorgos
Disadvantages
More restrictions than topologybased architectures: when
nodes join/leave, rehashing and
content migration needs to be
performed.
A peer needs to know what is
looking for, to map it to an
address.
Not practical for content
described by multiple attributes.
Storage and routing are closely
connected
36
SeLene November 2002
ICS-FORTH & Univ. of Crete
Conclusions about p2p systems

Benefits
efficiency: harnessing
unused resources
Self-organizing
Sharing cost of ownership
Robustness and availability
through replication
Anonymity/legal protection

Challenges
No authority to enforce
behavior
Cooperation
Unreliability of individual
peers
Efficiency of distributed
operations (absolute
resources)
Imposed research issues
• Resource Management
• Security
• Efficient Search
Zacharioudakis Giorgos
37
SeLene November 2002
ICS-FORTH & Univ. of Crete
Project JXTA
JXTA is a set of protocols which allow peers to discover and
communicate with each other
 Protocols are defined in terms of XML messages exchanged between
peers
 JXTA is platform (e.g Windows), language (e.g Java) and transport
(e.g TCP/IP) independent

Zacharioudakis Giorgos
41
SeLene November 2002
ICS-FORTH & Univ. of Crete
JXTA Concepts

Concepts:
Peer
- a node that speaks the JXTA
protocols
Peer
Group - a collection of
cooperating peers
Message
- a datagram containing an
envelope, protocol headers and
bodies
peer
peer
pipe
advertisement
Pipe
- an async communication
channel for sending/receiving
messages
Advertisement
- an XML document
that publishes the existence of a
resource (peer, peer group, pipe,
service)
Zacharioudakis Giorgos
peer
peer
peer
peer
group
42
SeLene November 2002
ICS-FORTH & Univ. of Crete
JXTA Model
Zacharioudakis Giorgos
43
SeLene November 2002
ICS-FORTH & Univ. of Crete
JXTA Protocols

Peer Discovery Protocol - used
between any peers to find other
peers, peer groups, or
advertisements

Peer Information Protocol used to learn about another
peer's properties

Peer Resolver Protocol 'foundation protocol' for the
Peer Discovery Protocol and
the Peer Information Protocol.
Can be used to build other
protocols as well. Defines
send/receive 'generic queries'
and responses to be sent from
one peer to another
Zacharioudakis Giorgos

Peer Membership Protocol - used
to find out about, join and leave
groups

Pipe Binding Protocol - used to
bind a pipe to an actual endpoint

Peer Endpoint Protocol - used to
provide routing information for
paths between peers (if a direct
connection is not possible)
44
SeLene November 2002
ICS-FORTH & Univ. of Crete
JXTA Search

JXTASearch is a framework for searching in distributed networks
A protocol for registration, query and response
A series of services for interacting via this protocol
Gnutella style peer search
Zacharioudakis Giorgos
JXTA style peer search
45
SeLene November 2002
ICS-FORTH & Univ. of Crete
JXTA Search

Advantages
 Disadvantages
Supports very dynamic networks
Single point of failure
Reduce publishing and query
Scalability
response latency
Centralized control …
Centralized control (centralized
implementation of security,
accounting, membership, …)
Zacharioudakis Giorgos
46
SeLene November 2002
ICS-FORTH & Univ. of Crete
Towards a Super-Peer Architecture for SeLene
Birkbeck
Orsay
Uoc
Zacharioudakis Giorgos
UoCyprus
47
SeLene November 2002
ICS-FORTH & Univ. of Crete
References







http://www.internet2.edu/presentations/20020131-P2P-Kan.htm
http://softwaredev.earthweb.com/java/article/0,,12082_783281,00.html
http://www.cs.vu.nl/pub/globe/cp2pc/notes/allnotes/jxta.overview
http://wiki.cs.uiuc.edu/cs427/P2P+Architecture
http://www.stanford.edu/class/cs347/handouts/p2p.ppt
http://cv.uoc.es/~grc0_000228_web/Marques/Tesi_JM.htm
http://iew3.technion.ac.il/~spektory/098223/presentations/fastTrack.ppt
Zacharioudakis Giorgos
48

P2P Systems and Technologies - Department of Computer Science

Transcript P2P Systems and Technologies - Department of Computer Science

Directory