ecs251 Spring 2007 - Department of Computer Science
Download
Report
Transcript ecs251 Spring 2007 - Department of Computer Science
UCDavis, ecs251
Fall 2007
ecs251 Fall 2007:
Operating System Models
#4: Peer-to-Peer Systems
Dr. S. Felix Wu
Computer Science Department
University of California, Davis
http://www.cs.ucdavis.edu/~wu/
[email protected]
10/23/2007
P2P
1
UCDavis, ecs251
Fall 2007
The role of service provider..
Centralized management of services
– DNS, Google, www.cnn.com, Blockbuster,
SBC/Sprint/AT&T, cable service, Grid
computing, AFS, bank transactions…
Information, Computing, & Network
resources owned by one or very few
administrative domains.
– Some with SLA (Service Level Agreement)
10/23/2007
P2P
2
UCDavis, ecs251
Fall 2007
Interacting with the “SP”
Service providers are the owner of the
information and the interactions
– Some enhance/establish the interactions
10/23/2007
P2P
3
UCDavis, ecs251
Fall 2007
Incentives for SP
Business model
By law
10/23/2007
P2P
4
UCDavis, ecs251
Fall 2007
Let’s compare …
Google
Blockbuster
CNN
MLB/NBA
LinkIn
e-Bay
10/23/2007
P2P
Skype
Bittorrent
Blog
Youtube
BotNet
Cyber-Paparazzi
5
UCDavis, ecs251
Fall 2007
Toward P2P
More participation of the end nodes (or their
users)
– More decentralized Computing/Network
resources available
– End-user controllability and interactions
– Security/robustness concerns
10/23/2007
P2P
6
UCDavis, ecs251
Fall 2007
Peer’s role
Shifting functionality/service/information
from the SP to the peers.
– Peer’s CPU cycles
– Peer’s Internet access
– Peer’s storage
10/23/2007
P2P
7
UCDavis, ecs251
Fall 2007
Service Providers in P2P
We might not like SP, but we still can not
avoid SP entirely.
– Who is going to lay the fiber and switch?
– Can we avoid DNS?
– How can we stop “Cyber-Bullying” and other
similar?
– Copyright enforcement?
– Internet becomes a junkyard?
10/23/2007
P2P
8
UCDavis, ecs251
Fall 2007
We will discuss…
P2P system examples
– Unstructured, structured, incentive
Architectural analysis and issues
Future P2P applications and why?
10/23/2007
P2P
9
UCDavis, ecs251
Fall 2007
Questions to ask
Peer’s role (or SP’s role)
Peer’s controllability and vulnerability
Incentives to contribute
Peer’s mobility and dynamics
Scalability
10/23/2007
P2P
10
UCDavis, ecs251
Fall 2007
Challenge to you…
Define a new P2P-related application,
service, or architecture.
Justify why it is practical, useful and will
scale well.
– Example: sharing cooking recipes, experiences
& recommendations about restaurants and
hotels
10/23/2007
P2P
11
UCDavis, ecs251
Fall 2007
Napster
P2P File sharing
“Unstructured”
10/23/2007
P2P
12
UCDavis, ecs251
Fall 2007
Napster
peers
Napster server
Index
1. File locati on
request
Napster server
Index
3. File request
2. List of peers
offering the file
5. Inde x update
4. File del ivered
10/23/2007
P2P
13
UCDavis, ecs251
Fall 2007
Napster
Advantages?
Disadvantages?
10/23/2007
P2P
14
UCDavis, ecs251
Fall 2007
Napster: Questions to ask
Peer’s role (or SP’s role)
Peer’s controllability and vulnerability
Incentives to contribute
Peer’s mobility and dynamics
Scalability
10/23/2007
P2P
15
UCDavis, ecs251
Fall 2007
10/23/2007
P2P
16
UCDavis, ecs251
Fall 2007
10/23/2007
P2P
17
UCDavis, ecs251
Fall 2007
Originally conceived of by Justin Frankel, 21 year old founder of Nullsoft
March 2000, Nullsoft posts Gnutella to the web
A day later AOL removes Gnutella at the behest of Time Warner
The Gnutella protocol version 0.4
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf
and version 0.6
http://rfc-gnutella.sourceforge.net/Proposals/Ultrapeer/Ultrapeers.htm
there are multiple open source implementations at http://sourceforge.net/
including:
– Jtella
– Gnucleus
Software released under the Lesser Gnu Public License (LGPL)
the Gnutella protocol has been widely analyzed
10/23/2007
P2P
18
UCDavis, ecs251
Fall 2007
Gnutella Protocol Messages
Broadcast Messages
– Ping: initiating message (“I’m here”)
– Query: search pattern and TTL (time-to-live)
Back-Propagated Messages
– Pong: reply to a ping, contains information about the
peer
– Query response: contains information about the
computer that has the needed file
Node-to-Node Messages
– GET: return the requested file
– PUSH: push the file to me
10/23/2007
P2P
19
UCDavis, ecs251
Fall 2007
Steps:
• Node 2 initiates search for file A
7
1
A
4
2
6
3
5
10/23/2007
P2P
20
UCDavis, ecs251
Fall 2007
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
7
1
4
2
3
A
6
A
5
10/23/2007
P2P
21
UCDavis, ecs251
Fall 2007
A
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
7
1
4
2
6
3
A
10/23/2007
A
5
P2P
22
UCDavis, ecs251
Fall 2007
A:7
A
7
1
4
2
6
3
A:5
10/23/2007
A
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
5
P2P
23
UCDavis, ecs251
Fall 2007
7
1
4
2
3
A:7
A:5
A 6
A
10/23/2007
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
5
P2P
24
UCDavis, ecs251
Fall 2007
7
1
A:7
2
4
A:5
6
3
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
5
10/23/2007
P2P
25
UCDavis, ecs251
Fall 2007
Limited Scope Flooding
Reverse Path Forwarding
download A
1
7
4
2
6
3
5
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
• File download
• Note: file transfer between
clients behind firewalls is not
possible; if only one client, X, is
behind a firewall, Y can request
that X push the file to Y
10/23/2007
P2P
26
UCDavis, ecs251
Fall 2007
Gnutella
Advantages?
Disadvantages?
10/23/2007
P2P
27
UCDavis, ecs251
Fall 2007
Gnutella: Questions to ask
Peer’s role (or SP’s role)
Peer’s controllability and vulnerability
Incentives to contribute
Peer’s mobility and dynamics
Scalability
10/23/2007
P2P
28
UCDavis, ecs251
Fall 2007
Gnutella vs. Napster
10/23/2007
P2P
29
UCDavis, ecs251
Fall 2007
GUID:
Short for Global Unique Identifier, a randomized string
that is used to uniquely identify a host or message on the
Gnutella Network. This prevents duplicate messages from
being sent on the network.
GWebCache:
a distributed system for helping servants connect to the
Gnutella network, thus solving the "bootstrapping"
problem. Servants query any of several hundred
GWebCache servers to find the addresses of other servants.
GWebCache servers are typically web servers running a
special module.
Host Catcher:
Pong responses allow servants to keep track of active
gnutella hosts
On most servants, the default port for Gnutella is 6346
10/23/2007
P2P
30
10/23/2007
Gnutella Network Growth
P2P
05/12/01
05/16/01
05/22/01
05/24/01
05/29/01
50
02/27/01
03/01/01
03/05/01
03/09/01
03/13/01
03/16/01
03/19/01
03/22/01
03/24/01
11/20/00
11/21/00
11/25/00
11/28/00
Number of nodes in the largest
network component ('000)
UCDavis, ecs251
Fall 2007
.
40
30
20
10
-
31
UCDavis, ecs251
Fall 2007
“Limited Scope Flooding”
Ripeanu reported that Gnutella traffic totals 1Gbps (or
330TB/month).
– Compare to 15,000TB/month in US Internet backbone
(December 2000)
– this estimate excludes actual file transfers
Reasoning:
QUERY and PING messages are flooded. They form
more than 90% of generated traffic
predominant TTL=7
>95% of nodes are less than 7 hops away
measured traffic at each link about 6kbs
network with 50k nodes and 170k links
10/23/2007
P2P
32
UCDavis, ecs251
Fall 2007
A
B
F
D
E
C
G
H
Perfect Mapping
10/23/2007
P2P
33
UCDavis, ecs251
Fall 2007
A
B
F
D
E
C
G
H
Inefficient mapping
Link D-E needs to support six times higher
traffic.
10/23/2007
P2P
34
UCDavis, ecs251
Fall 2007
Topology mismatch
The overlay network topology doesn’t match
the underlying Internet infrastructure
topology!
40% of all nodes are in the 10 largest Autonomous
Systems (AS)
Only 2-4% of all TCP connections link nodes
within the same AS
Largely ‘random wiring’
Most Gnutella generated traffic crosses AS border,
making the traffic more expensive
May cause ISPs to change their pricing scheme
10/23/2007
P2P
35
UCDavis, ecs251
Fall 2007
Scalability
Whenever a node receives a message,
(ping/query) it sends copies out to all of its
other connections.
existing mechanisms to reduce traffic:
– TTL counter
– Cache information about messages they
received, so that they don't forward duplicated
messages.
10/23/2007
P2P
36
UCDavis, ecs251
Fall 2007
70% of Gnutella users share no files
90% of users answer no queries
Those who have files to share may limit number of connections or
upload speed, resulting in a high download failure rate.
If only a few individuals contribute to the public good, these few
peers effectively act as centralized servers.
10/23/2007
P2P
37
UCDavis, ecs251
Fall 2007
Anonymity
Gnutella provides for anonymity by
masking the identity of the peer that
generated a query.
However, IP addresses are revealed at
various points in its operation: HITS
packets includes the URL for each file,
revealing the IP addresses
10/23/2007
P2P
38
UCDavis, ecs251
Fall 2007
Query Expressiveness
Format of query not standardized
No standard format or matching semantics for the
QUERY string. Its interpretation is completely
determined by each node that receives it.
String literal vs. regular expression
Directory name, filename, or file contents
Malicious users may even return files unrelated to
the query
10/23/2007
P2P
39
UCDavis, ecs251
Fall 2007
Superpeers
Cooperative, long-lived peers typically with
significant resources to handle very high
amount of query resolution traffic.
10/23/2007
P2P
40
UCDavis, ecs251
Fall 2007
10/23/2007
P2P
41
UCDavis, ecs251
Fall 2007
Gnutella is a self-organizing, large-scale, P2P application
that produces an overlay network on top of the Internet; it
appears to work
Growth is hindered by the volume of generated traffic and
inefficient resource use
since there is no central authority the open source
community must commit to making any changes
Suggested changes have been made by
– Peer-to-Peer Architecture Case Study: Gnutella Network, by Matei
Ripeanu
– Improving Gnutella Protocol: Protocol Analysis and Research
Proposals by Igor Ivkovic
10/23/2007
P2P
42
UCDavis, ecs251
Fall 2007
Freenet before v0.7
Essentially the same as Gnutella:
– Limited-scope flooding
– Reverse-path forwarding
Difference:
– Data objects (I.e., files) are also being delivered
via “reverse-path forwarding”
Freenet v0.7 -- “Darknet routing”
10/23/2007
P2P
43
UCDavis, ecs251
Fall 2007
P2P Issues
Scalability & Load Balancing
Anonymity
Fairness, Incentives & Trust
Security and Robustness
Efficiency
Mobility
10/23/2007
P2P
44
UCDavis, ecs251
Fall 2007
Incentive-driven Fairness
P2P means we all should contribute..
– Hopefully fair, but the majority is selfish…
“Incentive for people to contribute…”
10/23/2007
P2P
45
UCDavis, ecs251
Fall 2007
10/23/2007
P2P
46
UCDavis, ecs251
Fall 2007
File Organization
File
1
2
3
4
Piece
256KB
Block
16KB
Incomplete Piece
10/23/2007
P2P
47
UCDavis, ecs251
Fall 2007
Initialization
HTTP GET MYFILE.torrent
webserver
tracker
MYFILE.torrent
http://mytracker.com:6969/
S3F5YHG6FEB
FG5467HGF367
“register”
F456JI9N5FF4E
…
list of peers
ID1 169.237.234.1:6881
ID2 190.50.34.6:5692
ID3 34.275.89.143:4545
…
ID50 231.456.31.95:6882
…
Peer 40
10/23/2007
user
P2P
Peer 2
Peer 1
48
UCDavis, ecs251
Fall 2007
Peer/Seed
1
10/23/2007
2
3
P2P
4
49
UCDavis, ecs251
Fall 2007
“On the Wire” Protocol
(Over TCP)
0
BitField
Remote Peer
Interested = 0
choked
= 1
10/23/2007
Non-keepalive messages:
0 – choke
11
– unchoke 0
1
2 – interested
3 – not interested
ID/Infohash
4 – haveHandshake
BitField
5 – bitfield
6 – request
7 – piece
8 – cancel
P2P
Local Peer
Interested = 0
choked
= 1
50
UCDavis, ecs251
Fall 2007
Choking
By default, every peer is “choked”
– stop “uploading” to them, but the TCP
connection is still there.
Select 4~6 peers to “unchoke” ??
– “Re-choke” every 30 seconds
– How to decide?
Optimistic Unchoking
– What is this?
10/23/2007
P2P
51
UCDavis, ecs251
Fall 2007
“Interested”
A request for a piece (or its sub-pieces)
10/23/2007
P2P
52
UCDavis, ecs251
Fall 2007
Get a piece/block!!
Download:
– Which peer? (download from whom? Does it
matter?)
– Which piece?
How about “upload”?
– Which peer?
– Which piece?
10/23/2007
P2P
53
UCDavis, ecs251
Fall 2007
Piece Selection
Pipelining (5 requests)
Strict Priority (incomplete pieces first)
Rarest First
What is the problem?
10/23/2007
P2P
54
UCDavis, ecs251
Fall 2007
Rarest First
Exchanging bitmaps with 20+ peers
– Initial messages
– “have” messages
Array of buckets
– Ith buckets contains “pieces” with I known
instances
– Within the same bucket, the client will
randomly select one piece.
10/23/2007
P2P
55
UCDavis, ecs251
Fall 2007
Piece Selection
Pipelining (5 requests)
Strict Priority
3 stages:
– Random first piece
– Rarest First
– Endgame mode
10/23/2007
P2P
56
UCDavis, ecs251
Fall 2007
Piece Selection
Piece (64K~1M) Sub-piece (16K)
– Piece-size: trade-off between performance and the size
of the torrent file itself
– A client might request different sub-pieces of the same
piece from different peers.
Strict Priority - sub-pieces and piece
Rarest First
– Exception: “random first”
– Get the stuff out of Seed(s) as soon as possible..
10/23/2007
P2P
57
UCDavis, ecs251
Fall 2007
Get a piece/block!!
Download:
– Which peer?
– Which piece?
How about “upload”?
– Which peer?
– Which piece?
10/23/2007
P2P
58
UCDavis, ecs251
Fall 2007
Peer Selection
Focus on Rate
Upload to 4~6 peers
Random Unchoke
Global rate cap only
10/23/2007
P2P
59
UCDavis, ecs251
Fall 2007
Bittorrent: “Tit for Tat”
Equivalent Retaliation (Game theory)
– A peer will “initially” cooperate, then respond
in kind to an opponent's previous action. If the
opponent previously was cooperative, the agent
is cooperative. If not, the agent is not.
10/23/2007
P2P
60
UCDavis, ecs251
Fall 2007
Choking
By default, every peer is “choked”
– stop “uploading” to them, but the TCP
connection is still there.
Select 4~6 peers to “unchoke” ??
– Best “upload rates” and “interested”.
– Uploading to the unchoked ones and monitor
the download rate for all the peers
– “Re-choke” every 30 seconds
Optimistic Unchoking (6+1)
– Randomly select a choked peer to unchoke
10/23/2007
P2P
61
UCDavis, ecs251
Fall 2007
Bittorrent
Fairness of download and upload between a
pair of peers
Every 10 seconds, estimate the download
bandwidth from the other peer
– Based on the performance estimation to decide
to continue uploading to the other peer or not
10/23/2007
P2P
62
UCDavis, ecs251
Fall 2007
Properties
Bigger “%” = better chance of unchoked
Bigger “%” ~= better UL and DL rates ?!
10/23/2007
P2P
63
UCDavis, ecs251
Fall 2007
Who to Unchoke?
Peer/Seed
1
10/23/2007
2
3
P2P
4
64
UCDavis, ecs251
Fall 2007
Seed unchoking
old algorithm
– unchoke the fastest peers (how?)
– problem: fastest peers may monopolize seeds
new algorithm
periodically sort all peers according to
their last unchoke time
prefer the most recently unchoked peers;
on a tie, prefer the fastest
(presumably) achieves equal spread of seed
bandwidth
10/23/2007
P2P
65
UCDavis, ecs251
Fall 2007
Seed unchoking
old algorithm
– unchoke the fastest peers (how?)
– problem: fastest peers may monopolize seeds
new algorithm
periodically sort all peers according to
their last unchoke time
prefer the most recently unchoked peers;
on a tie, prefer the fastest
(presumably) achieves equal spread of seed
bandwidth
10/23/2007
P2P
66
UCDavis, ecs251
Fall 2007
Attacks to BT
???
10/23/2007
P2P
67
UCDavis, ecs251
Fall 2007
Attacks to BT
Download only from the seeds
Download only from fastest peers
Announcing false pieces
Privacy -- (Torrent, source IP addresses)
10/23/2007
P2P
68
UCDavis, ecs251
Fall 2007
BitTorrent: Questions to ask
Peer’s role (or SP’s role)
Peer’s controllability and vulnerability
Incentives to contribute
Peer’s mobility and dynamics
Scalability
10/23/2007
P2P
69
UCDavis, ecs251
Fall 2007
Bittorrent
“Tic-for-Tat” incentive model within the
same torrent
Piece/Peer selection and choking
The need for tracker and torrent file
10/23/2007
P2P
70
UCDavis, ecs251
Fall 2007
Client implementations
mainline: written in Python; right now, the only
one employing the new seed unchoking algorithm
Azureus: the most popular, written in Java;
implements a special protocol between clients
(e.g. peers can exchange peer lists)
other popular clients: ABC, BitComet, BitLord,
BitTornado, μTorrent, Opera browser
various non-standard extensions
– retaliation mode: detect compromised/malicious peers
– anti-snubbing: ignore a peer who ignores us
– super seeding: seed masquerading as a leecher
10/23/2007
P2P
71
UCDavis, ecs251
Fall 2007
Resources
Basic BitTorrent mechanisms
[Cohen, P2PECON’03]
BitTorrent specification Wiki
http://wiki.theory.org/BitTorrentSpecification
Measurement studies
[Izal et al., PAM’04],
[Pouwelse et al., Delft TR 2004 and IPTPS’05], [Guo et al., IMC’05], and
[Legout et al., INRIA-TR-2006]
Theoretical analysis and modeling
[Qiu et al., SIGCOMM’04], and
[Tian et al., Infocom’06]
Simulations
[Bharambe et al., MSR-TR-2005]
Sharing incentives and exploiting them
[Shneidman et al., PINS’04],
[Jun et al., P2PECON’05], and
[Liogkas et al., IPTPS’06]
10/23/2007
P2P
72
UCDavis, ecs251
Fall 2007
Trackerless Bittorrent
Every BT peer is a tracker!
But, how would they share and exchange
information regarding other peers?
Similar to Napster’s index server or DNS
10/23/2007
P2P
73
UCDavis, ecs251
Fall 2007
Pure P2P
Every peer is a tracker
Every peer is a DNS server
Every peer is a Napster Index server
How can this be done?
– We try to remove/reduce the role of “special
servers”!
10/23/2007
P2P
74
UCDavis, ecs251
Fall 2007
Unstructured P2P
P2P network topology is arbitrary!
– Gnutella
– BitTorrent and Napster
10/23/2007
P2P
75
UCDavis, ecs251
Fall 2007
Unstructured P2P
P2P network topology is arbitrary!
– Gnutella
– BitTorrent and Napster
Mapping - [Content/Object, Peer] is arbitrary!
– How to search the content in Gnutella?
– BitTorrent and Napster?
10/23/2007
P2P
76
UCDavis, ecs251
Fall 2007
Disadvantages of u-P2P
Flooding cost
Rare contents
– Also, the issue of “All or 1”
10/23/2007
P2P
77
UCDavis, ecs251
Fall 2007
From u-P2P to s-P2P
P2P network topology is formed according
to the content ownership
Unique “naming/keying” for the content
object as well as the peer
10/23/2007
P2P
78
UCDavis, ecs251
Fall 2007
Peer
The requirements of Peer?
10/23/2007
P2P
79
UCDavis, ecs251
Fall 2007
Structured Peering
Peer identity and routability
10/23/2007
P2P
80
UCDavis, ecs251
Fall 2007
Structured Peering
Peer identity and routability
Key/content assignment
– Which identity owns what? (Google Search?)
10/23/2007
P2P
81
UCDavis, ecs251
Fall 2007
Structured Peering
Peer identity and routability
Key/content assignment
– Which identity owns what?
Napster: centralized index service
Skype/Kazaa: login-server & super peers
DNS: hierarchical DNS servers
Two problems:
(1). How to connect to the “topology”?
(2). How to prevent failures/changes?
10/23/2007
P2P
82
UCDavis, ecs251
Fall 2007
DHT
Most s-P2P systems are DHT-based.
Distributed hash tables (DHTs)
– decentralized lookup service of a hash table
– (name, value) pairs stored in the DHT
– any peer can efficiently retrieve the value
associated with a given name
– the mapping from names to values is distributed
among peers
10/23/2007
P2P
83
UCDavis, ecs251
Fall 2007
HT as a search table
(BitTorrent, Napster)
“160 bits”
Index key
Content
Object/
Peer
naming
10/23/2007
Information/content is distributed, and we need
to know where?
Where is this piece of music?
Is this BT piece available?
What is the location of this type of content?
What is the current IP address of this skype
user?
P2P
84
UCDavis, ecs251
Fall 2007
DHT as a search table
???
Index key
10/23/2007
P2P
85
UCDavis, ecs251
Fall 2007
DHT as a search table
???
Index key
10/23/2007
P2P
86
UCDavis, ecs251
Fall 2007
DHT segment ownership
???
Index key
10/23/2007
P2P
87
UCDavis, ecs251
Fall 2007
DHT
Scalable
Peer arrivals, departures, and failures
Unstructured versus structured
10/23/2007
P2P
88
UCDavis, ecs251
Fall 2007
DHT (Name, Value)
How to utilize DHT to avoid Trackers in
Bittorrent?
10/23/2007
P2P
89
UCDavis, ecs251
Fall 2007
DHT-based Tracker
FreeBSD 5.4 CD images
Publish the key on
the class web site.
Index key
Whoever owns
this hash entry is
the tracker for the
corresponding
key!
Seed’s IP address
PUT & GET
10/23/2007
P2P
90
UCDavis, ecs251
Fall 2007
Chord
Given a key (content object), it maps the
key onto a peer -- consistent hash
Assign keys to peers.
Solves problem of locating key in a
collection of distributed peers.
Maintains routing information as peers join
and leave the system
10/23/2007
P2P
91
UCDavis, ecs251
Fall 2007
Chord
Consistent Hashing
A Simple Key Lookup Algorithm
Scalable Key Lookup Algorithm
Node Joins and Stabilization
Node Failures
10/23/2007
P2P
92
UCDavis, ecs251
Fall 2007
Consistent Hashing
Consistent hash function assigns each peer and
key an m-bit identifier (e.g., 140 bits).
SHA-1 as a base hash function.
A peer’s identifier is defined by hashing the peer’s
IP address. (other possibilities?)
A content identifier is produced by hashing the
key:
– ID(peer) = SHA-1(IP, Port)
– ID(content) = SHA-1(related to the content object)
– Application-dependent!
10/23/2007
P2P
93
UCDavis, ecs251
Fall 2007
Peer, Content
In an m-bit identifier space, there are 2
identifiers (for both peer and content).
Which peer handles which content?
10/23/2007
P2P
m
94
UCDavis, ecs251
Fall 2007
Peer, Content
In an m-bit identifier space, there are 2
identifiers (for both peer and content).
Which peer handles which contents?
m
– We will not have 2m peers/contents!
– Each peer might need to handle more than one
contents.
– In that case, which peer has what?
10/23/2007
P2P
95
UCDavis, ecs251
Fall 2007
Consistent Hashing
m
In an m-bit identifier space, there are 2
identifiers.
m
an identifier circle modulo 2 .
The identifier ring is called Chord ring.
Content X is assigned to the first peer
whose identifier is equal to or follows (the
identifier of) X in the identifier space.
This peer is the successor peer of key X,
denoted by successor(X).
10/23/2007
P2P
96
UCDavis, ecs251
Fall 2007
Successor Peers
identifier
node
6
1
0
6
identifier
circle
6
5
2
2
successor(2) = 3
3
4
10/23/2007
key
successor(1) = 1
1
7
successor(6) = 0
X
2
P2P
97
UCDavis, ecs251
Fall 2007
Join and Departure
When a node N joins the network, certain
contents previously assigned to N’s
successor now become assigned to N.
When node N leaves the network, all of its
assigned contents are reassigned to N’s
successor.
10/23/2007
P2P
99
UCDavis, ecs251
Fall 2007
Join
keys
5
7
keys
1
0
1
7
keys
6
2
5
3
keys
2
4
10/23/2007
P2P
100
UCDavis, ecs251
Fall 2007
Departure
keys
7
keys
1
0
1
7
keys
6
6
2
5
3
keys
2
4
10/23/2007
P2P
101
UCDavis, ecs251
Fall 2007
Join/Depart
What information must be maintained?
10/23/2007
P2P
102
UCDavis, ecs251
Fall 2007
Join/Depart
What information must be maintained?
– Pointer to successor(s)
– Content itself (but application dependent)
10/23/2007
P2P
103
UCDavis, ecs251
Fall 2007
Tracker gone?
FreeBSD 5.4 CD images
Publish the key on
the class web site.
Index key
Whoever owns
this hash entry is
the tracker for the
corresponding
key!
Seed’s IP address
PUT & GET
10/23/2007
P2P
104
UCDavis, ecs251
Fall 2007
How to identify the tracker?
And, its IP address, of course?
10/23/2007
P2P
105
UCDavis, ecs251
Fall 2007
A Simple Key Lookup
A very small amount of routing information suffices
to implement consistent hashing in a distributed
environment
If each node knows only how to contact its current
successor node on the identifier circle, all node can
be visited in linear order.
Queries for a given identifier could be passed
around the circle via these successor pointers until
they encounter the node that contains the key.
10/23/2007
P2P
106
UCDavis, ecs251
Fall 2007
A Simple Key Lookup
Pseudo code for finding successor:
// ask node n to find the successor of id
N.find_successor(id)
if (id (N, successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);
10/23/2007
P2P
107
UCDavis, ecs251
Fall 2007
A Simple Key Lookup
The path taken by a query from node 8 for
key 54:
10/23/2007
P2P
108
UCDavis, ecs251
Fall 2007
Successor
Each active node MUST know the IP
address of its successor!
– N8 has to know that the next node on the ring is
N14.
Departure N8 => N21
But, how about failure or crash?
10/23/2007
P2P
109
UCDavis, ecs251
Fall 2007
Robustness
Successor in R hops
– N8 => N14, N21, N32, N38 (R=4)
– Periodic pinging along the path to check, &
also find out maybe there are “new
members” in between
10/23/2007
P2P
110
UCDavis, ecs251
Fall 2007
Is that good enough?
10/23/2007
P2P
111
UCDavis, ecs251
Fall 2007
Without Periodic Ping…??
Triggered only by dynamics (Join/Depart)!
10/23/2007
P2P
112
UCDavis, ecs251
Fall 2007
Complexity of the search
Time/messages: O(N)
– N: # of nodes on the Ring
Space: O(1)
– We only need to remember R IP addresses
Stablization depends on “period”.
10/23/2007
P2P
113
UCDavis, ecs251
Fall 2007
Scalable Key Location
To accelerate lookups, Chord maintains
additional routing information.
This additional information is not essential
for correctness, which is achieved as long as
each node knows its correct successor.
10/23/2007
P2P
114
UCDavis, ecs251
Fall 2007
Finger Tables
Each node N’ maintains a routing table with up to m
entries (which is in fact the number of bits in
identifiers), called finger table.
The ith entry in the table at node N contains the
identity of the first node s that succeeds N by at
i-1
least 2 on the identifier circle.
i-1
s = successor (n+2 ).
s is called the ith finger of node N, denoted by
N.finger(i)
10/23/2007
P2P
115
UCDavis, ecs251
Fall 2007
Finger Tables
i-1
s = successor (n+2 ).
finger table
start
For.
0+20
0+21
0+22
1
2
4
1
6
1
3
0
0
1+2
1+21
1+22
2
3
5
succ.
keys
1
3
3
0
2
5
finger table
For.
start
3
0
3+2
3+21
3+22
4
10/23/2007
succ.
finger table
For.
start
0
7
keys
6
P2P
4
5
7
succ.
keys
2
0
0
0
116
UCDavis, ecs251
Fall 2007
Finger Tables
A finger table entry includes both the Chord
identifier and the IP address (and port
number) of the relevant node.
The first finger of N is the immediate
successor of N on the circle.
10/23/2007
P2P
117
UCDavis, ecs251
Fall 2007
Example query
The path a query for key 54 starting at node 8:
10/23/2007
P2P
118
UCDavis, ecs251
Fall 2007
Scalable Key Location
Since each node has finger entries at power of two
intervals around the identifier circle, each node
can forward a query at least halfway along the
remaining distance between the node and the
target identifier. From this intuition follows a
theorem:
Theorem: With high probability, the number of nodes
that must be contacted to find a successor in an N-node
network is O(logN).
10/23/2007
P2P
119
UCDavis, ecs251
Fall 2007
Complexity of the Search
Time/messages: O(logN)
– N: # of nodes on the Ring
Space: O(logN)
– We need to remember R IP addresses
– We need to remember logN Fingers
Stablization depends on “period”.
10/23/2007
P2P
120
UCDavis, ecs251
Fall 2007
An Example
M = 140 (identifier size), ring size is 2140
N = 216 (# of nodes)
How many entries we need to have for the
Finger Table?
Each node n’ maintains a routing table with up to m entries
(which is in fact the number of bits in identifiers), called
finger table.
The ith entry in the table at node n contains the identity of
the first node s that succeeds n by at least 2i-1 on the
identifier circle.
s = successor(n+2i-1).
10/23/2007
P2P
121
UCDavis, ecs251
Fall 2007
Complexity of the Search
Time/messages: O(M)
– M: # of bits of the identifier
Space: O(M)
– We need to remember R IP addresses
– We need to remember M Fingers
Stablization depends on “period”.
10/23/2007
P2P
122
UCDavis, ecs251
Fall 2007
Structured Peering
Peer identity and routability
– 2M identifiers, Finger Table routing
Key/content assignment
– Hashing
Dynamics/Failures
– Inconsistency??
10/23/2007
P2P
123
UCDavis, ecs251
Fall 2007
Joins and Stabilizations
The most important thing is the successor pointer.
If the successor pointer is ensured to be up to date,
which is sufficient to guarantee correctness of
lookups, then finger table can always be verified.
Each node runs a “stabilization” protocol
periodically in the background to update successor
pointer and finger table.
10/23/2007
P2P
124
UCDavis, ecs251
Fall 2007
Node Joins – stabilize()
Each time node N runs stabilize(), it asks its
successor for the it’s predecessor p, and decides
whether p should be N’s successor instead.
stabilize() notifies node N’s successor of N’s
existence, giving the successor the chance to
change its predecessor to N.
The successor does this only if it knows of no
closer predecessor than N.
10/23/2007
P2P
125
UCDavis, ecs251
Fall 2007
Node Joins – stabilize()
// called periodically. verifies N’s immediate
// successor, and tells the successor about N.
N.stabilize()
x = successor.predecessor;
if (x (N, successor))
successor = x;
successor.notify(N);
// N’ thinks it might be our predecessor.
n.notify(N’)
if (predecessor is nil or N’ (predecessor, N))
predecessor = N’;
10/23/2007
P2P
126
UCDavis, ecs251
Fall 2007
Stabilization
–
–
pred(ns) = n
nil
succ(np) = ns
n
predecessor = nil
n acquires ns as successor via some n’
n runs stabilize
–
–
succ(np) = n
pred(ns) = np
ns
n notifies ns being the new predecessor
ns acquires n as its predecessor
np runs stabilize
–
–
–
–
np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor
all predecessor and successor pointers are
now correct
fingers still need to be fixed, but old
fingers will still work
np
10/23/2007
n joins
P2P
127
UCDavis, ecs251
Fall 2007
fix_fingers()
Each node periodically calls fix fingers to
make sure its finger table entries are correct.
It is how new nodes initialize their finger
tables
It is how existing nodes incorporate new
nodes into their finger tables.
10/23/2007
P2P
128
UCDavis, ecs251
Fall 2007
Node Joins – fix_fingers()
// called periodically. refreshes finger table entries.
N.fix_fingers()
next = next + 1 ;
if (next > m)
next = 1 ;
finger[next] = find_successor(N + 2next-1);
// checks whether predecessor has failed.
n.check_predecessor()
if (predecessor has failed)
predecessor = nil;
10/23/2007
P2P
129
UCDavis, ecs251
Fall 2007
Node Failures
Key step in failure recovery is maintaining correct successor pointers
To help achieve this, each node maintains a successor-list of its r nearest
successors on the ring
If node n notices that its successor has failed, it replaces it with the first
live entry in the list
Successor lists are stabilized as follows:
– node n reconciles its list with its successor s by copying s’s successor list,
removing its last entry, and prepending s to it.
– If node n notices that its successor has failed, it replaces it with the first
live entry in its successor list and reconciles its successor list with its new
successor.
10/23/2007
P2P
131
UCDavis, ecs251
Fall 2007
Chord – The Math
Every node is responsible for about K/N keys (N nodes,
K keys)
When a node joins or leaves an N-node network, only
O(K/N) keys change hands (and only to and from
joining or leaving node)
Lookups need O(log N) messages
To reestablish routing invariants and finger tables after
node joining or leaving, only O(log2N) messages are
required
10/23/2007
P2P
132
UCDavis, ecs251
Fall 2007
Chord: Questions to ask
Peer’s role (or SP’s role)
Peer’s controllability and vulnerability
Incentives to contribute
Peer’s mobility and dynamics
Scalability
10/23/2007
P2P
133
UCDavis, ecs251
Fall 2007
Pros/Cons about DHT
10/23/2007
P2P
134