25-OverlaysP2P - Computer Science Division
Download
Report
Transcript 25-OverlaysP2P - Computer Science Division
EECS 122:
Introduction to Computer Networks
Overlay Networks, CDNs, and P2P
Networks
Computer Science Division
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
Berkeley, CA 94720-1776
Katz, Stoica F04
Overlay Networks: Motivations
Changes in the network happen very slowly
Why?
- Internet network is a shared infrastructure; need to achieve
consensus (IETF)
- Many of proposals require to change a large number of
routers (e.g., IP Multicast, QoS); otherwise end-users won’t
benefit
Proposed changes that haven’t happened yet on
large scale:
- Congestion (RED ‘93); More Addresses (IPv6 ‘91)
- Security (IPSEC ‘93); Multicast (IP multicast ‘90)
Katz, Stoica F04
2
Motivations (cont’d)
One size does not fit all
Applications need different levels of
-
Reliability
Performance (latency)
Security
Access control (e.g., who is allowed to join a multicast
group)
- …
Katz, Stoica F04
3
Goals
Make it easy to deploy new functionalities in the
network accelerate the pace of innovation
Allow users to customize their service
Katz, Stoica F04
4
Solution
Deploy processing in the network
Have packets processed as they traverse the
network
AS-1
AS-1
IP
Overlay Network
(over IP)
Katz, Stoica F04
5
Examples
Overlay multicast
Content Distribution Networks (CDNs)
Peer-to-peer systems
Katz, Stoica F04
6
Motivation Example: Internet Radio
www.digitallyimported.com (techno station)
- Sends out 128Kb/s MP3 music streams
- Peak usage ~9000 simultaneous streams
• Only 5 unique streams (trance, hard trance, hard
house, eurodance, classical)
- Consumes ~1.1Gb/s
• Bandwidth costs are large fraction of their
expenditures (maybe 50%?)
- If 1000 people are getting their groove on in Berkeley,
1000 unicast streams are sent from NYC to Berkeley
Katz, Stoica F04
7
This approach does not scale…
Broadcast
Center
Backbone
ISP
Katz, Stoica F04
8
Multicast Service Model
Unicast
[R0, data]
S
[R1, data]
[Rn-1, data]
Multicast
R0
Net
R1
R0
S
[G, data]
R1
Net
.
.
.
.
.
.
Rn-1
Rn-1
Receivers join a multicast group which is
identified by a multicast address (e.g. G)
Sender(s) send data to address G
Network routes data to each of the receivers
Katz, Stoica F04
9
Instead build trees
Copy data at routers
At most one copy of a data packet per link
Broadcast
Center
•Routers keep track of
groups in real-time
•“Path” computation is Tree
computation
Backbone
ISP
•LANs implement layer 2
multicast by broadcasting
Katz, Stoica F04 10
Multicast Primer
Type of trees
- Source Specific Trees
- Shared Trees
Examples
- Distance Vector Routing Multicast Protocol (DVRMP) –
Source specific trees
- Core Based Tree (CBT) – Shared trees
- Protocol Independent Multicast (PIM)
• Sparse mode Shared trees
• Dense mode Single source trees
Katz, Stoica F04
11
Source Specific Trees
5
7
Each source is the route of
its own tree
4
8
6
11
2
1
10
3
13
12
Katz, Stoica F04 12
Source Specific Trees
5
Each source is the route of
its own tree.
One tree for each source
7
4
8
6
11
2
1
10
3
13
12
Can pick “good” trees but lots
of state at the routers!
Katz, Stoica F04 13
Shared Tree
5
7
One tree used by all
4
8
6
11
2
1
10
3
13
12
Can’t pick “good” trees but
minimal state at the routers
Katz, Stoica F04 14
IP Multicast Problems
Fifteen years of research, but still not widely deployed
Poor scalability
- Routers need to maintain per-group or even per-group and persender state!
- Aggregation of multicast addresses is complicated
Supporting higher level functionality is difficult
- IP Multicast: best-effort multi-point delivery service
- Reliability and congestion control for IP Multicast complicated
• Need to deal with heterogeneous receiver negotiation hard
No support for access control
- Nor restriction on who can send very easy to mount Denial of
Service (Dos) attacks!
Katz, Stoica F04 15
Overlay Approach
Provide IP multicast functionality above the IP layer
application level multicast
Challenge: do this efficiently
Projects:
-
Narada
Overcast
Scattercast
Yoid
…
Katz, Stoica F04 16
Narada [Yang-hua et al, 2000]
Source Speific Trees
Involves only end hosts
Small group sizes <= hundreds of nodes
Typical application: chat
Katz, Stoica F04 17
Narada: End System Multicast
Gatech
Stanford
Stan1
Stan2
CMU
Berk1
Berk2
Berkeley
Overlay Tree
Stan1
Gatech
Stan2
CMU
Berk1
Berk2
Katz, Stoica F04 18
Performance Concerns
Stretch
- Ratio of latency in the overlay to latency in the
underlying network
Stress
- Number of duplicate packets sent over the same
physical link
Katz, Stoica F04 19
Performance Concerns
Gatech
Delay from CMU to
Berk1 increases
Stan1
Stan2
CMU
Berk1
Duplicate Packets:
Bandwidth Wastage
Gatech
Stanford
Berk2
Stan1
Stan2
CMU
Berk1
Berkeley
Berk2
Katz, Stoica F04 20
Properties
Easier to deploy than IP Multicast
- Don’t have to modify every router on path
Easier to implement reliability than IP Multicast
- Use hop-by-hop retransmissions
Can consume more bandwidth than IP Multicast
Can have higher latency than IP Multicast
Not clear how well it scales
- Neither has been used for a group with 1M receivers or
1M groups
Can use IP Multicast where available to optimize
performance
Katz, Stoica F04 21
Examples
Overlay Multicast
Content Distribution Networks (CDNs)
Peer-to-peer systems
Katz, Stoica F04 22
Content Distribution Networks
Problem: You are a web content provider
- How do you handle millions of web clients?
- How do you ensure that all clients experience good
performance?
- How do you maintain availability in the presence of server
and network failures?
Solutions:
- Add more servers at different locations If you are CNN this
might work!
- Caching
- Content Distribution Networks
Katz, Stoica F04 23
“Base-line”
Many clients transfer same information
- Generate unnecessary server and network load
- Clients experience unnecessary latency
Server
Backbone ISP
ISP-1
ISP-2
Clients
Katz, Stoica F04 24
Reverse Caches
Cache documents close to server decrease server load
Typically done by content providers
Server
Reverse caches
Backbone ISP
ISP-1
ISP-2
Clients
Katz, Stoica F04 25
Forward Proxies
Cache documents close to clients reduce network traffic
and decrease latency
Typically done by ISPs or corporate LANs
Server
Reverse caches
Backbone ISP
ISP-1
ISP-2
Forward caches
Clients
Katz, Stoica F04 26
Content Distribution Networks
(CDNs)
Integrate forward and reverse caching
functionalities into one overlay network (usually)
administrated by one entity
- Example: Akamai
Documents are cached both
- As a result of clients’ requests (pull)
- Pushed in the expectation of a high access rate
Beside caching do processing, e.g.,
- Handle dynamic web pages
- Transcoding
Katz, Stoica F04 27
CDNs (cont’d)
Server
CDN
Backbone ISP
ISP-1
ISP-2
Forward caches
Clients
Katz, Stoica F04 28
Example: Akamai
Akamai creates new domain names for each client
content provider.
- e.g., a128.g.akamai.net
The CDN’s DNS servers are authoritative for the
new domains
The client content provider modifies its content so
that embedded URLs reference the new domains.
- “Akamaize” content, e.g.: http://www.cnn.com/image-of-theday.gif becomes http://a128.g.akamai.net/image-of-the-day.gif.
Katz, Stoica F04 29
Example: Akamai
www.nhc.noaa.gov
“Akamaizes” its content.
akamai.net
DNS servers
a
lookup
a128.g.akamai.net
Akamai servers
store/cache secondary
content for “Akamaized”
services.
b
DNS server for
nhc.noaa.gov
c
get
http://www.nhc.noaa.gov
local
DNS server
“Akamaized” response object has inline URLs
for secondary content at a128.g.akamai.net
and other Akamai-managed DNS names.
Katz, Stoica F04 30
Examples
Overlay Multicast
Content Distribution Networks (CDNs)
Peer-to-peer systems
Napster, Gnutella, KaZaa, DHTs
Skype, BitTorent (next lecture)
Katz, Stoica F04 31
How Did it Start?
A killer application: Naptser
- Free music over the Internet
Key idea: share the storage and bandwidth of
individual (home) users
Internet
Katz, Stoica F04 32
Model
Each user stores a subset of files
Each user has access (can download) files from
all users in the system
Katz, Stoica F04 33
Main Challenge
Find where a particular file is stored
- Note: problem similar to finding a particular page in web
caching (see last lecture – what are the differences?)
E
F
D
E?
A
C
B
Katz, Stoica F04 34
Other Challenges
Scale: up to hundred of thousands or millions of
machines
Dynamicity: machines can come and go any time
Katz, Stoica F04 35
Napster
Assume a centralized index system that maps
files (songs) to machines that are alive
How to find a file (song)
- Query the index system return a machine that stores
the required file
• Ideally this is the closest/least-loaded machine
- ftp the file
Advantages:
- Simplicity, easy to implement sophisticated search
engines on top of the index system
Disadvantages:
- Robustness, scalability (?)
Katz, Stoica F04 36
Napster: Example
m5
E
m6
F
E?
E
E?
m5
m1
m2
m3
m4
m5
m6
m4
C
A
m1
D
A
B
C
D
E
F
B
m3
m2
Katz, Stoica F04 37
Gnutella
Distribute file location
Idea: broadcast the request
Hot to find a file:
- Send request to all neighbors
- Neighbors recursively multicast the request
- Eventually a machine that has the file receives the request,
and it sends back the answer
Advantages:
- Totally decentralized, highly robust
Disadvantages:
- Not scalable; the entire network can be swamped with
requests (to alleviate this problem, each request has a TTL)
Katz, Stoica F04 38
Gnutella: Example
Assume: m1’s neighbors are m2 and m3; m3’s
neighbors are m4 and m5;…
m5
E
m6
F
E
D
E?
E?
m4
E?
E?
C
A
m1
B
m3
m2
Katz, Stoica F04 39
Two-Level Hierarchy
Current Gnutella implementation,
KaZaa
Leaf nodes are connected to a
small number of ultrapeers
(suppernodes)
Query
Oct 2003 Crawl
on Gnutella
- A leaf sends query to its ultrapeers
- If ultrapeers don’t know the answer,
they flood the query to other
ultrapeers
More scalable:
- Flooding only among ultrapeers
Ultrapeer nodes
Leaf nodes
Katz, Stoica F04 40
Skype
login server
Peer-to-peer Internet
Telephony
Two-level hierarchy like
KaZaa
B
- Ultrapeers used mainly to
route traffic between NATed
end-hosts (see next slide)…
- … plus a login server to
• authenticate users
• ensure that names are
unique across network
Messages
exchanged
to login server
A
Data traffic
(Note*: probable protocol; Skype
protocol is not published)
Katz, Stoica F04 41
Detour: NAT (1/3)
Network Address Translation:
Motivation: address scarcity problem in IPv4
Allow to independently allocate addresses to hosts behind NAT
- Two hosts behind two different NATs can have the same address
64.36.12.64
Internet
192.168.0.1
NAT box
192.168.0.2
NAT box
128.2.12.30
169.32.41.10
192.168.0.1
Same address
Katz, Stoica F04 42
Detour: NAT (2/3)
Main idea: use port numbers to multiplex/demultiplex
connections of NATed end-hosts
- Map (IPaddr, Port) of a NATed host to (IPaddrNAT, PortNAT)
src addr
src port
dst addr
dst port
(192.168.0.1:64.36.12.64)(1005:80)
1
192.168.0.1
(169.32.41.10:64.36.12.64)(78:80)
5
64.36.12.64
3
(64.36.12.64:192.168.0.1)(80:1005)
NAT box
169.32.41.10
NAT Table
2
(192.168.0.1:1005) ↔ 78
…
Internet
4
(64.36.12.64:169.32.41.10)(80:78)
Katz, Stoica F04 43
Detour: NAT (3/3)
Limitations
1) Number of machines behind a NAT <= 64000. Why?
2) A host outside NAT cannot initiate connection to a host
behind a NAT
Skype and other P2P systems use
-
Login servers and ultrapeers to solve limitation (2)
How? (Hint: ultrapeers have globally unique (Internetroutable) IP addresses)
Katz, Stoica F04 44
BitTorrent (1/2)
Allow fast downloads even when sources have
low connectivity
How does it work?
- Split each file into pieces (~ 256 KB each), and each
piece into sub-pieces (~ 16 KB each)
- The loader loads one piece at a time
- Within one piece, the loader can load up to five subpieces in parallel
Katz, Stoica F04 45
BitTorrent (2/2)
Download consists of three phases:
Start: get a piece as soon as possible
- Select a random piece
Middle: spread all pieces as soon as possible
- Select rarest piece next
End: avoid getting stuck with a slow source,
when downloading the last sub-pieces
- Request in parallel the same sub-piece
- Cancel slowest downloads once a sub-piece has been
received
(For details see: http://bittorrent.com/bittorrentecon.pdf)
Katz, Stoica F04 46
Distributed Hash Tables
Problem:
- Given an ID, map to a host
Challenges
- Scalability: hundreds of thousands or millions of machines
- Instability
• Changes in routes, congestion, availability of machines
- Heterogeneity
• Latency: 1ms to 1000ms
• Bandwidth: 32Kb/s to 100Mb/s
• Nodes stay in system from 10s to a year
- Trust
• Selfish users
• Malicious users
Katz, Stoica F04 47
Content Addressable Network
(CAN)
Associate to each node and item a unique id in
an d-dimensional space
Properties
- Routing table size O(d)
- Guarantees that a file is found in at most d*n1/d steps,
where n is the total number of nodes
Katz, Stoica F04 48
CAN Example: Two Dimensional
Space
Space divided between nodes
All nodes cover the entire space
7
Each node covers either a square or a
rectangular area of ratios 1:2 or 2:1
6
Example:
- Assume space size (8 x 8)
- Node n1:(1, 2) first node that joins
cover the entire space
5
4
3
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 49
7
CAN Example: Two Dimensional
Space
Node n2:(4, 2) joins space is
divided between n1 and n2
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 50
7
CAN Example: Two Dimensional
Space
Node n2:(4, 2) joins space is
divided between n1 and n2
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 51
7
CAN Example: Two Dimensional
Space
Nodes n4:(5, 5) and n5:(6,6) join
7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 52
7
CAN Example: Two Dimensional
Space
Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5);
n4:(5,5);n5:(6,6)
Items: f1:(2,3); f2:(5,1); f3:(2,1);
f4:(7,5);
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Katz, Stoica F04 53
7
CAN Example: Two Dimensional
Space
Each item is stored by the node
who owns its mapping in the
space
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Katz, Stoica F04 54
7
CAN: Query Example
Each node knows its neighbors in the
d-space
Forward query to the neighbor that is
closest to the query id
Example: assume n1 queries f4
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Katz, Stoica F04 55
7
Chord
Associate to each node and item a unique ID in
an uni-dimensional space
Properties
- Routing table size O(log(N)) , where N is the total
number of nodes
- Guarantees that a file is found in O(log(N)) steps
Katz, Stoica F04 56
Data Structure
Assume identifier space is 0..2m
Each node maintains
- Finger table
• Entry i in the finger table of n is the first node that
succeeds or equals n + 2i
- Predecessor node
An item identified by id is stored on the succesor
node of id
Katz, Stoica F04 57
Chord Example
Assume an identifier
space 0..8
Node n1:(1) joinsall
entries in its finger
table are initialized to
itself
Succ. Table
i id+2i succ
0 2
1
1 3
1
2 5
1
0
1
7
6
2
5
3
4
Katz, Stoica F04 58
Chord Example
Node n2:(3) joins
Succ. Table
i id+2i succ
0 2
2
1 3
1
2 5
1
0
1
7
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
1
1 4
1
2 6
1
Katz, Stoica F04 59
Chord Example
Succ. Table
Nodes n3:(0), n4:(6) join
i id+2i succ
0 1
1
1 2
2
2 4
0
Succ. Table
i id+2i succ
0 2
2
1 3
6
2 5
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
Katz, Stoica F04 60
Chord Examples
Succ. Table
Nodes: n1:(1), n2(3),
n3(0), n4(6)
Items: f1:(7), f2:(2)
i
i id+2
0 1
1 2
2 4
Items
7
succ
1
2
0
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Succ. Table
6
i
i id+2
0 2
1 3
2 5
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
Katz, Stoica F04 61
Query
Upon receiving a query
for item id, a node
Check whether stores
the item locally
If not, forwards the query
to the largest node in its
successor table that
does not exceed id
Succ. Table
i
i id+2
0 1
1 2
2 4
Items
7
succ
1
2
0
0
Succ. Table
1
7
i
i id+2
0 2
1 3
2 5
query(7)
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
Katz, Stoica F04 62
Discussion
Query can be implemented
- Iteratively
- Recursively
Performance: routing in the overlay network can be more
expensive than in the underlying network
- Because usually there is no correlation between node IDs and
their locality; a query can repeatedly jump from Europe to North
America, though both the initiator and the node that store the item
are in Europe!
- Solutions: Tapestry takes care of this implicitly; CAN and Chord
maintain multiple copies for each entry in their routing tables and
choose the closest in terms of network distance
Katz, Stoica F04 63
Discussion
Robustness
- Maintain multiple copies associated to each entry in the routing
tables
- Replicate an item on nodes with close ids in the identifier space
Security
- Can be build on top of CAN, Chord, Tapestry, and Pastry
Katz, Stoica F04 64
Discussion
The key challenge of building wide area P2P systems is a
scalable and robust location service
Naptser: centralized solution
- Guarantee correctness and support approximate matching…
- …but neither scalable nor robust
Gnutella, KaZaa
- Support approximate queries, scalable, and robust…
- …but doesn’t guarantee correctness (i.e., it may fail to locate an
existing file)
Distributed Hash Tables
- Guarantee correctness, highly scalable and robust…
- … but difficult to implement approximate matching
Katz, Stoica F04 65