25-OverlaysP2P - Computer Science Division

Download Report

Transcript 25-OverlaysP2P - Computer Science Division

EECS 122:
Introduction to Computer Networks
Overlay Networks, CDNs, and P2P
Networks
Computer Science Division
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
Berkeley, CA 94720-1776
Katz, Stoica F04
Overlay Networks: Motivations


Changes in the network happen very slowly
Why?
- Internet network is a shared infrastructure; need to achieve
consensus (IETF)
- Many of proposals require to change a large number of
routers (e.g., IP Multicast, QoS); otherwise end-users won’t
benefit

Proposed changes that haven’t happened yet on
large scale:
- Congestion (RED ‘93); More Addresses (IPv6 ‘91)
- Security (IPSEC ‘93); Multicast (IP multicast ‘90)
Katz, Stoica F04
2
Motivations (cont’d)


One size does not fit all
Applications need different levels of
-
Reliability
Performance (latency)
Security
Access control (e.g., who is allowed to join a multicast
group)
- …
Katz, Stoica F04
3
Goals


Make it easy to deploy new functionalities in the
network  accelerate the pace of innovation
Allow users to customize their service
Katz, Stoica F04
4
Solution


Deploy processing in the network
Have packets processed as they traverse the
network
AS-1
AS-1
IP
Overlay Network
(over IP)
Katz, Stoica F04
5
Examples



Overlay multicast
Content Distribution Networks (CDNs)
Peer-to-peer systems
Katz, Stoica F04
6
Motivation Example: Internet Radio

www.digitallyimported.com (techno station)
- Sends out 128Kb/s MP3 music streams
- Peak usage ~9000 simultaneous streams
• Only 5 unique streams (trance, hard trance, hard
house, eurodance, classical)
- Consumes ~1.1Gb/s
• Bandwidth costs are large fraction of their
expenditures (maybe 50%?)
- If 1000 people are getting their groove on in Berkeley,
1000 unicast streams are sent from NYC to Berkeley
Katz, Stoica F04
7
This approach does not scale…
Broadcast
Center
Backbone
ISP
Katz, Stoica F04
8
Multicast Service Model
Unicast
[R0, data]
S
[R1, data]
[Rn-1, data]
Multicast
R0
Net
R1
R0
S
[G, data]
R1
Net
.
.
.
.
.
.
Rn-1



Rn-1
Receivers join a multicast group which is
identified by a multicast address (e.g. G)
Sender(s) send data to address G
Network routes data to each of the receivers
Katz, Stoica F04
9
Instead build trees
Copy data at routers
At most one copy of a data packet per link
Broadcast
Center
•Routers keep track of
groups in real-time
•“Path” computation is Tree
computation
Backbone
ISP
•LANs implement layer 2
multicast by broadcasting
Katz, Stoica F04 10
Multicast Primer

Type of trees
- Source Specific Trees
- Shared Trees

Examples
- Distance Vector Routing Multicast Protocol (DVRMP) –
Source specific trees
- Core Based Tree (CBT) – Shared trees
- Protocol Independent Multicast (PIM)
• Sparse mode  Shared trees
• Dense mode  Single source trees
Katz, Stoica F04
11
Source Specific Trees
5
7
Each source is the route of
its own tree
4
8
6
11
2
1
10
3
13
12
Katz, Stoica F04 12
Source Specific Trees
5
Each source is the route of
its own tree.
One tree for each source
7
4
8
6
11
2
1
10
3
13
12
Can pick “good” trees but lots
of state at the routers!
Katz, Stoica F04 13
Shared Tree
5
7
One tree used by all
4
8
6
11
2
1
10
3
13
12
Can’t pick “good” trees but
minimal state at the routers
Katz, Stoica F04 14
IP Multicast Problems


Fifteen years of research, but still not widely deployed
Poor scalability
- Routers need to maintain per-group or even per-group and persender state!
- Aggregation of multicast addresses is complicated

Supporting higher level functionality is difficult
- IP Multicast: best-effort multi-point delivery service
- Reliability and congestion control for IP Multicast complicated
• Need to deal with heterogeneous receiver  negotiation hard

No support for access control
- Nor restriction on who can send  very easy to mount Denial of
Service (Dos) attacks!
Katz, Stoica F04 15
Overlay Approach



Provide IP multicast functionality above the IP layer 
application level multicast
Challenge: do this efficiently
Projects:
-
Narada
Overcast
Scattercast
Yoid
…
Katz, Stoica F04 16
Narada [Yang-hua et al, 2000]




Source Speific Trees
Involves only end hosts
Small group sizes <= hundreds of nodes
Typical application: chat
Katz, Stoica F04 17
Narada: End System Multicast
Gatech
Stanford
Stan1
Stan2
CMU
Berk1
Berk2
Berkeley
Overlay Tree
Stan1
Gatech
Stan2
CMU
Berk1
Berk2
Katz, Stoica F04 18
Performance Concerns

Stretch
- Ratio of latency in the overlay to latency in the
underlying network

Stress
- Number of duplicate packets sent over the same
physical link
Katz, Stoica F04 19
Performance Concerns
Gatech
Delay from CMU to
Berk1 increases
Stan1
Stan2
CMU
Berk1
Duplicate Packets:
Bandwidth Wastage
Gatech
Stanford
Berk2
Stan1
Stan2
CMU
Berk1
Berkeley
Berk2
Katz, Stoica F04 20
Properties

Easier to deploy than IP Multicast
- Don’t have to modify every router on path

Easier to implement reliability than IP Multicast
- Use hop-by-hop retransmissions



Can consume more bandwidth than IP Multicast
Can have higher latency than IP Multicast
Not clear how well it scales
- Neither has been used for a group with 1M receivers or
1M groups

Can use IP Multicast where available to optimize
performance
Katz, Stoica F04 21
Examples



Overlay Multicast
Content Distribution Networks (CDNs)
Peer-to-peer systems
Katz, Stoica F04 22
Content Distribution Networks

Problem: You are a web content provider
- How do you handle millions of web clients?
- How do you ensure that all clients experience good
performance?
- How do you maintain availability in the presence of server
and network failures?

Solutions:
- Add more servers at different locations  If you are CNN this
might work!
- Caching
- Content Distribution Networks
Katz, Stoica F04 23
“Base-line”

Many clients transfer same information
- Generate unnecessary server and network load
- Clients experience unnecessary latency
Server
Backbone ISP
ISP-1
ISP-2
Clients
Katz, Stoica F04 24
Reverse Caches


Cache documents close to server  decrease server load
Typically done by content providers
Server
Reverse caches
Backbone ISP
ISP-1
ISP-2
Clients
Katz, Stoica F04 25
Forward Proxies


Cache documents close to clients  reduce network traffic
and decrease latency
Typically done by ISPs or corporate LANs
Server
Reverse caches
Backbone ISP
ISP-1
ISP-2
Forward caches
Clients
Katz, Stoica F04 26
Content Distribution Networks
(CDNs)

Integrate forward and reverse caching
functionalities into one overlay network (usually)
administrated by one entity
- Example: Akamai

Documents are cached both
- As a result of clients’ requests (pull)
- Pushed in the expectation of a high access rate

Beside caching do processing, e.g.,
- Handle dynamic web pages
- Transcoding
Katz, Stoica F04 27
CDNs (cont’d)
Server
CDN
Backbone ISP
ISP-1
ISP-2
Forward caches
Clients
Katz, Stoica F04 28
Example: Akamai

Akamai creates new domain names for each client
content provider.
- e.g., a128.g.akamai.net


The CDN’s DNS servers are authoritative for the
new domains
The client content provider modifies its content so
that embedded URLs reference the new domains.
- “Akamaize” content, e.g.: http://www.cnn.com/image-of-theday.gif becomes http://a128.g.akamai.net/image-of-the-day.gif.
Katz, Stoica F04 29
Example: Akamai
www.nhc.noaa.gov
“Akamaizes” its content.
akamai.net
DNS servers
a
lookup
a128.g.akamai.net
Akamai servers
store/cache secondary
content for “Akamaized”
services.
b
DNS server for
nhc.noaa.gov
c
get
http://www.nhc.noaa.gov
local
DNS server
“Akamaized” response object has inline URLs
for secondary content at a128.g.akamai.net
and other Akamai-managed DNS names.
Katz, Stoica F04 30
Examples



Overlay Multicast
Content Distribution Networks (CDNs)
Peer-to-peer systems
 Napster, Gnutella, KaZaa, DHTs
 Skype, BitTorent (next lecture)
Katz, Stoica F04 31
How Did it Start?

A killer application: Naptser
- Free music over the Internet

Key idea: share the storage and bandwidth of
individual (home) users
Internet
Katz, Stoica F04 32
Model


Each user stores a subset of files
Each user has access (can download) files from
all users in the system
Katz, Stoica F04 33
Main Challenge

Find where a particular file is stored
- Note: problem similar to finding a particular page in web
caching (see last lecture – what are the differences?)
E
F
D
E?
A
C
B
Katz, Stoica F04 34
Other Challenges


Scale: up to hundred of thousands or millions of
machines
Dynamicity: machines can come and go any time
Katz, Stoica F04 35
Napster


Assume a centralized index system that maps
files (songs) to machines that are alive
How to find a file (song)
- Query the index system  return a machine that stores
the required file
• Ideally this is the closest/least-loaded machine
- ftp the file

Advantages:
- Simplicity, easy to implement sophisticated search
engines on top of the index system

Disadvantages:
- Robustness, scalability (?)
Katz, Stoica F04 36
Napster: Example
m5
E
m6
F
E?
E
E?
m5
m1
m2
m3
m4
m5
m6
m4
C
A
m1
D
A
B
C
D
E
F
B
m3
m2
Katz, Stoica F04 37
Gnutella



Distribute file location
Idea: broadcast the request
Hot to find a file:
- Send request to all neighbors
- Neighbors recursively multicast the request
- Eventually a machine that has the file receives the request,
and it sends back the answer

Advantages:
- Totally decentralized, highly robust

Disadvantages:
- Not scalable; the entire network can be swamped with
requests (to alleviate this problem, each request has a TTL)
Katz, Stoica F04 38
Gnutella: Example

Assume: m1’s neighbors are m2 and m3; m3’s
neighbors are m4 and m5;…
m5
E
m6
F
E
D
E?
E?
m4
E?
E?
C
A
m1
B
m3
m2
Katz, Stoica F04 39
Two-Level Hierarchy



Current Gnutella implementation,
KaZaa
Leaf nodes are connected to a
small number of ultrapeers
(suppernodes)
Query
Oct 2003 Crawl
on Gnutella
- A leaf sends query to its ultrapeers
- If ultrapeers don’t know the answer,
they flood the query to other
ultrapeers

More scalable:
- Flooding only among ultrapeers
Ultrapeer nodes
Leaf nodes
Katz, Stoica F04 40
Skype
login server


Peer-to-peer Internet
Telephony
Two-level hierarchy like
KaZaa
B
- Ultrapeers used mainly to
route traffic between NATed
end-hosts (see next slide)…
- … plus a login server to
• authenticate users
• ensure that names are
unique across network
Messages
exchanged
to login server
A
Data traffic
(Note*: probable protocol; Skype
protocol is not published)
Katz, Stoica F04 41
Detour: NAT (1/3)



Network Address Translation:
Motivation: address scarcity problem in IPv4
Allow to independently allocate addresses to hosts behind NAT
- Two hosts behind two different NATs can have the same address
64.36.12.64
Internet
192.168.0.1
NAT box
192.168.0.2
NAT box
128.2.12.30
169.32.41.10
192.168.0.1
Same address
Katz, Stoica F04 42
Detour: NAT (2/3)

Main idea: use port numbers to multiplex/demultiplex
connections of NATed end-hosts
- Map (IPaddr, Port) of a NATed host to (IPaddrNAT, PortNAT)
src addr
src port
dst addr
dst port
(192.168.0.1:64.36.12.64)(1005:80)
1
192.168.0.1
(169.32.41.10:64.36.12.64)(78:80)
5
64.36.12.64
3
(64.36.12.64:192.168.0.1)(80:1005)
NAT box
169.32.41.10
NAT Table
2
(192.168.0.1:1005) ↔ 78
…
Internet
4
(64.36.12.64:169.32.41.10)(80:78)
Katz, Stoica F04 43
Detour: NAT (3/3)

Limitations
1) Number of machines behind a NAT <= 64000. Why?
2) A host outside NAT cannot initiate connection to a host
behind a NAT

Skype and other P2P systems use
-
Login servers and ultrapeers to solve limitation (2)
How? (Hint: ultrapeers have globally unique (Internetroutable) IP addresses)
Katz, Stoica F04 44
BitTorrent (1/2)


Allow fast downloads even when sources have
low connectivity
How does it work?
- Split each file into pieces (~ 256 KB each), and each
piece into sub-pieces (~ 16 KB each)
- The loader loads one piece at a time
- Within one piece, the loader can load up to five subpieces in parallel
Katz, Stoica F04 45
BitTorrent (2/2)


Download consists of three phases:
Start: get a piece as soon as possible
- Select a random piece

Middle: spread all pieces as soon as possible
- Select rarest piece next

End: avoid getting stuck with a slow source,
when downloading the last sub-pieces
- Request in parallel the same sub-piece
- Cancel slowest downloads once a sub-piece has been
received
(For details see: http://bittorrent.com/bittorrentecon.pdf)
Katz, Stoica F04 46
Distributed Hash Tables

Problem:
- Given an ID, map to a host

Challenges
- Scalability: hundreds of thousands or millions of machines
- Instability
• Changes in routes, congestion, availability of machines
- Heterogeneity
• Latency: 1ms to 1000ms
• Bandwidth: 32Kb/s to 100Mb/s
• Nodes stay in system from 10s to a year
- Trust
• Selfish users
• Malicious users
Katz, Stoica F04 47
Content Addressable Network
(CAN)


Associate to each node and item a unique id in
an d-dimensional space
Properties
- Routing table size O(d)
- Guarantees that a file is found in at most d*n1/d steps,
where n is the total number of nodes
Katz, Stoica F04 48
CAN Example: Two Dimensional
Space




Space divided between nodes
All nodes cover the entire space
7
Each node covers either a square or a
rectangular area of ratios 1:2 or 2:1
6
Example:
- Assume space size (8 x 8)
- Node n1:(1, 2) first node that joins 
cover the entire space
5
4
3
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 49
7
CAN Example: Two Dimensional
Space

Node n2:(4, 2) joins  space is
divided between n1 and n2
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 50
7
CAN Example: Two Dimensional
Space

Node n2:(4, 2) joins  space is
divided between n1 and n2
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 51
7
CAN Example: Two Dimensional
Space

Nodes n4:(5, 5) and n5:(6,6) join
7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Katz, Stoica F04 52
7
CAN Example: Two Dimensional
Space


Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5);
n4:(5,5);n5:(6,6)
Items: f1:(2,3); f2:(5,1); f3:(2,1);
f4:(7,5);
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Katz, Stoica F04 53
7
CAN Example: Two Dimensional
Space

Each item is stored by the node
who owns its mapping in the
space
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Katz, Stoica F04 54
7
CAN: Query Example



Each node knows its neighbors in the
d-space
Forward query to the neighbor that is
closest to the query id
Example: assume n1 queries f4
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Katz, Stoica F04 55
7
Chord


Associate to each node and item a unique ID in
an uni-dimensional space
Properties
- Routing table size O(log(N)) , where N is the total
number of nodes
- Guarantees that a file is found in O(log(N)) steps
Katz, Stoica F04 56
Data Structure


Assume identifier space is 0..2m
Each node maintains
- Finger table
• Entry i in the finger table of n is the first node that
succeeds or equals n + 2i
- Predecessor node

An item identified by id is stored on the succesor
node of id
Katz, Stoica F04 57
Chord Example


Assume an identifier
space 0..8
Node n1:(1) joinsall
entries in its finger
table are initialized to
itself
Succ. Table
i id+2i succ
0 2
1
1 3
1
2 5
1
0
1
7
6
2
5
3
4
Katz, Stoica F04 58
Chord Example

Node n2:(3) joins
Succ. Table
i id+2i succ
0 2
2
1 3
1
2 5
1
0
1
7
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
1
1 4
1
2 6
1
Katz, Stoica F04 59
Chord Example
Succ. Table

Nodes n3:(0), n4:(6) join
i id+2i succ
0 1
1
1 2
2
2 4
0
Succ. Table
i id+2i succ
0 2
2
1 3
6
2 5
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
Katz, Stoica F04 60
Chord Examples
Succ. Table


Nodes: n1:(1), n2(3),
n3(0), n4(6)
Items: f1:(7), f2:(2)
i
i id+2
0 1
1 2
2 4
Items
7
succ
1
2
0
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Succ. Table
6
i
i id+2
0 2
1 3
2 5
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
Katz, Stoica F04 61
Query



Upon receiving a query
for item id, a node
Check whether stores
the item locally
If not, forwards the query
to the largest node in its
successor table that
does not exceed id
Succ. Table
i
i id+2
0 1
1 2
2 4
Items
7
succ
1
2
0
0
Succ. Table
1
7
i
i id+2
0 2
1 3
2 5
query(7)
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
Katz, Stoica F04 62
Discussion

Query can be implemented
- Iteratively
- Recursively

Performance: routing in the overlay network can be more
expensive than in the underlying network
- Because usually there is no correlation between node IDs and
their locality; a query can repeatedly jump from Europe to North
America, though both the initiator and the node that store the item
are in Europe!
- Solutions: Tapestry takes care of this implicitly; CAN and Chord
maintain multiple copies for each entry in their routing tables and
choose the closest in terms of network distance
Katz, Stoica F04 63
Discussion

Robustness
- Maintain multiple copies associated to each entry in the routing
tables
- Replicate an item on nodes with close ids in the identifier space

Security
- Can be build on top of CAN, Chord, Tapestry, and Pastry
Katz, Stoica F04 64
Discussion


The key challenge of building wide area P2P systems is a
scalable and robust location service
Naptser: centralized solution
- Guarantee correctness and support approximate matching…
- …but neither scalable nor robust

Gnutella, KaZaa
- Support approximate queries, scalable, and robust…
- …but doesn’t guarantee correctness (i.e., it may fail to locate an
existing file)

Distributed Hash Tables
- Guarantee correctness, highly scalable and robust…
- … but difficult to implement approximate matching
Katz, Stoica F04 65