overlay - clear
Download
Report
Transcript overlay - clear
COMP/ELEC 429
Introduction to Computer Networks
Overlay networks
Some slides used with permissions from Edward W.
Knightly, T. S. Eugene Ng, Ion Stoica, Hui Zhang
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
1
Abstract View of the Internet
• A collection of IP
routers and point-topoint physical links
connecting routers
• Point-to-point links
between two routers
are physically as
direct as possible
– A copper wire, a coax
cable or a fiber laid
from one router to
another
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
2
Reality
• Fibers and wires are laid with tremendous physical constraints
– You can’t just dig up the ground everywhere and lay fibers
– Right-of-way issue
– Most fibers are laid along railroads
• Physical fiber topology often very far from the topology you want
• Internet is over-laid on top of this physical fiber topology
• Internet topology is only logical!
• Revelation: Internet is an overlay network
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
3
National Lambda Rail Project – Fiber Topology
IP logical link
Circuit
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
4
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
5
Overlay Made Possible by Layer Abstraction
• Layering hides the detail of lower layer from higher layer
• IP operates on datalink layer (say ATM or SONET)
logical topology
• ATM or SONET creates point-to-point circuits on the
fibers
Network
Datalink
Physical
T. S. Eugene Ng
Datalink
Physical
eugeneng at cs.rice.edu
Network
Datalink
Physical
Rice University
6
Overlay
• Overlay is clearly a general concept
– You can keep overlaying one network on another, it’s all
logical
• IP Internet overlays on top of physical topology
– Why stop here?
• Something else can overlay on top of IP Internet
– Use IP encapsulation to create logical links for yet another
logical topology
– E.g. VPN link from home to Rice’s campus network
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
7
Advanced Reasons to Overlay on IP Internet
• IP provides basic best effort datagram service
• Many things you may want in a network but not
widely supported
– Performance-based routing (vs. shortest AS path)
– Multicast: One-to-many packet delivery
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
8
Overlay Multicast
• IP multicast is not widely deployed
– Increase router complexity
– Needs universal router support
• Cannot incrementally deploy
– Poor built-in access control (anyone can potentially send
multicast traffic)
– Hard to multicast data reliably
– Hard to perform congestion control on multicast data
• Overlay multicast
– Use end hosts as multicast tree nodes to distribute data
– Most of the problems with IP multicast are eliminated
– But overlay multicast has it’s own set of problems
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
9
Motivating Example:
Conference Attendance
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 10
Solution based on naive unicast
Stanford
Gatech
CMU
(Source)
Berkeley
• Client-server architecture (the Web)
• Does not scale well with group size
– Source host is the bottleneck
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
11
End System Multicast
CMU
StanfordLAN
Gatech
StanfordModem
Berkeley1
Overlay Tree
Gatech
Berkeley2
Stanford-LAN
StanfordModem
CMU
Berkeley1
Berkeley
2
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 12
End System Multicast: Benefits
• Scalability
– Routers do not do anything special, all new features in end systems
• Easy to deploy
– Works over the existing IP infrastructure
• Can simplify support for higher level functionality
CMU
Stanford-LAN
Berkeley1
Unicast congestion
control
Gatech
Berkeley2
T. S. Eugene Ng
Transcoding
Stanford-Modem
eugeneng at cs.rice.edu
Rice University 13
Concerns with End System Multicast
• Challenge to construct efficient overlay trees
• Performance concerns compared to IP Multicast
– Increase in delay
– Bandwidth waste (packet duplication)
Gatech
Stanford
CMU
Stanford-LAN
Gatech
StanfordModem
CMU
Berkeley
IP Multicast
T. S. Eugene Ng
Berkeley1
End System Multicast
eugeneng at cs.rice.edu
Berkeley2
Rice University 14
More Challenges
StanfordLAN
Gatech
StanfordModem
CMU
Berkeley1
Berkeley
2
Overlays must adapt to network dynamics and congestion
StanfordLAN
Gatech
StanfordModem
CMU
Berkeley1
Berkeley
2
Group membership is dynamic: members can join and leave
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 15
Inefficient Overlay Trees
CMU
StanfordLAN
Stanford-Modem
Berkeley1
StanfordLAN
Stanford-Modem
Berkeley1
Gatech
-Potential congestion near CMU
StanfordLAN
Poor bandwidth
to members
Gatech
Berkeley2
Berkeley2
High latency
CMU
CMU
Stanford-Modem
Berkeley1
Gatech
Berkeley2
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 16
An Efficient Overlay Tree
Stanford-Modem
CMU
Stanford-LAN
Berkeley1
Gatech
Berkeley2
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 17
Auto Tree Building is Very Hard
• When a node joins, needs to quickly find a good
parent to connect to tree
• Good parent:
–
–
–
–
low delay to source
experience low packet loss
has enough spare outbound network capacity
has low delay, high bandwidth path to new node
• When a node leaves, orphaned children nodes need
to quickly find new parents
• When loss rate or delay is high, need to decide
whether to find a new parent
– Is it the performance problem with this node, or with its
parent, or in the middle of the network?
– Are there actually better alternatives?
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 18
Introducing a New Idea
• Multicast overlays, unicast overlays are meant to
facilitate distribution of data
– Route data to receivers
– Basically the same fundamental function the Internet is
designed for
• Content addressable overlay are meant to facilitate
the look-up of information!
– Route questions to answers
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 19
In 1999, Napster was a huge hit
Hash Table
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 20
Distributed Hash Tables (DHT)
nodes
k1,v1
Operations:
insert(k,v)
lookup(k)
P2P
overlay
network
eugeneng at cs.rice.edu
k3,v3
k4,v4
k5,v5
T. S. Eugene Ng
k2,v2
k6,v6
Rice University 21
Why p2p overlays?
• Leverage pooled resources (storage, bandwidth,
CPU)
• Leverage resource diversity (geographic, ownership)
• Scalability
• Robustness
• Self-organization
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 22
Pastry – An example system developed at
Rice and Microsoft
Generic p2p location and routing substrate
• Self-organizing overlay network
• Lookup/insert object in < log16 N routing steps
(expected)
• O(log N) per-node state
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 23
Pastry: Object distribution
2128-1 O
128 bit circular ID space
nodeIds (uniform random)
objId
objIds (uniform random)
Invariant: node with
numerically closest nodeId
maintains object
nodeIds
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 24
Pastry
One primitive:
route(M, X): route message M to the live node with
nodeId closest to key X
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 25
Pastry: Object insertion/lookup
2128-1 O
X
Msg with key X
is routed to live
node with nodeId
closest to X
Route(X)
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 26
Pastry: Leaf sets
Each node maintains IP addresses of the
nodes with the L/2 numerically closest
larger and smaller nodeIds, respectively
• L – configurable parameter
• robust routing through neighbor nodes
• fault detection (use periodic hello msgs)
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University
Pastry: Routing
d471f1
d467c4
d462ba
d46a1c
d4213f
Route(d46a1c)
d13da3
Properties
• log16 N steps
• O(log N) state
65a1fc
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 28
Pastry: Routing table (# 65a1fcx)
Row 0
0
x
1
x
2
x
3
x
4
x
Row 1
6
0
x
6
1
x
6
2
x
6
3
x
6
4
x
Row 2
6
5
0
x
6
5
1
x
6
5
2
x
6
5
3
x
6
5
4
x
Row 3
6
5
a
0
x
6
5
a
2
x
6
5
a
3
x
6
5
a
4
x
log16 N
rows
T. S. Eugene Ng
5
x
7
x
8 9 a
x x x
b
x
c
x
d
x
e
x
f
x
6 6
6 7
x x
6 6 6
8 9 a
x x x
6
b
x
6
c
x
6
d
x
6
e
x
6
f
x
6
5
5
x
6
5
6
x
6
5
7
x
6
5
8
x
6
5
9
x
6
5
b
x
6
5
c
x
6
5
d
x
6
5
e
x
6
5
f
x
6
5
a
5
x
6
5
a
6
x
6
5
a
7
x
6
5
a
8
x
6
5
a
9
x
6
5
a
b
x
6
5
a
c
x
6
5
a
d
x
6
5
a
e
x
6
5
a
f
x
eugeneng at cs.rice.edu
6
5
a
a
x
Rice University 29
Pastry: Routing
Tradeoff
• O(log N) routing table size
• O(log N) message forwarding steps
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 30
Pastry: Routing procedure
if (destination is within range of our leaf set)
forward to numerically closest member
else
let l = length of shared prefix
let d = value of l-th digit in D’s address
if (Rld exists)
forward to Rld
else
forward to a known node that
(a) shares at least as long a prefix
(b) is numerically closer than this node
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 31
Pastry: Node addition
d471f1
d467c4
d462ba
d46a1c
d4213f
New node: d46a1c
Route(d46a1c)
d13da3
65a1fc
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 32
Node departure (failure)
Leaf set members exchange keep-alive messages
• Leaf set repair (eager): request set from farthest live
node in set
• Routing table repair (lazy): get table from peers in
the same row, then higher rows
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 33
Application Example
PAST: File storage
fileId
Insert fileId
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 34
PAST: File storage
k=4
fileId
Storage Invariant:
File “replicas” are
stored on k nodes
with nodeIds
closest to fileId
Insert fileId
(k is bounded by the
leaf set size)
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 35
PAST: File Retrieval
C
k replicas
Lookup
fileId
file located in log16 N
steps (expected)
usually locates replica
nearest client C
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 36
Application Example
SCRIBE: Large scale multicast
topicId
Publish topicId
Subscribe topicId
T. S. Eugene Ng
eugeneng at cs.rice.edu
Rice University 37