ChouTutorial04 - Microsoft Research

Download Report

Transcript ChouTutorial04 - Microsoft Research

Practical Network Coding for the
Internet and Wireless Networks
Philip A. Chou
with thanks to Yunnan Wu, Kamal Jain, Pablo Rodruiguez
Rodriguez, Christos Gkantsidis, and Baochun Li, et al.
Globecom Tutorial, December 3, 2004
Outline

Practical Network Coding





Packetization
Buffering
Results for SprintLink ISP
Experimental comparisons to multicast routing
Internet and Wireless Network Applications

Live Broadcasting, File Downloading, Messaging,
Interactive Communication
Network Coding applicable to
real networks?

Internet

IP Layer


Application Layer



Routers (e.g., ISP)
Infrastructure (e.g., CDN)
Ad hoc (e.g., P2P)
Wireless



Mobile ad hoc multihop wireless networks
Sensor networks
Stationary wireless (residential) mesh networks
Theory vs. Practice

Theory:


Symbols flow synchronously throughout
network; edges have integral capacities
Practice:

Information travels asynchronously in packets;
packets subject to random delays and losses;
edge capacities often unknown, varying as
competing communication processes begin/end
Theory vs. Practice (2)

Theory:


Some centralized knowledge of topology
to compute capacity or coding functions
Practice:

May be difficult to obtain centralized knowledge,
or to arrange its reliable broadcast to nodes
across the very communication network being
established
Theory vs. Practice (3)

Theory:


Can design encoding to withstand failures,
but decoders must know failure pattern
Practice:

Difficult to communicate failure pattern reliably
to receivers
Theory vs. Practice (4)

Theory:


Cyclic graphs present difficulties, e.g., capacity
only in limit of large blocklength
Practice:

Cycles abound. If A → B then B → A.
Need to address practical
network coding in real networks







Packets subject to random loss and delay
Edges have variable capacities due to
congestion and other cross traffic
Node & link failures, additions, & deletions
are common (as in P2P, Ad hoc networks)
Cycles are everywhere
Broadcast capacity may be unknown
No centralized knowledge of graph topology
or encoder/decoder functions
Simple technology, applicable in practice
Approach

Packet Format


Removes need for centralized knowledge of graph
topology and encoding/decoding functions
Buffer Model

Allows asynchronous packets arrivals &
departures with arbitrarily varying rates, delay,
loss
[Chou, Wu, and Jain, Allerton 2003]
[Ho, Koetter, Médard, Karger, and Effros, ISIT 2003]
Standard Framework


Graph (V,E) having unit capacity edges
Sender s in V, set of receivers T={t,…} in V
Broadcast capacity h = mint Maxflow(s,t)
y(e’1) = x1
y(e’)
y(e1)
y(e’2) = x2
y(e’h) = xh


v
e
y(e)
t
...
...
s
y(e2)
...

y(eh)
y(e) = ∑e’ be(e’) y(e’)
b(e) = [be(e’)]e’ is local encoding vector
Global Encoding Vectors
y(e’)
y(e1)
y(e’2) = x2
y(e’h) = xh



v
y(e2)
y(e)
t
...
...
s
e
...
y(e’1) = x1
y(eh)
By induction y(e) = ∑hi=1 gi(e) xi
g(e) = [g1(e),…,gh(e)] is global encoding vector
Receiver t can recover x1,…,xh from
 y (e1 )   g1 (e1 )  g h (e1 )   x1 
 x1 
   
    G   


t

 
 

 y (eh )  g1 (eh )  g h (eh )  xh 
 xh 
Invertibility of Gt

Gt will be invertible with high probability
if local encoding vectors are random
and field size is sufficiently large

If field size = 216 and |E| = 28
then Gt will be invertible w.p. ≥ 1−2−8 = 0.996
[Ho et al., 2003]
[Jaggi, Sanders, et al., 2003]
Packetization
x1=[x1,1,x1,2,…,x1,N]
y(e’)
y(e1)
x2=[x2,1,x2,2,…,x2,N]
y(e2)
e
y(e)=[y1(e),y2(e),…,yN(e)]
xh=[xh,1,xh,2,…,xh,N]


t
...
v
...
...
s
y(eh)
Internet: MTU size typically ≈ 1400+ bytes
y(e) = ∑e’ be(e’) y(e’) = ∑hi=1 gi(e) xi s.t.
 y (e1 )   y1 (e1 )
   

 
y (eh )  y1 (eh )
 x1,1
y2 (e1 )  y N (e1 ) 


   Gt  
 xh ,1
y2 (eh )  y N (eh )
x1, 2  x1, N 


 
xh , 2  xh , N 
Key Idea


Include within each packet on edge e
g(e) = ∑e’ be(e’) g(e’); y(e) = ∑e’ be(e’) y(e’)
Can be accomplished by prefixing i th unit
vector to i th source vector xi, i=1,…,h
1
0 x1,1 x1, 2  x1, N 
 g1 (e1 )  g h (e1 ) y1 (e1 ) y2 (e1 )  y N (e1 ) 

 
G  










 t
0
 g1 (eh )  g h (eh ) y1 (eh ) y2 (eh )  y N (eh )
1 xh,h xh, 2  xh, N 

Then global encoding vectors needed to
invert the code at any receiver can be found
in the received packets themselves!
Cost vs. Benefit

Cost:


Overhead of transmitting h extra symbols
per packet; if h = 50 and field size = 28,
then overhead ≈ 50/1400 ≈ 3%
Benefit:

Receivers can decode even if




Network topology & encoding functions unknown
Nodes & edges added & removed in ad hoc way
Packet loss, node & link failures w/ unknown locations
Local encoding vectors are time-varying & random
Erasure Protection

Removals, failures, losses, poor random
encoding may reduce capacity below h
1
0 x1,1 x1, 2  x1, N 
 g1 (e1 )  g h (e1 ) y1 (e1 ) y2 (e1 )  y N (e1 ) 

   
  G k h  








 t 
0
 g1 (ek )  g h (ek ) y1 (ek ) y2 (ek )  y N (ek )
1 xh,h xh, 2  xh, N 

Basic form of erasure protection:
send redundant packets, e.g.,
last h-k packets of x1,…, xh are known zero
Priority Encoding Transmission
(Albanese et al., IEEE Trans IT ’96)


More sophisticated form: partition data into
layers of importance, vary redundancy by layer
Received rank k → recover k layers
Exact capacity can be unknown
Global encoding vectors
1
2
3
4
5
1
0
0
0
0
0
x1,1 x1,2 x1,3 x1,4 x1,5 x1,6 x1,7 x1,8
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
Data layer h=6
...
x1,11 x1,12
...
x1,N
Source packet 1
...
x2,11 x2,12
...
x2,N
Source packet 2
...
x3,11 x3,12
...
x3,N
Source packet 3
...
x4,11 x4,12
...
x4,N
Source packet 4
x5,8 x5,9 x5,10 x5,11 x5,12
...
x5,N
...

...
x6,N
Source packet h=6
x2,2 x2,3 x2,4 x2,5 x2,6 x2,7 x2,8
x3,3 x3,4 x3,5 x3,6 x3,7 x3,8
x4,5 x4,6 x4,7 x4,8
0
0
0
0
x6,12
Asynchronous Communication

In real networks, “unit capacity” edges grouped



Packets on real edges carried sequentially
Separate edges → separate prop & queuing delays
Number of packets per unit time on edge varies


Loss, congestion, competing traffic, rounding
Need to synchronize


All packets related to same source vectors x1,…, xh
are in same generation; h is generation size
All packets in same generation tagged with same
generation number; one byte (mod 256) sufficient
Buffering
random
combination
arriving packets
(jitter, loss,
variable rate)
edge
asynchronous
reception
Transmission
opportunity:
generate
packet
edge
buffer
node
asynchronous
transmission
Decoding

Block decoding:


Collect h or more packets, hope to invert Gt
Earliest decoding (recommended):

Perform Gaussian elimination after each packet



At every node, detect & discard non-informative packets
Gt tends to be lower triangular, so can typically
decode x1,…,xk with fewer more than k packets
Much lower decoding delay than block decoding

Approximately constant, independent of block length h
Flushing Policy, Delay Spread,
and Throughput loss

Policy: flush when first packet of next
generation arrives on any edge

Simple, robust, but leads to some throughput loss
Flushing
time
s
v
Fastest path
...
...
...
Slowest path
X
...
Time of arrival at node v
throughput loss (%) 
X
...
X
X
...
Delay
spread
delay spread (s)
delay spread (s)  sending rate (pkt/s)

generation duration (s)
h I
Interleaving

Decomposes session into several concurrent
interleaved sessions with lower sending rates
0


1
0
1
0
1
0
1
2
3
2
3
2
3
2
3
...
Does not decrease overall sending rate
Increases space between packets in each
session; decreases relative delay spread
Simulations


Implemented event-driven simulator in C++
Six ISP graphs from Rocketfuel project (UW)




Sender: Seattle; Receivers: 20 arbitrary (5 shown)



SprintLink: 89 nodes, 972 bidirectional edges
Edge capacities: scaled to 1 Gbps / “cost”
Edge latencies: speed of light x distance
Broadcast capacity: 450 Mbps; Max 833 Mbps
Union of maxflows: 89 nodes, 207 edges
Send 20000 packets in each experiment, measure:

received rank, throughput, throughput loss, decoding delay vs.
sendingRate(450), fieldSize(216), genSize(100), intLen(100)
Received Rank
100
100
(450 Mbps)
Chicago
Pearl Harbor (525 Mbps)
(625 Mbps)
Anaheim
(733 Mbps)
Boston
(833 Mbps)
SanJose
90
Avg. received rank
Received rank
90
80
70
60
80
70
60
50
50
40
Chicago
(450 Mbps)
Pearl Harbor (525 Mbps)
Anaheim
(625 Mbps)
Boston
(733 Mbps)
SanJose
(833 Mbps)
0
20
40
60
80
100
120
140
Generation number
160
180
200
40
0
5
10
15
Field size (bits)
20
25
Throughput
850
850
Chicago
(450 Mbps)
Pearl Harbor (525 Mbps)
Anaheim
(625 Mbps)
Boston
(733 Mbps)
SanJose
(833 Mbps)
Throughput (Mbps)
750
700
750
650
600
550
700
650
600
550
500
500
450
450
400
400
450
500
550
600
650
Chicago
(450 Mbps)
Pearl Harbor (525 Mbps)
Anaheim
(625 Mbps)
Boston
(733 Mbps)
SanJose
(833 Mbps)
800
Throughput (Mbps)
800
700
Sending rate (Mbps)
750
800
850
400
400
450
500
550
600
650
700
Sending rate (Mbps)
750
800
850
Throughput Loss
100
300
Throughput Loss (Mbps)
90
80
70
Throughput Loss (Mbps)
Chicago
(450 Mbps)
Pearl Harbor (525 Mbps)
Anaheim
(625 Mbps)
Boston
(733 Mbps)
San Jose
(833 Mbps)
60
50
40
30
20
Chicago
(450 Mbps)
Pearl Harbor (525 Mbps)
Anaheim
(625 Mbps)
Boston
(733 Mbps)
SanJose
(833 Mbps)
250
200
150
100
50
10
0
20
30
40
50
60
70
Generation Size
80
90
100
0
0
10
20
30
40
50
60
70
Interleaving Length
80
90
100
Decoding Delay
200
200
Pkt delay w/ blk decoding
Pkt delay w/ earliest decoding
180
160
Packet delay (ms)
Packet delay (ms)
160
140
120
100
80
60
140
120
100
80
60
40
40
20
20
0
Pkt delay w/ blk decoding
Pkt delay w/ earliest decoding
180
20
30
40
50
60
70
Generation Size
80
90
100
0
0
10
20
30
40
50
60
70
Interleaving Length
80
90
100
Distributed Flow Algorithms

Work in progress…
Comparing Network Coding
to Multicast Routing





Throughput
Computational complexity
Resource cost (efficiency)
Robustness to link failure
Robustness to packet loss
Multicast Routing
sender
receivers



Standard multicast: union of shortest reverse paths
Widest multicast: maximum rate Steiner tree
Multiple multicast: Steiner tree packing
Complexity and Throughput of
Optimal Steiner Tree Packing


Packing Steiner trees optimally is hard [e.g., JMS03]
Network Coding has small throughput advantage [LLJL04]:
Z. Li, B. Li, D. Jiang, and L. C. Lau, On achieving optimal end-to-end throughput in data networks:
theoretical and emprirical studies, ECE Tech. Rpt., U. Toronto, Feb. 2004, reprinted with permission.
Complexity and Throughput of
Optimal Steiner Tree Packing (2)

Z. Li, B. Li, D. Jiang, L. C. Lau ‘04:

Optimally packed Steiner trees in 1000 randomly
generated networks (|E|<35)


Network Coding advantage = 1.0 in every network
“The fundamental benefit of network coding is
not higher optimal throughput, but to facilitate
significantly more efficient computation and
implementation of strategies to achieve such
optimal throughput.”
Complexity of
Greedy Tree Packing

Find widest (max rate) distribution tree in
polynomial time [Prim]:


Grow tree edge by edge
Postprocess tree
Choose the edge
with maximum
capacity
Reached
nodes

Nonreached
nodes
Repeatedly pack widest distribution trees
(Mbps)
Comparison of Achievable Throughput
800
700
600
500
400
300
200
100
0
1
2
3
4
5
6
ISP#
Path-packing
Max-rate Steiner tree (Prim's algorithm)
Greedy tree-packing based on Prim's algorithm
Greedy tree-packing based on Lovasz's proof to Edmonds' theorem
Practical network coding
Multicast capacity
[Wu, Chou, and Jain, ISIT 2004]
Comparison of Efficiency
(#receivers * throughput / bandwidth cost)
Network coding
Widest tree
Z. Li, B. Li, D. Jiang, and L. C. Lau, On achieving optimal end-to-end throughput in data networks:
theoretical and emprirical studies, ECE Tech. Rpt., U. Toronto, Feb. 2004, reprinted with permission.
Robustness to Link Failure


Ergodic link failure: packet loss
Non-ergodic link failure: G  G-e

Compute throughput achievable by

Network Coding


Adaptive Tree Packing


Multicast capacity of G-e
Re-pack trees on G-e
Fixed Tree Packing


Keep trees packed on G
Delete subtrees downstream from e
[Wu, Chou, and Jain, ISIT 2004]
Robustness to Link Failures
Robustness to Packet Loss

Ergodic link failure:


5% loss
5% loss
Routing requires link-level error control to achieve
throughput 95% of sending rate
Network Coding obviates need for link-level error
control


Outputs random linear combinations at intermediate
nodes
Delays reconstruction until terminal nodes
Comparing Network Coding
to Multicast Routing





Throughput √
Computational complexity √
Resource cost (efficiency) √
Robustness to link failure √
Robustness to packet loss √
Manageability
Network Coding for Internet
and Wireless Applications
File download
wireless
Internet
Xbox
Live
Windows
Messenger
Peer
Net
Bit
Torrent
Digital
Fountain
ALM
Gnutella
Kazaa
Ad Hoc
(P2P)
Akamai
RBN
SplitStream
CoopNet
Infrastructure
(CDN)
Live Broadcast

State-of-the-art: Application Layer Multicast (ALM)
trees with disjoint edges (e.g., CoopNet)


FEC/MDC striped across trees
Up/download bandwidths equalized
a failed node
Live Broadcast (2)

Network Coding [Jain, Lovász, and Chou, 2004] :

Does not propagate losses/failures beyond child
failed node
affected nodes
(maxflow: ht → ht – 1)
unaffected nodes
(maxflow unchanged)

ALM/CoopNet average throughput: (1–ε)depth * sending rate
Network Coding average throughput: (1–ε) * sending rate
File Download

State-of-the-Art: Parallel download (e.g., BitTorrent)




Selects parents at random
Reconciles working sets
Flash crowds stressful
Network Coding:


Does not need to reconcile working sets
Handles flash crowds similarly to live broadcast


Throughput
download time
Seamlessly transitions from broadcast to download mode
File Download (2)
Mean Max
LR Free
124.2 161
LR TFT
126.1 185
FEC Free 123.6 159
FEC TFT
127.1 182
NC Free
117.0 136
NC TFT
117.2 139
C. Gkantsidis and P. Rodriguez Rodruiguez, Network Coding for large scale content distribution,
submitted to INFOCOM 2005, reprinted with permission.
Instant Messaging

State-of-the-Art: Flooding (e.g., PeerNet)




Peer Name Resolution Protocol (distributed hash table)
Maintains group as graph with 3-7 neighbors per node
Messaging service: push down at source, pops up at
receivers
How? Flooding



Adaptive, reliable
3-7x over-use
Network Coding:


Improves network usage 3-7x (since all packets informative)
Scales naturally from short message to long flows
Interactive Communication in
mobile ad hoc wireless networks

State-of-the-Art: Route discovery and maintenance


Timeliness, reliability
Network Coding:

Is as distributed, robust, and adaptive as flooding


Each node becomes collector and beacon of information
Can also minimize energy
Network Coding Minimizes Energy
(per bit)
a
a
a a,b b
a
a
a
a
b
a
a
a
b
a
optimal multicast
transmissions per
packet = 5
a
a
b
a+b
a+b
aa,b
bb,a
network coding
transmissions per
packet = 4.5
Wu et al. (2003); Wu, Chou, Kung (2004)
Lun, Médard, Ho, Koetter (2004)
Network Coding in Residential
Mesh Network (simulation)
Physical Piggybacking
a
a+b
s


b
a+b
t
Information sent from t to s can be piggybacked on
information sent from s to t
Network coding helps even with point-to-point
interactive communication



throughput
energy per bit
delay
Network Coding in Residential
Mesh Network (multi-session)
Network Coding with Cross-Layer
Optimization
Network Coding has
> 4x bits per Joule
Summary

Network Coding is Practical



Network Coding outperforms Routing on






Packetization
Buffering
Throughput
Computational complexity
Resource cost (efficiency)
Robustness to link failure
Robustness to packet loss
Network Coding can improve performance



in IP or wireless networks
in infrastructure-based or P2P networks
for live broadcast, file download, messaging, interactive
communication applications