Datacast: A Scalable and Efficient Reliable Group Data

Download Report

Transcript Datacast: A Scalable and Efficient Reliable Group Data

Datacast: A Scalable and Efficient
Reliable Group Data Delivery
Service for Data Centers
Jiaxin Cao, Chuanxiong Guo, Guohan Lu, Yongqiang Xiong,
Yixin Zheng, Yongguang Zhang, Yibo Zhu, Chen Chen
University of Science and Technology of China
Microsoft Research Asia
Tsinghua University
University of California, Santa Barbara
University of Pennsylvania
Reliable Group Data Delivery
The problem of RGDD is:
In a data center network, given a data source, Src, and a set of receivers, R1, R2, …, Rn,
how to reliably transmit bulk data from Src to all the receivers?
<1,0>
<1,1>
<1,2>
<1,3>
<0,0>
<0,1>
<0,2>
<0,3>
00 01 02 03
10 11 12 13
20 21 22 23
Data
30 31 32 33
Reliable Group Data Delivery
•RGDD is important in DCNs:
• Bootstrapping or OS upgrading.
• Distributed file systems, e.g., GFS.
• VM setup.
• And more...
Reliable Group Data Delivery
A good RGDD design should have the following
properties:
• Scalable (large group numbers and large group sizes)
• High bandwidth efficiency
Existing solutions to RGDD
Existing solutions can be classified into two
categories:
• Reliable IP multicast.
Not scalable, e.g., ACK implosion.
• End-host based overlays. Low bandwidth efficiency.
None of the existing systems can perfectly achieve RGDD.
New opportunities in DCN
Recently, there are two clear trends in DCN:
• Multiple edge-disjoint Steiner trees for RGDD.
• Practical packet caching abilities in network devices.
We can cache
packet!
00
00
<0,0>
<1,0>
01
02
03
10
20
30
<1,1>
<1,2>
<1,3>
<0,1>
<0,2>
<0,3>
11 21 31 12 22 32 13 23 33
11 12 13 21 22 23 31 32 33
<0,1>
<0,2>
<0,3>
<1,1>
<1,2>
<1,3>
10
20
30
01
02
03
The architecture of Datacast
Src
Master j
R1
Fabric
Manager
How to efficiently
transmit data in
each Steiner tree?
R2
RGDD Group in
RGDD Group i2
RGDD Group i1
Network
Topology
Src
Master i
IMD
How to calculate
multiple Steiner
trees?
R1
R2
R3
R4
Multiple edge-disjoint Steiner
trees in DCN
• Our multiple Steiner trees algorithm takes three steps:
1. Use specific algorithms to construct spanning trees.
2. Prune the spanning trees.
3. Use Breath First Search(BFS) to repair the trees broken
by network failures.
This algorithm is fast (O(k|V|) + O(|E|) + O(k|E|)) and
efficient.
Datacast transport protocol
• Datacast is built on top of Content Centric Network (CCN):
Data
00
Data
01
Data
02
Data
Inst
Data
10
11
03
Inst
12
13
22
23
Inst
20
30
21
31
32
Data
Inst
33
Datacast transport protocol
• Datacast uses a rate based congestion control algorithm at
the source side:
•𝑟=
𝑟
,
2
𝑤ℎ𝑒𝑛 𝑎 𝑑𝑢𝑝𝑙𝑖𝑐𝑎𝑡𝑒 𝑖𝑛𝑡𝑒𝑟𝑠𝑡 𝑖𝑠 𝑟𝑒𝑐𝑒𝑖𝑣𝑒𝑑.
𝑟 + 𝛿, 𝑤ℎ𝑒𝑛 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜 𝑑𝑢𝑝𝑙𝑖𝑐𝑎𝑡𝑒 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 𝑖𝑛 𝑇.
Datacast transport protocol
• We built a fluid model for Datacast congestion control algorithm.
• 𝑥𝑠 (𝑡) and 𝑥𝑟 (𝑡) are the current data positions of the data source and the
slowest receiver.
• R is the “desired” rate of the slowest receiver.
𝛿
𝑇
•
𝑥𝑠′′
𝑡 = 1 −𝑝 𝑡
•
𝑥𝑟′
𝑅
𝑡 =
max 𝑅, 𝑥𝑠′ (𝑡)
• 𝑝 𝑡 = 1 𝑥𝑠 (𝑡)−𝑥𝑟 (𝑡)>𝐶
−𝑝 𝑡
𝑥𝑠′ (𝑡) 𝑥𝑟′ (𝑡)
2 𝑀𝑇𝑈
𝑖𝑓 𝑥𝑟 (𝑡) < 𝑥𝑠 (𝑡)
𝑖𝑓 𝑥𝑟 (𝑡) = 𝑥𝑠 (𝑡)
Datacast transport protocol
• Theorem 1 (Cache Usage). Datacast works at the full
rate, i.e., the rate of the slowest receiver, R, if the cache
size, C, satisfies
𝑅2 𝑇
𝐶>
2𝛿
When R is 100Mbps, 𝛿 is 5Mbps, T is 1ms, C only needs
to be greater than 125KB.
Datacast transport protocol
• Theorem 2 (Duplicate Data Ratio). When Datacast works
at the full rate, the duplicate data ratio of Datacast is
𝛿
𝑇
𝛿
𝑅2
𝑇 + 2 𝑀𝑇𝑈
When R is 100Mbps, 𝛿 is 5Mbps, T is 1ms, and MTU is
1.5KB, the duplicate data ratio is 1.19%.
Simulation: multiple Steiner trees
algorithm
• We tested our algorithm in Fattree(24,3), BCube(8, 3), Torus(16, 3)
under the link failure rates (LFR) of 1%, 3% and 5%.
Running times.
Steiner tree numbers.
Simulation: Datacast congestion
control
• Simulation Setup:
• BCube(4, 1) with 1Gbps
links.
• The link from <0,0> to
02 is slowed down to
100Mbps.
• 𝛿 = 5Mbps, T = 1ms
• MTU = 1.5KB
00
00
<0,0>
<1,0>
01
02
03
10
20
30
<1,1>
<1,2>
<1,3>
<0,1>
<0,2>
<0,3>
11 21 31 12 22 32 13 23 33 11 12 13 21 22 23 31 32 33
<0,1>
<0,2>
<0,3>
<1,1>
<1,2>
<1,3>
10
20
30
01
02
03
Steiner Tree 1.
Steiner Tree 2.
Simulation: Datacast congestion
control
• Based on Theorem 1, Datacast needs 125KB caches to work at full
rate.
• Based on Theorem 2, the duplicate data ratios is 1.19%.
Cache Size (KB)
Throughput(Mbps)
Duplicate Data Ratio (%)
8
91.380
1.15
32
95.076
1.14
128
98.799
1.11
512
98.799
1.10
2048
98.799
1.12
Simulation: Datacast congestion
control
• Compare with BitTorrent.
Fattree.
BCube.
Torus.
Experiment: Datacast congestion
control
• We build Datacast with ServerSwitch platform, and evaluate it in a
BCube(4, 1).
• 𝛿 = 5Mbps, T = 1ms, MTU = 8KB.
• Based on Theorem 1, Datacast works at the full rate when C > 256KB.
• When C = 64KB, the average throughput is 91.998Mbps.
• When C = 256KB, the average throughput is 99.595Mbps.
• Based on Theorem 2, the duplicate data ratio should be 3.48%.
• When Datacast works at the full rate, the measured duplicate data ratio is
3.10%.
Experiment: Datacast congestion
control
• We compare Datacast with BitTorrent. We use both of them to
transmit 4GB data.
Finish time (s)
Link stress
Datacast
16.9
1.01
BitTorrent
52
1.39
Related work
• Reliable IP multicast
• Pgm congestion control (pgmcc)
• Active Reliable Multicast (ARM)
• End-host based overlays
• SplitStream
• End System Multicast
• Cornet
Conclusion
•In this paper, we propose Datacast which
• Calculates multiple edge-disjoint Steiner trees in DCNs
• Uses CCN to turn hard group states to soft packet caching
• Uses a simple rate-based AIMD congestion control
algorithm to achieve high efficiency
Datacast is scalable and achieves high bandwidth efficiency
Thank you!