PowerPoint 演示文稿

Download Report

Transcript PowerPoint 演示文稿

Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou
Department of Computer Science, Jinan University,
Guangzhou, P.R. China
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.
•
•
•
•
•
•
•
Motivation
Challenges
Related work
Our idea
System architecture
Evaluation
Conclusion
2
• The Explosive Growth of Data
 IDC: 1,800EB data in 2011, 40-60% annual increase
 Larger Data Center
 Google: 19 data centers > 1 million servers
 Higher traffic
 Cisco forecasts that annual traffic in global data centers will
nearly triple over the next 5 years and reach 7.7ZB by the end of
2017
Google Data Center
3
• Data Center Network
 Node increment Scalability?
 Failures are common  Fault tolerance?
 Google MapReduce in a 4,000-node cluster:
 5 nodes fail during a job
 1 disk fails every 6 hours
 Bandwidth-hungry services  Network capacity?
Infrastructure services: MapReduce, GFS, …
Network applications: Cloud disk, Video, …
• Tree-based Structure
 Traditional tree
 Bandwidth bottleneck, Single points of failure, Expensive
 Modified tree: Fat-tree
 High capacity
 Limited scalability
5
Fat-tree
Traditional Tree-based Structure
• Other novel, hybrid network structures
 Physical topology
 Level-based, but not tree-based
 Recursively defined
 Routing mechanism
 No routers, without traditional internet routing mechanism
 Put routing intelligence on servers
 Take advantage of structural properties
 Typical structures
 DCell, FiConn, BCube, Totoro…
6
• Physical structures
• DCell
• FiConn
• Totoro
• BCube
7
• Routing mechanisms
DCell
Totoro
FiConn
BCube
Core idea
Divide-and-Conquer
Correct
different
address digits
Calculation
Hop by hop
Full path
Link state
Broadcast domain
Path selection
Dijkstra + Rerouting
Greedy
Available one
Traffic-aware
No mention
Yes
No mention
Shortest
distance
No
Path probing
Yes
8
• What we achieve: Athena Routing Mechanism
 Routing algorithm
 Based on Dynamic Programming
 Find the shortest path with lower complexity than classic algorithms
 Support Multi-path
 Path probing mechanism
 Bypass the failed nodes & links
 Traffic-aware
 Properties
 More resilient, shorter latency, higher capacity, Lower complexity
9
• Athena Routing Mechanism
 Implement on the structure of Totoro
 Compare with the original Totoro Fault-tolerant Routing
Algorithm (TFR) and Shortest Path Algorithm (SPA, based
on Floyd-Warshall).
 Applicable to DCell, FiConn, BCube…
 Similar topology: level-based, recursively defined..
 Put routing intelligence on servers
10
• Totoro
 Two-port servers
 Low-end switches
 Level-based
 Recursively defined
two-port NIC
Totoro Structure of One Level
11
• Building Totoro




Connect N servers to an N-port switch
Here, N=4
Basic partition: Totoro0
Intra-switch
A Totoro0 Structure
12
• Building Totoro
 Available ports in Totoro0: c. Here, c=4
 Connect n Totoro0s to n-port switches by using c/2
ports
 Inter-switch
A Totoro1 structure consists of n Totoro0s.
13
• Building Totoro
 Connect n Totoroi-1s to n-port switches to build a
Totoroi
 Recursively defined
 Half of available ports ⇒ Open & Scalable
 The number of paths among Totorois is n/2 times of
the number of paths among Totoroi-1s ⇒ Multiredundant links ⇒ High network capacity
14
1-0,0
0,0,0 0,0,1 0,0,2 0,0,3 0,1,0 0,1,1 0,1,2 0,1,3
1-0,1
1-1,0
0,2,0 0,2,1 0,2,2 0,2,3 0,3,0 0,3,1 0,3,2 0,3,3
1-1,1
1,0,0 1,0,1 1,0,2 1,0,3 1,1,0 1,1,1 1,1,2 1,1,3 1,2,0 1,2,1 1,2,2 1,2,3 1,3,0 1,3,1 1,3,2 1,3,3
Xie, J., Deng, Y., Zhou, K.: Totoro: A scalable and fault-tolerant data center
2-0backup port.2-1
2-2 Parallel Computing.
2-3
Level-2
Link
network
by using
In: Network and
Springer
(2013) 94–105
Level-1 Link
3,0,0 3,0,1 3,0,2 3,0,3 3,1,0 3,1,1 3,1,2 3,1,3 3,2,0 3,2,1 3,2,2 3,2,3
1-3,0
1-3,1
3,3,0 3,3,1 3,3,2 3,3,3
2,0,0 2,0,1 2,0,2 2,0,3 2,1,0 2,1,1 2,1,2 2,1,3 2,2,0 2,2,1 2,2,2 2,2,3 2,3,0 2,3,1 2,3,2 2,3,3
1-2,0
Totoro2 structure with N = 4, n = 4, K = 2.
1-2,1
15
• Athena Routing Algorithm (ARA)
 Based on Dynamic Programming (DP)
 Applicable to problems which exhibit the properties of
 Overlapping subproblems
 Optimal substructure
 Recursively calculate
16
Steps of ARA:
1. Suppose src and dst
belong to two partitions.
2. Get all paths connecting
these two partitions.
3. For each path, recursively
calculate it.
4. Store all paths.
5. Sort all path by length.
6. Remove the extra paths.
This function is based
on the corresponding
structural properties.
Cartesian product
17
• Case study of ARA
 work out the path from src to dst
18
• Case study of ARA
 Step. 1: src and dst belong to two different subpartitions respectively
19
• Case study of ARA
 Step. 2: there exist two paths between these two subpartitions
20
• Case study of ARA
 Step. 3: for Path 1, recursively work out the sub-paths
in these sub-partitions, and join them for a full path
21
• Case study of ARA
 Step. 4: similarly, work out the full path for Path 2
22
• Case study of ARA
 Step. 5: add all paths into the result set
23
• Case study of ARA
 Step. 5: sort the paths by lengths
24
• Case study of ARA
 Step. 5: remove the extra paths (here, we suppose the
size of set to return is 1, i.e., it is the shortest path)
25
• Path Probing Mechanism
 Source host sends the probing request packets
 Destination host sends probing reply packets
 Intermediate servers record the link capacities in the
probing packets and forward them
26
• Path Probing Mechanism
 Detect the failed paths  No extra rerouting technique
is required
 Detect the link capacity  Support load balance…
27
28
29
• Protocol Implementation
 ARM Packet format
 Path-probing packet
 Data packet
30
• Protocol Implementation
 Protocol
2.5-layer protocol
4
TCP
3
IP
2.5
ARM
2
Ethernet
How an intermediate server determines the next hop?
 A fact: two adjacent servers in a path only differ at one “bit”
 Hence, we only store the different “bit”s in the vector.
31
• Evaluating Path Failure & Average Path Lengths
 ARM vs. TFR vs. SPA
TFR: the original Totoro Fault-tolerant Routing algorithm
SPA: Shortest Path Algorithm, Floyd-Warshall, performance bound
• Evaluating Resource Usage
32
• Evaluating Path Failure & Average Path Lengths
 Experimental parameters
Types of failures
Link, Node, Switch & Rack failures
Platform
Totoro2 (4096 servers)
Failures ratios
2% - 20%
Communication mode
All-to-all
Simulation times
20 times
33
• Evaluating Path Failure
 Path failure ratio vs. server/rack failure ratio
The performance of ARM/TFR are almost identical to that of SPA!
34
• Evaluating Path Failure
 Path failure ratio vs. switch failure ratio
The performance of ARM is almost identical to that of SPA!
But TFR isn’t.
35
• Evaluating Path Failure
 Path failure ratio vs. link failure ratio
 When a high link failure occurs:
 ARM achieves slightly better capacity than TFR.
 Performance gap between ARM and SPA still exists!
SPA traverse all feasible links in the whole
structure until finding a valid path!
This is a tradeoff that ARM makes to
facilitate algorithmic complexity and save
computation resources.
36
• Evaluating Average Path Lengths
ARM:
1. Better than TFR.
2. Almost identical to SPA.
3. Shorter than SPA, this is
because the path failure
ratio of ARM is a bit
higher than that of SPA,
thus our total path length
is shorter.
37
• Evaluating Resource Usage
 Experimental parameters
Testbed
Lenovo T350, Quad-core, 8GB memory
Platform
Totoro2 (4096 servers)
Size of each result
10 paths
Communication mode
One-to-all in 4 Totoro1
38
• Evaluating Resource Usage
CPU:
28%
+10nodes/s
1. Increase by 10 per
second
2. Peak value of 28% at
18s
3. Benefited from the
cache
Memory:
0%
For each host, it only
costs 164KB at most.
18s
39
•
•
•
•
More resilient
Shorter latency
Higher capacity
Lower complexity
• In the future work, we will focus on the
implementation of ARM in DCell, FiConn and
other structures!
40
41
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.