PowerPoint 演示文稿
Download
Report
Transcript PowerPoint 演示文稿
Lijun Lyu, Junjie Xie, Yuhui Deng, Yongtao Zhou
Department of Computer Science, Jinan University,
Guangzhou, P.R. China
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.
•
•
•
•
•
•
•
Motivation
Challenges
Related work
Our idea
System architecture
Evaluation
Conclusion
2
• The Explosive Growth of Data
IDC: 1,800EB data in 2011, 40-60% annual increase
Larger Data Center
Google: 19 data centers > 1 million servers
Higher traffic
Cisco forecasts that annual traffic in global data centers will
nearly triple over the next 5 years and reach 7.7ZB by the end of
2017
Google Data Center
3
• Data Center Network
Node increment Scalability?
Failures are common Fault tolerance?
Google MapReduce in a 4,000-node cluster:
5 nodes fail during a job
1 disk fails every 6 hours
Bandwidth-hungry services Network capacity?
Infrastructure services: MapReduce, GFS, …
Network applications: Cloud disk, Video, …
• Tree-based Structure
Traditional tree
Bandwidth bottleneck, Single points of failure, Expensive
Modified tree: Fat-tree
High capacity
Limited scalability
5
Fat-tree
Traditional Tree-based Structure
• Other novel, hybrid network structures
Physical topology
Level-based, but not tree-based
Recursively defined
Routing mechanism
No routers, without traditional internet routing mechanism
Put routing intelligence on servers
Take advantage of structural properties
Typical structures
DCell, FiConn, BCube, Totoro…
6
• Physical structures
• DCell
• FiConn
• Totoro
• BCube
7
• Routing mechanisms
DCell
Totoro
FiConn
BCube
Core idea
Divide-and-Conquer
Correct
different
address digits
Calculation
Hop by hop
Full path
Link state
Broadcast domain
Path selection
Dijkstra + Rerouting
Greedy
Available one
Traffic-aware
No mention
Yes
No mention
Shortest
distance
No
Path probing
Yes
8
• What we achieve: Athena Routing Mechanism
Routing algorithm
Based on Dynamic Programming
Find the shortest path with lower complexity than classic algorithms
Support Multi-path
Path probing mechanism
Bypass the failed nodes & links
Traffic-aware
Properties
More resilient, shorter latency, higher capacity, Lower complexity
9
• Athena Routing Mechanism
Implement on the structure of Totoro
Compare with the original Totoro Fault-tolerant Routing
Algorithm (TFR) and Shortest Path Algorithm (SPA, based
on Floyd-Warshall).
Applicable to DCell, FiConn, BCube…
Similar topology: level-based, recursively defined..
Put routing intelligence on servers
10
• Totoro
Two-port servers
Low-end switches
Level-based
Recursively defined
two-port NIC
Totoro Structure of One Level
11
• Building Totoro
Connect N servers to an N-port switch
Here, N=4
Basic partition: Totoro0
Intra-switch
A Totoro0 Structure
12
• Building Totoro
Available ports in Totoro0: c. Here, c=4
Connect n Totoro0s to n-port switches by using c/2
ports
Inter-switch
A Totoro1 structure consists of n Totoro0s.
13
• Building Totoro
Connect n Totoroi-1s to n-port switches to build a
Totoroi
Recursively defined
Half of available ports ⇒ Open & Scalable
The number of paths among Totorois is n/2 times of
the number of paths among Totoroi-1s ⇒ Multiredundant links ⇒ High network capacity
14
1-0,0
0,0,0 0,0,1 0,0,2 0,0,3 0,1,0 0,1,1 0,1,2 0,1,3
1-0,1
1-1,0
0,2,0 0,2,1 0,2,2 0,2,3 0,3,0 0,3,1 0,3,2 0,3,3
1-1,1
1,0,0 1,0,1 1,0,2 1,0,3 1,1,0 1,1,1 1,1,2 1,1,3 1,2,0 1,2,1 1,2,2 1,2,3 1,3,0 1,3,1 1,3,2 1,3,3
Xie, J., Deng, Y., Zhou, K.: Totoro: A scalable and fault-tolerant data center
2-0backup port.2-1
2-2 Parallel Computing.
2-3
Level-2
Link
network
by using
In: Network and
Springer
(2013) 94–105
Level-1 Link
3,0,0 3,0,1 3,0,2 3,0,3 3,1,0 3,1,1 3,1,2 3,1,3 3,2,0 3,2,1 3,2,2 3,2,3
1-3,0
1-3,1
3,3,0 3,3,1 3,3,2 3,3,3
2,0,0 2,0,1 2,0,2 2,0,3 2,1,0 2,1,1 2,1,2 2,1,3 2,2,0 2,2,1 2,2,2 2,2,3 2,3,0 2,3,1 2,3,2 2,3,3
1-2,0
Totoro2 structure with N = 4, n = 4, K = 2.
1-2,1
15
• Athena Routing Algorithm (ARA)
Based on Dynamic Programming (DP)
Applicable to problems which exhibit the properties of
Overlapping subproblems
Optimal substructure
Recursively calculate
16
Steps of ARA:
1. Suppose src and dst
belong to two partitions.
2. Get all paths connecting
these two partitions.
3. For each path, recursively
calculate it.
4. Store all paths.
5. Sort all path by length.
6. Remove the extra paths.
This function is based
on the corresponding
structural properties.
Cartesian product
17
• Case study of ARA
work out the path from src to dst
18
• Case study of ARA
Step. 1: src and dst belong to two different subpartitions respectively
19
• Case study of ARA
Step. 2: there exist two paths between these two subpartitions
20
• Case study of ARA
Step. 3: for Path 1, recursively work out the sub-paths
in these sub-partitions, and join them for a full path
21
• Case study of ARA
Step. 4: similarly, work out the full path for Path 2
22
• Case study of ARA
Step. 5: add all paths into the result set
23
• Case study of ARA
Step. 5: sort the paths by lengths
24
• Case study of ARA
Step. 5: remove the extra paths (here, we suppose the
size of set to return is 1, i.e., it is the shortest path)
25
• Path Probing Mechanism
Source host sends the probing request packets
Destination host sends probing reply packets
Intermediate servers record the link capacities in the
probing packets and forward them
26
• Path Probing Mechanism
Detect the failed paths No extra rerouting technique
is required
Detect the link capacity Support load balance…
27
28
29
• Protocol Implementation
ARM Packet format
Path-probing packet
Data packet
30
• Protocol Implementation
Protocol
2.5-layer protocol
4
TCP
3
IP
2.5
ARM
2
Ethernet
How an intermediate server determines the next hop?
A fact: two adjacent servers in a path only differ at one “bit”
Hence, we only store the different “bit”s in the vector.
31
• Evaluating Path Failure & Average Path Lengths
ARM vs. TFR vs. SPA
TFR: the original Totoro Fault-tolerant Routing algorithm
SPA: Shortest Path Algorithm, Floyd-Warshall, performance bound
• Evaluating Resource Usage
32
• Evaluating Path Failure & Average Path Lengths
Experimental parameters
Types of failures
Link, Node, Switch & Rack failures
Platform
Totoro2 (4096 servers)
Failures ratios
2% - 20%
Communication mode
All-to-all
Simulation times
20 times
33
• Evaluating Path Failure
Path failure ratio vs. server/rack failure ratio
The performance of ARM/TFR are almost identical to that of SPA!
34
• Evaluating Path Failure
Path failure ratio vs. switch failure ratio
The performance of ARM is almost identical to that of SPA!
But TFR isn’t.
35
• Evaluating Path Failure
Path failure ratio vs. link failure ratio
When a high link failure occurs:
ARM achieves slightly better capacity than TFR.
Performance gap between ARM and SPA still exists!
SPA traverse all feasible links in the whole
structure until finding a valid path!
This is a tradeoff that ARM makes to
facilitate algorithmic complexity and save
computation resources.
36
• Evaluating Average Path Lengths
ARM:
1. Better than TFR.
2. Almost identical to SPA.
3. Shorter than SPA, this is
because the path failure
ratio of ARM is a bit
higher than that of SPA,
thus our total path length
is shorter.
37
• Evaluating Resource Usage
Experimental parameters
Testbed
Lenovo T350, Quad-core, 8GB memory
Platform
Totoro2 (4096 servers)
Size of each result
10 paths
Communication mode
One-to-all in 4 Totoro1
38
• Evaluating Resource Usage
CPU:
28%
+10nodes/s
1. Increase by 10 per
second
2. Peak value of 28% at
18s
3. Benefited from the
cache
Memory:
0%
For each host, it only
costs 164KB at most.
18s
39
•
•
•
•
More resilient
Shorter latency
Higher capacity
Lower complexity
• In the future work, we will focus on the
implementation of ARM in DCell, FiConn and
other structures!
40
41
ICA3PP 2014: The 14th International Conference on Algorithms & Architectures for Parallel Processing. August 24-27, Dalian, China.