Load Balancing for Parallel Forwarding

Transcript Load Balancing for Parallel Forwarding

Load Balancing for Parallel
Forwarding
W. Shi, M.H. MacGregor, P. Gburzynski
Department of Computing Science
University of Alberta
Outline

Introduction



Parallel Forwarding
Common Scheduling Schemes
Previous Work




Highest Random Weight and Robust Hashing
Adaptive Load Sharing
Hashing Cannot Balance Workload
Scheduling Potent Flows
Outline (cont’d)

Scheduling TCP Bursts





TCP’s Bursty Transmit Pattern
Avoiding Packet Reordering
Load Balancer Design
Simulation and Results
Conclusions and Future Directions
Routing Tables are Growing
CIDR & Longest Prefix Match
Parallel Forwarding

Generations of Network Forwarding Systems




Software running on a CPU (80s)
Special purpose h/w to offload CPU; switches replace
buses. (mid 90s)
Decentralized design with ASIC; one CPU per card (late
90s)
Network Processors (NP or FE) [2,3]



Flexible to accommodate changes and updates; fast to
market; cost less …
Optimized for key forwarding functions and high-speed I/O
Parallel forwarding for performance and scalability
A Parallel Forwarding Model
Goals for the Scheduler



Distribute load evenly to achieve highest
possible throughput
Preserve packet ordering within flows
Maximize cache hit rate to obtain best
processor performance
Common Scheduling Schemes

Packet-level: Round Robin, Least Loaded, etc.




Distribute load evenly
Do not preserve packet order
Disperse flows and thus reduce cache hit ratio
Flow-level: Hashing (e.g., XOR, CRC, etc.)




Hashes on the flow ID, typically, the five-tuple of
{Source and Destination IP Addresses, Source and
Destination Ports, Protocol ID}
Preserves in-flow packet ordering
Increases temporal locality seen by each FE
May not distribute load evenly
Previous Work
HRW: Highest Random Weight [4]

N Web caches serve requests at a Web site.





obj: the name of the object in a request
sid: cache server identifier
Hash on {obj, sid}
Use the server whose sid gives the largest
hash value, i.e., the highest weight.
HRW can achieve long-term load balancing
Robust Hash Routing [5]



Goal: extend HRW to a system of
heterogeneous servers
Assign multipliers to cache servers to scale
the hash return values and then choose the
one with the highest weight
Application: the cache array routing protocol
(CARP)
Adaptive Load Sharing [6]


Dynamically adjust multipliers based on the
recent system load distribution.
Application: multi-NP forwarding systems.
IP Destination Address Popularity
Zipf-like Function Fits Popularity Data
Popularity Characterization [7]



Zipf-like distribution:
P(R) ~ 1/Rα
(1)
Slopes (α) fitted for the five traces, SDSC,
FUNET, UofA, IPLS, and Auck4, are -0.905,
-0.929, -1.04, -1.21, and -1.66, respectively.
With α values less than -1, hashing cannot
balance traffic, not even in the long term [8].
A Few Potent Flows
Rank
FUNET
UofA
Auck4
SDSC
IPLS
1
8,233
158,707
640,730
1,183,834
2,788,273
2
7,424
24,245
440,149
581,495
944,253
3
2,971
20,769
196,513
524,542
919,088
4
2,470
17,482
194,757
235,363
808,773
5
2,298
15,146
186,095
212,150
732,339
6
1,614
14,305
177,388
168,384
582,367
7
1,387
13,308
135,286
160,798
570,316
8
1,317
12,348
135,033
138,657
510,043
9
1,309
12,028
132,812
125,531
473,562
10
1,258
11,824
104,716
125,389
470,072
Scheduling Potent Flows [8,9,10]
Flowlets for Traffic Engineering [11]



Based on similar observations of the bursty
nature of TCP traffic
Developed independently in a different
context
Differs in many details and implications
Scheduling TCP Bursts
Ideal TCP Transmission
Actual (bursty) TCP transmission
Partially-filled Windows Result in Bursts
Eliminating Packet Reordering
Eliminating Packet Reordering







N: # of FEs
Pi, Pj, where j=i+1: two adjacent packets in a
flow
ti, tj arrival times of the two packets
Ti = tj – ti
L: the buffer size of each FE
ρ: the overall system utilization
Li, Lj: the numbers of packets preceding Pi
and Pj in their respective queues.
Eliminating Packet Reordering
Sufficient but not necessary:
Li – Ti * λ / ρ / N < Lj
λ is the bandwidth of the interface, in pps.
Ti > (Li – Lj) *ρ * N / λ.
Eliminating Packet Reordering

conservative choice is Ti > L * ρ / λ / N.
Ti > L * ρ / λ = L / μ

when Pj arrives Pi has already been
processed and thus reordering is not
possible.
(2)
Scheduling Bursts [11, 14]
Scheduling Bursts

Example





10 Gbps  1.25 GBps  1.25 Mpps assuming an average
packet size of 1000 bytes [12].
with 8 FEs, μi = 156,250 pps.
with L = 1000 pkts, Ti >6.4ms
this is much less than the minimum RTT observed in [13].
Even Better News


L could be further reduced, relaxing the conditions for Ti.
The scheme scales and parallelization helps.
% Packets Dropped
% Packets Reordered
Conclusions




Static hashing-only schemes cannot balance
workload.
Burst scheduling is relatively simple
Burst scheduling distributes load evenly
Reorder rates of < 0.1% can be attained

Larger buffers aren’t necessarily better
Future Work



Improve the burst-scheduling triggering policy
for better temporal locality
A better scheme is surely possible than that
mandated by Eq. 2.
We might schedule both potent flows and
bursts within flows
References
[1] Jeoff Huston, The BGP routing table, The Internet Protocol Journal (Cisco), vol. 4, no.
1, 2001.
[2] Niraj Shah, Understanding network processors, M.S. thesis, U. C. Berkeley,
September 2001.
[3] Douglas Comer, Network processors: Programmable technology for building network
systems, The Internet Protocol Journal (Cisco), vol. 7, no. 4, pp. 3--12, 2004.
[4] David G. Thaler and Chinya V. Ravishankar, Using name-based mappings to increase
hit rates, IEEE/ACM Transactions on Networking, vol. 6, no. 1, pp. 1--14, February
1998.
[5] Keith W. Ross, Hash routing for collections of shared Web caches, IEEE Network, vol.
11, no. 7, pp. 37--44, Nov-Dec 1997.
[6] Lukas Kencl and Jean-Yves Le Boudec, Adaptive load sharing for network processors,
in IEEE INFOCOM 2002, New York, NY, USA, June 2002, pp. 545--554.
[7] George K. Zipf, Human Behavior and the Principle of Least-Effort, Addison-Wesley,
Cambridge, MA,1949.
[8] Weiguang Shi, Mike H. MacGregor, and Pawel Gburzynski, Load balancing for parallel
forwarding, IEEE/ACM Transactions on Networking, vol. 13, no. 4, 2005.
References (cont’d)
[9] Ju-Yeon Jo, Yoohwan Kim, H. Jonathan Chao, and Frank Merat, Internet traffic load
balancing using dynamic hashing with flow volumes,. in Internet Performance and
Control of Network Systems III at SPIE ITCOM 2002, Boston, MA, USA, July 2002,
pp. 154--165.
[10] Anees Shaikh, Jennifer Rexford, and Kang G. Shin, Load-sensitive routing of longlived IP flows, ACM SIGCOMM Computer Communication Review, vol. 29, no. 4, pp.
215.226, October 1999.
[11] Shan Sinha, Srikanth Kandula, and Dina Katabi, Harnessing TCP's burstiness using
owlet switching, in 3rd ACM SIGCOMM Workshop on Hot Topics in Networks
(HotNets), San Diego, CA, November 2004.
[12] Craig Partridge, et al. A 50-gb/s IP router, IEEE/ACM Trans. Netw., vol. 6, no. 3, pp.
237--248, 1998.
[13] Jay Aikat, Jasleen Kaur, F. Donelson Smith, and Kevin Jeffay, Variability in TCP
round-trip times, in IMC '03, Miami Beach, FL, USA, 2003, pp. 279--284, ACM Press.
[14] Weiguang Shi, Mike H. MacGregor, and Pawel Gburzynski, A scalable load balancer
for forwarding internet traffic: Exploiting flow-level burstiness, in Symposium on
Architectures for Networking and Communications Systems, Princeton, NJ, USA,
October 2005.

Load Balancing for Parallel Forwarding

Transcript Load Balancing for Parallel Forwarding

Directory