Data Center Load Balancing

Download Report

Transcript Data Center Load Balancing

Data Center Load Balancing
T-106.5840 Seminar
Kristian Hartikainen
Aalto University, Helsinki, Finland
9.12.2015
Load Balancing
• Efficient distribution of the workload across the
available computing resources
– Distributing computation over multiple CPU cores
– Distributing network requests across multiple servers
– And many others...
• The goals is efficient resource usage to optimize
the desired performance metrics
– Maximizing network throughput
– Minimizing latency
– And many others...
Data Center Load Balancing
• Load balancing problems arise in several
(computing) contexts
• Our focus is on the data center load balancing
• Data center load balancing also consists of
several different levels
– Network traffic, CPU inside servers, servers, server
racks, server clusters, between data centers
• We studied load balancing of network traffic
and virtual servers
MOTIVATION
The Free Lunch Is Over
• Single threaded preformance have hit the wall
• Number of transistors in the microprocessors is still growing
Herb Sutter: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in
Software. Dr. Dobb’s Journal, 30(3), March 2005 (updated graph in August 2009).
Amdahl’s Law
• Clear limitations in the speed up gains that parallel programs can
achieve
• Thus, keeping the resources utilized is challenging
https://en.wikipedia.org/wiki/Amdahl%27s_law
Towards Data Center Computing
• At the same time, the amount of data mobile
devices, sensors and data transferred in general,
have proliferated
• Data center technology and virtualization has
been developing rapidly as well
• Data centers are becoming larger and larger and
more common
– Large companies are building their own data centers
– Many other companies are moving their computation,
storage and operations to cloud
Large Scale Data Centers
• Data centers provide many advantages over
traditional computing
– Economics of scale
– Enables use of cheap commodity hardware
– Cheap hardware, cooling, electricity, network, etc...
• However, to efficiently utilize data center
resources, and to provide the required
performance guarantees, efficient load balancing
mechanisms are needed on different levels of the
data center
NETWORK TRAFFIC LOAD
BALANCING
Network Traffic Load Balancing
• Today’s data centers are huge
– Hundreds of thousands of servers
– Supporting huge amount of services
• Big data apps
• Web services
• High performance computing
• Network traffic grows
– Both inter and intra data center traffic
• Network bandwidth is one of the major
bottlenecks in data centers
Data Center Networks
• Traditional data center network topologies are single rooted trees
• Limited port density (even in the highest-end) switches forces the data center
topology to take a form of multi-rooted tree
• For example fat-tree or leaf-spine
Data Center Networks
• The problem: How to efficiently utilize the theoretical bandwidth gains for the
multi-rooted design?
Flow Hashing
• Most of today’s load balancing mechanisms
are based on flow hashing
– E.g. Equal-Cost Multi-Path forwarding
• Basic idea: split the packet flows randomly
across multiple network paths
– E.g. by hashing the packet header (e.g. 5-tuple)
• ECMP
– Forwarding decisions made hop-by-hop
– All the routes are equal cost
Flow Hashing
Flow Hashing
• Pros:
– Easy to implement
– Good performance in ideal system conditions
– Packets are automatically kept in order, which is
crucial for certain protocols such as TCP.
• Cons:
– Hashing decisions are purely local
– And totally unaware of the congestion state of the
system
Congestion Aware Load Balancing
• Several proposals of congestion aware load
balancing have been made to overcome the
problems of hash-based methods
• Difficulties:
– How to handle packet reordering?
– Centralized vs. distributed systems?
– How to implement fast system with no specialized
hardware?
• Couple of examples: Hedera, Presto, CONGA
Hedera: Dynamic Flow Scheduling for
Data Center Networks
Flowlets
• One of the problems in non-hash-based load
balancing mechanisms is packet reordering
• Several solutions overcome this problem by doing
the load balancing decisions on per-flow basis,
instead of per-packet basis
• Flowlet is a burst of packets belonging to the
same flow, that are separated from other brusts
in the same flow by a large enough gap, that
splitting them on a separate paths do not cause
reordering problem
CONGA: Distributed Congestion-Aware
Load Balancing For Datacenters
• Distributed load balancing scheme
• Maintains the congestion state of each path in
the leaf nodes
• Congestion information is carried directly in
the hardware data plane of the switches (in
the VXLAN virtualization overlay headers)
CONGA: Distributed Congestion-Aware
Load Balancing For Datacenters
• Pros:
– High utilization of the network
– Reacts fast to congestion
– Fairly simple
• Cons:
– Distributed load balancing systems are often slow
– CONGA and many other distributed systems overcome
this problem by using customized networking
hardware
• Makes deployment hard
Presto: Edge-based Load Balancing for
Fast Datacenter Networks
• Load balancing mechanism implemented in
the soft network edge (virtual switches)
• Routes the flowlets through the network using
round robin algorithm
• Solves the problems of hash-based algorithms
– Works even in asymmetric topologies
– Elephant flows do not cause problems
Presto: Edge-based Load Balancing for
Fast Datacenter Networks
• Pros:
– Deals well with network failures and asymmetry
– Fully implemented in the software (~500 lines of
code in Open vSwitch and ~900 lines of code in
Linux Generic Receive Offload (GRO))
– Thus easy to deploy
• Cons:
– Too slow compared to HW solutions?
VIRTUAL SERVER LOAD BALANCING
Virtual Server Load Balancing
• Another part of data center that needs to be
balanced
• Goals and methods differ from network load
balancing
– Goal seems to be more about energy efficiency
rather than pure speed ups of scalability
Power Usage of Warehouse Scale
Server
Figure: L. A. Barroso, J. Clidaras, and U. Hlzle, “The datacenter as a computer: An
introduction to the design of warehouse-scale machines, second edition”
Server Load Balancing
• Cloud data center servers are often virtualized
• Virtual machine migration allows flexible
movement of servers between physical hardware
• Migration brings over head
• When and which virtual machine should be
migrated, and where?
• How to develop algorithm that scales?
• How to cope with heterogeneous allocation
policies and different objectives?
Virtual Server Migration
• Distributed vs. centralized load balancing
– Similarly as with network traffic load balancing
• Dynamic vs. Static load balancing
• Metrics to make the migration decisions
– CPU-, memory-, network usage, etc...
Example Load Balancing System
[A. Beloglazov and R. Buyya, “Energy efficient resource management in virtualized cloud data centers”]
• Decentralized
• Three level system architecture
– Dispatcher
• Distributes requests between global managers
– Global Manager
• Supervises a set of local managers
• Distributes their own local manager data between other
global managers
– Local manager
• Inside each of the physical servers nodes
• Responsible for continuous monitoring of the resource
utilization
Example Load Balancing System
[A. Beloglazov and R. Buyya, “Energy efficient resource management in virtualized
cloud data centers”]
EXPERIMENT
Experiment: Simulation of ECMP
• Equal-Cost Multi-Path (ECMP) experiment in a
small data center
• Simulated using Performance Simulation
Environment (PSE)
– In-house discrete event simulator
Simulation setup
Figure: http://simula.stanford.edu/~alizade/papers/conga-sigcomm14.pdf
PSE Model
Spine 0
Spine 1
Leaf 0
Leaf 1
server [32]
server [32]
OUT
IN
CONCLUSIONS
Conclusions
•
•
•
•
Load balancing is important
Load balancing is challenging
Experiment is not ready
Network traffic load balancing is more about
scalability without sacrificing latency or
throughput under unexpected network
conditions
• Server load balancing is more about efficient
utilization of the server nodes, to reduce energy
consumption