Slides - Duke Computer Science

Transcript Slides - Duke Computer Science

Data Center Fabrics
Forwarding Today
• Layer 3 approach:
– Assign IP addresses to hosts hierarchically based on their directly
connected switch.
– Use standard intra-domain routing protocols, eg. OSPF.
– Large administration overhead
• Layer 2 approach:
• Forwarding on flat MAC addresses
• Less administrative overhead
• Bad scalability
• Low performance
– Middle ground between layer 2 and layer 3:
• VLAN
• Feasible for smaller scale topologies
• Resource partition problem
Requirements due to Virtualization
• End host virtualization:
– Needs to support large addresses and VM migrations
– In layer 3 fabric, migrating the VM to a different switch
changes VM’s IP address
– In layer 2 fabric, migrating VM incurs scaling ARP and
performing routing/forwarding on millions of flat MAC
addresses.
Motivation
• Eliminate Over-subscription
– Solution: Commodity switch hardware
• Virtual Machine Migration
– Solution: Split IP address from location.
• Failure avoidance
– Solution: Fast scalable routing
Architectural Similarities
•
Both approaches use indirection
– Application address doesn’t change when VM moves, all that changes in Location address
– Location addresses: specifies location in network
– Application address: specifies address of VM
•
A network of commodity switches
– Reduces energy consumptions
– Allows to afford enough switches to eliminate overprovision
•
Central entity to perform name resolution between Location address and
application address
–
–
–
–
•
Directory Service: VL2
Fabric Manager: Portland
Both entities are triggered by ARP request.
Stores mapping of LA to AA
Gateway devices
– Perform encapsulation/decapsulation of external traffic
Architecture Differences
• Routing
– VL2: Source routing based
• Each packet contains the address of all switches to traverse
– Portland: topology based routing
• Location addresses encoding location with the tree
• Each switch is aware of how to decode location addresses
– Forwarding is based on this intimate knowledge.
• Indirection
– VL2: Indirection is on L3: IP-in-IP encapsulation
– Portland: Indirection is on L2: IP-to-Pmac
• ARP functionality:
– Portland: ARP returns IP to Pmac
– VL2: ARP returns a list of intermediate switches to traverse
Portland
Fat-Tree
•
•
Inter-connect racks (of servers) using a fat-tree topology
Fat-Tree: a special type of Clos Networks (after C. Clos)
K-ary fat tree: three-layer topology (edge, aggregation and core)
– each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches
– each edge switch connects to k/2 servers & k/2 aggr. switches
– each aggr. switch connects to k/2 edge & k/2 core switches
– (k/2)2 core switches: each connects to k pods
Fat-tree
with K=2
8
Why?
•
•
•
Why Fat-Tree?
– Fat tree has identical bandwidth at any bisections
– Each layer has the same aggregated bandwidth
Can be built using cheap devices with uniform capacity
– Each port supports same speed as end host
– All devices can transmit at line speed if packets are distributed uniform along
available paths
Great scalability: k-port switch supports k3/4 servers
Fat tree network with K = 3 supporting 54 hosts
9
PortLand
Assuming: a Fat-tree network topology for DC
• Introduce “pseudo MAC addresses” to balance the pros and
cons of flat- vs. topology-dependent addressing
• PMACs are “topology-dependent,” hierarchical addresses
– But used only as “host locators,” not “host identities”
– IP addresses used as “host identities” (for compatibility w/
apps)
• Pros: small switch state & Seamless VM migration
• Pros: “eliminate” flooding in both data & control planes
• But requires a IP-to-PMAC mapping and name resolution
– a location directory service
• And location discovery protocol & fabric manager
– for support of “plug-&-play”
10
PMAC Addressing Scheme
•
•
PMAC (48 bits): pod.position.port.vmid
– Pod: 16 bits; position and port (8 bits); vmid: 16 bits
Assign only to servers (end-hosts) – by switches
pod
position
11
Location Discovery Protocol
•
•
Location Discovery Messages (LDMs) exchanged between neighboring switches
Switches self-discover location on boot up
Location Characteristics
Technique
Tree-level (edge, aggr. , core) auto-discovery via neighbor connectivity
Position #
aggregation switch help edge switches decide
Pod #
request (by pos. 0 switch only) to fabric manager
12
PortLand: Name Resolution
•
•
Edge switch listens to end hosts, and discover new source MACs
Installs <IP, PMAC> mappings, and informs fabric manager
13
PortLand: Name Resolution …
•
•
Edge switch intercepts ARP messages from end hosts
send request to fabric manager, which replies with PMAC
14
PortLand: Fabric Manager
•
•
fabric manager: logically centralized, multi-homed server
maintains topology and <IP,PMAC> mappings in “soft state”
15
VL2
Design: Clos Network
• Same capacity at each layer
– No oversubscription
• Many paths
available
– Low sensitivity
to failures
Design: Separate Names from Locations
• Packet forwarding
– VL2 agent (at host) traps packets and encapsulates them
• Address resolution
– ARP requests converted to unicast to directory system
– Cached for performance
• Access control (security policy) via the directory system
LookUp (AA)
Application
VL2 Agent
IncapInfo (AA)
User space
Kernel
Server Machine
Directory System
Design: Separate Names from Locations
Design : Valiant Load Balancing
• Each flow goes through a different random path
• Hot-spot free for tested TMs
Design : VL2 Directory System
• Built using servers from the data center
• Two-tiered directory system architecture
– Tier 1 : read optimized cache servers (directory
server)
– Tier 2 : write optimized mapping servers (RSM)
Benefits + Drawbacks
Benefits
• VM migration
– No need to worry L2 broadcast
– Location+address dependence
• Revisiting fault tolerance
– Placement requirements
Loop-free
Forwarding impacts
FT: Service
allocation
and Fault-Tolerant Routing
worst-case survival
• Switches build forwarding tables based on their position
network core
– edge, aggregation and core switches
• Use strict “up-down semantics” to ensure loop-free
switches
forwarding
containers
racks – Load-balancing: use any ECMP path via flow hashing to
ensure packet ordering
power
• Fault-tolerant routing:
distribution
– Mostly concerned with detecting failures
Worst-case
survival:
– Fabric manager
maintains logical fault matrix with per-link
informcontainer,
affected switches
– redconnectivity
service: 0%info;
-- same
power
– Affected switches re-compute forwarding tables
– green service: 67% -- different containers, power
24
4
Draw Backs
• Higher failures
– Commodity switches fail more frequently
• No straight forward way to expand
– Expand in large increments, values of k
• Look-up servers
– Additional infrastructure servers
– Higher upfront startup latency
• Need special gateway servers
Draw Backs
• Higher failures
– Commodity switches fail more frequently
• No straight forward way to expand
– Expand in large increments, values of k
• Look-up servers
– Additional infrastructure servers
– Higher upfront startup latency
Draw Backs
• Higher failures
– Commodity switches fail more frequently
• No straight forward way to expand
– Expand in large increments, values of k
• Look-up servers
– Additional infrastructure servers
– Higher upfront startup latency