pptx - Stanford University

Download Report

Transcript pptx - Stanford University

CS244 Lecture 15
Data Centers
VL2: A Scalable and Flexible Data Center Network
Albert Greenberg, et al.
Lisa Yan
(adopted from Jen Rexford’s COS-561
Slides at Princeton University)
A timeline review
•
•
•
•
•
•
2007: SDN: Ethane
2008: OpenFlow
2009: VL2 paper
2010: SDN: Onix
2012: NFV: NVP (paper in 2014)
2013: B4
“Is this the first paper we've read this quarter produced by a
corporation that isn't Google? “ – Nick Yannacone
“The VMware paper benefits from the more sophisticated ideas
of SDN…If [VL2] had been published earlier I suspect we would
be reading quite a different paper.” – Wyatt Daviau
2
DC Origins:
Network Virtualization
3
Datacenters + Cloud Computing
• Expansive variety of services and workloads
• Multiple tenants
• Flexible service management (desired)
– Resiliency: isolate failures of servers and storage
– Agility: assign any server to any service
• Higher server utilization and lower costs
4
Data Center Costs
Amortized Cost*
Component
Sub-Components
~45%
Servers
CPU, memory, disk
~25%
Power infrastructure
UPS, cooling, power distribution
~15%
Power draw
Electrical utility costs
~15%
Network
Switches, links, transit
• Total cost varies
– Upwards of $1/4 B for mega data center
– Server costs dominate
– Network costs significant
The Cost of a Cloud: Research Problems in Data Center Networks.
Sigcomm CCR 2009. Greenberg, Hamilton, Maltz, Patel.
*3 yr amortization for servers, 15 yr for infrastructure; 5% cost of money
Maltz, WIOV ‘11
9
5
Datacenter Drawbacks
• Limited server-to-server capacity
• Fragmentation of resources
• Poor reliability and utilization
6
Performance Isolation
Maltz, WIOV ‘11
7
VL2 Paper (2009, 2011)
• Measurements
• Topology decision
• VL2
http://research.microsoft.com/en-US/news/features/datacenternetworking-081909.aspx
8
What you said
“This paper seems to complement well the Google B4 article –
it really feels like the L2 counterpart to B4, which would be an
“L3” paper.” – Alexander Schaub
" I think VL2 has one certain strength: it does not require
additional setup or infrastructure. Updating the switch and
client network interface is the only requirement to install this
system on the current data center.” – Sunkyu Lim
“I appreciated that the paper went through the trouble of
presenting typical data-center flow characteristics… I felt that
this type of in-depth analysis is often missing from the papers
that we have read.” – Tushar Paul
9
Microsoft Azure
“Network virtualization at scale, enabling
SDN and NFV”
VL2
Azure Clos Fabrics with 40G NICs
Regional Spine
Data Center Spine
Row Spine
Rack
T0-1
T1-1
T0-2
T1-2
…
Servers
T2-1-1
… T1-7
T0-20
T2-1-2
T3-1
…
T3-2
T0-1
…
T2-1-8
T1-1
T1-8
T0-2
T3-3
T1-2
…
T3-4
T2-2-1
… T1-7
T0-20
Scale-out, active-active
T2-2-2
T1-8
…
T1-1
T0-1
Servers
T0-2
T2-2-4
T1-2
…
… T1-7
T1-8
T0-20
Servers
Scale-up, active-passive
L3
Outcome of >10 years of history, with major
revisions every six months
LB/FW
LB/FW
LB/FW
LB/FW
L2
16
http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/keynote.pdf
10
Flow Distribution
99% flows from mice (<100MB)
>90% bytes from elephants (100MB-1GB)
Additional info:
DCTCP paper
11
Traffic Matrices and Failure
• Highly variable traffic patterns with no shortterm predictability
• Small failures common but downtimes
sometimes significant
– 50% failures involve <4 devices
– 0.09% failures last >10 days
• Network misconfigurations, firmware bugs,
faulty components
12
Traditional topology
Issues:
• Oversubscription
• Single point of failure – STP
13
Problem with STP
Problem$with$STP$ Problem$with$STP$
14
Clos topology
• Full mesh, non-blocking
FatTree-based DC Architecture
• Inter-connect racks (of servers) using a fat-tree topology
• Commodity switches
K-ary fat tree: three-layer topology (edge, aggregation and core)
– each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches
– each edge switch connects to k/2 servers & k/2 aggr. switches
– each aggr. switch connects to k/2 edge & k/2 core switches
– (k/2)2 core switches: each connects to k pods
– Scale out, not scale up
Fat-tree with
K=4
Fat tree
15
More topology
• Hypercube: Bcube
– Source routing
• Random graph:
Jellyfish
– ECMP + MPTCP
Additional info:
BCube paper
Jellyfish paper
16
VL2
•
•
•
•
•
Why traffic analysis for VL2?
What are Layer 2 semantics?
Layer 2.5 Shim at Host
Why double encapsulation?
Why hash?
17
VL2
CR
AR
...
AR
2. Uniform high
S
S
capacity
S
S
…
CR
1. L2 semantics
S
3. Performance
S
isolation
S
…
AR
...
S
S
…
AR
S
S
S
…
18
VL2 Goals and Solutions
Objective
1. Layer-2 semantics
2. Uniform
high capacity
between servers
3. Performance
Isolation
Approach
Solution
Employ flat addressing
Name-location separation
& resolution service
Guarantee bandwidth for
hose-model traffic
Flow-based random traffic
indirection
(VLB)
Enforce hose model
using existing
mechanisms only
TCP
“Hose”: each node has ingress/egress bandwidth constraints
19
19
VL2 Agent in Action
H(ft)
Int
dst LA
IP
src
IP
dst IPLA
src
H(ft)
IP
dstToR
dst AA
src AA
payload
Int
(10.1.1.1)
(10.0.0.4)
ToR
(20.0.0.1)
VL2 Agent
(10.0.0.6)
ToR
(20.0.0.1)
VLB
ECMP
20
VLB and ECMP
• VLB: pick IP address of Intermediate Switch
– Per flow, performed at VL2 agent
– Anycast addresses upon lookup
• ECMP: pick a switch that has the VLB-selected
IP address
– Any active Int Switch w/anycast address
• 5-tuple hash saved in src IP field to avoid
issues with multiple IP headers
anycast: many nodes have one IP address
21
What you said – Congestion
" The weakest part of the paper, I think, was its
reliance on TCP.” – Anna Saplitski
“[This paper] hand-waves away issues related to
workloads that the authors have decided are
non-standard… Shouldn't a well-designed
system be able to handle workloads that you
don't expect?” – Jacqueline Speiser
22
VL2 questions
• Elephant flows?
• Broadcast?
• Internet communication?
23
What you said – Directory System
"Their directory is still a bit of a mystery to me; it
seems like the entire architecture depends on it
working quite well and the paper unfortunately
doesn't really describe the specifics of how
inconsistencies are avoided within that system.”
– Bryce Taylor
24
VL2 Directory System
• End-system based address resolution and
centralized directory system
• Drawbacks?
25
VL2 Directory System
• Read-optimized Directory Servers
• Write-optimized
Replicated State
Machines
• Reactive cache
updates
26
What you said
“What I think lacked in this paper was their testing. ….
They also ran a fairness test of only 2 applications, when
in reality data centers can have hundreds…However, the
testing doesn’t invalidate the ideas of the paper, which I
feel are very good.” – Evan Cheshire
“I was less moved by the 94% throughput figure that was
advertised in the abstract. First of all, they had all servers
send a uniform amount of data to all other servers, which
seems unrealistic for any real-world data shuffling
scenario.” – Blane Wilson
27
VL2 Experiments
• Uniform high capacity
– Hose model
– 94% efficiency (encap/TCP overhead)
• VLB fairness
• Performance isolation – TCP (?)
28
VLB summary
Strengths
• General topology
• No network hardware
change
• Standard routing
functions
• Encapsulation is “safe”
• Conclusions based on
real experiments
Weaknesses
• Encapsulation overhead
• Handling of off-network
traffic
• Stale mappings
29
Related work
• Agility
– SIGCOMM 2009: Portland
– Today: Network Virtualization (NVP)
• Flow Scheduling
– pFabric, DCTCP
– Hedera
• Better multipathing: MPTCP
• Topologies: FatTree, Bcube, Jellyfish
30
Next lecture: Wireless
• 1000+ citations
31
Next lecture: Wireless
• WiFi measurements
– Unlicensed spectrum
– CSMA/CA: Carrier Sensing Multiple Access with
Collision Avoidance
• IEEE 802.11b
– Early 2000s
– 11 Mbps, 2.4GHz band
32
Next lecture: Coding
• Binary Phase Shift Keying:
Encode binary message by changing phase
of signal
33
Next lecure: Interference
• SNR (S/N): Signal to Noise Ratio
• Fading: destructive interference
• Multipath fading
34