Selena ANCS presentationx - Cambridge Computer Laboratory

Download Report

Transcript Selena ANCS presentationx - Cambridge Computer Laboratory

Faithful Reproduction of
Network Experiments
Dimosthenis
Pediaditakis
Charalampos
Rotsos
Andrew W.
Moore
[email protected]
Computer Laboratory, Systems Research Group
University of Cambridge, UK
http://selena-project.github.io
Research on networked systems: Present
WAN link: 40++ Gbps
1 GbE
1 GbE
1 GbE
10 GbE
100 Mbps
100 Mbps
100 Mbps
1 GbE
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
2
Performance of widely available tools
• A simple experiment
– 2-pod Fat-Tree
– 1 GbE links
– 10K 5MB TCP flows
• Simulation (ns3)
– Flat model
– 2.75x lower
throughput
• Emulation (MiniNet)
– 4.5x lower
throughput
– Skewed CDF
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
3
Why not simulation
Simulation
Example: NS2 / NS3
• Fidelity
Fidelity
– Modelling abstractions
– Real stacks or applications?
• Scalability
– Network size
– Network speed (10Gbps ++)
– Poor execution time scalability Reproducibility
Scalability
• Reproducibility
– Replication of configuration
– Repeatability of results (same rng seeds)
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
4
Why not real-time emulation
Simulation
Fidelity Emulation
Example: MiniNet
• Fidelity
– Real stacks or applications
– Heterogeneity support
– SDN devices
• Scalability
– CPU bottleneck
• Network speed
• Network size
Reproducibility
Scalability
• Reproducibility
– Replication of configuration
– Repeatability of results
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
5
In an ideal world...
Simulation
Fidelity Emulation
Our vision
What if we could achieve:
• Fidelity
– Real stacks or applications
– Heterogeneity support
– Realistic SDN switch model
• Scalability
– 10GbE, 100Gbps ...
– 100s of nodes
Reproducibility
Scalability
• Reproducibility
– Replication of configuration
– Repeatability of results
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
6
• High-level experiment description, automation
– Python API (MiniNet style)
• Real OS components, applications
– Xen based emulation
– Fine-grained resources control
– Heterogeneous deployments
• Hardware resources scaling
– Time dilation (revisiting DieCast), unmodified guests
– Users can trade execution speed for fidelity and scalability
• Network control plane fidelity
– Support for unmodified SDN platforms
– Empirical OpenFlow switch model (extensible)
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
7
Deploying an experiment with SELENA
Selena
compiler
Bridge
http://selena-project.github.io/
Bridge
OVS
ANCS 2014, Marina del Rey, Califoria, USA
8
Scaling resources via Time Dilation
• Create a scenario, choose TDF
• Linear and symmetric scaling of “perceived” by the
guest OS resource
– Network I/O , CPU, disk I/O
• Control independently the guest’s “perception” of
available resources
– CPUs
 Xen Credit2
– Network  Xen VIF QoS, NetEm/DummyNet
– Disk I/O  within guests via cgroups/rctl
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
9
The concept of Time-Dilation
Real time
1 tick = (1/C_Hz) seconds
10 Mbits data
rateREAL = 10 / (6*C_Hz) Mbps
Real
Time
2x Dilated time (TDF = 2)
OR
(tick rate)/2 , C_Hz
tick rate
, 2*C_Hz
Virtual
time
rateVIRT
10 Mbits data
= 10 / (3*C_Hz) Mbps = 2 * rateREAL
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
I command you
to slow down
10
PV-guest time dilation
• Wall clock time
– Time since epoch
– System time (boot)
– Independent
clock mode (rdtsc)
Hypervisor_set_timer_op
set
next
event
XEN
Clock Source
XEN VIRQ
TSC
value
• Timer interrupts
– Scheduled timers
– Periodic timers
– Loop delays
rdtsc
VIRQ_TIMER
XEN Hypervisor
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
11
OpenFlow Toolstack X-Ray
Control application complexity
Control
App
Control
App
Available capacity, synchronicity
Network OS
Control
Channel
OF Agent
- Scarce co-processor resources
- Switch OS scheduling is non-trivial
ASIC driver  policy configuration :
- latency and semantics in
ASIC
Limited PCI bus capacity
How critical is SDN control plane performance
for the data plane performance ?
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
12
Building an OpenFlow switch model
• Measure an off-the-shelf switch device
– Measure message processing performance (OFLOPS)
– Extract latency and loss characteristics of:
• flow table management
• the packet interception / injection mechanism
• Statistics counters extraction
• Configurable switch model
– Replicate latency and loss characteristics
– Implementation: Mirage-OS based switch
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
13
Evaluation roadmap
Methodology
Dimensions of fidelity
1. Run experiment on
real hardware
1. Throughput
2. Latency
2. Reproduce results in:
–
–
–
MiniNet
NS3
SELENA
3. Control plane
4. Application performance
3. Compare against “real”
http://selena-project.github.io/
5. Scalability
ANCS 2014, Marina del Rey, California, USA
14
Latency fidelity
ns3
mininet
Setup
- 18 nodes,
1Gbps links
10000 flows
Platform
Execution Time
Mininet
120s
Ns-3
172m 51s
SELENA (TDF=20)
40m
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
MiniNet, Ns3 accuracy:
32% and 44%
Selena accuracy
71% with 5x dilation
98.7% with 20x dilation
15
1Mb TCP flows completion time
exponential arrival λ = 0.02
SDN Control plane Fidelity
http://selena-project.github.io/
Stepping behavior:
- TCP SYN & SYNACK loss
Mininet switch model:
- does not capture this
throttling effect
The model is not capable to capture
transient switch OS scheduling effects
of the real switch.
ANCS 2014, Marina del Rey, Califoria, USA
16
Scalability analysis
Bridge
OVS
Bridge
• Star topology, 1 GbE links, multi Gbit sink link
• Dom-0 is allocated 4-cores
– Why tops at 250% CPU utilisation ?
• Near linear scalability
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
17
Application fidelity (LAMP)
• 2-pod Fat-Tree
–
–
–
–
1 GbE links
10x switches
4x Clients
4x WebServers:
Apache2, PHP,
MySQL, Redis,
Wordpress
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
18
SELENA usage guidelines
• SELENA is primarily a NETWORK emulation framework
– Perfect match: network bound applications
– Allows experimentation with:
• CPU, disk, Network relative performance
• Real applications / SDN controllers / network stacks
– Improved fidelity and scalability
• Outperforms common simulation / emulation tools
• Time dilation is exciting but not a panacea
– Hardware-specific performance characteristics, e.g.:
• Disks, cache size, per-core lock contention, Intel DDIO
• Rule of thumb for choosing TDF
– Low Dom-0 and Dom-U utilisation
– Observation time-scales matter
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
19
http://selena-project.github.io
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
20
SELENA is free and open.
Give it a try: http://selena-project.github.io
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
21
Backup slides
Throughput fidelity
Platform
Execution Time
Mininet
120s
Ns-3
175m 24s
SELENA (TDF=10)
20m
http://selena-project.github.io/
MiniNet and Ns3
- 2.7Gbps and 5.3Gbps
SELENA
- 10x dilation: 99.5% accuracy
- executes 9x faster than Ns3
ANCS 2014, Marina del Rey, Califoria, USA
24
Scalability
• Multi-machine emulation
– Synchronization among host
– Efficient placement
• Optimize guest-2-guest Xen communications
• Auto-tuning of TDF
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
25
A layered SDN controller hierarchy
1st Layer Controller
2nd Layer Controller
4 pod, Fat-Tree topology, 1GbE links
32 Gbps aggregate traffic
• More layers
– Control decisions taken higher in the hierarchy
The layered control-plane architecture
– Flow setup latency increases
• Network, Request pipelining,
CPU load
Question:
does a layered controller hierarchy affect performance ?
–How
Resilience
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
26
Limitations of ns-3
• Layer 2 models
– CSMA Link:
• Half duplex -> lower throughput.
• The only wired model supporting Ethernet .
– Point-to-point link model:
• IP only -> Cannot use switches.
• Distributed -> Synchronisation is not a good fit for DC
experiments.
• Time scalability is similar to CSMA.
• Layer 3 models
– TCP socket model
• No window scaling
Containers vs Xen
• Heterogeneity (OS, network stacks)
• OS-level time virtualization is easier
• Resource management
– Containers: cgroups, kernel noise, convoluted tuning
– Xen: Domain-0 -- Xen -- Dom-U isolation
• Can run MiniNet in a time-dilated VM
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
28
Why not just scale network rates
• Non uniform resource and time scaling
– User space applications
– Kernel (protocols, timers, link emulation)
• Not capturing the packet-level protocol effects
– E.g. TCP window sizing
– Queueing fidelity
• Lessons learned via MiniNet use cases
– JellyFish topology
– TCP-incast effect
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
29
Related work
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
31
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
32
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
33
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
34
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
35
http://selena-project.github.io
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
36
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califoria, USA
37
Research on networked systems:
past, present, future
• Animation: 3 examples of networks.
Examples will show the evolution of “network-characteristics” on
which research is conducted:
– Past: 2-3 Layers, Hierarchical, TOR, 100Mbps, bare metal OS
– Present: Fat-tree, 1Gbps links, Virtualization, WAN links
– Near future: Flexible architectures, 10Gbps, Elastic resource management,
SDN controllers, OF switches, large scale (DC),
• The point of this slide is that real-world systems progress at a fast pace
(complexity, size) but common tools have not kept up with this pace
• I will challenge the audience to think:
– Which of the 3 examples of illustrated networks they believe they can
model with existing tools
– What level of fidelity (incl. Protocols, SDN, Apps, Net emulation)
– What are the common sized and link speeds they can model
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
46
A simple example with NS-3
• Here I will assume a simple star-topology
• 10x clients, 1x server, 1x switch (10Gbps
aggregate)
• I will provide the throughput plot and explain
why performance sucks
• Point out that NS3 is not appropriate for faster
networks
• Simplicity of models + non real applications
• Using DCE: even slower, non full POSIXcompliant
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
47
A simple example with MiniNet
• Same as before
• Throughput plot
• Better fidelity in terms of protocols, applications
etc
– Penalty in performance
• Explain what is the bottleneck, especially in
relation to MiniNet’s implementation
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
48
Everything is a trade-off
• Nothing comes for free when it comes to modelling and the 3 keyexperimentation properties
• MiniNet aims for fidelity
– Sacrifices scalability
• NS-3 aims for scalability (many abstractions)
– Sacrifices fidelity, +scalability limitations
• The importance of Reproducibility
– MiniNet is a pioneer
– difficult to maintain from machine to machine
• MiniNet cannot guarantee that at the level of performance, only at the level of configuration
Fidelity
Reproducibility
http://selena-project.github.io/
Scalability
ANCS 2014, Marina del Rey, California, USA
49
SELENA: Standing on the shoulders of giants
• Fidelity: use Emulation
– Unmodified apps and protocols: fidelity + usability
– XEN: Support for common OS, good scalability, great control on resources
• Reproducible experiments
– MiniNet approach, high-level experiment descriptions, automation
• Maintain fidelity under scale
– DieCast approach: time dilation (will talk more later on that)
• The user is the MASTER:
– Tuning knob: Experiment Execution speed
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
50
SELENA Architecture
• Animation here: 3 steps show how an experiment is
– Specified (python API)
– compiled
– deployed
• Explain mappings of network entities-features to Xen
emulation components
• Give hints of optimization tweaks we use under the hood
Experiment description
Python API
Selena
compiler
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
51
Time Dilation and Reproducibility
• Explain how time dilation also FACILITATES
reproducibility across different platforms
• Reproducibility
– Replication of configuration
•
•
•
•
Network architecture, links, protocols
Applications
Traffic / workloads
How we do it in SELENA: Python API, XEN API
– Reproduction of results and observed performance
• Each platform should have enough resources to rund faithfully
the experiment
• How we do it in SELENA: time dilation
– An older platform/hardware will require a different minimum TDF to
reproduce the same results
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
52
Demystifying Time-Dilation 1/3
• Explain the concept in high-level terms
– Give a solid example with a timeline
• Similar to slide 8: http://sysnet.ucsd.edu/projects/timedilation/nsdi06-tdf-talk.pdf
• Explain that everything happens at the H/V level
– Guest time sandboxing (experiment VMs)
– Common time for kernel + user space
– No modifications for PV guests
• Linux, FreeBSD, ClickOS, OSv, Mirage
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
53
Demystifying Time-Dilation 2/3
• Here we explain the low-level staff
• Give credits to DieCast, but also explain the
incremental work we did
• Best to show/explain with an animation
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
54
Demystifying Time-Dilation 3/3
• Resources scaling
– Linear and symmetric scaling for Network, CPU, ram BW, disk
I/O
– TDF only increases the perceived performance headroom of
the above
– SELENA allows for configuring independently the perceived
speeds of
• CPU
• Network
• Disk I/O (from within the guests at the moment -- cgroups)
• Typical workflow
1. Create a scenario
2. Decide the minimum necessary TDF for supporting the
desired (will see more later on that)
3. Independently scale resources, based on the requirements
of the users and the focus of their studies
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
55
Summarizing the elements of Fidelity
• Resource scaling via time dilation (already
covered)
• Real Stacks and other OS components
• Real Applications
– Including SDN controllers
• Realistic SDN switch models
– Why is it important
– How much can it affect observed behaviours
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
56
Inside an OF switch
• Present a model of an OF switch internals
– Show components
– Show paths / interactions which affect performance
• Data plane (we do not model that currently)
• Control plane
Random image
from the web.
Just a
placeholder
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
57
Building a realistic OF switch model
• Methodology for constructing an empirical
model
– PICA-8
– OFLOPS measurements
• Collect, analyze, extract trends
• Stochastic model
– Use a mirage-switch to implement the model
• Flexible, functional, non-bloated code
• Performant: uni-kernel, no context switches
• Small footprint: scalable emulations
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
58
Evaluation methodology
1. Run experiment on real hardware
2. Reproduce results in:
1. MiniNet
2. NS3
3. SELENA (for various TDF)
3. Compare each one against “real”
•
We evaluate multiple aspects of fidelity:
–
–
–
–
Data-Plane
Flow-level
SDN Control
Application
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
59
Data-Plane fidelity
• Figure from paper
• Explain Star-topology
• Show comparison of MiniNet + NS3
– Same figures from slides 2+3 but now compared
against Selena + real
• Point out how increasing TDF affects fidelity
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
60
Flow-Level fidelity
• Figure from paper
• Explain Fat-tree topology
http://selena-project.github.io/
ANCS 2014, Marina del Rey, Califorina, USA
61
Execution Speed
• Compare against NS3, MiniNet
• Point out that SELENA executes faster than NS3
– NS3 however replicates only half speed network
• Therefore difference is even bigger
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
62
SDN Control plane Fidelity
• Figure from paper
• Explain experiment setup
• Point out shortcomings of MiniNet
– As good as OVS is
• Point out terrible support for SDN by NS3
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
63
Application level fidelity
•
•
•
•
Figure from paper
Explain the experiment setup
Latency aspect
Show how CPU utilisation matters for fidelity
– Open the dialogue for the performance bottlenecks
and limitations and make a smooth transition to the
next slide
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
64
Near-linear Scalability
• Figure from paper
• Explain how is scalability determined for a given
TDF
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
65
Limitations discussion
• Explain the effects of running on Xen
• Explain what happens if TDF is low and
utilisation is high
• Explain that insufficient CPU compromises
– Emulated network speeds
– Capability of guests to utilise the available
bandwidth
– Skews the performance of networked applications
– Adds excessive latency
• Scheduling also contributes
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
66
A more complicated example
• Showcase the power of SELENA :P
• Use the MRC2 experiment
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
67
Work in progress
• API compatibility with MiniNet
• Further improve scalability
- Multi-machine emulation
- Optimize guest-2-guest Xen communications
• Features and use cases
– SDN coupling with workload consolidation
– Emulation of live VM migration
– Incorporate energy models
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
68
SELENA is free and open.
Give it a try:
- http://selena-project.github.io
http://selena-project.github.io/
ANCS 2014, Marina del Rey, California, USA
69