Quick Start Steps - Wiki

Download Report

Transcript Quick Start Steps - Wiki

vNet-SLA – readout to vsperf
Maciek Konstantynowicz, [email protected]
Miroslav Miklus, [email protected]
14th October 2015
vNet-SLA project
•
Scope
•
vNet topologies
•
Testing methodologies
•
Sample results
•
Proposed contributions to opnfv.org/vsperf
Background
Cloud-Hosted Management
Cloud network services
 VNF service topologies (“chains”).
 Delivering deeper packet and flow processing.
Scalable, elastic, on-demand
vR
vFW
Internet
Router
vWSEC
CPE
IP
connectivity
CPE
CPE
SP CLOUD
Internet
Connectivity
 IP or Overlay tunnels – IPSec, GPE.
 Other overlay encapsulations.
Background
Cloud-Hosted Management
Scalable, elastic, on-demand
vR
vFW
Internet
Router
vWSEC
CPE
IP
connectivity
CPE
CPE
SP CLOUD
Internet
vNet-SLA project: benchmarking of
virtual network data plane
vNet-SLA Project
•
Goal:
•
•
Approach:
•
•
•
Evaluate boundaries of deterministic virtual network services performance and network service density on x86
A systematic top-down and bottom-up experimental approach to understand virtual networking technologies and
their applicability in cloud networking deployments.
Top-down:
•
Focus on usability – actual deployment designs and topologies.
•
Virtual network and VNF topologies as platform for delivering cloud network services.
•
Use universal efficiency metrics for network services: e.g. CPU clock cycles per tenant, per service instance.
Bottom-up:
•
Focus on technology – not products, not standards, not current practices. Focus on what comes first.
•
Use universal efficiency metrics for transporting and computing bits: packet throughput, latency, latency variation,
CPU clock cycles per packet.
•
[1] Experimental test -> [2] Analyse -> [3] Code-Tune -> loop to [1].
vNet-SLA Project
VNET elements identified for benchmarking
•
VSs – Virtual Switches (Virtual Forwarders)
•
•
Cisco VPP, OVS, OVSDPDK, Snabb, other open-source
VNFs – Virtual Network Functions
•
Cisco: CSR, ASAv, WSAv, XR9kv; open-source and 3rd party VRs and vAppliances.
VNET topology benchmarking
•
A defined set of virtual topologies based on service applicability: vCompX, vNetX
Covering applicable virtual network forwarding topologies:




PHY-to-PHY
PHY-to-VM
VM-to-VM
Combinations thereof
PHY =
Nx 10GE
Nx 40GE
Nx 100GE
•
L2, IPv4, IPv6, overlay packet encapsulations
•
Test methodology adjusting/refactoring as we’re learning what works and what doesn’t
vNet-SLA Topologies - Summary
vComp1-VM
vComp1-Host
x86
x86
User Space
Guest OS
User Space
ref-vnf
ref-vnf
NIC
Linux Kernel
NIC Hardware NIC
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
Nx10GE
x86
vNet4-1VNF
vNet2
vNet1
x86
x86
User Space
Guest OS
Guest OS
vswitch
vswitch
vnf
NIC
Linux Kernel
NIC Hardware NIC
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
vnf
Guest OS
vnf
vnf
Linux Kernel
NIC Hardware
Nx10GE
vNet5-nVNF
x86
User Space
Guest OS
Guest OS
User Space
Guest OS Guest OS
vnf
vswitch
NIC
Linux Kernel
NIC Hardware NIC
IXIA
vNet5-1VNF
x86
User Space
Guest OS Guest OS
vswitch
Nx10GE
IXIA
vNet4-nVNF
x86
User Space
User Space
IXIA
IXIA
vNet: vSwitch + VNFs
vNet: vSwitch only
vComponents
vnf
vswitch
Guest OS
Linux Kernel
NIC Hardware NIC
Nx10GE
Linux Kernel
NIC Hardware
Nx10GE
vComp2-VM2VM
x86
User Space
IXIA
vComp3-PHY2VM
x86
IXIA
NIC
Nx10GE
IXIA
IXIA
User Space
Guest OS
Guest OS
Guest OS
Guest OS
vnf
vnf
vnf
vnf
vNet: VNF only
vNet: vSwitch + VNF_chains
vNet3
vswitch
vnf
vswitch
Guest OS
NIC
Guest OS
vnf
vswitch
Linux Kernel
NIC Hardware NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
Nx10GE
IXIA
IXIA
x86
vNet6-1VNFch
User Space
Guest OS
vnf
x86
User Space
Guest OS Guest OS
vnf
vnf
x86
User Space
Guest OS
Guest OS
Guest OS
Guest OS
vnf
vnf
vnf
vnf
vNet7-nVNFch
vNet7-1VNFch
vNet6-nVNFch
x86
x86
User Space
Guest OS Guest OS
vnf
vnf
User Space
Guest OS
Guest OS
Guest OS
Guest OS
vnf
vnf
vnf
vnf
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
vswitch
Nx10GE
Lots of combinations!
Need to prioritize ...
vswitch
vswitch
vswitch
Guest OS
Linux Kernel
NIC Hardware
IXIA
NIC
Linux Kernel
NIC Hardware
Nx10GE
IXIA
NIC
Linux Kernel
NIC Hardware
Nx10GE
IXIA
NIC
Linux Kernel
NIC Hardware
Nx10GE
IXIA
NIC
vNet-SLA: Virtual Component Test Topologies
Single vDevice: vSwitch or VNF
vComp1-Host
vComp1-VM
ref-VNF in host
ref-VNF in guest
vComp1-VM
vComp1-Host
x86
User Space
ref-vnf
x86
vComp2-VM2VM
vComp3-PHY2VM
ref-VNF in guest
ref-VNF in guest
vComp2-VM2VM
vComp3-PHY2VM
x86
User Space
x86
User Space
User Space
Guest OS
Guest OS
Guest OS
Guest OS
Guest OS
vnf
vnf
vnf
vnf
ref-vnf
NIC
Linux Kernel
NIC Hardware NIC
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
Nx10GE
IXIA
IXIA
Use a simple reference VNF:
Intel L2FWD application
vswitch
vswitch
Linux Kernel
NIC Hardware NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
Nx10GE
IXIA
IXIA
vNet-SLA: Virtual Network Test Topologies
Single vDevice: vSwitch or VNF
vNet1
vNet2
vNet3
vSwitch in host
vSwitch in guest
VNF in guest
vNet2
vNet3
vNet1
x86
User Space
vswitch
x86
User Space
x86
User Space
Guest OS
Guest OS
vswitch
vnf
Nx10GE
NIC
Linux Kernel
NIC Hardware NIC
NIC
Linux Kernel
NIC Hardware NIC
IXIA
Nx10GE
Nx10GE
IXIA
IXIA
NIC
Linux Kernel
NIC Hardware NIC
vNet-SLA: Virtual Network Test Topologies
Simple topologies: vSwitch and VNF(s)
vNet4
vNet5
vSwitch in host, VNF(s) in guest(s)
vSwitch in guest, VNF(s) in guest(s)
vNet4-1VNF
vNet4-nVNF
vNet5-1VNF
x86
x86
User Space
Guest OS
vnf
vswitch
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
x86
User Space
Guest OS Guest OS
vnf
vnf
Guest OS
vnf
vswitch
Linux Kernel
NIC Hardware
Nx10GE
IXIA
NIC
User Space
Guest OS
Guest OS
vnf
vNet5-nVNF
x86
User Space
Guest OS Guest OS
vnf
vnf
vswitch
vswitch
Guest OS
Guest OS
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
NIC
Linux Kernel
Hardware
Nx10GE
IXIA
Guest OS
vnf
NIC
vNet-SLA: Virtual Network Test Topologies
Service-chain topologies: vSwitch and VNFs
vNet6
vNet7
vSwitch in host, VNF(s) in guest(s)
vSwitch in guest, VNF(s) in guest(s)
vNet6-1VNFch
x86
x86
User Space
Guest OS Guest OS
vnf
vnf
Nx10GE
IXIA
User Space
Guest OS
Guest OS
Guest OS
vnf
vnf
vnf
vnf
NIC
Nx10GE
IXIA
x86
User Space
vnf
vnf
User Space
Guest OS
Guest OS
Guest OS
Guest OS
vnf
vnf
vnf
vnf
vswitch
vswitch
vswitch
Linux Kernel
NIC Hardware
x86
Guest OS Guest OS
Guest OS
vswitch
Linux Kernel
NIC Hardware
vNet6-nVNFch
vNet7-nVNFch
vNet7-1VNFch
Guest OS
NIC
Linux Kernel
NIC Hardware
Nx10GE
IXIA
NIC
Linux Kernel
NIC Hardware
Nx10GE
IXIA
NIC
vNet-SLA: Virtual Component Test Topologies
vComp1-Host
x86
User Space
ref-vnf
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
vNet-SLA: Virtual Component Test Topologies
vComp2-VM2VM
x86
User Space
Guest OS
Guest OS
vnf
vnf
vSwitch
vswitch
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
vNet-SLA: Virtual Component Test Topologies
vComp3-PHY2VM
x86
User Space
Guest OS
Guest OS
vnf
vnf
vSwitch
vswitch
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
Which system HW+SW resources to monitor and why
•
CPU core utilization
•
CPU cache hits/misses
•
VM-EXITs
•
Context switching
•
Memory, especially for higher bandwidths
Which HW resources to monitor and why
•
•
•
HW resource pinch points for vSwitches and VNFs
•
CPU cores, CPU cache levels, NUMA location (core, memory, pci)
•
Memory not a huge factor for 20Gbps; for full-box tests memory channel
population maters, as does memory NUMA location
Currently measured: CPU
•
CPU pinning (not testing inefficiencies of OS process scheduling at this time)
•
Per vSwitch process/task/thread, per VNF process/task/thread
•
Overall system CPU heatmap
Under investigation: NUMA location, CPU cache
hits/misses
•
process relocation without CPU pinning causes full I-Cache/D-Cache flush ->
tanks zero pkt drop rate
Avoid the QPI bus (Intel’s inter-socket bus) that handles:
- PCI traffic between sockets
- Memory accesses between NUMA nodes
•
Identify x86 static and dynamic resource bottlenecks
Topology of the x86 system with N * CSR + VPP
(output of ‘lstopo --of pdf [filename]’)
Which system SW resources to monitor and why
Example output of `cat /proc/stat` - provides x86 system wide kernel/system statistics for
the amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use
sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in various states:










user - Time spent in user mode.
nice - Time spent in user mode with
low priority (nice).
system - Time spent in system mode.
idle - Time spent in the idle task.
iowait - Time waiting for I/O to
complete.
irq - Time servicing interrupts.
softirq - Time servicing softirqs (linux
specific, generated by kernel SW).
steal - Stolen time, which is the time
spent in other operating systems when
running in a virtualized environment.
guest - Time spent running a virtual
CPU for guest operating systems
under the control of the Linux kernel.
guest_nice - Time spent running a
niced guest.
Output of `cat /proc/stat’, processed and graphed as %cpu time over the last test run
duration.
cpu15
cpu14
cpu13
cpu12
cpu11
cpu10
user
cpu9
nice
cpu8
system
idle
cpu7
iowait
cpu6
irq
cpu5
so irq
cpu4
cpu3
cpu2
cpu1
cpu0
0%
10%
20%
30%
40%
50%
60%
Note: In KVM setup: `guest` time accounted in `user` time
70%
80%
90%
100%
vNet-SLA – Developed Testing Methodologies
FrameLossRate-Sweep
①
•
provides overview of system packet throughput and loss across the entire range of offered load
ThroughputBinarySearch-SingleRun
②
•
measures Throughput per packet size based on a single measurement per packet rate
ThroughputBinarySearch-BestNWorstN
③
•
measures Throughput per packet size based on a multiple measurements per packet rate
LatencyAndLatencyVariation
④
•
measures per packet latency and latency variation and reports the values in specified percentiles
of packets in latency stream
All tests based on modified RFC2544 and RFC1242, adopted to virtualized networking environment.
All tests realized using automated tooling, ready for use in continuous integration devops model.
vNet-SLA – FrameLossRateSweep
•
Measure packet loss % resulting from linear increase of offered load from 1% of 10GE linerate to 100% of 10GE
linerate, with step of 1% of 10GE linerate. (all values are parameterised)
•
Capture CPU core utilization and latency as well
•
Provides overview of system packet throughput, loss, CPU utilization and latency across the entire range of offered
load
•
Aligned with RFC2544 “Frame Loss Rate” measurements
10
9
8
7
6
5
4
3
2
1
0
OVS-DPDK
vSwitch1
Cisco VPP
vSwitch2
0
5
10
15
Data Transmission Rate [Gbit/s]
20
Packet Loss [%]
Packet Loss [%]
Example: PHY-VS-VM-VS-PHY (vComp3)
0.01
0.009
0.008
0.007
0.006
0.005
0.004
0.003
0.002
0.001
0
OVS-DPDK
vSwitch1
Cisco VPP
vSwitch2
0
5
10
15
Data Transmission Rate [Gbit/s]
20
vNet-SLA - ThroughputBinarySearch-SingleRun
•
RFC2544-like methodology
•
•
•
•
Using binary search algorithm (with initial linear phase) to find maximum
throughput at specific loss-tolerance (0% for Non Drop Rate NDR, 0.01% for Partial
Drop Rate PDR)
Parameters: packet size (or packet sequence e.g. imix), loss tolerance, traffic run
duration
Result: maximum throughput (indicated as “single run” in results)
Issue: non-deterministic, consecutive tests with same parameters don’t yield the same
results
vNet-SLA - ThroughputBinarySearch-SingleRun
70
0.006
60
0.005
50
0.004
40
0.003
30
0.002
20
0.001
10
0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
traffic run sequence
loss%
transmite rate %LR 10GE
transmit rate and loss% during test -- loss tolerance 0
tx rate
loss%
vNet-SLA - ThroughputBinarySearch-SingleRun
1. set *rate* to *start_rate*, *step* size to *start_step*
(linear phase)
2. start traffic run at *rate* for the specified duration, collect packet
statistics after finish
3. if packet loss > loss tolerance then *rate = rate - step*; goto 5
4. if packet loss <= loss tolerance then *rate = rate + step*; goto 2
(binary phase)
5. start traffic run at *rate* for the specified duration, collect packet
statistics after finish
6. if *step* < resolution then goto 11
7. *step = step/2*
8. if packet loss > loss tolerance then *rate = rate - step*
9. if packet loss <= loss tolerance then *rate = rate + step*
10. goto 5
11. throughput = *rate*
vNet-SLA - ThroughputBinarySearch-BestNWorstN
Addressing issue with SingleRun: non-deterministic, consecutive tests with same
parameters don’t yield the same results
•
Extended RFC2544-like methodology
•
Same concept as on previous slides (binary search, loss tolerance), but more
traffic runs
• Two measurements: best-of-N and worst-of-N
• Best-of-N:
•
•
Decrease test traffic rate only after N traffic runs had a loss higher than the losstolerance
Worst-of-N:
•
Increase test traffic rate only after N traffic runs had a loss lower than the losstolerance
vNet-SLA - ThroughputBinarySearch-BestNWorstN
•
Best-of-N indicates the highest expected throughput for (packet
size, loss tolerance) when repeating test
•
Worst-of-N indicates the lowest expected throughput for (packet
size, loss tolerance) when repeating test
•
Base measurements:
•
•
•
best5 for NDR and PDR (loss-tolerance 0% and 0.01%)
worst5 for NDR and PDR (loss-tolerance 0% and 0.01%)
Range worst5 to best5 represents a sample-variance across 10 samples (10 total
measurements per packet rate)
vNet-SLA - ThroughputBinarySearch-BestNWorstN
Making vNet throughput results statistically meaningful
•
•
•
Increase the values of N
Current practice is
using N=5
•
But it will still yield a 2*N sample variance results, not full variance results
•
Works for some virtual devices, does not for others that are less deterministic
implemented but
not used often each measurement
taking too much
time
Use a linear step method starting from bestN and worstN results
•
Collect measurements for a series of traffic rates, stepping through using smartly chosen values
•
Repeat traffic run at each traffic rate N times
•
Graph percentiled distribution of packet loss across the range Worst5 to Best5 NDR
•
More detailed view about the level of performance within the range Worst5 to Best5
Alternative, run SingleRun tests MANY times (MANY >> 10)
not implemented
each measurement
taking too many
hours
•
Then calculate Sample Standard Deviation from measured Throughput samples
•
The lower Sample Standard Deviation the more deterministic system throughput performance
vNet-SLA - LatencyAndLatencyVariation
•
Adds background latency streams at low rate e.g. 5PPS per stream
•
Captures every packet to calculate per packet latency and latency variation
•
Presents per packet latency and latency variation results in 50%ile, 90%ile and 100%ile for
NDR/PDR and last linear step increase before NDR/PDR
•
•
•
Latency variation calculated per IPDV (Inter-Packet Delay Variation) definition in RFC 5481
Other calculations e.g. PDF (Packet Delay Variation) per RFC5481 and statistics are possible, as all
captured packet tx/rx timestamps available for processing
This allows to pick up any anomalies in packet latency and latency variation
Note: Above is in addition to standard IXIA tooling measuring Min/Max/Avg Latency and Min/Max/Avg Latency Variation.
Sample vSwitch Results
From Recent EANTC Testing
Sample vSwitch Results
VM-to-VM
•
Single vhost interface to single vhost interface
•
memory copy!
•
10 Gbit/s, 1.6 million frames/s throughput with Cisco‘s VPP
•
7 Gbit/s, 1.09 million frames/s throughput with OpenvSwitch
•
Latency acceptable and comparable between Open vSwitch and VPP
vComp2-VM2VM
x86
User Space
Guest OS
Guest OS
vnf
vnf
vswitch
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
Source: http://www.lightreading.com/nfv/nfv-tests-and-trials/validating-ciscos-nfv-infrastructure-pt-1/d/d-id/718684
Sample vSwitch Results
PHY-to-VM
35
Cisco's VPP reached up to 20 Gbit/s, 2.5 million
frames/second throughput with a single core with
deterministic repeatable performance.
Throughput [Gbit/s]
•
40
Open vSwitch provided between 8-40 Gbit/s, 2-6
million frames/second throughput, varying greatly
across measurements.
VPP Performance on SandBridge
0% loss tolerance
25
x86
Guest OS
vnf
vnf
20
20
SingleRun
18.0
13.3
5
1.7
0
64
15
256
IMIX
512
Packet Size [Bytes]
Best N Worst N
SingleRun
10
1
0.9
0.4
5
Nx10GE
IXIA
25
8.3
2
vswitch
Linux Kernel
NIC Hardware NIC
19.6
10
User Space
Guest OS
30
15
vComp3-PHY2VM
Throughput [Gbit/s]
•
OVS-DPDK Performance on SandBridge
0% loss tolerance
45
0.2
0
64
256
IMIX
512
1518
Packet Size [Bytes]
Source: http://www.lightreading.com/nfv/nfv-tests-and-trials/validating-ciscos-nfv-infrastructure-pt-1/d/d-id/718684
1518
Sample vSwitch Results
Ethernet Forwarding
•
Almost line rate throughput with Cisco‘s VPP for Ethernet forwarding
up to 20,000 MAC addresses
•
OVS performance reduced by 81% when forwarding to 2,000 MAC
addresses – unusable for 20,000 MAC addresses
vNet1
x86
User Space
vswitch
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
Source: http://www.lightreading.com/nfv/nfv-tests-and-trials/validating-ciscos-nfv-infrastructure-pt-1/d/d-id/718684
Sample vSwitch Results
IPv4 Forwarding
•
77% line rate with Cisco‘s VPP when forwarding to 20,000 IPv4
addresses
•
OVS performed 19% of line rate throughput when forwarding to 2,000
IP addresses
vNet1
x86
User Space
vswitch
NIC
Linux Kernel
NIC Hardware NIC
Nx10GE
IXIA
Source: http://www.lightreading.com/nfv/nfv-tests-and-trials/validating-ciscos-nfv-infrastructure-pt-1/d/d-id/718684
Relevance to opnfv.org/vsperf work ...
•
Virtual network topologies, configurations
•
Testing methodology
•
Testing tools:
•
•
test execution drivers (python, tcl)
test results log processors (python, perl)
THANK YOU !