Transcript ppt - NOISE

VINI: Virtual Network
Infrastructure
Nick Feamster
Georgia Tech
Andy Bavier, Mark Huang, Larry Peterson, Jennifer Rexford
Princeton University
VINI Overview
Bridge the gap between “lab experiments”
and live experiments at scale.
?
VINI
Emulation
Simulation
•
•
•
•
•
Small-scale
experiment
Runs real routing software
Exposes realistic network conditions
Gives control over network events
Carries traffic on behalf of real users
Is shared among many experiments
Live
deployment
Goal: Control and Realism
• Control
Topology
Arbitrary,
emulated
Actual
network
Traffic
Synthetic
or traces
Real
clients,
servers
Network Events
Inject faults,
anomalies
Observed in
operational
network
– Reproduce results
– Methodically change or
relax constraints
• Realism
– Long-running services
attract real users
– Connectivity to real Internet
– Forward high traffic volumes
(Gb/s)
– Handle unexpected events
Overview
• VINI characteristics
–
–
–
–
Fixed, shared infrastructure
Flexible network topology
Expose/inject network events
External connectivity and routing adjacencies
• PL-VINI: prototype on PlanetLab
• Preliminary Experiments
• Ongoing work
Fixed Infrastructure
Shared Infrastructure
Arbitrary Virtual Topologies
Exposing and Injecting Failures
Carry Traffic for Real End Users
c
s
Participate in Internet Routing
BGP
BGP
c
s
BGP
BGP
PL-VINI: Prototype on PlanetLab
• First experiment: Internet In A Slice
– XORP open-source routing protocol suite (NSDI ’05)
– Click modular router (TOCS ’00, SOSP ’99)
• Clarify issues that VINI must address
–
–
–
–
Unmodified routing software on a virtual topology
Forwarding packets at line speed
Illusion of dedicated hardware
Injection of faults and other events
PL-VINI: Prototype on PlanetLab
• PlanetLab: testbed for planetary-scale services
• Simultaneous experiments in separate VMs
– Each has “root” in its own VM, can customize
• Can reserve CPU, network capacity per VM
Node
Mgr
Local
Admin
VM1
VM2
…
VMn
PlanetLab node
Virtual Machine Monitor (VMM)
(Linux++)
XORP: Control Plane
XORP
(routing protocols)
• BGP, OSPF, RIP, PIMSM, IGMP/MLD
• Goal: run real routing
protocols on virtual
network topologies
User-Mode Linux: Environment
UML
XORP
(routing protocols)
eth0
eth1
eth2
eth3
• Interface ≈ network
• PlanetLab limitation:
– Slice cannot create new
interfaces
• Run routing software in
UML environment
• Create virtual network
interfaces in UML
Click: Data Plane
• Performance
UML
XORP
– Avoid UML overhead
– Move to kernel, FPGA
(routing protocols)
eth0
eth1
eth2
• Interfaces  tunnels
eth3
Control
Data
Packet
Forward
Engine
Click
UmlSwitch
element
Tunnel table
Filters
– Click UDP tunnels
correspond to UML
network interfaces
• Filters
– “Fail a link” by blocking
packets at tunnel
Intra-domain Route Changes
s
856
2095
700
260
1295
c
639
366
548
587
902
1893
233
1176
846
Ping During Link Failure
120
Routes converging
Ping RTT (ms)
110
100
Link down
90
Link up
80
70
0
10
20
30
Seconds
40
50
Close-Up of TCP Transfer
PL-VINI enables a user-space virtual network
to behave like a real network on PlanetLab
2.45
Megabytes in stream
Packet receiv ed
2.4
2.35
2.3
Slow start
2.25
2.2
Retransmit
lost packet
2.15
2.1
17.5
18
18.5
19
Seconds
19.5
20
Challenge: Attracting Real Users
• Could have run experiments on Emulab
• Goal: Operate our own virtual network
– Carrying traffic for actual users
– We can tinker with routing protocols
• Attracting real users
Conclusion
• VINI: Controlled, Realistic Experimentation
• Installing VINI nodes in NLR, Abilene
• Download and run Internet In A Slice
http://www.vini-veritas.net/
TCP Throughput
12
Megabytes transferred
Packet receiv ed
10
8
6 Link down
Link up
4
Zoom in
2
0
0
10
20
30
Seconds
40
50
Ongoing Work
• Improving realism
– Exposing network failures and changes in the
underlying topology
– Participating in routing with neighboring networks
• Improving control
– Better isolation
– Experiment specification
Resource Isolation
• Issue: Forwarding packets in user space
– PlanetLab sees heavy use
– CPU load affects virtual network performance
Property
Depends On
Solution
Throughput
CPU% received
PlanetLab provides CPU
reservations
Latency
CPU scheduling
delay
PL-VINI: boost priority of
packet forward process
Performance is bad
• User-space Click: ~200Mb/s forwarding
VINI should use Xen
Experimental Results
• Is a VINI feasible?
– Click in user-space: 200Mb/s forwarded
– Latency and jitter comparable between network and
IIAS on PL-VINI.
– Say something about running on just PlanetLab?
Don’t spend much time talking about CPU
scheduling…
Low latency for everyone?
• PL-VINI provided IIAS with low latency by giving
it high CPU scheduling priority
Internet In A Slice
XORP
• Run OSPF
• Configure FIB
S
S
C
C
S
C
Click
• FIB
• Tunnels
• Inject faults
OpenVPN & NAT
• Connect clients
and servers
PL-VINI / IIAS Router
• Blue: topology
UML
– Virtual net devices
– Tunnels
XORP
eth0
eth1
eth2
UmlSwitch
FIB
UmlSwitch
element
Encapsulation table
Click
tap0
• Red: routing and
forwarding
eth3
Control
Data
– Data traffic does not enter
UML
• Green: enter & exit IIAS
overlay
PL-VINI Summary
Flexible Network Topology
Virtual point-to-point connectivity
Tunnels in Click
Unique interfaces per experiment
Virtual network devices in UML
Exposure of topology changes
Upcalls of layer-3 alarms
Flexible Routing and Forwarding
Per-node forwarding table
Separate Click per virtual node
Per-node routing process
Separate XORP per virtual node
Connectivity to External Hosts
End-hosts can direct traffic through VINI
Connect to OpenVPN server
Return traffic flows through VINI
NAT in Click on egress node
Support for Simultaneous Experiments
Isolation between experiments
PlanetLab VMs and network isolation
CPU reservations and priorities
Distinct external routing adjacencies
BGP multiplexer for external sessions
PL-VINI / IIAS Router
• XORP: control plane
• UML: environment
UML
XORP
(routing protocols)
eth0
eth1
eth2
– Virtual interfaces
• Click: data plane
eth3
Control
Data
Packet
Forward
Engine
Click
UmlSwitch
element
Tunnel table
– Performance
• Avoid UML overhead
• Move to kernel, FPGA
– Interfaces  tunnels
– “Fail a link”
Trellis
Trellis virtual host
• Same abstractions as PL-VINI
application
user
kernel
kernel FIB
virtual
NIC
virtual
NIC
bridge
bridge
shaper
shaper
EGRE
tunnel
EGRE
tunnel
Trellis Substrate
– Virtual hosts and links
– Push performance, ease of use
• Full network-stack virtualization
• Run XORP, Quagga in a slice
– Support data plane in kernel
• Approach native Linux kernel
performance (15x PL-VINI)
• Be an “early adopter” of new Linux
virtualization work
33
Virtual Hosts
• Use container-based virtualization
– Xen, VMWare: poor scalability, performance
• Option #1: Linux Vserver
– Containers without network virtualization
– PlanetLab slices share single IP address, port space
• Option #2: OpenVZ
– Mature container-based approach
– Roughly equivalent to Vserver
– Has full network virtualization
34
Network Containers for Linux
• Create multiple copies of TCP/IP stack
• Per-network container
– Kernel IPv4 and IPv6 routing table
– Physical or virtual interfaces
– Iptables, traffic shaping, sysctl.net variables
• Trellis: marry Vserver + NetNS
– Be an early adopter of the new interfaces
– Otherwise stay close to PlanetLab
35
Virtual Links: EGRE Tunnels
Trellis virtual host
application
user
kernel
kernel FIB
virtual
NIC
virtual
NIC
• Virtual Ethernet links
• Make minimal assumptions about
the physical network between
Trellis nodes
• Trellis: Tunnel Ethernet over GRE
over IP
– Already a standard, but no Linux
implementation
• Other approaches:
– VLANs, MPLS, other network
circuits or tunnels
– These fit into our framework
EGRE
tunnel
EGRE
tunnel
Trellis Substrate
36
Tunnel Termination
• Where is EGRE tunnel interface?
• Inside container: better performance
• Outside container: more flexibility
– Transparently change implementation
– Process, shape traffic btw container and tunnel
– User cannot manipulate tunnel, shapers
• Trellis: terminate tunnel outside container
37
Glue: Bridging
• How to connect virtual hosts to tunnels?
– Connecting two Ethernet interfaces
• Linux software bridge
– Ethernet bridge semantics, create P2M links
– Relatively poor performance
• Common-case: P2P links
• Trellis
– Use Linux bridge for P2M links
– Create new “shortbridge” for P2P links
38
Glue: Bridging
Trellis virtual host
application
user
kernel
kernel FIB
virtual
NIC
bridge*
virtual
NIC
bridge*
shaper
shaper
EGRE
tunnel
EGRE
tunnel
• How to connect virtual hosts to
EGRE tunnels?
– Two Ethernet interfaces
• Linux software bridge
– Ethernet bridge semantics
– Support P2M links
– Relatively poor performance
• Common-case: P2P links
• Trellis:
– Use Linux bridge for P2M links
– New, optimized “shortbridge”
module for P2P links
Trellis Substrate
39
Forwarding rate (kpps)
IPv4 Packet Forwarding
900
800
700
600
500
400
300
200
100
0
PL-VINI
Xen
Trellis (Bridge)
Trellis
(Shortbridge)
Native Linux
2/3 of native performance, 10X faster than PL-VINI
40
Virtualized Data Plane in Hardware
• Software provides flexibility, but poor
performance and often inadequate isolation
• Idea: Forward packets exclusively in hardware
– Platform: OpenVZ over NetFPGA
– Challenge: Share common functions, while isolating
functions that are specific to each virtual network
41
Accelerating the Data Plane
• Virtual
environments in
OpenVZ
• Interface to
NetFPGA based
on Stanford
reference router
42
Control Plane
• Virtual environments
– Virtualize the control plane by running multiple virtual
environments on the host (same as in Trellis)
– Routing table updates pass through security daemon
– Root user updates VMAC-VE table
• Hardware access control
– VMAC-VE table/VE-ID controls access to hardware
• Control register
– Used to multiplex VE to the appropriate hardware
43
Virtual Forwarding Table Mapping
44
Share Common Functions
• Common functions
–
–
–
–
Packet decoding
Calculating checksums
Decrementing TTLs
Input arbitration
• VE-Specific Functions
– FIB
– IP lookup table
– ARP table
45
Forwarding Performance
46
Efficiency
• 53K Logic Cells
• 202 Units of
Block RAM
Sharing common elements saves up to 75% savings over
independent physical routers.
47
Conclusion
• Virtualization allows physical hardware to be
shared among many virtual networks
• Tradeoffs: sharing, performance, and isolation
• Two approaches
– Trellis: Kernel-level packet forwarding
(10x packet forwarding rate improvement vs. PL-VINI)
– NetFPGA-based forwarding for virtual networks
(same forwarding rate as NetFPGA-based router, with
75% improvement in hardware resource utilization)
48
Accessing Services in the Cloud
ISP1
Interactive
Service
Bulk transfer
Cloud Data
Center
Data
Center
Router
Internet
• Hosted
services have
different
requirements
ISP2
Routing updates
Packets
– Too slow for
interactive
service, or
– Too costly
for bulk
transfer!
49
Cloud Routing Today
• Multiple upstream ISPs
– Amazon EC2 has at least 58 routing peers in Virginia
data center
• Data center router picks one route to a
destination for all hosted services
– Packets from all hosted applications use
the same path
50
Route Control: “Cloudless” Solution
• Obtain connectivity to upstream ISPs
– Physical connectivity
– Contracts and routing sessions
• Obtain the Internet numbered resources from
authorities
• Expensive and time-consuming!
51
Routing with Transit Portal (TP)
Interactive
Service
ISP1
Virtual
Router
A
Transit
Portal
Virtual
Router
B
Bulk
transferData
Cloud
Center
Internet
ISP2
Routes
Packets
Full Internet route
control to hosted
cloud services!
52 52
Outline
• Motivation and Overview
• Connecting to the Transit Portal
• Advanced Transit Portal Applications
• Scaling the Transit Portal
• Future Work & Summary
53
Connecting to the TP
• Separate Internet router for each service
– Virtual or physical routers
• Links between service router and TP
– Each link emulates connection to upstream ISP
• Routing sessions to upstream ISPs
– TP exposes standard BGP route control interface
54
Basic Internet Routing with TP
ISP 2
ISP 1
• Cloud client with two
upstream ISPs
– ISP 1 is preferred
Traffic
Transit
Portal
BGP
Sessions
Virtual
BGP
Router
Interactive Cloud Service
• ISP 1 exhibits
excessive jitter
• Cloud client reroutes
through ISP 2
55
Current TP Deployment
• Server with custom routing software
– 4GB RAM, 2x2.66GHz Xeon cores
• Three active sites with upstream ISPs
– Atlanta, Madison, and Princeton
• A number of active experiments
– BGP poisoning (University of Washington)
– IP Anycast (Princeton University)
– Advanced Networking class (Georgia Tech)
56
TP Applications: Fast DNS
• Internet services require fast name resolution
• IP anycast for name resolution
– DNS servers with the same IP address
– IP address announced to ISPs in multiple locations
– Internet routing converges to the closest server
• Available only to large organizations
57
TP Applications: Fast DNS
• TP allows hosted applications use IP anycast
Asia
ISP1
North America
ISP2
Transit
Portal
Name Service
ISP3
Anycast
Routes
ISP4
Transit
Portal
Name Service
58
TP Applications: Service Migration
• Internet services in geographically diverse data
centers
• Operators migrate Internet user’s connections
• Two conventional methods:
– DNS name re-mapping
• Slow
– Virtual machine migration with local re-routing
• Requires globally routed network
59
TP Applications: Service Migration
Asia
ISP1
Internet
ISP2
Transit
Portal
Active Game
Service
North America
ISP3
Tunneled Sessions
ISP4
Transit
Portal
60
Scaling the Transit Portal
• Scale to dozens of sessions to ISPs and
hundreds of sessions to hosted services
• At the same time:
– Present each client with sessions that have an
appearance of direct connectivity to an ISP
– Prevented clients from abusing Internet routing
protocols
61
Conventional BGP Routing
• Conventional BGP router:
– Receives routing updates
from peers
– Propagates routing update
about one path only
– Selects one path to forward
packets
• Scalable but not
transparent or flexible
ISP2
ISP1
BGP Router
Client BGP
Router
Client BGP
Router
Updates
Packets
62
Scaling BGP Memory Use
• Store and propagate
all BGP routes from
ISPs
– Separate routing tables
• Reduce memory
consumption
– Single routing process
- shared data structures
– Reduce memory use from
90MB/ISP to 60MB/ISP
ISP1
ISP2
Routing Process
Routing
Table 1
Virtual
Router
Interactive Service
Routing
Table 2
Virtual
Router
Bulk Transfer
63
Scaling BGP CPU Use
• Hundreds of routing
sessions to clients
– High CPU load
• Schedule and send
routing updates in
bundles
– Reduces CPU from 18% to
6% for 500 client sessions
ISP1
ISP2
Routing Process
Routing
Table 1
Virtual
Router
Interactive Service
Routing
Table 2
Virtual
Router
Bulk Transfer
64
Scaling Forwarding Memory for TP
• Connecting clients
ISP1
ISP2
– Tunneling and VLANs
• Curbing memory usage
Forwardin
Forwardng
Forwarding Table
g Table 1
Table 2
– Separate virtual routing
tables with default to
Virtual
upstream
BGP
Router
– 50MB/ISP -> ~0.1MB/ISP
memory use in forwarding
table
Interactive Service
Virtual
BGP
Router
Bulk Transfer
65