belllabs09 - Princeton University

Download Report

Transcript belllabs09 - Princeton University

Rethinking Routers
in the Age of Virtualization
Jennifer Rexford
Princeton University
http://www.cs.princeton.edu/~jrex/virtual.html
Traditional View of a Router
• A big, physical device…
– Processors
– Multiple links
– Switching fabric
• … that directs Internet traffic
– Connects to other routers
– Computes routes
– Forwards packets
Times Are Changing
Backbone Links are Virtual
• Flexible underlying transport network
– Layer-3 links are multi-hop paths at layer 2
New York
Chicago
Washington D.C.
Routing Separate From Forwarding
• Separation of functionality
– Control plane: computes paths
– Forwarding plane: forwards packets
data plane
Processor
Line card
Line card
Line card
control plane
Line card
Switching
Fabric
Line card
Line card
Multiple Virtual Routers
• Multiple virtual routers on same physical one
– Virtual Private Networks (VPNs)
– Router consolidation for smaller footprint
control plane
data plane
Switching
Fabric
Capitalizing on Virtualization
• Simplify network management
– Hide planned changes in the physical topology
• Improve router reliability
– Survive bugs in complex routing software
• Deploy new value-added services
– Customized protocols in virtual networks
• Enable new network business models
– Separate service providers from the infrastructure
What should the router “hypervisor” look like?
VROOM: Virtual Routers On the Move
With Yi Wang, Eric Keller, Brian Biskeborn, and Kobus van der Merwe
The Two Notions of “Router”
• IP-layer logical functionality, and physical equipment
Logical
(IP layer)
Physical
9
Tight Coupling of Physical & Logical
• Root of many network-management challenges (and
“point solutions”)
Logical
(IP layer)
Physical
10
VROOM: Breaking the Coupling
• Re-mapping logical node to another physical node
VROOM enables this re-mapping of logical to
Logical
physical through virtual router migration.
(IP layer)
Physical
11
Case 1: Planned Maintenance
• NO reconfiguration of VRs, NO reconvergence
VR-1
A
B
12
Case 1: Planned Maintenance
• NO reconfiguration of VRs, NO reconvergence
A
VR-1
B
13
Case 1: Planned Maintenance
• NO reconfiguration of VRs, NO reconvergence
A
VR-1
B
14
Case 2: Service Deployment/Evolution
• Move (logical) router to more powerful hardware
15
Case 2: Service Deployment/Evolution
• VROOM guarantees seamless service to existing
customers during the migration
16
Case 3: Power Savings
• $ Hundreds of millions/year of electricity bills
17
Case 3: Power Savings
• Contract and expand the physical network according
to the traffic volume
18
Case 3: Power Savings
• Contract and expand the physical network according
to the traffic volume
19
Case 3: Power Savings
• Contract and expand the physical network according
to the traffic volume
20
Virtual Router Migration: Challenges
1. Migrate an entire virtual router instance
•
All control-plane processes & data-plane states
21
Virtual Router Migration: Challenges
1. Migrate an entire virtual router instance
2. Minimize disruption
•
•
Data plane: millions of packets/sec on a 10Gbps link
Control plane: less strict (with routing message retrans.)
22
Virtual Router Migration: Challenges
1. Migrating an entire virtual router instance
2. Minimize disruption
3. Link migration
23
Virtual Router Migration: Challenges
1. Migrating an entire virtual router instance
2. Minimize disruption
3. Link migration
24
VROOM Architecture
Data-Plane Hypervisor
Dynamic Interface Binding
25
VROOM’s Migration Process
• Key idea: separate the migration of control
and data planes
1.Migrate the control plane
2.Clone the data plane
3.Migrate the links
26
Control-Plane Migration
• Leverage virtual server migration techniques
• Router image
– Binaries, configuration files, etc.
27
Control-Plane Migration
• Leverage virtual server migration techniques
• Router image
• Memory
– 1st stage: iterative pre-copy
– 2nd stage: stall-and-copy (when the control plane
is “frozen”)
28
Control-Plane Migration
• Leverage virtual server migration techniques
• Router image
• Memory
CP
Physical router A
DP
Physical router B
29
Data-Plane Cloning
• Clone the data plane by repopulation
– Enable migration across different data planes
– Avoid copying duplicate information
Physical router A
DP-old
CP
Physical router B
DP-new
30
Remote Control Plane
• Data-plane cloning takes time
– Installing 250k routes may take several seconds
• Control & old data planes need to be kept “online”
• Solution: redirect routing messages through tunnels
Physical router A
DP-old
CP
Physical router B
DP-new
31
Remote Control Plane
• Data-plane cloning takes time
– Installing 250k routes takes over 20 seconds
• Control & old data planes need to be kept “online”
• Solution: redirect routing messages through tunnels
Physical router A
DP-old
CP
Physical router B
DP-new
32
Remote Control Plane
• Data-plane cloning takes time
– Installing 250k routes takes over 20 seconds
• Control & old data planes need to be kept “online”
• Solution: redirect routing messages through tunnels
Physical router A
DP-old
CP
Physical router B
DP-new
33
Double Data Planes
• At the end of data-plane cloning, both data
planes are ready to forward traffic
DP-old
CP
DP-new
34
Asynchronous Link Migration
• With the double data planes, links can be
migrated independently
A
DP-old
B
CP
DP-new
35
Prototype Implementation
• Virtualized operating system
– OpenVZ, supports VM migration
• Routing protocols
– Quagga software suite
• Packet forwarding
– NetFPGA hardware
• Router hypervisor
– Our extensions for repopulating data plane,
remote control plane, double data planes, …
36
Experimental Results
• Data plane: NetFPGA
– No packet loss or extra delay
• Control plane: Quagga routing software
– All routing-protocol adjacencies stay up
– Core router migration (intradomain only)
• Inject an unplanned link failure at another router
• At most one retransmission of an OSPF message
– Edge router migration (intra and interdomain)
• Control-plane downtime: 3.56 seconds
• Within reasonable keep-alive timer intervals
37
Conclusions on VROOM
• Useful network-management primitive
– Separate tight coupling between physical and logical
– Simplify management, enable new applications
• Evaluation of prototype
– No disruption in packet forwarding
– No noticeable disruption in routing protocols
• Ongoing work
– Migration scheduling as an optimization problem
– Extensions to hypervisor for other applications
38
VERB: Virtually Eliminating Router Bugs
With Eric Keller, Minlan Yu, and Matt Caesar
Router Bugs Are Important
• Routing software is complicated
– Leads to programming errors (aka “bugs”)
– Recent string of high-profile outages
• Bugs different from traditional failures
– Byzantine failures, don’t simply crash the router
– Violate protocol, and cause cascading outages
• The problem is getting worse
– Software is getting more complicated
– Other outages becoming less common
– Vendors allowing third-party software
Exploit Software and Data Diversity
• Many sources of diversity
– Diverse code (Quagga, XORP, BIRD)
– Diverse protocols (OSPF and IS-IS)
– Diverse environment (timing, ordering, memory)
• Reasonable overhead
– Extra processor blade for hardware reliability
– Multi-core processors, separate route servers, …
• Special properties of routing software
– Clear interfaces to data plane and other routers
– Limited dependence on past history
Handling Bugs at Run Time
• Diverse replication
– Run multiple control planes in parallel
– Vote on routing messages and forwarding table
Protocol
daemon
Protocol
daemon
Protocol
daemon
RIB
RIB
RIB
REPLICA
MANAGER
IF 1
IF 2
FIB
VOTER
UPDATE
VOTER
Forwarding Table (FIB)
Hypervisor
Replicating Incoming Routing Messages
Update
12.0.0.0/8
Protocol
daemon
Protocol
daemon
Protocol
daemon
RIB
RIB
RIB
REPLICA
MANAGER
IF 1
FIB
VOTER
Hypervisor
UPDATE
VOTER
FIB
IF 2
No need for protocol parsing – operates at socket level
Voting: Updates to Forwarding Table
Update
12.0.0.0/8
Protocol
daemon
Protocol
daemon
Protocol
daemon
RIB
RIB
RIB
REPLICA
MANAGER
IF 1
FIB
VOTER
Hypervisor
UPDATE
VOTER
FIB 12.0.0.0/8  IF 2
IF 2
Transparent by intercepting calls to “Netlink”
Voting: Control-Plane Messages
Update
12.0.0.0/8
Protocol
daemon
Protocol
daemon
Protocol
daemon
RIB
RIB
RIB
REPLICA
MANAGER
IF 1
FIB
VOTER
Hypervisor
UPDATE
VOTER
FIB 12.0.0.0/8  IF 2
IF 2
Transparent by intercepting socket system calls
Simple Voting and Recovery
• Tolerate transient periods of disagreement
– During routing-protocol convergence (tens of sec)
• Several different voting mechanisms
– Master-slave vs. wait-for-consensus
• Small, trusted software component
– No parsing, treats data as opaque strings
– Just 514 lines of code in our implementation
• Recovery
– Kill faulty instance, and invoke a new one
Conclusion on Bug-Tolerant Router
• Seriousness of routing software bugs
– Cause serious outages, misbehavior, vulnerability
– Violate protocol semantics, so not handled by
traditional failure detection and recovery
• Software and data diversity
– Effective, and has reasonable overhead
• Design and prototype of bug-tolerant router
– Works with Quagga, XORP, and BIRD software
– Low overhead, and small trusted code base
Conclusions for the Talk
• Router virtualization is exciting
– Enables wide variety of new networking techniques
– … for network management & service deployment
– … and even rethinking the Internet architecture
• Fascinating space of open questions
– Other possible applications of router virtualization?
– What is the right interface to router hardware?
– What is the right programming environment for
customized protocols on virtual networks?
http://www.cs.princeton.edu/~jrex/virtual.html