iap05 - Princeton University

Download Report

Transcript iap05 - Princeton University

A Routing Control Platform
for Managing IP Networks
Jennifer Rexford
Princeton University
http://www.cs.princeton.edu/~jrex
Background and Interests
• Professional background
– Joined Princeton in February 2005
– After 8.5 years at AT&T Labs—Research
• Work with AT&T’s backbone & large enterprises
• Tools in daily use in AT&T’s backbone
• Research interests: data networking
– Networks easier to design and operate
– IP routing and network measurement
– Division between routers and management
Today: Inside a Single Network
Shell scripts
Management Plane
• Figure out what is
Planning tools
Databases
happening in network
Configs SNMP
netflow modems • Decide how to change it
OSPF
Control Plane
Link
• Multiple routing processes
Routing
OSPF
metrics
on each router
policies
BGP
• Configuration on each router
• Many control knobs: link
OSPF
OSPF
weights, access lists, policy
BGP
BGP
FIB
FIB
Traffic Engin.
Data Plane
FIB
• Packet handling by routers
Packet • Forwarding, filtering, queuing
filters
How Did We Get in This Mess?
• Initial IP architecture
– Bundled packet handling and control
– Functionality distributed across routers
– Didn’t anticipate need for management
• Rapid growth in features
– Internet’s sudden popularity and growth
– Demands for new features
– Built as incremental extensions
• Challenges of distributed algorithms
– Some tasks are hard in a distributed fashion
Solution: Wafer-Thin Control Plane
• Decision plane: outside the routers
– All decision logic and state
– Network-wide view and objectives
– Direct control over the data plane
• Discovery plane: in the routers
– Monitors the topology
– Measures the traffic
• Data plane: in the routers
– Queues, filters, and forwards data packets
– Accepts instruction from decision plane
Achieving the New Architecture Today
• Deployability: getting from here to there
– Compatible with existing routers
– Incentives for deployment
• Speed: running fast enough
– Respond quickly to network events
• Reliability: avoiding single point of failure
– Replicate to tolerate failure
– Replicas must behave consistently
Can we do it? The short answer is… yes!
Deployability: Backwards Compatibility using BGP
• Border Gateway Protocol (BGP)
– Protocol: messages sent between routers
– Decision logic: route-selection process
– Policy: configurable rules
• The key point is
– Complex decision logic and policies
– Yet simple protocol and message format
Idea: Use BGP messages to tell the routers what to do
Deployability: Inside a Single AS
Before: conventional use of BGP in backbone network
eBGP
iBGP
After: RCP learns routes and sends answers to routers
eBGP
RCP
iBGP
Only one AS has to change its architecture!
Deployability: Across Multiple ASes
• Represents the AS to others
– Has complete view of all candidate routes
– Computes answers for the AS’s routers
• Communicates with other ASes
– Using BGP or a brand new protocol
– … while using BGP to talk to the routers
RCP
RCP
Inter-AS Protocol
RCP
iBGP
AS 1
Physical
peering
AS 2
AS 3
Routing Control Platform (RCP)
Routing Control
Platform (RCP)
Route Control
Server (RCS)
Options
Answers
BGP Engine
…
BGP
updates
…
BGP
updates
Network
Topology
OSPF Viewer
…
OSPF link-state
advertisements
Scalability: Standard Computing Platform
• Prototype on a high-end PC
– 3.2 GHz Pentium-4 with 8 GB of RAM
– Running the Linux 2.6.5 kernel
• Workload from the AT&T backbone
– Replay the BGP and OSPF messages
• Good RCP performance
– Memory usage: less than 2GB
– Speed, BGP changes: less than 40 msec
– Speed, topology changes: 0.1-0.8 seconds
Short answer: the system can keep up
Reliability: Replication and Consistency
• Replication: avoid single point of failure
– Multiple RCPs in a network
– Connected at different places
• Consistency: no explicit coordination
– Replica has full view of each partition
– Replicas perform the same algorithm on the
same data, and get the same answer
A
RCP A
A, B
B
RCP B
Example Applications
• Customer-driven route selection
– Customized load-balancing policies
– Geographic rules for route selection
• Blocking denial-of-service attacks
– “Blackhole” routes that drop traffic
– Only for routers carrying attack traffic
• Hitless maintenance
– Move traffic away from certain routers
– Before the operators bring down the routers
Conclusion
• Network operations is too hard
– IP was not designed for management
– Complex, distributed operation of routers
• Must reduce complexity
– Network-wide views and objectives
– Direct control over the data plane
• New architecture is feasible
– RCP is deployable, scalable, and reliable
– RCP solves real, important problems
New opportunity to impact the future of IP networks.