google06 - Princeton University

Download Report

Transcript google06 - Princeton University

A Routing Control Platform
for Managing IP Networks
Jennifer Rexford
Princeton University
http://www.cs.princeton.edu/~jrex
Outline
• Revisiting the control plane
– Complexity of today’s control plane
– Principles for a redesign
• Routing Control Platform
– Deployability
– Scalability
– Reliability
• Example applications
– DDoS blackholing, planned maintenance, and
customized egress selection
• Conclusions and future work
2
Internet Architecture
•
•
•
•
Smart hosts, and a dumb network
Network delivers packets to hosts
Services implemented on hosts
Keep most state at the edges
Edge
IP
Network
IP
Edge
But, how should we partition function vertically?
3
Today: Inside a Single Network
Shell scripts
Management Plane
• Figure out what is
Planning tools
Databases
happening in network
Configs SNMP
netflow modems • Decide how to change it
OSPF
Control Plane
Link
• Multiple routing processes
Routing
OSPF
metrics
on each router
policies
BGP
• Each router with different
configuration program
OSPF
OSPF
• Many control knobs: link
BGP
BGP
FIB
weights, access lists, policy
FIB
Traffic Engin.
Data Plane
FIB
• Packet handling by routers
Packet • Forwarding, filtering, queuing
filters
4
No State in the Network? Yeah, Right…
• Dynamic state
– Routing tables
– Forwarding tables
• Configuration state
– Access control lists
– Link weights
– Routing policies
• Hard-wired state
– Default values of timers
– Path-computation algorithms
Lots of state, updated in a distributed, uncoordinated way
5
How Did We Get in This Mess?
• Initial IP architecture
– Bundled packet handling and control logic
– Distributed the functions across routers
– Didn’t fully anticipate the need for management
• Rapid growth in features
– Sudden popularity and growth of the Internet
– Increasing demands for new functionality
– Incremental extensions to protocols & routers
• Challenges of distributed algorithms
– Some tasks are hard to do in a distributed fashion
6
What Does the Network Operator Want?
• Network-wide views
– Network topology (e.g., routers, links)
– Mapping to lower-level equipment
– Traffic matrix
• Network-level objectives
– Load balancing
– Survivability
– Reachability
– Security
• Direct control
– Explicit configuration of data-plane mechanisms
7
What Architecture Would Achieve This?
• Management plane  Decision plane
– Responsible for all decision logic and state
– Operates on network-wide view and objectives
– Directly controls the behavior of the data plane
• Control plane  Discovery plane
– Responsible for providing the network-wide view
– Topology discovery, traffic measurement, etc.
• Data plane
– Queues, filters, and forwards data packets
– Accepts direct instruction from the decision plane
8
Advantages of the New Approach
• Lower management complexity
– Complete, network-wide view
– Direct control over the routers
– Single specification of policies and objectives
• Simpler routers
– Much less control-plane software
– Much less configuration state
• Enabling innovation
– New algorithms for selecting paths within an AS
– New approaches to inter-AS routing
9
Example: Improving ISP Routing
6
2
3
4
3
9
2
1
Border router
Internal router
1.
2.
3.
4.
Provide internal reachability (IGP)
Learn routes to external destinations (eBGP)
Distribute externally learned routes internally (iBGP)
Select closest egress (IGP)
10
Is the New Architecture Feasible?
• Deployability: any way from here to there?
– Must be compatible with today’s routers
– Must provide incentives for deployment
• Speed: can it run fast enough?
– Must respond quickly to network events
– Needs to be as fast as a router
• Reliability: avoid single point of failure?
– Must be replicated to tolerate failure
– Replicas must behave consistently
11
Deployability: Don’t Change the Message Format
• Border Gateway Protocol
– Interdomain routing protocol for the Internet
– Widely implemented and used in networks
• Three main aspects of BGP
– Protocol: standard messages sent between routers
– Decision logic: multi-step route selection process
– Policy: configuration options that influence routing
• The key point is
– Although decision logic and policies are complex…
– … the protocol and message format are simple
Idea: use BGP messages to tell each router how to forward 12
Phase 1: Flexible Path Selection in One AS
Before: conventional use of BGP in backbone network
eBGP
iBGP
After: RCP learns routes and sends answers to routers
eBGP
RCP
iBGP
13
Phase 2: AS-Wide Path Selection and Export
Before: RCP gets “best” iBGP routes (and IGP feed)
eBGP
RCP
iBGP
After: RCP gets all eBGP routes from neighbors
eBGP
RCP
iBGP
14
Phase 3: Direct Communication Between RCPs
Before: RCP gets all eBGP routes from neighbors
eBGP
RCP
iBGP
After: ASes exchange routes via RCP
RCP
RCP
Inter-AS Protocol
RCP
iBGP
AS 1
Physical
peering
AS 2
AS 3
15
RCP Architecture
Routing Control
Platform (RCP)
Route Control
Server (RCS)
Available
BGP routes
Selected
BGP routes
IGP Viewer
BGP Engine
BGP
…
updates
…
Path cost
matrix
BGP
updates
…
IGP link-state
advertisements
16
Challenges and Contributions
• Reliability
– Problem: single point of failure
– Contribution: simple replication of RCP components
• Consistency
– Problem: inconsistent decisions by replicas
– Contribution: consistency without inter-replica protocol
• Scalability
– Problem: storing all routes increases cpu/memory usage
– Contribution: can support large ISP in one computer
 Building this system is feasible
17
Consistency: One RCP, One Partition
RCP 1
“Use egress A”
A
“Use egress B”
B
• Solution: Assign all routers along the
shortest IGP path the same exit router
– Ensures forwarding loops don’t arise
18
Consistency: One RCP, Many Partitions
RCP 1
Partition 1
Partition 2
• Solution: Only use state from router’s
partition in assigning its routes
– Ensures next hop is reachable
19
Consistency: Many RCPs, Many Partitions
RCP 1
Partition 1
RCP 2
Partition 2
Partition 3
• Solution: RCPs receive same IGP/BGP state
from each partition they can reach
– IGP provides complete visibility and connectivity
– RCS only acts on partition if it has complete state for it
No consistency protocol needed to
guarantee consistency in steady state
20
RCS Scalability
• Eliminate redundancy
– Store only a single copy of each BGP route
• Accelerate lookup
– Quickly find routers whose routes changed
• Avoid recomputation
– Compute routes once for groups of routers
– Don’t recompute if relative ranking of egress
routers unchanged
21
Scalability: RCS Data Structures
Global route table
(stores copies of routes)
RIB-Out shadow tables
(points to currently used
route for each router)
rtr1 rtr2 rtr3
 Prefixes
 Prefixes
BGP routes 
BGP updates
(from egress routers)
BGP updates
(to routers)
Egress lists
(points to routes that use
each egress)
rtr1
eg1
rtr2
eg1
eg2
eg2
eg3
eg3
IGP updates
22
Scalability: Standard Computing Platform
• Implementation platform
– 3.2 GHz Pentium-4
– 8 GB memory
– Linux 2.6.5 kernel
• Workload
– Routing/topology changes in AT&T’s network
• RCP performance
– Memory usage: less than 2GB
– Speed, BGP changes: less than 40 msec
– Speed, topology changes: 0.1-0.8 seconds
• System is able to keep up…
23
Application: DDoS Blackholing
• Blackholing of denial-of-service attacks
– Preconfigure a “null” route on each router
– Identify address of victim (from DoS system)
– RCP assigns a null route for the destination
RCP
“Use null route for 1.2.3.4/32”
iBGP
Victim 1.2.3.4
attack (detected by traffic analysis)
24
Application: Maintenance Dry-Out
• Dry-out of traffic before maintenance
– Plan to take a router temporarily out of service
– RCP assigns routes via new egress in advance
after
RCP
s
d
“Use route via s for d”
iBGP
before
r
Router r about to
undergo maintenance
25
Application: Customized Egress Selection
• Customer-controlled selection of egress points
– Customer with two data centers and many sites
– Customer wants to control the load balancing
– RCP customization, not simply closest egress
“Use route via s for d”
Site #1
RCP
s
d
Site #2
“Use route via r for d”
iBGP
r
26
Conclusion
• Managing IP networks is too hard
– IP architecture not designed for management
– Complex, distributed operation of routers
• Reducing complexity in the key
– Network-wide views/objectives and direct control
– Removing control logic and state from the routers
• New architecture is feasible
– RCP is deployable, scalable, and reliable
– RCP solves important operations problems
27
Future Work
• Optimization
– Real-time adaptation and offline planning
– Designing the boundary to support optimization
• Security
– Securing communication between routers & RCP
– Identifying unstable and suspicious BGP routes
– Incrementally deploying a more secure protocol
• Policy
– High-level specification of routing policies
– Quantifying reductions in configuration complexity
28