microsoft05 - Princeton University

Download Report

Transcript microsoft05 - Princeton University

Evolving Toward a
Self-Managing Network
Jennifer Rexford
Princeton University
http://www.cs.princeton.edu/~jrex
Why is Network Management So Darn Hard?
• Oodles and oodles of complex features
– Many protocols
– Many mechanisms
– Lots of tunable parameters
• Little guidance for network administrators
– Guidelines for selecting and composing features?
– Effective models/tools for setting parameters?
• Managing boxes, rather than networks
– Routers, switches, firewalls, IDSes, servers, etc.
– Low-level, box-specific configuration languages
The Enemy is Complexity
• Goal: raising the level of abstraction
– Network-level design and configuration
– Composition of protocols and mechanisms
• Idea #1: add abstraction on top
– Keep the same boxes and protocols
– Compile high-level spec into box configuration
– But, today’s systems have a lot of complexity
• Idea #2: design for manageability
– Identify the network-level abstractions we want
– Design boxes and protocols that support them
– But, can we ever get from here to there?
Design for manageability and incremental deployability
Example: Border Gateway Protocol
• ASes exchange reachability information
– IP prefix: block of destination IP addresses
– AS path: sequence of ASes along the path
• Configurable routing policies
– Path selection (which route to use?)
– Path export (who to tell about the route?)
“12.34.158.0/24: path (7018,1,88)”
“12.34.158.0/24: path (88)”
88
1
7018
data traffic
data traffic
12.34.158.5
Some Things I Hate About BGP…
• Routers in an AS have different views
– Effect: protocol oscillation and forwarding loops
– Point fix: test sufficient conditions for problem
• Path selection and export distributed across routers
– Effect: routers do not have enough information
– Point fix: complex “tagging” of BGP routes
• Policy has only an indirect effect on traffic
– Effect: hard to know what policy changes to make
– Point fix: “what if” tools for traffic engineering
• BGP route selection depends on the IGP
– Effect: disruptions from small intradomain changes
– Point fix: configure the IGP to limit the likelihood
Interdomain Routing: Design for Manageability
• Routing Control Platform
– Represents the AS to others
– Has complete view of candidate routes
– Computes answers for the AS’s routers
• Communicates with other ASes
– Using BGP or (ideally) a brand new protocol
RCP
AS 1
RCP
Physical
peering
AS 2
Inter-AS Protocol
RCP
AS 3
Advantages of RCP Approach
• Lower management complexity
– Complete, network-wide view
– Direct control over the routers
– Single specification of network policies/objectives
• Simpler routers
– Much less control-plane software
– Much less configuration state
• Enabling innovation
– New algorithms for selecting paths within an AS
– New protocols for inter-AS routing
Deployability: Backwards Compatibility using BGP
• Border Gateway Protocol (BGP)
– Protocol: messages sent between routers
– Decision logic: route-selection process
– Policy: configurable rules
• The key point is
– Complex decision logic and policies
– Yet simple protocol and message format
Use BGP messages to tell the routers what to do
Phase 1: Flexible Path Selection
Before: conventional use of BGP in backbone network
eBGP
iBGP
After: RCP learns routes and sends answers to routers
eBGP
RCP
iBGP
Only one AS has to change its architecture!
Phase 2: AS-Wide Selection and Policy
Before: RCP gets “best” iBGP routes (and IGP feed)
eBGP
RCP
iBGP
After: RCP gets all eBGP routes from neighbors
eBGP
RCP
iBGP
RCP controls all path selection and export!
Phase 3: Other ASes have RCPs
Before: RCP gets all eBGP routes from neighbors
eBGP
RCP
iBGP
After: ASes exchange routes via RCP
RCP
RCP
Inter-AS Protocol
RCP
iBGP
AS 1
Physical
peering
AS 2
AS 3
RCP enables creation of new inter-AS protocol!
Systems Considerations
• Reliability
– Problem: single point of failure
– Solution: replication of RCP components
• Consistency
– Problem: inconsistent decisions by replicas
– Solution: consistency without inter-replica protocol
• Scalability
– Problem: storing and computing for all routers
– Solution: store each route once and amortize work
RCP for a large ISP on a single high-end PC (NSDI’05)
Example Network Management Applications
• Customer-driven route selection
– Customized load-balancing policies
– Geographic rules for route selection
• Blocking denial-of-service attacks
– “Blackhole” routes that drop traffic
– Only for routers carrying attack traffic
• Hitless maintenance
– Move traffic away from certain routers
– Before the operators bring down the routers
Conclusion
• Network operations is too hard
– IP was not designed for management
– Complex, distributed operation of routers
• Must reduce complexity
– Network-wide views and objectives
– Direct control over the data plane
• New architecture is feasible
– RCP is deployable, scalable, and reliable
– RCP solves real, important problems
• Many interesting open problems
Backup Slides
Routing Control Platform (RCP)
Routing Control
Platform (RCP)
Route Control
Server (RCS)
Options
Answers
BGP Engine
…
BGP
updates
…
BGP
updates
Network
Topology
OSPF Viewer
…
OSPF link-state
advertisements
Scalability: Standard Computing Platform
• Prototype on a high-end PC
– 3.2 GHz Pentium-4 with 8 GB of RAM
– Running the Linux 2.6.5 kernel
• Workload from the AT&T backbone
– Replay the BGP and OSPF messages
• Good RCP performance
– Memory usage: less than 2GB
– Speed, BGP changes: less than 40 msec
– Speed, topology changes: 0.1-0.8 seconds
Short answer: the system can keep up
Reliability: Replication and Consistency
• Replication: avoid single point of failure
– Multiple RCPs in a network
– Connected at different places
• Consistency: no explicit coordination
– Replica has full view of each partition
– Replicas perform the same algorithm on the
same data, and get the same answer
A
RCP A
A, B
B
RCP B