lockheed-jan05 - Princeton University
Download
Report
Transcript lockheed-jan05 - Princeton University
Route Control Platform
Making the Network Act Like One Big Router
Jennifer Rexford
Princeton University
http://www.cs.princeton.edu/~jrex
http://www.cs.princeton.edu/~jrex/papers/rcp.pdf
1
Outline
• Internet architecture
– Complexity of network management
• Moving control from routers to servers
– Reducing complexity and increasing flexibility
• Traffic engineering example
– Today’s approach vs. the RCP
• Making the RCP real
– Deployability, scalability, and reliability
• Example applications
– Security, maintenance, and customer control
2
Internet Architecture
• The Internet is
–
–
–
–
Decentralized: loose confederation of peers
Self-configuring: no global registry of topology
Stateless: limited information in the routers
Connectionless: no fixed connection between hosts
• These attributes contribute
– To the success of Internet
– To the rapid growth of the Internet
– … and the difficulty of controlling the Internet!
sender
receiver
3
A Well-Studied Architecture Question
•
•
•
•
Smart hosts, dumb network
Network moves IP packets between hosts
Services implemented on hosts
Keep state at the edges
Edge
IP
Network
IP
How to partition function vertically?
Edge
4
Inside a Single Network
Shell scripts
Management Plane
• Figure out what is
Planning tools
Databases
happening in network
Configs SNMP
netflow modems • Decide how to change it
OSPF
Control Plane
• Multiple routing processes
Link
Routing
OSPF
metrics
on each router
policies
BGP
• Each router with different
configuration program
OSPF
OSPF
• Huge number of control
BGP
BGP
FIB
knobs: metrics, ACLs, policy
FIB
Traffic Eng
FIBPacket
filters
Data Plane
• Distributed routers
• Forwarding, filtering, queuing
5
Inside a Single Network
Shell scripts
Management Plane
• Figure out what is
Planning tools
Databases
happening in network
Configs SNMP
netflow modems • Decide how to change it
State everywhere!
OSPF
Control Plane
• Link
Dynamic state in Routing
forwarding
tablesrouting processes
• Multiple
OSPF
metrics
onpolicies,
each router
policies
• Configured
state
in
settings,
packet filters
BGP
• Each router with different
• Programmed state in magicconfiguration
constants, program
timers
OSPF
OSPF
• Many
dependencies
between
bitsnumber
of stateof control
•
Huge
BGP
BGP
FIB
knobs:
metrics, ACLs,
policy
State updated in uncoordinated,
decentralized
way!
FIB
Traffic Eng
FIBPacket
filters
•
•
•
•
Data Plane
Distributed routers
Forwarding, filtering, queueing
Based on FIB or labels
6
How Did We Get in This Mess?
• Initial IP architecture
– Bundled packet handling and control logic
– Distributed the functions across routers
– Didn’t anticipate the need for management
• Rapid growth in features
– Sudden popularity and growth of the Internet
– Increasing demands for new functionality
– Incremental extensions to protocols & router software
• Challenges of distributed algorithms
– Some functions are hard to do in a distributed fashion
7
What Does the Operator Want?
• Network-wide views
– Network topology
– Mapping to lower-level equipment
– Traffic matrix
• Network-level objectives
–
–
–
–
Load balancing
Survivability
Reachability
Security
• Direct control
– Explicit configuration of data-plane mechanisms
8
What Architecture Would Achieve This?
• Management plane Decision plane
– Responsible for all decision logic and state
– Operates on network-wide view and objectives
– Directly controls the behavior of the data plane
• Control plane Discovery plane
– Responsible for providing the network-wide view
– Topology discovery, traffic measurement, etc.
• Data plane
– Queues, filters, and forwards data packets
– Accepts direct instruction from the decision plane
9
Example Application: Traffic Engineering
• Problem: Adapt routing to the traffic demands
– Inputs: network topology and traffic matrix
– Outputs: routing of traffic that balances load
• Three ways to solve the problem
– Extend the control plane to adapt to load
– Management plane, with today’s control plane
– Decision plane
10
Interior Gateway Protocol (OSPF/IS-IS)
• Routers flood information to learn the topology
– Determine “next hop” to reach other routers…
– Compute shortest paths based on the link weights
• Link weights configured by the network operator
2
3
2
1
1
1
3
5
4
3
11
Control Plane: Let the Routers Adapt
• Strawman alternative: load-sensitive routing
– Link metrics based on traffic load
– Flood dynamic metrics as they change
– Adapt automatically to changes in offered load
• Reasons why this is typically not done
– Delay-based routing unsuccessful in the early days
– Oscillation as routers adapt to out-of-date information
– Most Internet transfers are very short-lived
• Research and standards work continues…
– … but operators have to do what they can today
12
Management Plane: Measure, Model, Control
optimize
Network-wide
“what if” model
Offered
Topology/
traffic
Configuration
measure
Changes to
link weights
control
Operational network
13
Management Plane Approach
• Topology
– Connectivity and capacity of routers and links
• Traffic matrix
– Offered load between points in the network
• Link weights
– Configurable parameters for routing protocol
• Performance objective
– Balanced load, low latency, service agreements …
• Question: Given the topology and traffic matrix,
which link weights should be used?
14
Management Plane Solution
• Measure
– Topology: monitoring of the routing protocols
– Traffic matrix: widely deployed traffic measurement
• Model
– Representations of topology and traffic
– “What-if” models of the routing protocol
• Optimize
– Efficient local-search algorithms to find good settings
– Operational experience to identify key constraints
http://www.cs.princeton.edu/~jrex/papers/ieeecomm02.pdf
15
This Works, But Has Some Limitations
• “What-if” model
– Repeats the logic implemented in the control plane
– Duplication of functionality, and debugging
• Optimization techniques
– Local search because the problem is intractable
– Too much computation to explore all possibilities
• Network effects
– Link-weight changes are disruptive
– Routers must converge after each change
– Leads to transient packet loss and delay
16
Decision Plane Solution
• Measure
– Topology: monitoring of the routing protocols
– Traffic matrix: widely deployed traffic measurement
• Optimize the routing
– Compute desired forwarding paths directly
– Simpler than optimizing the link weights
• Instruct the routers
– Could change one router at a time to gradually switch
to the new routes
– Avoid transient packet loss and delays
17
More Network-Level Objectives
• Survivability
– Routing that can tolerate any single equipment failure
– Incorporate knowledge of shared risk groups
• Reachability policies
– Control which pairs of hosts can communicate
– Install packet filters and forwarding-table entries
• Security
– Install “blackhole” routes that drop attack traffic
– Keep routing tables within router storage limits
• Etc.
18
Is The Decision Plane Feasible?
• Deployability: any path from here to there?
– Must be compatible with today’s routers
– Must provide incentives for deployment
• Speed: can it run fast enough?
– Must respond quickly to network events
– Needs to be as fast as a router
• Reliability: single point of failure?
– Must be replicated to tolerate failure
– Replicas must behave consistently
19
Deployability
• Take a lesson from Ethernet
– Change anything but the message format
• Border Gateway Protocol (BGP)
– Interdomain routing protocol for the Internet
• Widely implemented on existing routers
• Widely used, especially in backbone networks
– Three main aspects of BGP
• Protocol: standard messages sent between routers
• Vendors: path-selection logic on individual routers
• Operators: configuration of policies for path selection
– Logic and policies are complex, but messages simple
20
Deployment in a Single Network
Before: conventional use of BGP in a backbone network
eBGP
iBGP
After: RCP learns external routes and sends answers to the routers
eBGP
RCP
iBGP
Only one AS has to change its architecture!
21
Longer Term, Wide-Spread Deployment
• Represents an AS as a single logical entity
– Complete view of AS’s routes
– Computes routes for all routers inside an AS
• Exchanges routing information with other ASes
– Using BGP or a new inter-AS protocol
– While still using BGP to talk to the routers
RCP
RCP
Inter-AS Protocol
RCP
iBGP
AS 1
Physical
peering
AS 2
AS 3
22
RCP Architecture
Route Control
Server
BGP Engine
Brain
R
RCP
Brawn
OSPF Viewer
Route Control
Server
BGP Engine
R
R
R
OSPF Viewer
R
R
R
Scalability through decomposition; reliability through replication
23
Scalability: Three-Part RCP Architecture
• OSPF viewer
– Continuous view of network topology
– Passive monitoring of link-state advertisements
• BGP engine
– Collecting BGP updates from border routers
– Sending chosen routes to the router
– Lots of TCP connections, like a Web server
• Route Control Server
– Logic for computing answers for the routers
– Configuration for controlling the logic
– Operates on real-time feeds from the monitors
24
Scalability: Initial Prototype
• Implementation platform
– 3.2 GHz Pentium-4
– 8 GB memory
– Linux 2.6.5 kernel
• Workload
– Routing/topology changes in AT&T’s network
• RCP performance
– Memory usage: less than 2GB
– Speed, BGP changes: less than 40 msec
– Speed, topology changes: 0.1-0.8 seconds
• System is able to keep up…
25
Reliability
• Replication: avoid single point of failures
– Multiple RCPs per network
– Connected at different places
• Consistency: replicas act as one
– Replicas performing the same algorithm on the same
input get the same answer (eventually)
– Replica has complete view of each partition it sees
A
A
A, B
B
B
26
Application: DDoS Blackholing
• Blackholing of denial-of-service attacks
– Preconfigure a “null” route on each router
– Identify address of attack victim (from DoS system)
– RCP assigns the destination address to the null route
RCP
“Use null route for 1.2.3.4/32”
iBGP
Victim 1.2.3.4
attack (detected by traffic analysis)
27
Application: Maintenance Dry-Out
• Dry-out of traffic before maintenance
– Plan to take a router out of service
– RCP assigns routes via new egress points in advance
after
RCP
s
d
“Use route via s for d”
iBGP
before
r
Router r about to
undergo maintenance
28
Application: Customized Egress Selection
• Customer-controlled selection of egress points
– Customer with two data centers and many sites
– Customer wants to control the load balance
– RCP customization (not simply closest egress)
“Use route via s for d”
Site #1
RCP
s
d
Site #2
“Use route via r for d”
iBGP
r
29
Conclusion
• Managing IP networks is too hard
– IP architecture not designed for management
– Complex, distributed operation of routers
• Reducing complexity in the key
– Network-wide views & objectives, and direct control
– Removing control logic and state from the routers
• New architecture is feasible
– RCP is deployable, scalable, reliable
– RCP solves important operations problems
30