20051025-network-maltz
Download
Report
Transcript 20051025-network-maltz
Rethinking Network
Control and Management
David A. Maltz
[email protected]
1
Context for Network Control and Management
Many different network environments
Access, backbone networks
Data-center networks, enterprise/campus
Many different technologies
Longest-prefix routing, label switching, circuit switching
IP, Ethernet, MPLS, optical circuits
Outsourcing of responsibility into the network
Many different policies
2
Middle-boxes: firewalls, network monitoring, …
Routing, reachability, transit, traffic engineering, robustness
ATT/CMU Study of 31 Production networks
Provider & enterprise networks (10-1200 routers)
Many different routing designs
Packet filters, multiple OSPF instances, multiple ASs
2000
Lines in
config file 1000
0
3
0
Router ID
881
Fundamental Problem: Wrong Abstractions
Shell scripts
Traffic Eng
Planning tools
Databases
Configs SNMP
OSPF
Link
metrics
OSPF
BGP
FIB
OSPF
BGP
FIB
• Figure out what is happening in
network
netflow modems • Decide how to change it
Routing
policies
OSPF
BGP
FIBPacket
filters
4
Management Plane
Control Plane
• Multiple routing processes on
each router
• Each router with different
configuration program
• Huge number of control knobs:
metrics, ACLs, policy
Data Plane
• Distributed routers
• Forwarding, filtering, queueing
• Based on FIB or labels
Inside a Single Network
Shell scripts
Management Plane
• Figure out what is
Planning tools
Databases
happening in network
Configs SNMP
netflow modems • Decide how to change it
OSPF
Control Plane
• Multiple routing processes
Link
Routing
OSPF
metrics
on each router
policies
BGP
• Each router with different
configuration program
OSPF
OSPF
• Huge number of control
BGP
BGP
FIB
knobs: metrics, ACLs, policy
FIB
Traffic Eng
FIBPacket
filters
5
Data Plane
Distributed routers
Forwarding, filtering, queueing
Based on FIB or labels
Inside a Single Network
Shell scripts
Management Plane
• Figure out what is
Planning tools
Databases
happening in network
Configs SNMP
netflow modems • Decide how to change it
OSPF
Control Plane
• Multiple routing processes
Link
Routing
OSPF
metrics
on each router
policies
BGP
• Each router with different
configuration program
OSPF
OSPF
• Huge number of control
BGP
BGP
FIB
knobs: metrics, ACLs, policy
FIB
Traffic Eng
FIBPacket
filters
6
Data Plane
Distributed routers
Forwarding, filtering, queueing
Based on FIB or labels
Control Plane: The Key Leverage Point
Great Potential: control plane determines the
behavior of the network
Reaction to events, reachability, services
Great Opportunities
Each network (administrative domain) has its own
control plane
A radical clean-slate control plane can be deployed
– Agnostic to user data format: IPv4/v6, ethernet, circuit
– No changes to end-system software
Control plane is the nexus of network evolution
– Changing the control plane logic can smooth transitions in
network technologies and architectures
7
An Alternative: The 4D Architecture
8
Key principles
Network-level objectives
Network-wide views
Direct control
Corollaries
Predictable behavior (including overload threshold)
Zero device-specific or manual configuration
Data plane support for network-wide view
Define objectives in terms of organizationally salient
entities
Good Abstractions Reduce Complexity
Management
Plane
Control
Plane
Data Plane
Configs
FIBs, ACLs
Decision
Plane
FIBs, ACLs Dissemination
Data Plane
All decision making logic lifted out of control plane
9
Eliminates duplicate logic in management plane
Dissemination plane provides robust
communication to/from data plane switches
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Decision Plane:
10
All management logic implemented on centralized
servers making all decisions
Decision Elements use views to compute data plane
state that meets objectives, then directly writes this
state to routers
Concerns and Challenges
11
Distributed Systems issues
How will communication between routers and DEs
survive failures in the network?
Latency means DE’s view of network is behind
reality. Will the control loop be stable?
What is the overhead to/from the DEs?
What happens in a network partition?
Networking issues
Does the 4D simplify control and management?
Can we create logic to meet multiple objectives?
Evaluation of the 4D Prototype
Evaluated using Emulab (www.emulab.net)
Linux PCs used as routers (650 – 800MHz)
Tested on 9 enterprise network
topologies (10-100 routers each)
Example network with
49 switches and 5 DEs
12
Performance of the 4D Prototype
Trivial prototype has performance comparable to welltuned production networks
Recovers from single link failure in < 300 ms
< 1 s response considered “excellent”
Faster forwarding reconvergence possible
Survives failure of master Decision Element
New DE takes control within 1 s
No disruption unless second fault occurs
Gracefully handles complete network partitions
13
Less than 1.5 s of outage
Thanks!
14
Future Work
Scalability
Evaluate over 1-10K switches, 10-100K routes
Networks with backbone-like propagation delays
Structuring decision logic
Arbitrate among multiple, potentially competing objectives
Unify control when some logic takes longer than others
Protocol improvements
Deployment in today’s networks
15
Better dissemination and discovery planes
Data center, enterprise, campus, backbone (RCP)
Future Work
Expand relationships with security
Securing the infrastructure
Using 4D as mechanism for monitoring/quarantine
Formulate models that establish bounds of 4D
16
Scale, latency, stability, failure models, objectives
Generate evidence to support/refute principles
Themes of Network Control & Management
Holistic Design
Many different technologies – a few common problems
Find the right abstractions: exploit commonality
Clean Slate
How much autonomy do routers/switches need?
New principles for controlling networks
Separate networking issues from distributed system issues
Leverage Network Structure
17
Many different types of networks exist - each with different
objectives and topologies
Recent Publications
18
G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, “On
Static Reachability Analysis of IP Networks,” IEEE INFOCOM 2005, Orlando, FL,
March 2005.
J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, H.
Zhang, “Network-Wide Decision Making: Toward a Wafer-Thin Control Plane,”
Proceedings of ACM HotNets-III, San Diego, CA, November 2004.
D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A. Greenberg, H. Zhang, “Routing
Design in Operational Networks: A Look from the Inside,” Proceedings of the 2004
Conference on Applications, Technologies, Architectures, and Protocols for Computer
Communications (ACM SIGCOMM 2004), Portland, Oregon, 2004.
D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmtysson, A. Greenberg, J. Rexford,
“Structure Preserving Anonymization of Router Configuration Data,” Proceedings
of ACM/Usenix Internet Measurement Conference (IMC 2004), Sicily, Italy, 2004.
A Clean-slate Design
What are the fundamental causes of network problems?
How to secure the network and protect the infrastructure?
What functionality needs to be distributed – what can be
centralized?
How to reduce/simplify the software in networks?
What would a “RISC” router look like?
How to leverage technology trends?
19
CPU and link-speed growing faster than # of switches
Three Principles for
Network Control & Management
Network-level Objectives:
Express goals explicitly
Security policies, QoS, egress point selection
Do not bury goals in box-specific configuration
Reachability matrix
Traffic engineering rules
Management
Logic
20
Three Principles for
Network Control & Management
Network-wide Views:
Design network to provide timely, accurate info
Topology, traffic, resource limitations
Give logic the inputs it needs
Reachability matrix
Traffic engineering rules
Management
Logic
Read state info
21
Three Principles for
Network Control & Management
Direct Control:
Allow logic to directly set forwarding state
FIB entries, packet filters, queuing parameters
Logic computes desired network state, let it implement it
Reachability matrix
Traffic engineering rules
Write state
Management
Logic
Read state info
22
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Dissemination Plane:
23
Provides a robust communication channel to each
router – and robustness is the only goal!
May run over same links as user data, but logically
separate and independently controlled
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Discovery Plane:
24
Each router discovers its own resources and its local
environment
E.g., the identity of its immediate neighbors
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Data Plane:
25
Spatially distributed routers/switches
Can deploy with today’s technology
Looking at ways to unify forwarding paradigms
across technologies
Fundamental Problem: Conflation of Issues
Ideal case: all routing information flooded to
all routers inside network
26
Robustness achieved via flooding
Reality: routing information filtered and
aggregated extensively
Route filtering used to implement security and
resource policies
Route aggregation used to achieve scalability
4D Separates Distributed Computing Issues from
Networking Issues
27
Distributed computing issues ! protocols and network
architecture
Overhead
Resiliency
Scalability
Networking issues ! management logic
Traffic engineering and service provisioning
Egress point selection
Reachability control (VPNs)
Precomputation of backup paths
4D Can Leverage Network Structure
28
Decision plane logic can be specialized for
structure of each physical network
Distributed protocols must be prepared for
arbitrary topology graphs
4D enables network logic specialized differently
for access and for backbone
E.g., creating aggregation tree in access network
Advantages
Faster route computations
Retain flexibility to evolve network as needed
Support transition to 100x100 architecture
The Feasibility of the 4D Architecture
We designed and built a prototype of the 4D Architecture
4D Architecture permits many designs – prototype is a
single, simple design point
Decision plane
29
Contains logic to simultaneously compute routes and enforce
reachability matrix
Multiple Decision Elements per network, using simple election
protocol to pick master
Dissemination plane
Uses source routes to direct control messages
Extremely simple & robust
Quickly route around failed data links, even multiple failures