2005-maltz-job-talk

Download Report

Transcript 2005-maltz-job-talk

Rethinking Network Control & Management
The Case for a New 4D Architecture
David A. Maltz
Carnegie Mellon University
Joint work with
Albert Greenberg, Gisli Hjalmtysson
Andy Myers, Jennifer Rexford, Geoffrey Xie,
Hong Yan, Jibin Zhan, Hui Zhang
1
Is the Network Down Again?
You sit at your home computer, trying to access
a computer at work…
…But no data is getting through
Minutes or hours later, data flows again…
…You never find out why
Network operators aren’t much better at predicting
outages …
2
Outline
What do networks look like today?
New approach to predicting network behavior
A new architecture for controlling networks
3
Many Kinds of Networks
Each has different
• Size – generally 10-1000 routers each
• Owner – company, university, organization
• Topology – mesh, tree, ring
Examples:
• Enterprise/Campus networks
• Access networks: DSL, cable modems
• Metro networks: connect up biz in cities
• Data center networks: disk arrays & servers
• Transit/Backbone networks
4
A Conventional View of a Network
A
E
H
C
F
B
I
J
D
G
Physical topology is a graph of nodes and links
Run Dijkstra to find route to each node
5
A Conventional View of a Network
A
E
H
C
F
B
I
J
D
G
Physical topology is a graph of nodes and links
Run Dijkstra to find route to each node
6
Network Equipment
Picture from Internet2 Abilene Network
Boxes: router, switch
Links: Ethernet, SONET, T1, …
7
The Data Plane of a Network
Hosts/servers
Router/Switch
Interfaces
8
Packets
Meta-data
Source Address
Destination Addr
Port numbers
….
User data
For this talk, networks traffic in packets
• A sequence of bytes processed as a unit
9
The Data Plane of a Network
Destination
A
NextHop
left
B
C
right
left
Forwarding Information Base (FIB)
• Basically a look-up table, each entry is a route
• Tests fields of packet and determines which
interface to send packet out
10
The Data Plane of a Network
Permit A->B
Drop C->B
Packet Filter
• Specific to a single interface
• Tests fields of packet and determines whether
to permit or drop packet
• Finer granularity than FIB – can test more
fields, even target specific applications
11
The Data Plane of a Network
Many other mechanisms…
• Queueing discipline
• Packet transformers (e.g., address translation)
12
The Control Plane of a Network
Destination
A
B
NextHop
left
right
C
left
Where do FIB entries come from?
• A distributed system called the Control Plane
Control plane failures responsible for many of the
longest, hardest to debug outages!
13
The Control Plane of a Network
Routing
Process
FIB
Routers run routing processes
14
The Control Plane of a Network
Routing
Process A,B
FIB
Routing
Process
FIB
Routing
C,D Process
FIB
Adjacent processes exchange routing information
• Information format defined by routing protocol
• Many routing protocols: BGP, OSPF, RIP, EIGRP
• Adjacent processes must use the same protocol
15
The Control Plane of a Network
Routing
Process
FIB
D
Routing
Process
D
DestinationFIB NextHop
D
left
Routing
Process
FIB
Routing protocols define logic for computing routes
• Combine all available information
• Pick best route for each destination
16
Control Plane Creates Resiliency
Routing
Process
D
left
D
Routing
Process
D
D
left
D
D
Routing
Process
D
left
17
Control Plane Creates Resiliency
Routing
Process
D
right
D
Routing
Process
D
D
left
D
Routing
Process
D
left
18
A Study of Operational Production Networks
How complicated/simple are real control planes?
• What is the structure of the distributed system?
Use reverse-engineering methodology
• There are few or no documents
• The ones that exist are out-of-date
Anonymized configuration files for 31 active networks
(>8,000 configuration files)
• 6 Tier-1 and Tier-2 Internet backbone networks
• 25 enterprise networks
• Sizes between 10 and 1,200 routers
• 4 enterprise networks significantly larger than the
backbone networks
19
Excerpts from a Router Configuration File
interface Ethernet0
ip address 6.2.5.14 255.255.255.128
interface Serial1/0.5 point-to-point
ip address 6.2.2.85 255.255.255.252
ip access-group 143 in
frame-relay interface-dlci 28
access-list 143 deny 1.1.0.0/16
access-list 143 permit any
route-map 8aTzlvBrbaW deny 10
match ip address 4
route-map 8aTzlvBrbaW permit 20
match ip address 7
ip route 10.2.2.1/16 10.2.1.7
router ospf 64
redistribute connected subnets
redistribute bgp 64780 metric 1 subnets
network 66.251.75.128 0.0.0.127 area 0
router bgp 64780
redistribute ospf 64 match route-map 8aTzlvBrbaW
neighbor 66.253.160.68 remote-as 12762
neighbor 66.253.160.68 distribute-list 4 in
20
Size of Configuration Files in One Network
2000
Lines in
config file
1000
0
0
881
Router ID (sorted by file size)
21
Routing Processes Implement Policy
Routing
Process
A,B
Routing A
Process
Routing
Process
FIB
FIB
FIB
R1
R2
R3
Extensive use of policy commands to filter routes
• Prevent some hosts from communicating:
security policy
• Limit access to short-cut links: resource policy
22
Packet Filters Implement Policy
Packet filters used extensively throughout networks
• Protect routers from attack
• Implement reachability matrix
– Define which hosts can communicate
– Localize traffic, particularly multicast
23
Multiple Interacting Routing Processes
Client
Server
OSPF
OSPF
Internet
FIB
OSPF
FIB
Policy1
OSPF
Policy2
BGP
FIB
OSPF
FIB
OSPF
FIB
25
The Routing Instance Graph of a
881 Router Network
26
Take Away Points
Networks deal with both creating connectivity
and preventing it
Networks controlled by complex distributed systems
• Must understand system to understand behavior
Focusing on individual protocols is not enough
• Composition of protocols is important and complex
Developed abstractions to model routing design
• Routing Process Graph – accurately model design
• Routing Instance – abstracts away details
• Reverse-engineer routing design from configs
27
Outline
What do networks look like today?
New approach to predicting network behavior
• Frame the problem of reachability analysis
• Sketch algebra for predicting reachability
A new architecture for controlling networks
28
Reachability
A
B
i
j
Can A send a packet to B?
• Depends on routing protocols, advertised
routes, policies, packet filters, ...
Predicting reachability is key to network
survivability and security
29
Reachability
A
B
i
j
We focus on two types of policy:
– Survivability: Certain packets should always be
permitted, under all possible network states
– Security: Certain packets should never be
permitted, under all possible network states
30
Reachability Example
R1
Chicago (chi)
R2
New York (nyc)
Data Center
Front Office
R5
R3
R4
• Two locations, each with data center & front office
• All routers exchange routes over all links
31
Reachability Example
R1
Chicago (chi)
R2
New York (nyc)
Data Center
Front Office
R5
R3
R4
chi-DC
chi-FO
nyc-DC
nyc-FO
32
Reachability Example
R1
Data Center
Packet filter:
Drop nyc-FO -> *
Permit *
Packet filter:
Drop chi-FO -> *
Permit *
R3
R2
R5
chi
Front Office
nyc
R4
chi-DC
chi-FO
nyc-DC
nyc-FO
33
Reachability Example
R1
Data Center
Packet filter:
Drop nyc-FO -> *
Permit *
Packet filter:
Drop chi-FO -> *
Permit *
R3
R2
R5
chi
Front Office
nyc
R4
A new short-cut link added between data centers
• Intended for backup traffic between centers
34
Reachability Example
R1
Data Center
Packet filter:
Drop nyc-FO -> *
Permit *
Packet filter:
Drop chi-FO -> *
Permit *
R3
R2
R5
chi
Front Office
nyc
R4
Oops – new link lets packets violate security policy!
• Routing changed, but
• Packet filters don’t update automatically
35
Reachability Example
R1
Data Center
Packet filter:
Drop nyc-FO -> *
Permit *
Packet filter:
Drop chi-FO -> *
Permit *
R3
R2
R5
chi
Front Office
nyc
R4
Typical response – add more packet filters to plug
the holes in security policy
36
Reachability Example
R1
Drop nyc-FO -> *
R2
Data Center
R5
chi
Front Office
nyc
Drop chi-FO -> *
R3
R4
Packet filters have surprising consequences
• Consider a link failure
• chi-FO and nyc-FO still connected
37
Reachability Example
R1
Drop nyc-FO -> *
R2
Data Center
R5
chi
Front Office
nyc
Drop chi-FO -> *
R3
R4
Network has less survivability than topology suggests
• chi-FO and nyc-FO still connected
• But packet filter means no data can flow!
• Probing the network won’t predict this problem
38
State of the Art in Reachability Analysis
Build the network, try sending packets
• ping, traceroute, monitoring tools
Only checks paths currently selected by routing
protocols
• Cannot be used for “what if” analysis
Our goal: Static Reachability Analysis
• Predict reachability over multiple scenarios
through analysis of router configuration files
39
Predicting Reachability
How can we formalize the reachability provided
by a network?
Ri,j(s)
i
j
• The set of packets the network will carry
from router i to router j
• A function of the forwarding state s
• s represents the contents of each FIB
• Ri,j(s) is the instantaneous reachability
40
Computing Reachability
The set of all paths from
i to j
Packets allowed along
path p
R1
R2
R4
Fi,j(s): Set of packets permitted
along link from node i to node j
in network state s
R3
41
Jointly Modeling the Effects
of Packet Filters and Routing
Key Problem:
• Fi,j(s) affected by routing and packet filters
Key Insight:
• Treat routes as dynamic packet filters
R1
Permit *->B
Drop *->*
Dest
A
B
C
R2
Permit *->A
Permit *->C
Drop *->*
NextHop
R3
R1
R3
R3
42
Bounding the Instantaneous Reachability
Knowing the exact forwarding state s is impractical
Knowing Ri,j(s) doesn’t help much, anyway
• Want to predict behavior over a range of states
Luckily, predicting behavior over set of all possible
states is easier than predicting reachability for a
single state
43
Reachability Bounds
Lower bound on Reachability
Packets in this set never prohibited by network
Upper bound on Reachability
Packets not in this set always prohibited by network
44
Example Upper Bound Analysis
R1
Packet filter:
Drop nyc-FO -> *
Permit *
R2
Packet filter:
Drop chi-FO -> *
Permit *
R5
R3
chi
nyc
R4
Before short-cut link added:
After short-cut link added:
45
Example Lower Bound Analysis
R1
Packet filter:
Drop nyc-FO -> *
Permit *
R2
Packet filter:
Drop chi-FO -> *
Permit *
R5
R3
chi
nyc
R4
Before extra packet filters added:
After extra packet filters added:
46
Take Away Points
We have defined an algebra for modeling reachability
• Packet filters, routing protocols, NAT
• Griffin&Bush validated RFC 2547 VPNs
Status
• Algebra works on test cases
• Currently experimenting with production networks
Algebra’s strength and weakness is static analysis
• Can validate that network meets static objectives
• Can have false positives
• Cannot design the network to meet objectives
• Cannot control network to obey dynamic objectives
47
Outline
What do networks look like today?
New approach to predicting network behavior
A new architecture for controlling networks
• New principles for network control
• New architecture embodying those principles
• Experimental validation
48
Does Network Control Actually Matter?
YES!
• Microsoft: All services fell off the network
for 23 hours due to misconfiguration of
routers in their network (2001)
• Major ISP: 50% of outages occur during
planned maintenance (2005)
• IP networks have 2-3x the outages as
circuit-switched networks (2005)
49
Three Principles for
Network Control & Management
Network-level Objectives:
• Express goals explicitly
• Security policies, QoS, egress point selection
• Do not bury goals in box-specific configuration
Reachability matrix
Traffic engineering rules
Management
Logic
50
Three Principles for
Network Control & Management
Network-wide Views:
• Design network to provide timely, accurate info
• Topology, traffic, resource limitations
• Give logic the inputs it needs
Reachability matrix
Traffic engineering rules
Management
Logic
Read state info
51
Three Principles for
Network Control & Management
Direct Control:
• Allow logic to directly set forwarding state
• FIB entries, packet filters, queuing parameters
• Logic computes desired network state, let it
implement it
Reachability matrix
Traffic engineering rules
Write state
Management
Logic
Read state info
52
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Decision Plane:
• All management logic implemented on
centralized servers making all decisions
• Decision Elements use views to compute
data plane state that meets objectives, then
directly writes this state to routers
53
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Dissemination Plane:
• Provides a robust communication channel to
each router
• May run over same links as user data, but
logically separate and independently controlled
54
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Discovery Plane:
• Each router discovers its own resources and
its local environment
• E.g., the identity of its immediate neighbors
55
Overview of the 4D Architecture
Network-level
objectives
Decision
Network-wide
views
Dissemination
Discovery
Direct
control
Data
Data Plane:
• Spatially distributed routers/switches
• No need to change today’s technology
56
Control & Management Today
Shell scripts
Traffic Eng
Planning tools
Databases
Config files SNMP
netflow
OSPF
Link
metrics
OSPF
BGP
FIB
OSPF
BGP
FIB
Routing
policies
OSPF
BGP
FIBPacket
filters
Management Plane
• Figure out what is
happening in network
• Decide how to change it
Control Plane
• Multiple routing processes
on each router
• Each router with different
configuration program
• Huge number of control
knobs: metrics, ACLs, policy
Data Plane
• Distributed routers
• Forwarding, filtering, queueing
• Based on FIB or labels
57
Good Abstractions Reduce Complexity
Management
Plane
Control
Plane
Data Plane
Configs
FIBs, ACLs
Decision
Plane
FIBs, ACLs Dissemination
Data Plane
All decision making logic lifted out of control plane
• Eliminates duplicate logic in management plane
• Dissemination plane provides robust
communication to/from data plane routers
58
Three Key Questions
• Could the 4D architecture ever be
deployed?
• Is the 4D architecture feasible?
• Can the 4D architecture actually simplify
network control and management?
59
Deployment of the 4D Architecture
Pre-existing industry trend towards separating
router hardware from software
• IETF: FORCES, GSMP, GMPLS
• SoftRouter [Lakshman, HotNets’04]
Incremental deployment path exists
• Individual networks can upgrade to 4D and
gain benefits
• Small enterprise networks have most to gain
60
The Feasibility of the 4D Architecture
We designed and built a prototype of the 4D
Decision plane
• Contains logic to simultaneously compute
routes and enforce reachability matrix
• Multiple Decision Elements per network, using
simple election protocol to pick master
Dissemination plane
• Uses source routes to direct control messages
• Extremely simple, but can route around failed
data links
61
Performance of the 4D Prototype
Evaluated using Emulab (www.emulab.net)
• Linux PCs used as routers (650 – 800MHz)
• Tested on 9 enterprise network topologies
(10-100 routers each)
Recovers from single link failure in < 300 ms
• < 1 s response considered “excellent”
Survives failure of master Decision Element
• New DE takes control within 1 s
• No disruption unless second fault occurs
Gracefully handles complete network partitions
• Less than 1.5 s of outage
62
4D Makes Network Management &
Control Error-proof
R1
Data Center
Packet filter:
Drop nyc-FO -> *
Permit *
Packet filter:
Drop chi-FO -> *
Permit *
R3
R2
R5
chi
Front Office
nyc
R4
chi-DC
chi-FO
nyc-DC
nyc-FO
63
Prohibiting Packets from chi-FO to nyc-DC
64
4D Makes Network Management &
Control Error-proof
R1
Drop nyc-FO -> *
R2
Data Center
R5
chi
Front Office
nyc
Drop chi-FO -> *
R3
R4
65
Allowing Packets from chi-FO to nyc-FO
66
Related Work
• Driving network operation from network-wide views
– Traffic Engineering
– Traffic Matrix computation
• Centralization of decision making logic
– Routing Control Point [Feamster]
– Path Computation Element [Farrel]
– Signaling System 7 [Ma Bell]
67
Take Aways
No need for complicated distributed system in
control plane – do away with it!
4D Architecture a promising approach
Power of solution comes from:
• Colocating all decision making in one plane
• Providing that plane with network-wide views
• Directly express solution by writing forwarding state
Benefits
• Coordinated state updates ! better reliability
• Separates network issues from distributed systems
issues
68
Summary
Networks must meet many different types of objectives
• Security, traffic engineering, robustness
Today, objectives met using control plane mechanisms
• Results in complicated distributed system
• Ripe with opportunities to set time-bombs
• Predicting static properties is possible, but difficult
Refactoring into a 4D Architecture very promising
• Separates network issues from reliability issues
• Eliminates duplicate logic and simplifies network
• Enables new capabilities, like joint control
69
Questions?
70
Backup Slides
71
Computing Reachability Bounds
• Problem reduced to estimating all routes
potentially in routing table (FIB) of each router
• Much easier than predicting exactly which
routes will be in FIB
72
How to Organize the Decision Plane?
We have exposed the network control logic
--- now what?
Need a way to structure that logic
• Mutual optimization of multiple objectives
– Potentially mutually exclusive
• Each objective has different time constants
• Multiple objectives may affect the same bit of
data-plane state
73
Future Directions
4D in different network contexts
• Ethernet networks
• Mixed networks: circuit- and packetswitched
Include services in the 4D
• Domain Name Service
• HTTP Proxies and load balancers
74
Reverse-Engineering Overview
Configuration files
Find links
Construct Layer 3 Topology
Find adjacent routing processes
Construct Routing Process Graph
Condense adjacent routing processes
AS2
OSPF #1
BGP AS1
OSPF #2
Construct Routing Instance Graph
75
Reconstruct the Layer 3 Topology
Internet
Router 1 Config
Router 2 Config
interface Serial1/0.5
interface Serial2/1.5
ip address 1.1.1.1 255.255.255.252
ip address 1.1.1.2 255.255.255.252
….
….
76
Abstract to a Routing Instance Graph
OSPF
OSPF
RT
RT
OSPF
OSPF
BGP
OSPF
OSPF
RT
Route Table
RT
AS2
OSPF #1
Policy1
BGP AS1
Policy2
OSPF #2
• Pick an unassigned Routing Process
• Flood fill along process adjacencies, labeling processes
• Repeat until all processes assigned to an Instance
77
Textbook Routing Design for Enterprise Networks
EBGP
EBGP
• Border routers speak eBGP to external peers
• BGP selects a few key external routes to redistribute
into OSPF
• 7 of 25 enterprise networks follow this pattern
OSPF
BGP
AS #1
AS2
AS3
78
Reality: A Diversity of Unusual Routing Designs
BGP
AS #2
Rest of
the
World
EBGP
BGP
AS #3
EBGP
BGP
AS #1
EBGP
BGP
AS #4
EBGP
BGP
AS #5
• Network broken up into compartments, each with
only 1 to 4 routers
• Each compartment has its own AS number
• Hub and spoke logical topology
• Why? Lots of control over how spokes communicate
79
Reality: A Diversity of Unusual Routing Designs
Rest of
the
World
EBGP
EIGRP
Rest of
the
World
EBGP
BGP
AS #1
EBGP
EIGRP
BGP
AS #4
BGP
AS #2
EIGRP
EBGP
BGP
AS #3
• Network broken up into many compartments, each running EIGRP,
some with 400+ routers
• BGP used to filter routes passed between compartments
• Compartments themselves pass information between BGP speakers
• Why? Little need for IBGP; few routers speak BGP; Lots of control
over how packets move between compartments
80
Link Down
81
Reconvergence Time Under
Single Link Failure
82
Reconvergence Time When
Master DE Crashes
83
Reconvergence Time When
Network Partitions
84
Reconvergence Time When
Network Partitions
85
Slides in Progress
or Looking for a Place to go
86
Separation of Issues
The 4D Architecture separates issues
• Networking logic goes into decision plane
87
Dissemination Plane
Make clear that dissem paths can use
same physical links, but different routing
Discovery and dissem packets can be
independent of data-plane (e.g. IP)
IP is very configuration intensive
(addresses, etc) so we avoid it
whenever possible
88
Questions
What if I want to take a bunch of hosts
and stick them together into a small
network? Haven’t you made this
common case terrifically hard?
• Today, I’d use static routes – it’s neither
common nor easy
• In the 4D model, what do I do?
– DE co-located on the host
– Doesn’t talk to any other DEs or routers
89
Problems with State of the Art
Today: Network behavior determined by multiple
interacting distributed programs, written in
assembly language
• No way to visualize or describe routing design
• Impossible to establish linkage between
configurations and network objectives
• Only a few “textbook” routing designs are
widely known
90