slides - Microsoft Research

Download Report

Transcript slides - Microsoft Research

New Directions in
Enterprise Network
Management
Aditya Akella
University of Wisconsin, Madison
MSR Networking Summit
June 2006
Enterprise Network Management
• Very broad topic…
– Tuning performance and availability of networkattached services
– Traffic sniffing for trouble-shooting
– Monitoring utilization
– Mapping network topology and resources, etc.
• Several tools (both commercial and free)
– Tailored to enterprises of different sizes, requirements
Outline
• Enterprises desire specific management
functionalities that current tools fundamentally
cannot provide
– Three examples
• Inability arises from how enterprises are
designed and operated today (IP-based)
– Decentralization and no control over routing
• Thoughts on enterprise network design
principles
– … Simplified management is a side-effect
So What’s Missing?
• Cumbersome or impossible to support
– What-If analysis
– Effective trouble-shooting
– Fine-grained resource management
• Some tools may provide one of these
– No tool provides all of them
1. What-If Analysis
New config stable? Will
bottleneck disappear?
Will upgrade violate
policy?
• What will happen if I
change X in my network?
New link/
network upgrade
– Policy/control plane level
– Reason about connectivity
before installing changes
New policies
for sales
Alternate
configuration
 Decentralized config specification
– Complex config/policy split across several devices/mechanisms
• Firewalls, Proxies, NATs, router ACLs, VLANs, port filtering
– … And across different network layers
– Hard to reason about cross-layer, cross-device interaction
2. Trouble-Shooting
• What is the current “status”
of my network?
– Who is talking to who
and how? Resource
consumption?
– Avoid overload; control
plane trouble shooting
• Information at arbitrary
granularities
– Users, machines, groups…
– Ability to go back in time
– Unexpected patterns of
communication; Protocol
usage
How many conns
from sales?
Who is using
access link?
How many
connections from
guests?
Finance grp
protocol usage
last week?
2. Trouble-Shooting
• Today…
–
–
–
–
SNMP for tracking resource consumption  Coarse-grained
Monitoring key resources  Application specific; not network-wide
Inference  Rely on heuristics, error prone
Not fine-grained enough
 Distributed decision on whether to allow flows
– Distributed and/or local to services and devices
– By default all-to-all is allowed
• Something is undesirable  local restrictions
• Use appropriate mechanism (ACLs, port filters, firewalls etc.)
– Poll to figure out what’s going on, or infer
– Hard to archive control-plane events
3. Resource Management
• Route around overloaded/failed
switches and links
– Connection latency
– Availability
Guests 
restrict b/w
Sales  virus-1 +
image-filter +
compression
• Control levels of resource
consumptions
– Prioritize applications or users
– Restrict bandwidth consumption
of “sales”
X
• Middle-boxes and proxies
– Placed at network choke points
– Ideally, deploy at diverse
locations
– Route different classes of flows
via different middleboxes
Products  virus-2
+ compression
3. Resource Management
• Limited or no support in enterprises today
– SNMP-based/manual tuning, OSPF, load-balancing
using DNS
 Lack of tight control over routing
– Forwarding tables, hop-by-hop dst IP based routing
inflexible
• Very little info used for routing
• Additional info into forwarding tables  complexity; slow
look-up
• Aggregation  No control over flows or groups of flows
– Need tighter, app flow-level control
• Forwarding tables fundamentally insufficient
Desiderata
Should AD be
allowed?
A
B
A  B using HTTP
C  D using AIM via proxy
A  D using AIM via filter
…
C
D
• Centralization:
– Of config specification (who can access what and how)
– Of enterprise-wide decision-making (should flow X be allowed)
– What-if analysis or connectivity becomes trivial
• (Offline) Analysis of a central database of policies
– Troubleshooting and forensics is simple
• Current set or complete log of accepted conn requests or active flows
Desiderata
Route AD
(AIM) through
s1p1p2s2
A
B
Route AD
(HTTP) through
s1p1s3s2
C
D
• Tight control over routing:
– Centrally pre-ordain the path of each flow
– No more designing around choke-points
• Easy to integrate arbitrary number/type of middle-boxes
– Fine-grained resource control
– Also aids trouble-shooting and what-if analysis
An Architectural View
• Take all configuration and decision-making out
of switches, routers
– Put all eggs in one basket
• Central entity tells switches how to forward
packets
– Wire a circuit for each new flow…
– … Or hand out a source route
 Switches have no forwarding table
– Dumb forwarding elements
– Under the direct control of the central controller (via
control channels)
Effect on Management
• Control-plane related management or monitoring
easy to do
–
–
–
–
–
How many connections per users?
Upgrades violate policy?
Who accessed service X?
Route different flows differently
React to failures/overload
• “Data-plane management” harder to do
– Band-width related
– E.g. Restrictions on users; Monitor Utilization
Data Plane Management
• Switches need to be slightly less dumb
– Minimal management support to enable data
plane management?
•
•
•
•
Counters per-flow?
Per-flow queuing?
Up-to-date link utilization?
Push vs pull based?