mung-class-mar08 - Department of Computer Science

Download Report

Transcript mung-class-mar08 - Department of Computer Science

Traffic Engineering for ISP Networks
Jennifer Rexford
Computer Science Department
Princeton University
http://www.cs.princeton.edu/~jrex
Outline
 Internet
Service Provider (ISP) backbones
– Wide-area network with multiple Points of Presence
– Shortest-path link-state routing between edge routers
 Optimization: Tuning
routing to the traffic
– Optimizing routing given a topology and traffic matrix
– Local search to select the integer link weights
 Tomography:
Inferring the traffic matrix
– Estimating traffic matrix from routing and link loads
 Conclusion
and ongoing work
Example Backbone: Abilene Internet2 Newtork
Points-of-Presence (PoPs)
 Inter-PoP links
– Long distances
– High bandwidth
Inter-PoP
Intra-PoP
 Intra-PoP links
– Short cables between
racks or floors
– Aggregated bandwidth
 Links
to other networks
– Wide range of media
and bandwidth
Other networks
Routing Inside an Internet Service Provider
 Routers
flood information to learn the topology
– Routers determine “next hop” to reach other routers…
– By computing shortest paths based on the link weights
 Routers
forward packets via the “next hop” link(s)
2
3
2
1
1
1
3
5
4
3
Optimization: Tuning Routing to the Traffic
Link Weights Control the Flow of Traffic
 Routers
compute paths
– Shortest paths as sum of link weights
 Operators
set the link weights
– To control where the traffic goes
2
3
2
1
1
31
3
5
4
3
Heuristics for Setting the Link Weights
 Proportional
to physical distance
– Cross-country links have higher weights than local ones
– Minimizes end-to-end propagation delay
 Inversely
proportional to link capacity
– Smaller weights for higher-bandwidth links
– Attracts more traffic to links with more capacity
 Tuned
based on the offered traffic
– Network-wide optimization of weights based on traffic
– Directly minimizes key metrics like max link utilization
Why Are the Link Weights Static?
 Strawman
alternative: load-sensitive routing
– Link metrics based on traffic load
– Flood dynamic metrics as they change
– Adapt automatically to changes in offered load
 Reasons
why this is typically not done
– Delay-based routing unsuccessful in the early days
– Oscillation as routers adapt to out-of-date information
– Most Internet transfers are very short-lived
 Research
and standards work continues…
– … but operators have to do what they can today
Big Picture: Measure, Model, and Control
Network-wide
“what if” model
Offered
Topology/
traffic
Configuration
measure
Changes to
the network
control
Operational network
Traffic Engineering in an ISP Backbone
 Topology
– Connectivity and capacity of routers and links
 Traffic
matrix
– Offered load between points in the network
 Link
weights
– Configurable parameters for Interior Gateway Protocol
 Performance
objective
– Balanced load, low latency, service level agreements …
 Question:
Given the topology and traffic matrix in
an IP network, which link weights should be used?
Key Ingredients of Our Approach
 Measurement
– Topology: monitoring of the routing protocols
– Traffic matrix: widely deployed traffic measurement
 Network-wide
models
– Representations of topology and traffic
– “What-if” models of shortest-path routing
 Network
optimization
– Efficient algorithms to find good configurations
– Operational experience to identify key constraints
Formalizing the Optimization Problem
 Input:
graph G(R,L)
– R is the set of routers
– L is the set of unidirectional links
– cl is the capacity of link l
 Input:
traffic matrix
– Mi,j is traffic load from router i to j
 Output:
setting of the link weights
– wl is weight on unidirectional link l
– Pi,j,l is fraction of traffic from i to j traversing link l
Multiple Shortest Paths With Even Splitting
0.25
0.25
0.5
1.0
0.25
0.5
1.0
0.25
0.5
0.5
Values of Pi,j,l
Defining the Objective Function
 Computing
the link utilization
– Link load: ul = Si,j Mi,j Pi,j,l
– Utilization: ul/cl
 Objective
functions
– min(maxl(ul/cl))
f(x)
– min(Sl f(ul/cl))
x
1
Complexity of the Optimization Problem
 NP-hard
optimization problem
– No efficient algorithm to find the link weights
– Even for the simple convex objective functions
 Why
can’t we just do multi-commodity flow?
– E.g., solve the multi-commodity flow problem…
– … and the link weights pop out as the dual
– Because IP routers cannot split arbitrarily over ties
 What
are the implications?
– Have to resort to searching through weight settings
Optimization Based on Local Search
 Start
with an initial setting of the link weights
– E.g., same integer weight on every link
– E.g., weights inversely proportional to link capacity
– E.g., existing weights in the operational network
 Compute
the objective function
– Compute the all-pairs shortest paths to get Pi,j,l
– Apply the traffic matrix Mi,j to get link loads ul
– Evaluate the objective function from the ul/cl
 Generate
a new setting of the link weights
repeat
Making the Search Efficient
 Avoid
repeating the same weight setting
– Keep track of past values of the weight setting
– … or keep a small signature (e.g., a hash) of past values
– Do not evaluate a weight setting if signatures match
 Avoid
computing the shortest paths from scratch
– Explore weight settings that changes just one weight
– Apply fast incremental shortest-path algorithms
 Limit
the number of unique values of link weights
– Do not explore all 216 possible values for each weight
 Stop
early, before exploring the whole search space
Incorporating Operational Realities
 Minimize
number of changes to the network
– Changing just 1 or 2 link weights is often enough
 Tolerate
failure of network equipment
– Weights settings usually remain good after failure
– … or can be fixed by changing one or two weights
 Limit
dependence on measurement accuracy
– Good weights remain good, despite random noise
 Limit
frequency of changes to the weights
– Joint optimization for day and night traffic matrices
Application to AT&T’s Backbone Network
 Performance
of the optimized weights
– Search finds a good solution within a few minutes
– Much better than link capacity or physical distance
– Competitive with multi-commodity flow solution
 How AT&T changes
the link weights
– Maintenance done every night from midnight to 6am
– Predict effects of removing link(s) from the network
– Reoptimize the link weights to avoid congestion
– Configure new weights before disabling equipment
Example from My Visit to AT&T’s Operations Center
 Amtrak
repairing/moving part of the train track
– Need to move some of the fiber optic cables
– Or, heightened risk of the cables being cut
– Amtrak notifies us of the time the work will be done
 AT&T engineers
model the effects
– Determine which IP links go over the affected fiber
– Pretend the network no longer has these links
– Evaluate the new shortest paths and traffic flow
– Identify whether link loads will be too high
Example Continued
 If
load will be too high
– Reoptimize the weights on the remaining links
– Schedule the time for the new weights to be configured
– Roll back to the old weight setting after Amtrak is done
 Same
process applied to other cases
– Assessing the network’s risk to possible failures
– Planning for maintenance of existing equipment
– Adapting the link weights to installation of new links
– Adapting the link weights in response to traffic shifts
Conclusions on Traffic Engineering
 IP networks
do not adapt on their own
– Routers compute shortest paths based on static weights
 Service
providers need to adapt the weights
– Due to failures, congestion, or planned maintenance
 Leads
to an interesting optimization problems
– Optimize link weights based on topology and traffic
 Optimization
problem is computationally difficult
– Forces the use of efficient local-search techniques
 Results
of the local search are pretty good
– Near-optimal solutions that minimize disruptions
Ongoing Work
 Robust
link-weight assignments
– Link/node failures
– Range of traffic matrices
 More
complex routing models
– Hot-potato routing
– BGP routing policies
 Interaction
between ASes
– Inter-AS negotiation for joint optimization
– Grappling with scalability and trust issues
Tomography: Inferring the Traffic Matrix
Computing the Traffic Matrix Mi,j
 Hard
to measure the traffic matrix
– IP networks transmit data as individual packets
– Routers do not keep traffic statistics, except link
utilization on (say) a five-minute time scale
 Need
to infer the traffic matrix Mi,j from
– Current topology G(R,L)
– Current routing Pi,j,l
– Current link load ul
– Link capacity cl
Inference: Network Tomography
From link counts to the traffic matrix
Sources
5Mbps
3Mbps
4Mbps
4Mbps
Destinations
Tomography: Formalizing the Problem
 Ingress-egress
pairs
– p is a ingress-egress pair of nodes (i,j)
– xp is the (unknown) traffic volume for this pair Mi,j
 Routing
– Plp is proportion of p’s traffic that traverses l
 Links
in the network
– l is a unidirectional edge
– ul is the observed traffic volume on this link
 Relationship:
u = Px (work backwards to get x)
Tomography: One Observation Not Enough
 Linear
system of n nodes is underdetermined
– Number of links e is around O(n)
– Number of ingress-egress pairs c is O(n2)
– Dimension of solution sub-space at least c - e
 Multiple
observations are needed
– k independent observations (over time)
– Stochastic model with Poisson iid counts
– Maximum likelihood estimation to infer matrix
 Doesn’t
work all that well in practice…
Approach Used at AT&T: Tomo-gravity
 Gravitational
assumption
– Ingress point a has traffic via
– Egress point b has traffic veb
– Pair (a,b) has traffic proportional to via * veb
9
6
20
3
14
21
7
10
Approach Used at AT&T: Tomo-gravity
 Problem
with gravity model
– Gravity model ignores the load on the inside links
– Gravity assumption isn’t always 100% correct
– Resulting traffic matrix might not satisfy the link loads
 Combining
the two techniques
– Gravity: find a traffic matrix using the gravity model
– Tomography: find the family of traffic matrices
consistent with all link load statistics
– Tomo-gravity: find the tomography solution that is
closest to the output of the gravity model
 Works
extremely well (and fast) in practice
Conclusions
 Managing
IP networks is challenging
– Routers don’t adapt on their own to congestion
– Routers don’t reveal much information about traffic
 Measurement
provides a network-wide view
– Topology
– Traffic matrix
 Optimization
enables the network to adapt
– Inferring the traffic matrix from the link loads
– Optimizing the link weights based on the traffic matrix
New Research Direction: Design for Manage-ability
 Two
main parts of network management
– Control: optimization
– Measurement: tomography
 Two
research approaches
– Bottom up: do the best with what you have
– Top down: design systems that are easier to manage
 Design
for manage-ability
– “If you are both the professor and the student, you create
exam questions that are easy to answer.” – Mung Chiang