mung-class-mar08 - Department of Computer Science
Download
Report
Transcript mung-class-mar08 - Department of Computer Science
Traffic Engineering for ISP Networks
Jennifer Rexford
Computer Science Department
Princeton University
http://www.cs.princeton.edu/~jrex
Outline
Internet
Service Provider (ISP) backbones
– Wide-area network with multiple Points of Presence
– Shortest-path link-state routing between edge routers
Optimization: Tuning
routing to the traffic
– Optimizing routing given a topology and traffic matrix
– Local search to select the integer link weights
Tomography:
Inferring the traffic matrix
– Estimating traffic matrix from routing and link loads
Conclusion
and ongoing work
Example Backbone: Abilene Internet2 Newtork
Points-of-Presence (PoPs)
Inter-PoP links
– Long distances
– High bandwidth
Inter-PoP
Intra-PoP
Intra-PoP links
– Short cables between
racks or floors
– Aggregated bandwidth
Links
to other networks
– Wide range of media
and bandwidth
Other networks
Routing Inside an Internet Service Provider
Routers
flood information to learn the topology
– Routers determine “next hop” to reach other routers…
– By computing shortest paths based on the link weights
Routers
forward packets via the “next hop” link(s)
2
3
2
1
1
1
3
5
4
3
Optimization: Tuning Routing to the Traffic
Link Weights Control the Flow of Traffic
Routers
compute paths
– Shortest paths as sum of link weights
Operators
set the link weights
– To control where the traffic goes
2
3
2
1
1
31
3
5
4
3
Heuristics for Setting the Link Weights
Proportional
to physical distance
– Cross-country links have higher weights than local ones
– Minimizes end-to-end propagation delay
Inversely
proportional to link capacity
– Smaller weights for higher-bandwidth links
– Attracts more traffic to links with more capacity
Tuned
based on the offered traffic
– Network-wide optimization of weights based on traffic
– Directly minimizes key metrics like max link utilization
Why Are the Link Weights Static?
Strawman
alternative: load-sensitive routing
– Link metrics based on traffic load
– Flood dynamic metrics as they change
– Adapt automatically to changes in offered load
Reasons
why this is typically not done
– Delay-based routing unsuccessful in the early days
– Oscillation as routers adapt to out-of-date information
– Most Internet transfers are very short-lived
Research
and standards work continues…
– … but operators have to do what they can today
Big Picture: Measure, Model, and Control
Network-wide
“what if” model
Offered
Topology/
traffic
Configuration
measure
Changes to
the network
control
Operational network
Traffic Engineering in an ISP Backbone
Topology
– Connectivity and capacity of routers and links
Traffic
matrix
– Offered load between points in the network
Link
weights
– Configurable parameters for Interior Gateway Protocol
Performance
objective
– Balanced load, low latency, service level agreements …
Question:
Given the topology and traffic matrix in
an IP network, which link weights should be used?
Key Ingredients of Our Approach
Measurement
– Topology: monitoring of the routing protocols
– Traffic matrix: widely deployed traffic measurement
Network-wide
models
– Representations of topology and traffic
– “What-if” models of shortest-path routing
Network
optimization
– Efficient algorithms to find good configurations
– Operational experience to identify key constraints
Formalizing the Optimization Problem
Input:
graph G(R,L)
– R is the set of routers
– L is the set of unidirectional links
– cl is the capacity of link l
Input:
traffic matrix
– Mi,j is traffic load from router i to j
Output:
setting of the link weights
– wl is weight on unidirectional link l
– Pi,j,l is fraction of traffic from i to j traversing link l
Multiple Shortest Paths With Even Splitting
0.25
0.25
0.5
1.0
0.25
0.5
1.0
0.25
0.5
0.5
Values of Pi,j,l
Defining the Objective Function
Computing
the link utilization
– Link load: ul = Si,j Mi,j Pi,j,l
– Utilization: ul/cl
Objective
functions
– min(maxl(ul/cl))
f(x)
– min(Sl f(ul/cl))
x
1
Complexity of the Optimization Problem
NP-hard
optimization problem
– No efficient algorithm to find the link weights
– Even for the simple convex objective functions
Why
can’t we just do multi-commodity flow?
– E.g., solve the multi-commodity flow problem…
– … and the link weights pop out as the dual
– Because IP routers cannot split arbitrarily over ties
What
are the implications?
– Have to resort to searching through weight settings
Optimization Based on Local Search
Start
with an initial setting of the link weights
– E.g., same integer weight on every link
– E.g., weights inversely proportional to link capacity
– E.g., existing weights in the operational network
Compute
the objective function
– Compute the all-pairs shortest paths to get Pi,j,l
– Apply the traffic matrix Mi,j to get link loads ul
– Evaluate the objective function from the ul/cl
Generate
a new setting of the link weights
repeat
Making the Search Efficient
Avoid
repeating the same weight setting
– Keep track of past values of the weight setting
– … or keep a small signature (e.g., a hash) of past values
– Do not evaluate a weight setting if signatures match
Avoid
computing the shortest paths from scratch
– Explore weight settings that changes just one weight
– Apply fast incremental shortest-path algorithms
Limit
the number of unique values of link weights
– Do not explore all 216 possible values for each weight
Stop
early, before exploring the whole search space
Incorporating Operational Realities
Minimize
number of changes to the network
– Changing just 1 or 2 link weights is often enough
Tolerate
failure of network equipment
– Weights settings usually remain good after failure
– … or can be fixed by changing one or two weights
Limit
dependence on measurement accuracy
– Good weights remain good, despite random noise
Limit
frequency of changes to the weights
– Joint optimization for day and night traffic matrices
Application to AT&T’s Backbone Network
Performance
of the optimized weights
– Search finds a good solution within a few minutes
– Much better than link capacity or physical distance
– Competitive with multi-commodity flow solution
How AT&T changes
the link weights
– Maintenance done every night from midnight to 6am
– Predict effects of removing link(s) from the network
– Reoptimize the link weights to avoid congestion
– Configure new weights before disabling equipment
Example from My Visit to AT&T’s Operations Center
Amtrak
repairing/moving part of the train track
– Need to move some of the fiber optic cables
– Or, heightened risk of the cables being cut
– Amtrak notifies us of the time the work will be done
AT&T engineers
model the effects
– Determine which IP links go over the affected fiber
– Pretend the network no longer has these links
– Evaluate the new shortest paths and traffic flow
– Identify whether link loads will be too high
Example Continued
If
load will be too high
– Reoptimize the weights on the remaining links
– Schedule the time for the new weights to be configured
– Roll back to the old weight setting after Amtrak is done
Same
process applied to other cases
– Assessing the network’s risk to possible failures
– Planning for maintenance of existing equipment
– Adapting the link weights to installation of new links
– Adapting the link weights in response to traffic shifts
Conclusions on Traffic Engineering
IP networks
do not adapt on their own
– Routers compute shortest paths based on static weights
Service
providers need to adapt the weights
– Due to failures, congestion, or planned maintenance
Leads
to an interesting optimization problems
– Optimize link weights based on topology and traffic
Optimization
problem is computationally difficult
– Forces the use of efficient local-search techniques
Results
of the local search are pretty good
– Near-optimal solutions that minimize disruptions
Ongoing Work
Robust
link-weight assignments
– Link/node failures
– Range of traffic matrices
More
complex routing models
– Hot-potato routing
– BGP routing policies
Interaction
between ASes
– Inter-AS negotiation for joint optimization
– Grappling with scalability and trust issues
Tomography: Inferring the Traffic Matrix
Computing the Traffic Matrix Mi,j
Hard
to measure the traffic matrix
– IP networks transmit data as individual packets
– Routers do not keep traffic statistics, except link
utilization on (say) a five-minute time scale
Need
to infer the traffic matrix Mi,j from
– Current topology G(R,L)
– Current routing Pi,j,l
– Current link load ul
– Link capacity cl
Inference: Network Tomography
From link counts to the traffic matrix
Sources
5Mbps
3Mbps
4Mbps
4Mbps
Destinations
Tomography: Formalizing the Problem
Ingress-egress
pairs
– p is a ingress-egress pair of nodes (i,j)
– xp is the (unknown) traffic volume for this pair Mi,j
Routing
– Plp is proportion of p’s traffic that traverses l
Links
in the network
– l is a unidirectional edge
– ul is the observed traffic volume on this link
Relationship:
u = Px (work backwards to get x)
Tomography: One Observation Not Enough
Linear
system of n nodes is underdetermined
– Number of links e is around O(n)
– Number of ingress-egress pairs c is O(n2)
– Dimension of solution sub-space at least c - e
Multiple
observations are needed
– k independent observations (over time)
– Stochastic model with Poisson iid counts
– Maximum likelihood estimation to infer matrix
Doesn’t
work all that well in practice…
Approach Used at AT&T: Tomo-gravity
Gravitational
assumption
– Ingress point a has traffic via
– Egress point b has traffic veb
– Pair (a,b) has traffic proportional to via * veb
9
6
20
3
14
21
7
10
Approach Used at AT&T: Tomo-gravity
Problem
with gravity model
– Gravity model ignores the load on the inside links
– Gravity assumption isn’t always 100% correct
– Resulting traffic matrix might not satisfy the link loads
Combining
the two techniques
– Gravity: find a traffic matrix using the gravity model
– Tomography: find the family of traffic matrices
consistent with all link load statistics
– Tomo-gravity: find the tomography solution that is
closest to the output of the gravity model
Works
extremely well (and fast) in practice
Conclusions
Managing
IP networks is challenging
– Routers don’t adapt on their own to congestion
– Routers don’t reveal much information about traffic
Measurement
provides a network-wide view
– Topology
– Traffic matrix
Optimization
enables the network to adapt
– Inferring the traffic matrix from the link loads
– Optimizing the link weights based on the traffic matrix
New Research Direction: Design for Manage-ability
Two
main parts of network management
– Control: optimization
– Measurement: tomography
Two
research approaches
– Bottom up: do the best with what you have
– Top down: design systems that are easier to manage
Design
for manage-ability
– “If you are both the professor and the student, you create
exam questions that are easy to answer.” – Mung Chiang