Increasing the MTU of the Internet

Download Report

Transcript Increasing the MTU of the Internet

Rapid Convergence in IP Networks:
Tom Scholl, AT&T Labs
NANOG 46
Convergence?
• No, this is not about the “convergence” regarding being able
to twitter, facebook, linkedin update, irc msg, sms or myspace
on a single device using RSS or XML in this web 2.0 world.
• This presentation is covering the convergence time in IP
networks as the result of routing changes.
This covers specifically intra-domain changes. AS-to-AS
convergence is another story altogether.
2
What are we trying to solve?
• Events occur within IP networks today that result in traffic being dropped.
• This dropping can be the result of:
• Congestion
• Forwarding errors (unable to forward out an interface that has gone
down)
• TTL expiration (packets looping)
• Network convergence dictates the amount of time it takes to resolve some
of the above conditions.
• It is often highlighted that IP networks may not be able to achieve the same
level of convergence performance of SONET transport networks. However,
IP networks can achieve similar performance with the right set of tools.
3
Who cares about rapid convergence?
• A few people actually:
• VoIP and IP-based call center applications
• Several second threshold before calls are dropped
• Video (IPTV)
• Stringent requirements as to not drop any significant amount of frames resulting in
defects
• Cellular/Mobile Networks
• SIGTRAN, Voice traffic
And anyone else whose been accustomed to rapid convergence on their existing
transport services.
4
What are the targets an operator may have to
aim for?
• Each application has its own thresholds.
• Very few customers actually know the requirements of their
applications.
• But “50ms” or “sub-second” is a common customer goal
• Where did this magic 50ms come from?
• The telco world (specifications on equipment)
• Weren’t driven from customer application requirements
• No actual application that said “Hey, I need 50ms or I’m broken!”
5
What kind of events are involved in
convergence?
• Generally, link/node failures or IGP cost changes.
• Any of the above two actions will result in routers having to:
• Flood link-state-advertisements
• Recalculate the shortest-path-first (SPF) tree
• Install any applicable forwarding changes into the FIB
• Of course, the above can take awhile to occur
• Can take as long as hundreds of milliseconds to several seconds,
depending on IGP scale and hardware.
• Each router does the above action as an independent process.
• There is no ordered execution of SPF across a network.
6
What happens in IGP convergence today
• 1) Failure detection
• 2) Flooding of topology changes via IGP
• Queuing, serialization and propagation of updates
• 3) SPF recalculation
• 4) Updating the RIB (routing engine)
• 5) Update the FIB (forwarding plane)
7
Defaults aren’t going to cut it
• If you are looking to achieve rapid convergence, default router
settings are not going to work.
• Several knobs need to be modified (which vary from vendor to
vendor):
•
•
•
•
•
SPF Intervals / SPF Throttle Timers
Flooding Timers / Thresholds
FIB-related knobs
RIB-related knobs
The use of rapid-failure-detecting protocols (lowered IGP timers, L2
keepalives or BFD).
• But wait, there’s more…
8
The Fast in Fast Reroute
• Even with the right knobs, some networks will still be unable to
reach sub-100ms or even sub-second goals.
• Two main options exists for operators in this space:
• IP Fast Reroute
• MPLS-TE Fast Reroute
9
IP Fast Reroute / Loop Free Alternates
• Does not require MPLS.
• Relies upon paths that would not result in a micro-loop, but does not
require explicit-path routing.
• SRLG awareness may be an issue.
• Who actually supports SRLG identification via the IGP?
• Can work well in some environments where MPLS is too much
overhead.
• Simple topologies such as edge router uplinks work well.
• Operator doesn’t have the ability to selectively control what can be
FRR protected and what is not (not like MPLS-TE).
10
MPLS-TE Fast Reroute
• A logical interface can have a bypass LSP assigned to be used in
the event of a failure.
• Traffic that was traversing the logical interface will then be
encapsulated in the bypass LSP.
• FIB is pre-programmed with instructions on what to do in the event
of a logical interface failure.
• Does not require any recalculation at time of failure.
• Can result in sub-optimal routing momentarily
• Traffic may backtrack the way it came in order to detour around the
failure.
11
MPLS-TE Fast Reroute (cont’d)
• When you are protecting a link, make sure the bypass LSP does not
run through the same facilities / SRLG as the link you intend to
protect!
• If you protect a Chicago-NYC link, make sure the bypass via Cleveland
doesn’t run through the same fiber, span, etc.
• Being able to determine the diversity is not easy.
• If you don’t own your own facilities (and you probably don’t), you are left
to reading over circuit design layouts and comparing them.
• Even if you know what paths are diverse, building each backup LSP
by hand is not easy.
• Best to have an internal database translating IP address to SRLG
identifiers and auto-build backup LSPs.
• There once was a capability in IGPs to signal SRLGs.
12
MPLS-TE Fast Reroute (cont’d)
• FRR bypass LSPs require some amount of care besides just
diversity:
• Do you have any latency limitations?
• Do you have sufficient bandwidth?
• Modeling tools are incredibly helpful here.
• Make-before-break nature of MPLS-TE LSPs will result in
reordering as packets go from a “longer path” (bypass LSP) to
a “shorter path” (re-optimized path).
13
Bottlenecks still exist which may beyond your
control
• Even with the right protocols and the right knobs, your own routers
still have limitations:
• FIB programming/updates are limited by other factors (IPC, linecard
CPU utilization, etc).
• Attempt to detect failures as fast as possible:
• Reduce down-convergence where safe (carrier/link-transition delays)
• 802.3ah
• BFD
14
Can IP networks deliver SONET convergence
speed?
• Yes, with a properly designed and maintained network.
• Its up to the operator to determine if they want to use MPLS-TE,
IPFRR or a combination of both.
• Combination being using MPLS-TE for the “hard” stuff (WAN links) and
IPFRR for the simple topologies (edges).
• Even if you can provide some fast convergence, whose counting?
• Do you even have the monitoring systems to verify what you are
providing?
• Are you going to provide this for all users, or just specific
users/applications?
15
How you can start improving your network
today
• Cisco IOS:
•
•
•
•
•
•
•
Set lower interface carrier-delays
Enable CEF Table loadinfo force
Modify CEF linecard timers (GSR)
Modify CEF linecard IPC memory (GSR)
Lower IGP SPF throttling
Lower process-max-time
Enable IP routing purge interface
• Juniper JUNOS:
•
•
Enable next-hop indirection
Lower IGP SPF timers
16
Send questions, comments, complaints to:
Tom Scholl, AT&T Labs
[email protected]