Micro-loop Prevention Methods

Download Report

Transcript Micro-loop Prevention Methods

Micro-loop Prevention
Methods
draft-bryant-shand-lf-conv-frmwk-00.txt
draft-zinin-microloop-analysis-00.txt
Outline of Talk
•
•
•
•
Convergence Strategy and Motivation
Solution Taxonomy
Existing Solution Space
Summary
Traditional convergence strategy
• Switch to new as fast as you can independently
• Required for failures
– Strategy optimized for this case
– traffic to failed element is lost
– so speed is essential
• Used for everything else
– common method
– traffic can be lost due to loops
Fast-Reroute prevents traffic loss due to
failure but loops can still cause loss.
Micro-Loop Properties
• Independent decisions can cause micro-loops.
• Loops may occur between pairs of nodes or cycles of
nodes.
A
Duration depends on relative
time to update FIBs.
1
–Implementation differences
S
–Number of affected destinations
1
–Propagation time
1
E
B
1
4
F
3
1
D
Loss due to Loop duration may be longer (an
order of magnitude) than Loss during the
Fast Reroute failover.
Controlled convergence
• Made feasible for failure case by fast reroute
– Traffic is not lost so can afford to take time
– Can use common method for both failure and
management change events
• Traditional convergence optimized for failure
case without fast-reroute.
We can do better…
(but keep traditional as safe fall-back for single
failure assumption violation.)
Solution taxonomy
• Controlled Information flow
– Incremental cost change
• Controlled Distributed Behavior
– Synchronized FIB installation
– Ordered FIB changes
– Path locking
Method Comparison
Name
Pros
Cons
Delay (in FIB
compute/install
Inc. Cost
Change
Only local router
needs to support
Unacceptable delay
(Bounded, but can be
hours!)
Unacceptably high
(Bounded by max
metric used)
Synch.
FIB Install
Seems simple…
Couples NTP to
Routing.
Implementation is
complicated variances may cause
very short loop.
Minimal
(1)
Ordered
SPFs
Control plane only.
SRLGs require
destination-based
decision.
High (Bounded by
network diameter)
Path
Locking
Deals with SRLGs
& uncorrelated
changes.
Various depending
on sub-method
Completeness
depends on additional
forwarding
mechanisms.
Small (3)
Incremental cost change
•
•
•
•
•
A change in a link cost of x can only cause loops
whose “cyclic” cost is <=x
Minimum cycle is 2 (1 in each direction)
Hence cost change of 1 can never cause a loop.
Where minimum cycle is larger, larger increments can
be used.
Once cost reaches cost of alternate path no more
loops possible.
No Cooperation Required –
But Can Take Hours
Synchronized FIB swap
• Network synchronized change-over at
predetermined time
– Signal/determine time to change
– Network Synchronized Time (NTP is there)
• Either Two FIBs for fast swap
– Substantial hardware implications
• Or FIB update “fast-enough” from change-over
time.
• Dependent on NTP
Conceptually simple with minimal signalling –
NTP dependency & implementation concerns
Ordering by signalling alone
• On change, tell old primary neighbors to wait for
you
• Wait for all neighbors as instructed, install FIB,
and tell your old primary neighbors.
• Assumes a single non-SRLG failure
– Otherwise communication per destination is required
No Estimation Required for FIB Compute/Install Require Reliable Fast Signalling and
Non-Trivial Protocol Extensions
Ordered FIB changes
• For any isolated link/node change
• Determine “safe” ordering for FIB
installation
– bad news: update from edge to failure,
– good news: update from change to edge
• Each router computes its “rank” with
respect to the change.
• Delays for a number of worst-case FIB
compute/install times proportional to its
rank.
Computing the ordering
• Single Reverse SPF rooted at change node
– Use old SPT to determine relevant node
• For bad news:- count maximum depth of
sub-tree below you
• For good news:- count maximum hops to
change
Delay Proportional to Network Diameter
• For Good News, rSPF gives necessary Calc Delay 0
Needed Delay 0
depth.
• For Bad News, rSPF is overly
Calc Delay N
pessimistic for some topologies.
Needed Delay 0
• Strategies to reduce unnecessary delay
– Prune rSPF by only considering the branch
across the failure – but still too pessimistic.
– Run SPF rooted at edge nodes to correctly
prune them – but doesn’t scale.
– Compare rSPFs before and after failure
Calc Delay N+1
Needed Delay 1
B
A
1
5
S
1
G
F
10
1
E
1
D
Avoids all micro-loops and requires single FIB install.
Delay dependent on network diameter so may be
unacceptable.
Signalling optimization to Reduce delay
• Use actual FIB compute/install instead of worstcase
– In many cases, actual delay is 0 b/c no change
needed.
• Signal to parents in rSPF when
– Nothing to do, or
– Completed FIB changes
• Can change FIB when received signal from all
children (or when delay expires)
• Only an optimization
– Loss of signals falls back to delay based
SRLG Concerns
• Diverse failures may require mutually
incompatible ordering
• Different orderings for individual
destination sets may help
• Need Rules to merge multiple rSPFs
Ordered SPF Summary
• No forwarding changes required.
• No signalling required at time of change.
• Complete prevention of loops for isolated node
or link changes.
• Requires cooperation from all routers
• May delay re-convergence for tens of seconds
(unless optional signalling used)
• SRLGs require per destination delays and may
delay re-convergence more.
Path Locking Framework
•
•
•
•
Obtain a fixed convergence delay regardless
of network.
Avoid ordering issue by providing transitional
paths.
Handles SRLGs
Different methods to
–
–
Determine/Create transitional paths
Direct traffic to use transitional paths
Standard trade-off of complexity versus coverage.
1.
2.
3.
4.
Tunnels for Transitional Paths
Safe Neighbors for Transitional Next-Hops
Marked Packets to Use Transitional Topology
U-turn Packets to Use New Topology
Time-Line of Convergence
• Change Discovery Time – At this point, all routers
know about the change. Routers install
transitional path support.
– For some methods, immediately start use of selfdetermined transitional paths.
• Use Transitional Paths Time (1 worst-case FIB
compute/install later) – All routers use transitional
paths, if available, and new primary next-hops
otherwise.
• Lock to New Topology Time (1 worst-case FIB
compute/install later) – All routers use new primary
next-hops.
All micro-loops avoided if a transitional path
always exists.
Create Tunnels
• Requires tunnel computation/creation at
topology change
• Old topology Locking
– Tunnel to the upstream side of the failure
– Single tunnel for all affected destinations (if link/node
failure).
• New topology locking
– Tunnel to first unaffected router on new primary path
• Tunnels provide a transitional path that can
traverse non-supporting routers.
• Non-supporting routers can only loop locally
originated traffic.
Safe Neighbors
• Find a safe neighbor to use as a transitional
next-hop.
– Safety condition is a neighbor that is loop-free on old
topology and a downstream path on new topology.
• If two neighboring routers don’t have a safe
neighbor, a micro-loop can form on that link.
• Analysis of real topologies shows pretty good
coverage.
• Local micro-loops possible with non-supporting
routers.
Typical Coverage
Packet Marking
•
•
•
•
Can mark packets to force forwarding
according to a particular topology.
Topology can be new or old.
All marking starts at the Use Transitional
Paths Time
If using new topology, traffic on new
topology after 1+ worst-case FIB
compute/install delay.
U-turn Packet
• Create transitional next-hop by directing U-turn
packets to the new primary next-hops.
• At Use Transitional Paths Time, send traffic to
new primaries (potentially explicitly marked as
U-turn packets).
• If implicitly determined U-turn packets, doesn’t
require marking.
• Explicit method for signalling support of U-turns
Lots of Possibilities…
• What are important criteria?
– Time to be converged
• Affects single failure assumption
• Network Stability
• Ballpark requirement is 10s
– Simplicity
– Support for SRLGs
– No additional mechanisms beyond IP (but
coverage may suffer…)
– Common additional mechanisms for this and
IPFRR advanced methods.
– Should also work for LDP
Conclusions & Next Steps
• Incremental Cost Change is impractical.
• Synchronized FIB Swap – what is the implementation
complexity? Implications of coupling NTP to routing?
• Ordered SPF – long delay and poor SRLG support. Is
that enough to be an issue?
• Path Locking
– Seem most promising
– Many possibilities to get similar results
Please send suggestions and comments to the list.
This solution set may not be complete.