Transcript PPT Version

TRILL Routing Scalability
Considerations
Alex Zinin
[email protected]
IETF-62
TRILL BOF
General scalability framework

About growth functions for




Scaling parameters




Data overhead (Adj’s, LSDB, MAC entries)
BW overhead (Hellos, Updates, Refr’s/sec)
CPU overhead (comp complexity, frequency)
N—total number of stations N
L—number of VLANs
F—relocation frequency
Types of devices


Edge switch (attached to a fraction of N, and L)
Core switch (most of L)
IETF-62
TRILL BOF
Scenarios for analysis

Single stationary bcast domain




Bcast domain with mobile stations
Multiple stationary VLANs



No practical station mobility
N = O(1K) by natural bcast limits
L = O(1K) total, O(100) visible to switch
N = O(10K) total
Multiple VLANs with mobile stations
IETF-62
TRILL BOF
Protocol params of interest

What







Why




Amount of data (topology, leaf entries)
Number of LSPs
LSP refresh rate
LSP update rate
Flooding complexity
Route calculation complexity & frequency
Required memory [increase] as network grows
Required mem & CPU to keep up with protocol dynamics
Link BW overhead to control the network
How:


Absolute: big-O notation
Relative: compare to e.g. bridging & IP routing
IETF-62
TRILL BOF
Why is this important

If data-inefficient:




Increased memory requirements
Frequent memory upgrades as network grows
Much more info to flood
If comput’ly inefficient:



Substantial comp power increase == marginal network size
increase
High CPU utilization
Inability to keep up with protocol dynamics
IETF-62
TRILL BOF
Link-state Protocol Dynamics


Network events are visible everywhere
Main assumption for stationary networks:



For each node:






Rprc >> Rinp
What if (Rprc < Rinp) ???


Rinp—input update rate (network event frequency)
Rprc—update process rate
Long-term convergence condition:


Network change is temporary
Topology stabilizes within finite T
Micro bursts are buffered by queues
Short-term (normal for stat. nets): update drops, rexmit, convergence
Long-term/permanent: net never converges, CPU upgrade needed
Rprc = f (proto design, CPU, implementation)
Rinp = f (proto design, network)
IETF-62
TRILL BOF
Data-plane parameters

Data overhead


Number of MAC entries in CAM-table
Why worry?





CAM-table is expensive
1-8K entries for small switches
32K-128K for core switches
Shared among VLANs
Entries expire when stations go silent
IETF-62
TRILL BOF
Single Bcast domain (CP)

Total of O(1K) MAC addresses


IS-IS update packing:




4 addr’s per TLV (TLV is 255B max)
20 addr’s per LSP fragment (1470B default)
~5K add’s per node (256 frags total)
LSP refresh rate:



Each address: 12bit VLAN tag + 48bit MAC = 60 bits
1K MACs = 50 LSPs
1h renewal = 1 update every 72 secs
MAC update rate:

Depends on MAC learning & dead detection procedure
IETF-62
TRILL BOF
MAC learning

Traffic + expiration (5-15m):




Announces station activity
1K stations, 30m fluctuations = 1 update every 1.8 seconds
average
Likely bursts due to “start-of-day” phenomenon
Reachability-based



Start announcing MAC when first heard from station
Assume it’s there until have seen evidence otherwise even if
silent (presumption of reachability)
Removes activity-sensitive fluctuations
IETF-62
TRILL BOF
Single bcast domain (DP)

Number of entries

Bridges: f (traffic)




Limited by local config, location within network
Rbridge: all attached stations
No big change for core switches (see most MACs)
May be a problem for smaller ones
IETF-62
TRILL BOF
Single bcast: summary


With reachibility-based MAC announcements…
CP is well within the limits of current link-state routing
protocols




CP data overhead is O(N)




Can comfortably handle O(10k) routes
Dynamics are very similar
There’s an existence proof that this works
Worse than IP routing: O(log N)
However, net size is upper-bound by bcast limits
Small switches will need to store & compute more
Data-plane may require bigger MAC tables in smaller
switches
IETF-62
TRILL BOF
Note: comfort limit


Always possible to overload neighbor w updates
Update flow control is employed


Experience-based heuristics: pace updates at 30/sec




Dynamic is possible, yet…
Not a hard rule, ballpark
Limits burst Rinp for neighbor
Prevents drops during flooding storms
Given the (Rprc >> Rinp) condition, want average to be
an order of magnitude lower, e.g. O(1) upd/sec Max
IETF-62
TRILL BOF
Note: protocol upper-bound



LSP generation is paced: normally not more frequent
than each 5 secs
Each LSP frag has it’s own timer
With equal distribution


Max node origination rate == 51 upd/sec
Does not address long-term stability
IETF-62
TRILL BOF
Single bcast + mobility

Same number of stations



Different dynamics
Take IETF wireless network, worst case





~700 stations
New location within 10 minutes
Average 1 MAC every 0.86 sec or 1.16 MAC/sec
Note: every small switch in VLAN will see updates
How does it work now???


Same data efficiency for CP and DP
Bridges (APs + switches) relearn MACs, expire old
Summary: dynamics barely fit within comfort range
IETF-62
TRILL BOF
Multiple VLANs


Real networks have VLANs
Assuming current proposal is used


Two possibilities:



Standard IS-IS flooding
Single IS-IS instance for whole network
Separate IS-IS instance per VLAN
Similar scaling challenges as with VR-based L3
VPNs
IETF-62
TRILL BOF
VLANs: single IS-IS



Assuming reachability-based MAC announc’t
Adjacencies and convergence scale well
However…




Easily hit 5K MAC/node limit (solvable)
Every switch sees every MAC in every VLAN
Even if it doesn’t need it
Clear scaling issue
IETF-62
TRILL BOF
VLANs: multiple instances



MAC announcements scale well
Good resource separation
However…





N
N
N
N
adjacencies for a VLAN trunk
times more processing for a single topological event
times more data structures (neighbors, timers, etc.)
=100…1000 for a core switch
Clear scaling issue for core switches
IETF-62
TRILL BOF
VLANs: data plane

Core switches



Not big difference
Exposed to most MACs in VLANs anyway
Smaller switches


Have to install all MACs even if a single port on a
switch belongs to a VLAN
May require bigger MAC tables than available today
IETF-62
TRILL BOF
VLANs: summary

Control plane:


Currently available solutions have scaling issues
Data plane:

Smaller switches may have to pay
IETF-62
TRILL BOF
VLANs + Mobility




Assuming some VLANs will have mobile stations
Data plane: same as stationary VLANs
All scaling considerations for VLANs apply
Mobility dynamics get multiplied





Single IS-IS: updates hit same adjacency
Multiple IS-IS: updates hit same CPU
Activity not bounded naturally anymore
Update rate easily goes outside comfort range
Clear scaling issues
IETF-62
TRILL BOF
Resolving scaling concerns




5K MAC/node limit in IS-IS could be solved with
RFC3786
Don’t use per-VLAN (multi-instance) routing
Use reachability-based MAC announcement
Scaling MAC distribution requires VLAN-aware flooding:




Each node and link is associated with a set of VLANs
Only information needed by the remote nbr is flooded to it
Not present in current IS-IS framework
Forget about mobility ;-)
IETF-62
TRILL BOF