Scaling beyond 10G: When what you have is never enough…
Download
Report
Transcript Scaling beyond 10G: When what you have is never enough…
Scaling beyond 10G:
When what you have is never enough…
Mike Hughes <[email protected]>
CTO
London Internet Exchange
Brief History of LINX
Founded in 1994 by 5 ISPs
–
–
–
–
–
Pipex (the original “Pipex”, now MCI/Uunet)
Demon Internet
BTnet
UKERNA
EUnet GB (later PSInet, now Telstra UK)
A switch (well 10Mb hub!) in Telehouse
– Volunteer staff
Architecture Development - 1996
A FDDI ring based
architecture
– Cisco and Plaintree
switches
– FDDI, 100Mb TX
and 10Mb
connections
Full time staff
Architecture Development - 1998
Gigabit Ethernet
switches
– First Metro GigE
deployment in EU
Multiple site IX
Multiple vendor
– Packet Engines
– Extreme
Broke the 1G mark
in Nov 1999
Cathartic Events in 2000
There was an attempt to take LINX
commercial in the wake of the boom
Orchestrated by a number of LINX directors
with external backing/funding
Member reaction – “LINX is not for sale!”
– Concerns about LINX becoming open to capture
Reaffirmed the mutual, not-for-profit model
LINX Today
211 members from around 30 different
countries
– Still strong UK contingent (about 50%)
– Most continents represented
7 co-locations in London Docklands
Dual LAN, Dual Vendor nx10G network
– Foundry and Extreme platforms
– Not interconnected
– Both platforms/networks in each location
Meeting the 10G Challenge
LINX was a very early adopter of 10G
– Foundry network first in late 2001
• It just worked!
– Removed the need to buy WDM equipment
• Costly at the time
That’s been upgraded to nx10G in the
backbone as traffic has grown
But now networks attaching to LINX at 10G
– Presenting challenges for the backbone
10G Switches
Upgrade Process
We started upgrading our Foundry platform in
2004
– BigIron MG8 switches
– Not a trouble free experience
– Now have 13 members connected via 10GE
Now upgrading the Extreme platform to an
equivalent spec
– And then upgrade Foundry again!
We love pain!
Two networks give us lots of extra
redundancy and flexibility
– Does mean we get to do things twice, though!
This year, LINX will upgrade the Extreme
platform to be of an equivalent spec
– Both networks need to be roughly equal
Test as much as possible, then test it again!
– Can you be too thorough?
Agreed acceptance criteria with vendor
– Especially for the first system
Interesting packet size datapoint
Packet Size Distribution at LINX
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1024-1518
512-1023
256-511
128-255
65-127
0-64
Vendor Selection: What Matters?
10G port density
1G port density
Uniform, predictable packet performance
– Especially at smaller frame sizes!
Important features
– Particularly trunking/LACP
High Availability
– Hitless failover/upgrade, redundancy model
Challenges to come
Scaling the network for multiple 10G
connections from members
Little sign of active development in 40G/100G
arena
– Meaning nx10G is best we can expect for now
Being able to provide uniform service in
multiple locations
Potential for massive traffic growth…
Scary Doom Curve
Scarier – 3 months later
Where’s it all coming from?
Increased access speeds
– ADSL2, WiMAX, FTTx, buzzword,
buzzword…
More applications
– VoIP is a traffic red-herring – just watch the
pps though!
Industry consolidation
– Fewer people needing more & faster pipes
Technologies
The sky isn’t the ethernet limit
– nx10G seems to be for the time being
– 40G or 100G are some way off (3 years)
• According to most vendors
CWDM prices are falling
Dark fibre is still relatively cheap
May also be new technologies or ideas
on the horizon
What we can do today
Blocked Link
X
Foundry Network Today: 20G across all nodes
…and tomorrow
X
X
X= Blocked Links
Foundry Network Evolution 1: 2x20G rings
…and next week
X
X
X= Blocked Links
Foundry Network Evolution 2: 1x40G, 1x20G
…and next month
Install
Bigger
Switches!
X
X
X= Blocked Links
Foundry Network Evolution 2: 1x40G, 1x20G
Bigger Box: Foundry RX16
Double the density of the MG8
Up to 64 line-rate 10G ports per chassis
– Biggest on the market today
– Keeps traffic inside a single large box
We’ve just finished lab testing
Shorter Term
Bigger switches and fatter Interswitch trunks
can meet most needs
– 10G connections have to be “concentrated”
– But about 50% of a switch could easily be
consumed by backbone connectivity
• With a consequent push to hierarchical model?
Need some protocol enhancements from
vendors
– e.g. EAPSv2 and MRP phase 2
– add multiple ring support
Key Features at LINX
Moving to a “dual ring” topology
– MRP Phase 2 on Foundry
– EAPSv2 on Extreme
Allows different ring sizing
– 40G ring on larger sites
Increases effective ISL bandwidth
– less “transit” flows
Low(ish) dark fibre cost – no WDM here
Foundry Network Plans
FOUNDRY NETWORK
THE
RX16
MRP Ring 1
Master
MRP Ring 1:
Blocked Link
THN
RX16
MRP Ring 2:
Blocked Link
RBX
B8k
MRP Ring 1:
40G Ring
TCX
B8k
MRP Ring 2
Master
10G/1G
MRP Shared Node
THN
B8k
RBS
MG8
100M Agg.
MRP Shared Node
MRP Ring 2: 20G Ring
TCX-TCM-THN-RBS-RBX
TCM
B8k
Extreme Network Plans
EXTREME NETWORK (Interim)
THE
EAPS Ring 1
Master
EAPS Ring 1:
Blocked Link
THN
EAPS Ring 1:
40G Ring
EAPS Ring 2:
Blocked Link
RBX
TCX
EAPS Ring 2
Master
EAPS
Shared Node
RBS
EAPS Ring 2: 20G Ring
TCX-TCM-THN-RBS-RBX
EAPS
Shared Node
TCM
Fibre Network Expansion (1)
FOUNDRY NETWORK
Fibernet Fibre
Telehouse Fibre
Thus Fibre
Unknown Supplier (New) Fibre
New Fibre
Existing Unlit Fibre
Existing Lit Fibre
THE
2x New 9/125
Singlemode fibre
pairs from
Telehouse
2x New 9/125
Singlemode fibre
pairs.
Routed via
Prestons Road
route
THN
4x New 9/125
Singlemode fibre
pairs.
Routed diversely
from Prestons
Road route:
2x Immediate
RBX
TCX
RBS
TCM
Fibre Network Expansion (2)
EXTREME NETWORK
3x New 9/125
Singlemode fibre
pairs from
Telehouse:
1x Immediate
2x By June 2006
Fibernet Fibre
Telehouse Fibre
Thus Fibre
Unknown Supplier (New) Fibre
THE
3x New 9/125
Singlemode fibre
pairs.
Routed via
Prestons Road
route:
1x Immediate
2x June 2006
THN
4x New 9/125
Singlemode fibre
pairs.
Routed diversely
from Prestons
Road route:
June 2006
New Fibre
Existing Unlit Fibre
Existing Lit Fibre
RBS
RBX
TCX
1x New 9/125 Singlemode
fibre ring routing:
RBS-RBX-TCX-TCM-THN
TCM
So, what’s next?
At the last Seattle NANOG, a Force10 person
came and asked:
– “What do you want, 40G or 100G?”
– The answer seemed to be 100G
We can do 40G now:
– Expensively @ OC768
– Cheaply @ 4x10GE
Therefore 40GE is a chocolate kettle
– It’s a waste of devel time (and cash)
Who’s watching the core?
Hey, but can’t we just…
Build fat 8x10G link-agg?
Rate limit/transfer cap users?
Implement QoS?
Throttle p2p apps?
…well, yes, you could.
But it either doesn’t scale, isn’t an
option, or is costly and complex.
It’s easier to overprovide…
“For a number of years, we seriously
explored various “quality of service” schemes,
including having our engineers convene a
Quality of Service Working Group. Our
research came to the conclusion that it was
far more cost effective to simply provide
more bandwidth. With enough bandwidth in
the network, there is no congestion and video
bits do not need preferential treatment.”
- Gary Bachula, VP Internet2
…with the right technology
We already need something faster than 10GE
(and 40GE?).
Some networks already building 8x10GE link
agg bundles on a single span!
Common engineering sense says that your
backbone has to be some multiple larger than
your largest customer connection.
– A LINX member asked about ordering a 2x10G
port last week!
Looking Forward
Ethernet rings can have some problems
– All nodes have to be (roughly) equal
– Multiple rings solves most of this
– Still constrained by max link speed/trunk size
Is the Swedish model - unconnected switches
– a better way?
– Backplane bandwidth is unrestricted/cheap
– Some redundancy/resiliency challenges
How the Swedes do it
Enabled by the fibre situation in Stockholm
– City run fibre utility/monopoly
Therefore fibre is readily available
Two disconnected switches in different
locations
– You get two pairs of fibre when you connect
– One to each switch, in secure underground “cave”
Everything contained in the backplane
Traffic Management
MPLS
– The DIX-IE (Tokyo) is involved in a trial of an
MPLS interconnect – using conventional routing
(ISIS) to route the network and LDP to discover
endpoints – “mplsASSOCIO”
– Downside is potentially complex config
TRILL (nothing to do with Star Trek)
– IETF working group to support “L2 routing”
– “rbridge”: ISIS for Layer 2, using MAC addresses
– Would solve “wasted” redundant bandwidth
What’s going where?
The challenge with a flat L2 network
– Just big broadcast domain(s)
Is it easier to take bulk flows and give a
dedicated channel?
How to identify these flows?
– ISP can do it (Netflow)
– The IXP/MAN can do it (Sflow)
Sflow @ 10G
It’s sampled but still a hell of a lot of
data
– Sample rate @ 1 in 2048 packets
– Gives about 60GB per day
– Need 850G disk to deal with 2 weeks data
– If traffic doubles in the year, need 1.7TB
Actually become constrained by disk I/O
But we’re still deploying it anyway…
Other Scalers
Passive Private Interconnect
– Fibre cross-connects to shed the largest flows
– Cheap (for the IX), easy to implement
– Can run whatever protocol the peers choose
More exchanges
– Could LINX run a third platform?
– More smaller exchanges? What about critical
mass?
“Transmission Only”
– e.g. WDM platforms, stub-sites (no switch)
Move to “Stub” Nodes
Reduce core nodes down to small
number of switches
– Minimise interswitch connectivity
Stub nodes:
– Cheap switch for 100M aggregation
– CWDM terminal for GigE/10G transport
All traffic then hauled to centre
– Pseudo-Swedish with “edges”
“Stub” overview
FOUNDRY
NETW ORKS
FOUNDRY
BigIron MG8
NETW ORKS
BigIron MG8
DWDM
Terminal
FOUNDRY
NETW ORKS
BigIron MG8
DWDM
Terminal
AGG
Switch
100M conns
GigE conns
Pros/Cons of Stubs
Pros
–
–
–
–
Easy to set up
Low commitment required
Relatively cheap per stub
May help break into new and “remote” locations
Cons
–
–
–
–
Less redundancy/resiliency
Finite (size of mux/aggr switch)
Hauls all traffic to core (even local 1G tfc)
Doesn’t fit ring topology of many fibre builds
Hierarchical Model
Core, Aggregation, Edge layers?
– An expansion of “stubs”, really
More interswitch connectivity needed
– Due to meshed topology
Simple ring topology no longer possible
– May work for “core”, with edge “mesh”
Probably more expensive
– More devices, increased management
Wrapping Up
Some vendors are saying that the next
Ethernet standard is 5 years out. Too late!
While edge speed has increased, the core
has stood still
– Don’t edge and core vendors talk to each other?
Massive parallel links and “carving off” traffic
is a tool in dealing with this
– But adds complexity
Seems that keeping things simple remains
key
Where are we now?
Where are we now?
Where are we now?
Where are we now?
Questions?