Redundancy for High-Performance Connectivity

Download Report

Transcript Redundancy for High-Performance Connectivity

Redundancy for High-Performance
Connectivity
Dan Magorian
Director of Engineering and Operations
Mid-Atlantic Crossroads
Internet2 Member Meeting
September 21, 2005
What Do We Mean by Redundancy, Anyway?
• Hopefully not what the British mean by redundancy,
which we Americans call “laid off”.
• From the user/customer perspective, it might be:
“Whatever you net geeks need to do to keep my
connection alive when Bad Things Are Happening.
Please don’t bother me with the details.”
• From an Admin and CIO point of view: “All that
expensive stuff you keep asking us to pay for, that
we’re not convinced you really need, but since
redundancy is a sacred cow we can’t argue against.”
That’s fine, but from a Techie Perspective:
• Traditionally most RONs/gigapops and service providers
in the industry have used layer 3 protection: Abilene has
a partial mesh across country, MAX has a ring around our
region, most state nets have meshes, etc.
• Each segment on the ring or mesh terminates in a router.
Usually pick up customers there, and can load balance
using mpls if needed. Not just protection, but making the
most use of expensive resources: why have paths sitting
around unused just for protection? Eg, might be able
postpone 10G upgrade w/ multiple OC48 paths.
• So with dwdm serving these topologies, often more pointto-point drops than express lambda pass-throughs.
What’s Wrong with this Approach?
• Most obvious one, well known to this community, is that
not all applications are well served by best-effort IP.
Some really work better w/ dedicated lambda resources.
• Also, worked fine for years with OC48 networks, but is
proving to be uneconomic with 10G. Routers not getting
cheaper, optical is. Also, most customer circuits are now
ethernet, so much router functionality at edges no longer
needed to pick up customer sonet, atm, ds3s, etc.
• So, MAX production net has decommissioned routers at
most pops, and uses dwdm optical backhaul to fewer
“Big Fat routers” (Juniper T640s) in middle. Still can only
afford to give top-tier customers own lambdas, so use
aggregation switches & L2 protection for custs < gige.
So in the 10G world we’re using L1/L2 protection
• Traditional L3 approach with redundant router ints is just
too expensive, at least Juniper is.
• So “switch routers” are winning: Force10, Cisco 6500, etc.
Problem is, lose functionality with non-carrier class
routers: eg, can you do v6 multicast?
• Bigger Question: Is L2 the right layer to do protection?
• Many “light paths” are being strung together out of hybrid
of L1 and L2: lambdas daisy-chained into 10G switches
feeding vlans over shared paths: not very robust.
• L1 protection can be economic: tradeoff several schemes:
• One transponder laser w/ optical protect switch
• One cpe interface, two tranceiver lasers
Generally, we’ve been promoting “high-9’s”
redundancy to customers
• Two customer routers, connected over two diverse
fiber/lambda paths, to two MAX routers.
• Had this topology for years with Univ System MD, really
worked out well for protection against failure of either side
router or path.
• Working with lot of larger customers, eg NIH, NASA, JHU,
Census, HHMI, etc to move to this topology. Problem is,
costs money and takes time, especially procuring diverse
fiber paths to difficult locations.
• Still doesn’t solve problem of Abilene redundancy.
• MAX has actually had redundant Abilene connection for
years because we run NGIX/E, but to same WASH router.
So gigapops/RONs have been talking
about backing each other up
• Lot of ways to do this, varying costs:
• Private circuits, eg NIH is getting OC3 to Chicago
• RON interconnects via procured fiber connections
• Interconnects using NLR lambdas, eg Atlantic Wave.
• Just announced: Qwest backhaul to another Abilene
node. We’ve talking with Qwest about redundant ISP
port offering to Quilt for minimum cost also
• Still lots of unanswered questions about provisioning for
transit capacity and consolidation. How does MAX pay
for increasing Abilene conn to 10G to handle eg PSC
failover? Could we end up with fewer 10G-connected
gigapops, and how will this affect Abilene NG finances?
Ancient Chinese curse:
“May you live in interesting times”
We’ll see how it works out!
Thanks!
[email protected]