coms4119pres

Download Report

Transcript coms4119pres

Implementing and Maintaining
an ISP Backbone
Kevin Butler
Sprint
Network
Seattle
Tacoma
Click here for a closer look at the
Sprint network in Washington state
Stockton
San Jose
Click here for
a closer look
at the Sprint
network in
Northern
California
DS3
OC3
OC12
OC48
Legend
Cheyenne
Kansas City
Click here for
a closer look
at the Sprint
network on the
East Coast
New York
Pennsauken
Relay
Wash. DC
Chicago
Roachdale
Anahei m
Atlanta
Pearl Ci ty in Hawaii is
a future network location
Fort Worth
Orlando
Tier 1 ISP Backbones
• Comprise some of the world’s largest IP
networks
• Tier 1 companies include Sprint, AT&T,
PSINet
• UUNET has the world’s largest IP data
network (by number of POPs), presence on
five continents (North and South America,
Europe, Asia, Australia)
Service Level Agreements
• SLAs are an important and prestigious tool
in attracting and maintaining customers
• Comprised of uptime guarantees and
bounds on latency through various
geographic regions
• most ISPs currently have latency < 65ms
monthly average between regional hubs in
the US
Current SLA latency times
• Looking at the North American Backbone
over past 24 hours (ICMP tests)
–
–
–
–
–
UUNET: 64.9 ms
SprintLink: 69.3 ms
AT&T: 68.7 ms
Cable & Wireless: 60.8 ms
PSINet: 80 ms
source: http://ratings.miq.net
Supporting the Customer
• Quality and expertise of first-line customer
support varies wildly between companies
• depending on size, geographic location and
company focus, some front-line support
teams outsourced to third parties
• some in-house high level support teams
have skills equivalent or superior to NOCs
Network Operations Centres
• Generally the teams concerned with
backbone maintenance and support
• trend towards consolidation into “SuperNOCs” (eg. one for Americas, one for
Europe)
• specialisation within NOC for product
support (eg. dial, VPN, backbone NOCs)
NOC Tools
• NOCOL - Network Operations Centre On
Line (freeware UNIX)
• Mediahouse monitoring (mainly web)
• Micromuse Netcool - used by WorldCom,
PSINet, BT
Some Circuit Terminology
• DS-1 = 1.544 Mbps, refers to “digital signal”, the actual
physical layer component
• Often used interchangeably with “T1”, referring to the
carrier on the line
• DS-3 (T3) = 44.736 Mbps or 28 DS-1s
• PRI: “primary rate interface”, equivalent to a DS-1
• BRI: basic rate interface, made up of 2 B (bearer) channels
and 1 D channel: B channel is 56/64 kbps (depending on
switching limitations), 23 B + 1 64 kbps D channel make a
PRI (each B channel is a DS-0 circuit)
• Note: 24 DS-0 = 1.536 Mbps – remainder of bandwidth
comes as a synchronizing Frame bit after a byte transferred
from all 24 channels (so this is bit 193)
Optical Carrier
• OC-x rates based on multiplexing SONET streams
• SONET – synchronous optical network: defines a
standard optical TDM system with common
standards and compatibility across continents
(devised at Bellcore) – Europe uses SDH, very
similar to SONET
• OC-3 = 155.54 Mbps, commonly goes up in
multiples of four in North America and Europe
(OC-12 = 622 Mbps, OC48 ~ 2.5 Gbps, OC-192 ~
10 Gbps)
Dial Access
• Dial is a major selling point, especially with
customers who travel a lot or are their own ISPs
• connections made through a dial concentrating
unit eg. Ascend (Lucent) MAX TNT, which can
support up to 720 concurrent callers
• back-end is a DS-3 into a backbone router, routers
advertised by an IGP (eg. RIP)
Dial-Related Technologies
• COBRA (Central Office
Based Remote Access)
allow building of virtual
POPs by backhauling
PRIs
• RADIUS (Remote
Authentication Dial In
User Service) –
authenticates and can
provide some routing and
netblock information
about customer logging in
Integrated Services Digital
Network
• ISDN customers authenticate by RADIUS
similar to dial users
• Most customers use BRI (2 B channels for
128 kbps data rate)
• underlying architecture similar but dial
equipment often administrated differently
• ISDN maintained within same AS as
backbone whereas dial often in its own AS
DS-1 and high-speed access
• Customer connections usually multiplexed,
come into DSU (data service unit) as a
channelised DS-3
• gateway routers on ISP side usually Cisco
7500 series, increasingly using Cisco 12000
• customers connect using Cisco 1604, 2621,
some 3600 series, very large customers use
7500 series routers
Gateway Routers
• obtain routes from
customers usually
statically, but
sometimes by BGP
• usually run link-state
IGP within AS (eg.
OSPF, IS-IS)
• Cisco 7513 backplanes
1.8 Gbps while 12008
does 40 Gbps
Where does traffic go from
here?
• Most ISPs have two levels of networks
above the access router
• Metropolitan networks aggregate gateway
traffic, generally city-wide if multiple points
of presence (POPs) in city
• transit networks aggregate metro network’s
traffic, responsible for inter-city transport
The Big Picture
TR
TA
TA
TR
TRANSIT
XR
XR
HA
DR
HA
GW
METRO
HA
EDGE
DR
HA
GW
POPs and NAPs as real
estate
• Often located in the centre
of cities (Ameritech NAP
in Chicago, right)
• 60 Hudson St, NYC is a
“telco hotel”, large
number of telecoms
companies have
equipment there
• Industrial buildings
(because of high HVAC
use) and often nondescript
(both for cost and security
reasons)
ATM Switches
• Terminate long-haul
OC-12, OC-48 circuits
and metro rings
• Choice of vendor
contingent on ISP,
commonly Newbridge,
Fore Systems (ASX1000 and ASX-4000)
Example of an ATM interface
TR1.EG1:
interface ATM2/0
description To HA13.BLAH1 3C1
atm vc-per-vp 512
atm pvc 16 0 16 ilmi
!
interface ATM2/0.195 point-to-point
description To XR1.BLAH1 ATM6/0
ip address 146.188.200.98 255.255.255.252
ip router isis Net-Backbone
atm pvc 195 0 195 aal5snap
clns router isis Net-Backbone
Tying it all Together
• ATM devices perform switching functions
at layer two level
• Within regional areas, routers use intradomain routing protocols
• To communicate with other regions and
across peering points, an inter-domain
routing protocol is used
Slash Notation
• Subnet masks can be an unwieldy thing to deal
with, eg. 255.255.255.240
• Slash notation simplifies this: the number after the
slash refers to the number of bits to be ANDed to
create the network identifier
– 192.168.1.0 255.255.255.0 = 192.168.1.0/24
• Nifty trick: number of hosts in a netblock easy to
determine with slash notation - # usable hosts in /x
= 2^(32-x) – 2
• Therefore, there are 256 addresses in a /24, 254
usable
Routing Protocols
• Intra-domain (IGPs)
– Distance-vector (RIP, IGRP)
– Link-state (OSPF, IS-IS, EIGRP)
• Inter-domain (EGPs)
– Path-vector: BGP
• Routes by number of hops between autonomous
systems, hence uses a vector comprised of AS
sequence numbers instead of next IP address
Autonomous Systems
• An autonomous system (AS) is a group of routers
with a single routing policy, running under a
single administration
• Different ISPs, and large companies, can have
their own AS number
• Where to get a number? In North America, ARIN
(American Registry for Internet Numbers), in
Europe, RIPE (Réseaux IP Européens), in Asia
APNIC (Asia-Pacific Network Information
Centre) – also the places for getting IP addresses
Implementation of BGP
• BGP runs between autonomous systems and peers,
as well as multi-homed customers
• monolithic AS broken up into BGP confederations
for ease of work
• Why BGP? Policies can be defined and routes
controlled to a highly customisable degree using
access lists and route maps – one can choose what
routes to distribute to which neighbours
• BGP can run inside an AS – internal (IBGP)
carries transit traffic through the AS (like an
Interstate through a county)
BGP
Communities are destinations that share common
attributes (eg. through access-list filters)
BGP table version is 23718690, local router ID is 205.150.242.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
Network
*>i24.64.0.0/19
*>i24.64.0.0/14
*>i24.64.32.0/19
*>i24.64.64.0/19
*>i24.64.96.0/19
*>i24.64.192.0/19
*>i24.64.224.0/19
*>i24.65.0.0/19
*>i24.65.96.0/19
*>i24.65.128.0/19
Next Hop
198.133.49.7
198.133.49.7
198.133.49.7
198.133.49.7
198.133.49.7
198.133.49.7
198.133.49.7
198.133.49.7
198.133.49.7
198.133.49.7
Metric LocPrf Weight Path
100
0 6327 6172
100
0 6327 i
100
0 6327 6172
100
0 6327 6172
100
0 6327 6172
100
0 6327 6172
100
0 6327 6172
100
0 6327 6172
100
0 6327 6172
100
0 6327 6172
i
i
i
i
i
i
i
i
i
Advantages of BGP for User
• Allows for load-sharing and redundancy
• routes can be biased through AS path prepending
(adding the same AS number to a route multiple
times to make it a less favourable route to take)
• requirement is high-quality router with close to
100% uptime to avoid connection flaps and
subsequent route dampening (BGP gets annoyed if
connections go up and down frequently and will
penalise the offending network)
Common Customer Issues
• Static routes on backbone - often difficult to spot,
can cause very strange routing results (very
conducive to routing loops)
• pull-up routes for netblocks smaller than /24,
required to avoid BGP dampening (smaller
customers tend to reset their equipment more
often)
• BGP recalculations - if done on a transit router,
entire backbone segments can experience outages
(tables are huge, currently over 103,000 prefixes
in table)
Customer Requirements of
the Backbone
• Redundancy - networks are redundant but
card failures can take down whole routers
• physical connection to POP from customer
is SPF
• low latency - massive increases in demand
on backbone makes this difficult
• over $2 million a day spent on global
backbone upgrades
DSL: low cost, high speed
• DSL might phase out ISDN connections
• difficult to troubleshoot from network standpoint
• connections pass through telco’s frame or ATM
cloud between DSLAM (DSL access multiplexor
– separates voice and data traffic by frequency)
and VR
• RedBack SMS (Subscriber Management System)
1000 commonly used as VR, though currently the
SMS 10000 is the largest “carrier-class” routing
switch, can take in 24 OC-12s)
RedBack SMS 1000
• Supports up to 4000 sessions
• OC-3 out to metro network
• traffic-shaping accomplished with profiles
atm profile samplecust
counters
shaping vbr-nrt pcr 1000 cdvt 100 scr 100 bt 10
Increasing Capacity
• Backbone capacity increasing at a huge rate
• Traffic engineering combined with high
backplane becoming increasingly important
• many ISPs turning to Juniper routers
• UUNET rolled out production OC-192c
with Juniper M160 running MPLS
Juniper Routers
• Specialises in huge routers
(M160 backplanes 160
Gbps)
• JUNOS supports MPLS
and RSVP
isis {
interface all;
}
ospf {
area 0.0.0.0 {
interface so-0/0/0 {
metric 15;
retransmit-interval 10;
hello-interval 5;
}
}
}
[edit]
Network Abuse
• Spam-killing – looking at SMTP header for IP
address, null-routing it
• Open relay detection – ORBS et. al.
• DDoS attacks can be very detrimental to backbone
(even causing switch crashes)
• Combated by rate-limiting ICMP on routers
• Most effective defense is community-wide egress
filtering; requires co-operation throughout the
Internet
Network Challenges eg.
Canada
• Geographically, population resides in
virtually a straight line across the south
• major focus is on southbound capacity to
the US
• CRTC regulations on telcos create different
arrangements
• heterogeneous network to the US,
integration a big issue
Costs
• Network equipment not cheap: a Cisco GSR can cost
upwards of a quarter million dollars
• Fibre and transceivers can be expensive to lay ($100K/mile
near rail, over $300K/mile in the city)
• Interesting note: Sprint grew its all fibre network quickly
because it was laid on railway right-of-way (the SPR in
Sprint initially stood for Southern Pacific Railway)
• Costs for backbone access? Currently ~ $1300 CDN +
local loop cost for burstable 128k T1, up to ~ $50 K CDN
for a full T3, much more for OC3+ (USD costs similar)
Questions?
• Anything I can clarify or expand on...
• Thank you!