Dan Golding Presentation

Download Report

Transcript Dan Golding Presentation

Crafting Confederations
An overview of the Confederation POP Approach
to Network Architecture
Dan Golding
NetRail, Inc.
[email protected]
Miguel Dimayuga
Earthlink, Inc.
[email protected]
The Old Way…
Conventional Network Routing Architectures….
• Full Mesh iBGP or Route Reflectors
• A fully meshed Network via ATM PVCs.
What’s Wrong With The Old Way?
• It’s not adapted to the New Optical Network!
• POS is here in force, ATM’s value in the core is
receding.
• It is far more fragile, and far less agile than newer
methods of Inter-domain Routing.
• The Old Way was prone to user-error. The ECommerce Revolution demands a New Way!
A Better Way
• Emphasizes Large Scale, IP Based, Fiber Ring
Networks
• Optimized for Service Provider Needs
• Utilizes cutting edge routing technologies to
provide far greater fault tolerance and usable
traffic engineering.
• Implemented via advanced BGP techniques:
Communities and Confederations.
How the Old worked…
(Full Mesh iBGP)
• Every router must be fully
meshed with all others.
• Works well in small
systems
• Grows exponentially
• Eventually consumes all
CPU, memory, and
engineering resources.
Full iBGP Mesh
Exponential growth!
How the Old Way worked…
(Route Reflectors)
• Scaled Well
• Well suited to fully
meshed ATM
Networks – Star
Topology.
but...
• Not Survivable in
a Fiber Ring
Network.
Peer Isolation with
BGP Route Reflection
Peers
RR Client
RR Server
Peers
How the Old Way worked…
(Filtering)
• List of IP Prefixes and/or AS numbers set on all
border routers to other ISPs. Only the access-list
contents would be advertised.
• Worked well when most customers were singlehomed and didn’t run BGP.
• Changes were VERY manpower intensive.
• With multi-homed e-commerce shops, no longer
feasible.
How the New Way works…
(Confederations)
• Routers peer
with neighbors
• Highly
Survivable
• Very Scalable
• Easily
Configured
• Aids
Troubleshooting
BGP Confederations
Routers Peer
with
Neighbors
Peers
Peers
Confederation Overview
• BGP allows three types of peer relationships:
– iBGP (Full iBGP mesh)
– eBGP (External Peering or Transit)
– Confederation eBGP (its an iBGP with an eBGP look!)
• Confederation eBGP is like regular eBGP, except
– Next Hop, Local Preference and MEDs are preserved
– Confederation elements in the AS-PATH are not counted for route
selection purposes
Confederation Overview
• Confederations allow groups of routers to form “subautonomous systems” to eliminate scaling problems with
full mesh iBGP
• All Routers within a sub-AS must be fully meshed (or
optionally in a route reflector cluster configuration)
• Confederations are most advantageous when there are few
routers per sub-AS. There is no reason to limit the number
of sub-AS’s you have – nothing is gained.
Confederation Overview
• Most confederation designs start out with only two
or three sub-ASes. This offers few advantages
over full mesh iBGP in a ring network topology.
• The more sub-ASes you add, the greater the
advantage
• The final result: One sub-AS per POP
• The upper limit on this is 1000 sub-AS’s per RFC
The Advantages of a Confederation of POPs
• The routers within each POP need only peer with
each other, utilizing iBGP
• Neighboring POPs are peered with via POP border
routers speaking confederation eBGP
• Next Hop, Local Pref and MEDs are preserved
• More survivable than Route Reflectors
• Far more scalable than full iBGP mesh
How to Make It Work
•
•
•
•
•
Thoughtful use of sub-AS numbers
Local Preference Hierarchy
Useful and Descriptive Community Strings
Meaningful MEDs
Use of various policies – via access lists,
community lists, etc – as building blocks
• Use of Peer Groups whenever implementation
allows.
Sub-AS Assignment
• Sub-AS’s become useful tools for debugging –
show ip bgp, show route
• Suggested assignment is geographical
• Always remember to keep room for expansion!
• Put plenty of extra sub-AS’s in your configs –
don’t count on adding them later!
Geographical Region as sub-AS
•
•
•
•
•
•
•
•
•
•
Southeast
Northeast
Northcentral
Southcentral
Western
Canadian
Latin/South American
European
Asian
Reserved
65000-65099
65100-65199
65200-65299
65300-65399
65400-65499
65500-65535
64512-64599
64600-64699
64700-64799
64800-64999
Sample Community Assignments
65405
SEA
65200
CHI
msp
4xT1
354
65203
CLE
65406
DEN
65407
OAK
65101
DC
65401
PHX
65102
BOS
65100
NYC
65005
RTP
65400
LAX
65300
DAL
65000
ATL
Community Strings are the Key
• Communities are “tags” or “post-it notes” attached to
routes to help identify them.
– There can be more than one community attached to a route.
• Communities are recommended to be set at the ingress
point.
– Communities need be applied only once
– administrative burden and complexity is greatly reduced.
• When routes egress, filtering can be based on one or more
community strings.
• Sample Communities – Regional, by Peer, Customer,
Internal, Peer, Transit
Communities Set at Ingress
router bgp 4355
network 207.69.0.0/16 route-map make-green
network 199.174.166.0/24 route-map make-red
transit
4.0.0.0/8
5.0.0.0/8
i
i
AS701
207.69.0.0/16
198.99.146.0/24
4.0.0.0/8
5.0.0.0/8
i
i
701 i
701 i
AS4355
router bgp 4355
neighbor a.a.a.a remote-as 701
neighbor a.a.a.a route-map make-blue in
Communities Used to Filter on Egress
transit
4.0.0.0/8
5.0.0.0/8
i
i
AS701
207.69.0.0/16
198.99.146.0/24
4.0.0.0/8
5.0.0.0/8
i
i
701 i
701 i
router bgp 4355
neighbor b.b.b.b remote-as 3703
neighbor b.b.b.b route-map blue-green out
customer
4.0.0.0/8
5.0.0.0/8
207.69.0.0/16
701 4335 i
701 4335 i
4335 i
AS4355
AS3703
Community Categories – Route Type
•
•
•
•
Customer Routes
Private Peering
Transit
Public Peering
• Internal Routes (OPN-visible)
• Internal Routes (Global-visible)
4006:65150
4006:65140
4006:65130
4006:65120
4006:65110
4006:65100
Other Peoples Networks (OPNs)
• To expand our national coverage, Mindspring utilized
third party networks’ dialup facilities. These networks are
what we term as OPNs.
• Prefixes for Core Services which we want restricted to
MindSpring customers and not visible to the rest of the
world (e.g. news, radius, smtp) are announced to our OPNs
alone.
– This has the added advantage of protecting against abuse of our
services by non-customers.
• With communities, we can tag routes for export to OPNs
alone.
Community Categories – Route Ingress Location
• Field Peering
• Exchange Point Peer
4006:65020
4006:65010
•
•
•
•
•
4006:65030
4006:65040
4006:65050
4006:65060
4006:65070
Northeast Region Peering (DC)
Southeast Region Peering (Atlanta)
Northcentral Region Peering (Chicago)
West Peering Region (Palo Alto)
Southcentral Region Peering (Dallas)
Community Categories – Specials
• No Export to any external BGP peer
No-Export
• Do Not Advertise to any peer (Well Known)
No-Advertise
• Always Prefer (proposed Well Known)
Prefer-Me (65535:65519)
• Always Avoid (proposed Well Known)
Avoid-Me (65535:65504)
Community Categories – Origin AS
Also add a community string for the origin AS
If the route comes from UUNet,
then add 4006:701
If the route comes from Sprint,
then add 4006:1239
Local Preference
transit
165.200.1.0/24
peering
1239 3703 i
165.200.1.0/24
AS701
165.200.1.0/24
165.200.1.0/24
165.200.1.0/24
100
90
60
1 3703 i
AS4006
3703 i
4006 3703 i
701 3703 i
router bgp 4355
neighbor a.a.a.a
b.b.b.bremote-as
c.c.c.c
remote-as3703
701
4006
neighbor a.a.a.a
b.b.b.broute-map
c.c.c.c
route-mapsetlocpref100
setlocpref60
setlocpref90in
in
in
customer
165.200.1.0/24
i
AS4355
AS3703
Local Preference Hierarchy
• The higher the Local Preference, the more
desirable the route.
• Customers ALWAYS come first – we never want
to send their traffic to a peer, regardless of ASPath padding
• Private Peering is always more desirable than
Public Peering
• Transit is less desirable than private peering for
economic reasons
Local Preference Hierarchy
•
•
•
•
•
•
•
•
•
•
Always Preferred
Customer Routes
Customer Backup Routes
Private Peering
Less Preferred Private Peering (congested)
Paid Transit
Less Preferred Paid Transit (congested)
Public Peering (ATM NAPs)
Less Preferred Public Peering (FDDI NAPs)
Never Preferred
250
100
90
80
70
60
50
40
30
1
Peer Types
•
•
•
•
•
Local sub-AS Peer (within a POP)
Confederation Peers (other POPs or sub-ASes)
Transit Peers (we buy transit from them)
Public/Private Peering
Customer Peers
Local sub-AS Peers
• All peers within a POP are members of this group.
• The update source for these BGP sessions will be
the loopback address of the router.
• Communities must be recognized.
• Option to use full-mesh or route-reflectors.
For Each Local Sub-AS Peer
neighbor <neigh-ip A>
neighbor <neigh-ip A>
neighbor <neigh-ip A>
neighbor <neigh-ip A>
neighbor <neigh-ip A>
remote-as <neighbor-as A>
description otherlocalroutername
update-source loopback0
send-community
version 4
Update-Source Loopback Address
• The routers will use loopback address as the
source of the bgp packets.
– Only one session needs to be created even with multiple
paths between routers.
• Peering between loopback addresses increase the
stability of the bgp sessions since loopback
addresses don’t go down.
207.69.132.1/24 207.69.132.2/24
192.168.128.1/32
207.69.133.1/24 207.69.133.2/24
192.168.128.2/32
Confederation Peers
•
•
•
All peers that are POP border routers are members of this
group.
The update source for these BGP sessions will be the
facing interface of the router.
Inbound Soft Reconfiguration is not necessary.
– Outbound soft reconfiguration can be done at the remote end
•
•
Communities must be recognized.
Filtering is done on egress, MEDs are set on ingress.
Soft Reconfiguration
• “clear ip bgp” drops the TCP session. Soft
reconfiguration is much friendlier.
• “clear ip bgp <neighbor-ip> soft out” issues
withdrawals for all advertised routes, recomputes
and then resends the routes (low cpu)
• “clear ip bgp <neighbor-ip> soft in” reevaluates
routes received from its peers stored in memory.
(high memory requirements)
Confederation Peer Configuration
Peer-Group
neighbor internal peer-group
neighbor internal version 4
neighbor internal send-community
For Each Peer
neighbor <neigh-ip
neighbor <neigh-ip
neighbor <neigh-ip
neighbor <neigh-ip
neighbor <neigh-ip
A>
A>
A>
A>
A>
remote-as <neighbor-as A>
description remotesitename
route-map <site>-recv-<remotesite> in
route-map <site>-send-<remotesite> out
peer-group internal
route-map <site>-recv-<remotesite> permit 10
set metric +<metric>
route-map <site>-send-<remotesite> permit 10
match community <send-all-except-no-advertise-routes>
Confederation Peer Routes
• Don’t Send: No Advertise
• Send: Customer, Peer, Transit, Internal
Additive MEDs
• Why
– Allows a tiebreaker based on optimum routing
– Allows an alternate method to de-prefer routes in case of
transit/peering congestion
• Possible Values –
– Mileage
– delay in ms
– fixed value per hop
• Supported by – Cisco IOS
– Feature Request in JUNOS, Riverstone, Foundry IronWare
Additive MEDs in Confederations
65012
BHAM
207.69.0.0/16
580
65400
DAL
207.69.0.0/16
207.69.0.0/16
40
700 (65012 65000)
760 (65401 65012 65000)
207.69.0.0/16
207.69.0.0/16
120
(65000)
120
65000
ATL
600
65401
HOU
207.69.0.0/16
0
(originated here)
720 (65012 65000)
740 (65400 65012 65000)
Transit Peers
• The update source for these BGP sessions will be the
facing interface address of the router.
• Soft Reconfiguration should be used.
• Communities must be recognized.
• Send out only customer and internal routes.
• Apply an import ACL to the routes that prevents reception
of martian routes, and assigns proper communities and
local preference.
• Allows prepending certain subsets of routes with
additional AS numbers.
Transit Peer Config
neighbor <neighbor-ip> send-community
neighbor <neighbor-ip> version 4
neighbor <neighbor-ip> next-hop-self
neighbor <neighbor-ip> soft-reconfiguration inbound
neighbor <neighbor-ip> distribute-list martians in
neighbor <neighbor-ip> remote-as <neighbor-as C>
neighbor <neighbor-ip> route-map <site>-recv-<provider> in
neighbor <neighbor-ip> route-map <site>-send-<provider> out
neighbor <neighbor-ip> description transitprovidername
route map <site>-send-<provider> deny 10
match community 4
route-map <site>-recv-<provider> permit 10
set local-preference 60
route map <site>-send-<provider> permit 20
match community 1
set as-path prepend 4006 4006
set metric 0 (if you don’t want to listen to others meds)
Set community 4006:30 additive
Set community 4006:20 additive
Set community 4006:500 additive
Set community 4006:<AS#> additive
Transit Peer Config
• Don’t Send: No Exports, No Advertise
Peers or Transit
• Send: Customers, Internal
Transit Tricks
• De-prefer routes for congested outbound
– Set Local Pref normally for routes with AS-Path Length=1 or 2
– Set Local Pref Lower for all other routes
– Effect: Only most direct routes flow through that connection.
Others flow through other transit, if available
• OPN’s and sending OPN routes
– Send special routes – usually for servers and services – only to
your own network, and OPNs
– Have a special community list or policy specifying the routes.
Private/Public Peers
• The update source for these BGP sessions will be the
facing interface address of the router.
• Soft Reconfiguration should be used.
• Communities must be recognized.
• Send out only customer and internal routes.
• Apply an import ACL to the routes that prevents reception
of martian routes, and assigns proper communities and
local preference.
• Option to use local preference to prefer unconditionally all
or only some routes coming from a free peer.
Peer Configuration
neighbor free-peering peer-group
neighbor free-peering send-community
neighbor free-peering version 4
neighbor free-peering next-hop-self
neighbor free-peering-full soft-reconfiguration inbound
neighbor free-peering-full distribute-list martians in
neighbor free-peering route-map <peername>-in in
neighbor free-peering route-map cust-routes out
route map cust-routes deny 5
match community-list 4
route-map cust-routes permit 10
match community-list 1
route-map <peername>-in permit 10
set local-preference 80
set community 4006:30 additive
set community 4006:20 additive
set community 4006:700 additive
set community 4006:<AS#> additive
Per-Peer
neighbor <neighbor-ip> remote-as <neighbor-as D>
neighbor <neighbor-ip> peer-group free-peering
neighbor <neighbor-ip> description Peer Name
Free Peering Routes
• Don’t Send: No Exports, No Advertise
Peers or Transit
• Send: Customers, Internal
Customer Peers
• The update source for these BGP sessions will be the
facing interface address of the router.
• Soft Reconfiguration should be used.
• Communities must be recognized. This includes
communities sent from customers.
• Send out selected routes, based on customer request.
• Apply an import ACL to the routes that prevents reception
of martian routes, and assign proper communities and local
preference.
• The import filter must also accept only specific customer
routes.
– We recommend using Rtconfig to query RADB and generate the ACLs.
What Type of Routes Can We Send?
• Full Routes
– Customer, Peers, Internals, Transit.
– AKA “A Full View”
• Customer Routes
– Customer and Internal Routes.
– Good for weaker routers (Cisco)
– AKA “A Partial View”
• Default Route
– Send only a default route - 0.0.0.0/0, pointed to the
router interface
– Limited utility
Special Considerations for Customers
• Carefully Filter routes – the farther downstream
you get, the less clueful (generally)
• Filtering can be based on AS or Prefix
• The generally accepted practice is to filter by IP
Access List at ingress (use radb tools if possible)
• Customers do not have to advertise the same
routes everywhere – peers do!
Customer Configuration – Full Routes
bgp {
group <location-customername> {
type external;
description <peer-name>;
peer-as <neighbor AS #>;
neighbor <ip address>;
import <customername>-in;
}
policy-statement atl-myco {
from {
route-filter 209.49.143.0/24 exact accept;
route-filter 199.5.0.0/16 exact accept;
}
then reject
}
policy-options {
policy-statement <customername>-in {
term term1 {
from policy <location-customername>;
then {
local-preference 100;
nexthop self;
community + customer;
community + field
community + ATL;
community + <customername>;
}
}
}
Customer Configuration – Partial Routes
bgp {
group <location-customername> {
type external;
description <peer-name>;
peer-as <neighbor AS #>;
neighbor <ip address>;
import <customername>-in;
export custroutes;
}
}
policy-options {
policy-statement <customername>-in {
term term1 {
from policy <location-customername>;
then {
local-preference 100;
nexthop self;
community + customer;
community + field;
community + ATL;
community + <customername>;
}
}
}
policy-statement atl-myco {
from {
route-filter 209.49.143.0/24 exact accept;
route-filter 199.5.0.0/16 exact accept;
}
then reject
policy-statement custroutes {
term term1 {
from community [no-export no-advertise];
then reject;
}
term term2 {
from community [internal customer custback];
then accept;
}
Default Route Only
• Cisco – neighbor a.b.c.d default-originate
• Juniper - A little more complex...
bgp {
group <location-customername> {
type external;
description <peer-name>;
peer-as <neighbor AS #>;
neighbor <ip address>;
import <customername>-in;
export default-originate;
}
}
routing-options {
static {
route 0.0.0.0/0 {
nexthop <loopback address>;
no-install;
}
}
policy-statement default-originate {
from route-filter 0.0.0.0/0;
then {
nexthop self;
accept;
}
Question and Answer
• Confederations
• General BGP Questions
The New Way gives us…
•
•
•
•
•
•
•
Less complexity
More stability
More flexibility for traffic management
Greater Survivability
Lower Engineering and Administrative costs.
Increased Uptime
A Scalable, Next Generation IP Network
Bibliography
• RFC 1771 A Border Gateway Protocol 4 (BGP-4)
• RFC 1965 Autonomous System Confederations for BGP
• RFC 1930 Guidelines for creation, selection, and
registration of an Autonomous System (AS)
• RFC 1997 BGP Community Attributes
• Nussbacher, Rudnev, and Hares, Global BGP Community
Values, Internet Draft, 12/99
• Halabi, Bassam; Internet Routing Architectures
• Freedman, Avi, Lecture Notes: January 1999 NANOG
Conference Session: “BGP 102”
In Tribute to the Memory of...
• MindSpring Enterprises, Inc.
Very Special Thanks to…
• Brandon Ross, Netrail
• Avi Freedman, Akamai
• Khalid Raza, Cisco