Transcript routing

Internetworking
BGP
From mort&tim
Internetworking
• So far we have talked about:
– Moving data between hosts
– Moving data within a network
(administrative domain)
• So what is the Internet then, really?
InternetVerizon
BT The
AT&T
2
3
Recall: Routing vs. Forwarding
• Router receives an IP packet: what to do?
– Drop or forward via an interface
• Deciding which interface is forwarding
– IP bases this decision (almost) solely on the
destination IP address
• Building up the information to do so is routing
– Where are all the addresses at the moment?
4
Recall: Longest Prefix Matching
192
168
10
12
1100 0000 . 1010 1000 . 0000 1010 . 0000 1100
/32 – Host
192
168
0
0
1100 0000 . 1010 1000 . 0000 0000 . 0000 0000
/16
192
168
8
0
1100 0000 . 1010 1000 . 0000 1000 . 0000 0000
/21
192
168
10
0
1100 0000 . 1010 1000 . 0000 1010 . 0000 0000
/23
192
168
10
0
1100 0000 . 1010 1000 . 0000 1010 . 0000 0000
/24
192
168
4
0
1100 0000 . 1010 1000 . 0000 0100 . 0000 0000
/24
5
Contents
•
•
•
•
Routing
The Protocol
Decision Process
Operations
6
Contents
• Routing
– Inter-domain Routing
– BGPv4
– Autonomous Systems
• The Protocol
• Decision Process
• Operations
7
Routing Protocols
• Distribute the data to build forwarding tables
• Examples we saw: OSPF, IS-IS, RIP
– Link-state, Distance vector
• These are intra-domain routing protocols
– Or Interior Gateway Protocols
– Source and destination inside the same network
• What happens between networks?
8
Inter-domain Routing
• An important distinction: local vs global
– Interior vs Exterior Gateway Protocol (IGP, EGP)
– Why is this important? Two reasons:
• Dynamics
– Need to scope information propagation (why?)
• Protection
– Need to hide information (why?)
9
Border Gateway Protocol, BGPv4
• The Internet inter-domain routing protocol
– RFC 4271, updating RFC 1771
– Derives originally from GGP, EGP (1982)
– Updated over time (RFCs 1105, 1163, 1267)
• Deals in IP prefixes and Autonomous Systems
– ASs purely administrative
– Purpose is to enable policy to be applied
– Only prefixes matter in the data-plane
10
Autonomous Systems, ASs
AS3
AS1
AS2
• Internet policy domains
– Logical construct only
– No meaning outside BGP
– Do not map simply onto ISPs or networks
AS4
• Currently ~493,000 prefixes, ~46,000 ASs
11
Contents
• Routing
• The Protocol
– Sessions
– Updates
– Path Attributes
• Decision Process
• Operations
12
A Very Simple Protocol
• Exchanges prefixes
Peer A
Peer B
– Uses TCP/179 as transport
– OPEN, UPDATE, KEEPALIVE,
NOTIFICATION
• Sessions between peers
– Simple capability negotiation
– Manage simultaneous OPEN
– Lose everything on session
failure (why?)
UPDATEs
(withdrawn,
attributes,
advertised)
13
Sessions & RIBs
Routing Information Bases
• BGP peer typically has many sessions
– 10? 20? 100s?
• Logically, Adj-RIB-In & -Out for each session
– Advertisements received and to be sent
• Generate Loc-RIB from Adj-RIB-In
– Routes to use and potentially distribute
– Resolved into per-port forwarding tables
• Generate Adj-RIB-Out from Loc-RIB and policy
14
UPDATEs
• Incremental – indicate changes to state
– Withdrawn routes
– Path attributes, common to all advertised routes
– Advertised routes, known as NLRI
• There are ~27 path attributes defined
– Perhaps a dozen or so are in common use
– Communicate information about prefixes
– Used to apply policy in BGP decision process
15
Path Attributes
• Well-known, Mandatory • Well-known,
Discretionary
– Next Hop
– AS Path
– Origin
• Optional, Transitive
– Aggregator
– Community
– Extended Communities
– Local Preference
– Atomic Aggregate
• Optional, Non-transitive
– Multi-Exit Discriminator
– Originator ID
–…
16
An Example UPDATE
[ Thu Apr 1 04:26:25 2010 ]
MRT packet: len: 81, type: PROTOCOL_BGP4MP, subtype: MESSAGE
AS(src): 39202, AS(dst): 12654
ifc idx: 0, AFI: IP
IP(src): 195.66.225.2, IP(dst): 195.66.225.241
Update (len=65): unfeasible_len=0 path_attr_len=26
UNFEASIBLE ROUTES:
PATH ATTRIBUTES:
ORIGIN: IGP [ transitive ]
AS_PATH: (SEQUENCE)[ <- 39202 <- 3491 <- 17639 <- 6163 <- 6163 ] [ transitive
]
NEXT_HOP: 195.66.224.167 [ transitive ]
FEASIBLE ROUTES:
1: 61.9.0.0/24
2: 61.9.1.0/24
3: 61.9.62.0/24
4: 202.47.132.0/24
17
18
Contents
• Routing
• The Protocol
• Decision Process
– Path Vectors
• Operations
19
Path Vectors – AS_PATH
• Distance vector – prefer lowest cost path
– Need to break loops somehow (how?)
• Path Vector
– How do we know if we’ve seen this advert before?
– Store the list of ASs through which it reached us
– The AS_PATH
• Loops can be broken:
– If our ASN appears in a received AS_PATH, drop the
advert
20
Decision Process
• Drop prefix if:
– NEXT_HOP is unreachable via local routing table
– Local AS appears in AS_PATH
• Then (commonly) apply following preference:
1. Higher WEIGHT
(local to this router)
2. Highest LOCAL_PREF
3. Shortest AS_PATH
(leads to AS padding)
4. Lowest ORIGIN
5. Lowest MED
(if from same AS – why?)
6. EGP to IGP
(hot potato)
7. Shortest internal path
8. Prefer oldest route
9. Lowest Router-ID
(usually, highest router IP)
10. Lowest interface IP
address
21
Contents
•
•
•
•
Routing
The Protocol
Decision Process
Operations
– Consistency
– Scaling
– Confederations
– Route Reflectors
22
Consistency
• Learn external routes on EBGP sessions
– EBGP defined as peers having different ASNs
– Must ensure every router knows all external
routes (why?)
• Redistribute external routes inside network
– Via IGP – only in small networks (why?)
– Via IBGP – gives full control over route distribution
• What’s the problem with IBGP?
23
Scaling
• Can’t distribute IBGP routes on IBGP sessions
– Why?
• Have to maintain N.(N-1)/2 IBGP sessions
– Each carrying up to 490k routes x 2 tables
• Two standard solutions
– Route Reflectors:
supernodes, readvertising IBGP routes
– AS Confederations:
split AS up into mini-ASs
– Both tweak decision process somewhat
24
Operations
• Handle link failures
– Bind to loopback
– Flap damping (but can make things worse!)
• Process failures
– Out of memory error due to too many routes
• Hijacking, intentional and unintentional
– “Don’t believe everything you read”
– http://www.youtube.com/watch?v=IzLPKuAOe50
• Anycast (1:1-of-N)
– Advertise same prefix in many places. Carefully.
25
Network Interconnection
• Networks interconnect via EBGP sessions
– POPs, Points-of-Presence; or IXs, Internet eXchanges
• Multi-homing
– This is all logical – what about physical diversity?
• How does this all fit together?
– Public/Private Peering vs Transit
– Roughly hierarchical (though this is changing)
– Tier-1/core/backbone vs the rest
• As ever, business and politics
– E.g., Level3 vs Cogent de-peering
26
Simple Example of a Complex Graph
(Policy – example from Level3)
27
Contents
• Routing
– Inter-domain Routing
– BGPv4
– Autonomous Systems
• The Protocol
– Sessions
– Updates
– Path Attributes
• Decision Process
– Path Vectors
• Operations
–
–
–
–
Consistency
Scaling
Confederations
Route Reflectors
28
Summary
• The Internet is inter-connected networks
– The routing protocols are what hold it together
• BGPv4 is the inter-network routing protocol
– All about application of policy
– To meet business needs
• Simple protocol, can be arbitrarily complex
– Many operational matters make this hard
29
Quiz (1)
1. What information needs to be exchanged between
networks to route packets?
2. What constraints are different between an IGP and an
EGP?
3. Why does BGP add path attributes to prefixes?
4. What is an AS?
5. Why is simultaneous open of BGP sessions an issue,
and how is it resolved?
6. What might happen if the corresponding tables and
routes were not removed on session failure?
30
Load Balancing Example
AS 3
peer
provider
peer
AS 4
provider
customer
customer
AS 2
AS 5
primary link for prefix P2
backup link for prefix P1
primary link for prefix P1
backup link for prefix P2
AS 1
Simple session reset my not work!!
Can’t un-wedge with session resets!
3
4
2
5
3
all up
4
BOTH
P1 & P2
wedged
2
all up
5
3
4
2
5
1
1
1
3
2
1—2 & 1—5
down
P2
wedged
4
5
Note that when bringing
all up we could actually land
the system in any one of the
4 stable states --- depends
on message order….
3
2
1—2 & 1—5
down
P1
wedged
1
1
1—2 up
1—5 up
3
4
2
3
5
1
2
1—2 down
INTENDED
1
4
5
4
3
4
5
2
5
1—5 down
1
Recovery
3
2
4
P2
wedged
5
Temporarily
filter P2 from
1—5 session
Temporarily
filter P1 from
1—2 session
3
2
4
P1
wedged
1
1
1—2 up
1—5 up
3
4
2
3
5
1
2
1—2 down
INTENDED
1
5
4
3
4
5
2
5
1—5 down
1
Who among us could figure this one out?
When 1—2 is in New York and 1—5 is in Tokyo?
Full Wedgie Example
peer
•
peer
AS 3
AS 4
provider
provider
customer
customer
AS 2
peer
provider
•
peer
AS 5
backup links
primary link
customer
customer
AS 1
•
AS 1 implements backup links
by sending AS 2 and AS 3 a
“depref me” communities.
AS 2 implements its community
so that the resulting local pref is
below that of its upstream
providers and it’s peers (AS 3
and AS 5 routes)
AS 5 implements its community
so that the resulting local pref is
below its peers (AS 2) but
above that of its providers (AS
3)
And the Routings are…
AS 3
AS 4
AS 5
AS 2
AS 3
AS 4
AS 5
AS 2
AS 1
AS 1
Intended Routing
Unintended Routing
Resetting 1—2 does not help!!
AS 3
AS 4
AS 5
AS 2
AS 3
AS 4
AS 5
AS 2
AS 1
AS 1
Bring down AS 1-2 session
Bring up AS 1-2 session
Recovery
AS 3
AS 2
AS 4
AS 5
AS 3
AS 2
AS 1
Bring down AS 1-2 session
AND AS 1-5 session
AS 4
AS 5
AS 3
AS 2
AS 4
AS 5
AS 1
AS 1
Bring up AS 1-2 session
AND AS 1-5 session
A lot of “non-local” knowledge is required to arrive at
this recovery strategy!
Try to convince AS 5 and AS 1 that their session has be
reset (or filtered) even though it is not associated with an
active route!
That Can’t happen in MY network!!
NA
EMEA
LA
AP
AU++
An “normal” global global backbone (ISP or Corporate Intranet)
implemented with 5 regional ASes
The Full Wedgie Example, in a new Guise
NA
AP
LA
EMEA
Intended Routing for
some prefixes in AU,
implemented
with communities.
DOES THIS LOOK
FAMILIAR??
AU
Message: Same problems can arise
with “traffic engineering” across
regional networks.
Recommendations
• Be aware of BGP Wedgies
• Preference-impacting Interdomain
communities should be defined with care and
consistently implemented (this may require
translating and transiting communities).
References
• Internet Draft (grow working group):
draft-ietf-grow-bgp-wedgies-03.txt
• Long-term solution?
– Metarouting!
– http://www.acm.org/sigs/sigcomm/sigcomm2005/techp
rog.html#session1
Extras…
42
So, how do you build an IP network?
$1m? $2m? for a new,
populated, backbone router!
1. Buy (lease) routers
2. Buy (lease) fibre
3. Connect them all together
Wayleaves = $$$
Be a landowner!
Correctly.
For now.
4. Configure routers
Mwuhahaha.
5. Configure end-systems
Someone else’s can
of worms.
Multiple Router Flavours
• Core
– OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps)
– Big, fat, fast, expensive
– E.g., Cisco HFR, Juniper T-640
– HFR: 1.2Tbps each, interconnect up to 72 giving
92Tbps, start at $450k
• Transit/Peering-facing
– OC-3 and up, good GigE density
– ACLs, full-on BGP, uRPF, accounting
Multiple Router Flavours
• Customer-facing
– FR/ATM/…
– Feature set as above, plus fancy queues, etc
• Broadband aggregator
– High scalability: sessions, ports, reconnections
– Feature set as above
• Customer-premises (CPE)
– 100Mbps, maybe
– NAT, DHCP, firewall, wireless, VoIP, …
– Low cost, low-end, perhaps just software on a PC
Multiple Router Flavours
Cisco CRS-1
Multi-shelf system
Network Design
• Whose network?
– ISPs, IXs, enterprise, campus
– POPs, DCs
• Many designs:
– Flat
– Hierarchical
– Hybrids
– Multiple scales
Network Design Constraints
• Business
– Backwards compatibility. Who to connect. Peering.
• Technology
–
–
–
–
Power – directly (24x7 operation) and indirectly (cooling)
Port density vs. raw bandwidth
Software reliability
Hardware/software capability
• Addressing schemes for scalability, summarization
• Can’t run feature X with feature Y on vendor C in network size N
• Connectivity/resiliency
– “All core routers connect to at least 2 other core routers”
– “All edge routers connect to at least 2 core routers”
Router OS Configuration
• Initialization
– Name the router, setup boot options, setup
authentication options
• Configure interfaces
– Loopback, Ethernet, fibre, ATM
– Subnet/mask, filters, static routes
– Shutdown (or not), queuing options, full/half
duplex
Router Software Configuration
• Configure routing protocols (OSPF, BGP, &c)
– Process number, addresses to accept routes from,
networks to advertise
– Access lists, filters, ...
• Numeric id, permit/deny, subnet/mask, protocol, port
– Route-maps, matching routes rather than data traffic
• Other configuration aspects: traps, syslog, &c
– (Oh, and switch configuration is about as painful)
Router Configuration Fragments
hostname FOOBAR
!
boot system flash slot0:a-boot-image.bin
boot system flash bootflash:
interface Loopback0
logging buffered 100000 debugging
description router-1.network.corp.com
logging console informational
ip address 10.65.21.43 255.255.255.255
aaa new-model
!
aaa authentication login default tacacs local aaa
interface FastEthernet0/0/0 router ospf 2
authentication login consoleport none
description Link to New York log-adjacency-changes
aaa authentication ppp default if-needed tacacs
ip address 10.65.43.21 255.255.255.128
passive-interface FastEthernet0/0/0
aaa authorization network tacacs !
ip access-group 175 in
passive-interface FastEthernet0/1/0
ip tftp source-interface Loopback0
ip helper-address 10.65.12.34 passive-interface FastEthernet1/0/0
no ip domain-lookup
ip pim sparse-mode
passive-interface FastEthernet1/1/0
ip name-server 10.34.56.78
ip cgmp
passive-interface FastEthernet2/0/0
!
ip dvmrp accept-filter 98 neighbor-list
99
passive-interface
FastEthernet2/1/0
ip multicast-routing
full-duplex
passive-interface
FastEthernet3/0/0
ip dvmrp route-limit 7000
!
access-list
24
remark
Mcast
ACL
network
10.65.23.45
0.0.0.255 area 1.0.0.0
ip cef distributed
interface FastEthernet4/0/0
access-list 24 permit 239.255.255.254
network 10.65.34.56 0.0.0.255 area 1.0.0.0
no ip address
access-list 24 permit 224.0.1.111
network 10.65.43.0 0.0.0.127 area 1.0.0.0
ip access-group
access-list 24 permit 239.192.0.0
0.3.255.255 183 in
ip pim
sparse-mode
access-list 24 permit 232.192.0.0
0.3.255.255
cgmp
access-list 24 permit 224.0.0.0ip0.0.0.255
shutdown ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1 2 eq 0x42
access-list 1011 deny
0000.0000.0000
full-duplex
access-list
1011
permit
0000.0000.0000
tftp-server slot1:some-other-image.bin ffff.ffff.ffff 0000.0000.0000 ffff.ffff.ffff
tacacs-server host 10.65.0.2
tacacs-server key xxxxxxxx
rmon event 1 trap Trap1 description "CPU Utilization>75%" owner config
rmon event 2 trap Trap2 description "CPU Utilization>95%" owner config
Router Configuration
• Lots of large, fragile text files
– 00s/000s routers, 00s/000s lines per config
– Errors are hard to find and have non-obvious results
– Router configuration also editable on-line
– Order matters!
• How to keep track of them all?
– Naming schemes, directory trees, CVS, ssh upload and atomic commit
to router
This counts
– Perhaps even a proper database
as advanced!
• State of the art is pretty basic
– Few tools to check consistency, design goals
– Generally generate configurations from templates and have humanintensive process to control access to running configs