Routing tables
Download
Report
Transcript Routing tables
DCR Project
Internet Routing System (Model)
Material extracted from tutorial on BGP routing course
presented in training seminar (ECODE FP7 Project)
Dimitri Papadimitriou and Olivier Bonaventure
Alcatel-Lucent BELL - Universite catholique de Louvain (UCL)
Outline
Internet Routing
•
•
•
Internet Routing and Protocols
Organization of the Internet
Intra-domain routing
–
–
–
•
Static routing
Distance vector routing
Link-state routing
Inter-domain routing
–
Path vector routing
BGP basics
BGP in large networks
TCP/IP Model
Interface between
application and OS
(Sockets interface)
Application (FTP, Telnet, WWW, email)
Unreliable
datagram
delivery
User datagram protocol
(UDP)
Transmission control
protocol (TCP)
Reliable
byte stream
delivery
Network Layer: Internet Protocol (IP)
Best effort datagram
delivery
(host-host)
Data Link Layer: Ethernet
Physical interface (PHY)
Physical
connection
Internetworking - Layer 3: Routers
Telnet, FTP,
HTTP, email
application
application
transport
transport
network
network
network
Ethernet
data link
data link
data link
100BaseTX
GbE PHY
physical
physical
physical
Host on
network A
Router
(forwards IP datagrams)
Host on
network B
IP
Internet Routing
Internet domains comprises devices called routers comprising a
routing and a forwarding engine (and a management agent)
Routing engine:
•
•
•
Process routing information (exchanged between routers using a routing
protocols such as BGP) so as to compute routes (using a shortest path
algorithms)
Routes entries (composed by a destination, a next-hop interface, and a
metric) are stored in routing information bases (RIB)
Routing entries are subsequently used by the forwarding engine
Forwarding engine:
•
Transfer incoming IP datagram to an outgoing interface directed towards
a router closer (next-hop) to the traffic destination by performing a
longest match prefix lookup on forwarding entries stored in forwarding
information base (FIB) using the incoming IP datagram destination
address
What is Routing?
Routing:
•
•
•
Components: routing information exchange, algorithm, routing
information base (RIB)
RIB entries <dest.prefix,next-hop,cost>
Routing algorithm output used to build RIB entries
Router
Routing
Routing engine
Router
Routing
Protocol
Routing Table
Routing engine
Router
Routing
Protocol
Routing engine
Routing Table
Routing Table
FT
FT
Fwd’ing
The "best" paths selected from routing
table built by the routing protocols are
installed in the forwarding table (FT)
FT
Forwarding engine
I/f
Forwarding engine
Forwarding engine
Forwarding:
•
•
•
Components: processing, lookup, forwarding information base (FIB)
Directing incoming datagram to an outgoing interface
FIB entries <dest.prefix,next-hop,interface>
Architecture of a normal IP router
Routing
protocol
Routing table
The "best" paths selected from the routing table
built by the routing protocols are installed in the
forwarding table
Shap.
IP packets
Forwarding
Table
Control
IP packets
Class.
Pol
Forwarding
Shap.
Class.
Pol
Forwarding decision based on longest prefix match
Update of TTL and checksum fields in IP packets
IP packets
Simplified router model
Router:
Routing
information
Simulator
Data structures:
Router
Routing engine
Routing info
processing
Routing
Processing:
•
The "best" paths selected from routing table
built by the routing protocol are installed in
the forwarding table (FT)
I/f
IP datagram
FT
IP datagram
Forwarding engine
Routing table
information
RIB
IP datagram
•
I/f
Forwarding decision based on longest prefix match
Update of TTL and checksum fields in IP packets
RIB (or RT) = Routing Information Base
FT = Forwarding table
Note: Forwarding Information Base (that
populates FT) is derived from RIB
Routing information
processing:
–
–
–
Route selection
Route computation
Route dissemination
-- No simulation of the
forwarding table
Routing Information Base (RIB)
Repository storing in which all IP Routing
protocols place all of their routes (routing entries)
RIB is not specific to any routing protocol, rather,
it is the repository where all the routing protocols
place all of their routes
•
•
Routes are inserted into the RIB whenever a routing
protocol learns/computes a new route
When a destination becomes unreachable, the route
is first marked unusable and later removed from the
RIB as per the specifications of the routing protocol
they were learned from
Note:
•
•
RIB is NOT used for forwarding IP datagrams
RIB is also referred to as Routing Table (RT)
Routing vs Forwarding (RIB vs FIB)
Routing Information Base (RIB): table that contains
all destinations to which the router may forward IP
datagrams
•
•
RIB entries can be used to populate the FIB based on
some selection criteria. These entries can also be used as
a source to re-advertise routing information
RIB may contain multiple paths to the same destination
(possibly with the same or different degree of preference
and possibly advertised from multiple sources)
Forwarding Information Base (FIB): table containing
the information necessary to forward IP datagrams
•
•
•
Describes a database indexing network prefixes versus
router interface identifiers
At minimum, contains the interface identifier and next hop
information for each reachable destination network prefix
Note: contains unique paths only (i.e. does not contain
secondary paths)
Dynamic Routing Operations
The basic functions of dynamic routing are:
•
•
•
Processing of routing information: route computation
(algorithm) resulting in routing table entries)
Maintenance of the RIB
Distribution of routing information (routes, network
topology state, etc.)
Routing functions implemented by routing engine
Internet Routing Protocols
Interior Gateway Protocol (IGP)
•
•
Routing of IP datagrams inside each domain
Only knows topology of its own domain (all routers within
given AS managed by a single admin unit)
Domain4
Domain2
Domain1
Domain3
Exterior Gateway Protocol (EGP)
•
•
Routing of IP packets between domains
Each domain is considered as a blackbox
Inter vs Intra-domain Routing Protocols
IGP: Intra-domain routing (within AS)
•
•
•
Allow routers to transmit IP packets towards their destination along the best path
= shortest-path (metrics: #hops, link cost)
IGP routing protocols: distance vector RIPv2 or link state OSPFv2 (RFC2328)
and IS-IS (RFC1195)
All routers exchange routing information: each domain router can obtain routing
information for the whole domain (all routers within given AS are managed by a
single administrative unit)
eBGP
eBGP
IGP
eBGP
eBGP
iBGP
eBGP
eBGP
EGP: Inter-domain routing (between AS)
•
•
•
Routing policies based on business relationships
No common metrics, and limited cooperation
Policy-based, path-vector routing protocol: external/internal Border Gateway
Protocol (eBGP/iBGP)
Inter vs Intra-domain Routing Protocols
AS 16
Step 1: AS as abstract
nodes (eBGP only)
eBGP
Abstract node router
eBGP
eBGP
AS 15
iBGP
Step 2: AS with internal
structure (iBGP)
AS 76
AS 16
eBGP
AS 15
iBGP
eBGP
eBGP
iBGP
AS 76
Outline
Internet Routing
•
•
Internet Routing and Protocols
Organization of the Internet
–
–
Topology
Types of domains
–
–
–
•
Example of domain
Intra-domain routing
–
–
–
•
Transit domain
Stub domain
Static routing
Distance vector routing
Link-state routing
Inter-domain routing
–
Path vector routing
BGP basics
BGP in large networks
Organization of the Internet
Internet: infrastructure composed by an
interconnected set of (heterogeneous) networks
architected around a distributed routing system that
is partitioned into independently administrated
domains (autonomous systems)
A domain is a set of routers, links, hosts and local
area networks under the same administrative control
•
A domain can be very large...
–
•
A domain can be very small...
–
AS568: SUMNET-AS DISO-UNRRA contains 73154560 IP
addresses
AS2111: IST-ATRIUM TE Experiment a single PC running Linux...
Internet is composed of ~ 30.000 autonomous
systems (AS)
Organization of the Internet
Domains
•
•
are interconnected in various ways
The interconnection of all domains should in theory allow
packets to be sent anywhere
Usually IP datagram will need to cross a few ASes (3 to 4,
average 3.4) to reach its destination
Evolution of the Internet Topology (1)
1986: NSF builds NSFNet as backbone, links 6 supercomputer centers,
56 kbps; huge increase of connections, especially from universities
1987: 10,000 hosts - 1989: 100,000 hosts - 1992: 1 million hosts
1988: NSFNet backbone upgrades to 1.5Mbps
1991: NSF lifts restrictions on the commercial use of the Net;
1994: NSF reverts back to research network (vBNS); the backbone of the
Internet consists of multiple private backbones
Before ‘95: Strict hierarchical network with a single central
backbone
NSFNet Backbone
Regional
Campus
Campus
Regional
Campus
Regional
Campus
Evolution of the Internet Topology (2)
Between 1995-1999: increased meshedness between ISP
backbones and customers
Decentralization: from a single backbone network to a
conglomeration of 100s of backbone and 1000s ISP
Loss of hierarchy and abstraction: from hierarchical network
to increasingly meshed interconnection
Significant bandwidth increase: from T3 (45MB) and T1 (1MB)
to OC48 (2.5GB) and OC12 (622MB) link capacity
AS1
AS2
R2
R1
AS4
AS3
R3
R4
Evolution of the Internet Topology (3)
Can be viewed as structured into tiers
•
Tier-1 ISPs a.k.a backbone providers
–
–
–
•
Tier-2 ISPs
–
–
–
–
•
Dozen (12 to 20 AS) of large international or large national ISPs
interconnected by multiple private peering points (shared cost)
Provide transit service (no “upstream” provider)
Examples: AT&T, Verizon, Sprint, Level 3, etc.
Regional or National ISPs (order 1k AS)
Customer of T1 ISP(s) - at least 1 and often 2 - and Provider of T3 ISP(s)
Shared-cost with other T2 ISPs
Examples: France Telecom, BT, Belgacom
Tier-3 ISPs a.k.a stub AS
–
–
–
Smaller ISPs, Corporate Networks, Content providers (order 10k AS)
Customers of T2 or T1 ISPs (no transit service to other ISPs)
Shared-cost with other T3 ISPs
Interconnections
•
•
An ISP runs (private) Points of Presence (PoP) where its
customers and other ISPs connect to it
ISPs also connect at (public) Network Access Point (NAP)
called public peering
Tier-1 ISP
“Tier-1” ISPs (a.k.a. backbone providers e.g., AT&T, Verizon,
Sprint, Level 3, Qwest): national/ international coverage treating
each other as equals (peers)
Tier-1 providers
interconnect privately =
multiple private peering
Tier-1 providers also
interconnect at public
network access points
(NAPs) = public peering
Tier 1 ISP
NAP
Tier 1 ISP
Tier 1 ISP
Tier-2 ISP
“Tier-2” ISPs (often regional-national): ISPs that connect to one
or more Tier-1 ISPs, possibly other Tier-2 ISPs
Tier-2 ISP
Tier-2 ISP pays Tier-1
ISP for connectivity to
rest of Internet
Tier-2 ISP is customer
of Tier-1 ISP
Tier-2 ISP
Tier-2 ISPs also peer
privately with each
other, and publicly
interconnect at NAP
Tier 1 ISP
NAP
Tier 1 ISP
Tier 1 ISP
Tier-2 ISP
Tier-2 ISP
PoP
Tier-2 ISP
Tier-3 ISP
“Tier-3” ISPs: last hop (“access”) network (closest to end
systems)
Tier-3 ISP
Tier-3 ISP
Tier-2 ISP
Tier- 3 ISPs are
customers of
higher tier ISPs
connecting them
to rest of Internet
Tier-3 ISP
Tier-3 ISP
Tier-3 ISP
Tier-2 ISP
Tier 1 ISP
NAP
Tier 1 ISP
Tier 1 ISP
Tier-2 ISP
Tier-2 ISP
Tier-2 ISP
PoP
Tier-3 ISP
Tier-3 ISP
Tier-3 ISP
Tier-3 ISP
Organization of the Internet
Tier-1 ISPs
–
–
Dozen of large ISPs
interconnected by shared-cost
Provide transit service
–
Uunet, Level3, Sprint, ...
Tier-2 ISPs
–
–
–
–
Regional or National ISPs
Customer of T1 ISP(s)
Provider of T2 ISP(s)
Shared-cost with other T2 ISPs
–
France Telecom, BT, Belgacom
Tier-3 ISPs
–
–
–
Smaller ISPs, Corporate
Networks, Content providers
Customers of T2 or T1 ISPs
Shared-cost with other T3 ISPs
AS Ranking
Proposing two ranking methods:
•
•
Degree-based: ASes are ranked by their degrees
in the AS topology graph: http://as-rank.caida.org/
AS-relationship-based: ASes are ranked by their
customer cone sizes
See http://as-rank.caida.org/data
RouteViews BGP AS links annotated with inferred relationships
Dataset date: 20080818
Alpha parameter of inference algorithm: 0.01000
Format: <AS1> <AS2> <relationship>
where
<AS1> and <AS2> are AS numbers, and
<relationship> is -1 if AS1 is a customer of AS2,
0 if AS1 and AS2 are peers,
1 if AS1 is a provider of AS2, and
2 if AS1 and AS2 are siblings (the
same organization)
Summary
Based on AS connectivity and relationships, the
Internet routing infrastructure can be viewed as a
three tier hierarchy
•
•
•
Core: consisting of a dozen or so Tier-1 providers forming
the top level of the hierarchy
Middle: consisting of few thousands of ASes (Tier-2
providers) that provide transit service but are not part of
the core
Edge: 10 thousands of stub ASes that do not provide
transit service. Usually, local ISP, ASP and CSP
Outline
Internet Routing
•
•
Internet Routing and Protocols
Organization of the Internet
–
–
Topology
Types of domains
–
–
–
•
Example of domain
Intra-domain routing
–
–
–
•
Transit domain
Stub domain
Static routing
Distance vector routing
Link-state routing
Inter-domain routing
–
Path vector routing
BGP basics
BGP in large networks
Types of domains
The Internet consists of routing domains:
Autonomous Systems (AS) interconnected with
each other:
•
•
Transit domain: provider, hooking many AS together
Stub domain: smaller corporation/domain:
–
–
At least one and usually two connections to other domain
No transit service to other domains
Two-level routing:
•
•
Intra-domain: administrator responsible for choice of
routing protocol within network (usually link-state routing
protocol)
Inter-domain: standard for interdomain routing: BGP
Types of domains (1)
Transit domain
•
A transit domain allows external domains to use its
own infrastructure to send packets to other domains
S1
S2
T2
T1
T3
S4
S3
Examples
•
UUNet, OpenTransit, GEANT, Internet2, RENATER,
EQUANT, BT, Telia, Level3,...
Types of domains (2)
Stub domains
•
•
A stub domain does not allow external domains to
use its infrastructure to send packets to other domains
A stub is connected to at least one transit domain
–
–
Single-homed stub : connected to one transit domain
Dual-homed stub : connected to two transit domains
S1
S2
•
T3
S4
S3
Content stub domain (Content Service Provider)
–
•
T2
T1
Large web servers : Yahoo, Google, MSN, TF1, BBC,...
Access-rich stub domain (Access Service Provider)
–
ISPs providing Internet access via CATV, ADSL, ...
Multihomed domains
Definition: use of redundant network links/connections
to the same or different domain for the purposes of
external connectivity
Objective:
•
•
•
Robustness in case of failure (link, upstream domain)
Performance (load balancing)
Cost
Multi-homed stub AS: connectivity to multiple
immediate upstream transit domains
T2
T1
T3
Multi-homed transit AS
S3
A transit domain : Easynet
A transit domain : GEANT
A transit domain : BT/IGnite
A large transit domain : UUNet
Composition of Internet paths
Most Internet paths contain a sequence of
•
•
•
0 or more Customer->Provider relationships
0 or 1 Peer-to-Peer relationships
0 or more Provider->Customer relationships
AS1
AS2
$
$
$
Shared-cost (peering)
$
$
AS9
$
AS4
AS3
AS8
$
$
AS7
Customer-provider
Outline
Internet Routing
•
•
•
Internet Routing and Protocols
Organization of the Internet
Intra-domain routing
–
–
–
•
Static routing
Distance vector routing
Link-state routing
Inter-domain routing
–
Path vector routing
BGP basics
BGP in large networks
Intradomain Routing
Goal
•
Allow routers to transmit IP datagrams along the
best path towards their destination
–
Best usually means the shortest path
–
–
•
Shortest measured in sum of link costs or as number of hops
along the path
Sometimes best means the less loaded path
Allow to find alternate routes in case of failures
Behavior
•
All routers exchange routing information
–
–
Each domain router can obtain routing information for
the whole domain
The network operator or the routing protocol selects
the cost of each link
Intradomain Routing Protocols
Static routing
•
Distance vector routing
•
•
Only useful in very small domains
Routing Information Protocol (RIPv2 - RFC2453)
Still used in small domains despite its limitations
Link-state routing
•
•
Open Shortest Path First (OSPF - RFC2328)
Developed to address the needs of large, scalable
internetworks that RIP could not.
Widely used in enterprise networks and ISPs
Integrated Intermediate System-Intermediate System (ISIS - RFC 1195) or Dual IS-IS
Extended version of IS-IS (RFC 1142) that supports
multiple routed protocols including IP
Widely used by ISPs
Distance Vector Routing: Bellman-Ford
Distance
Vector routing algorithm (a.k.a. Bellman-Ford
algorithm)
•
Each router maintains a vector with an entry for every
destination that contains
–
–
•
Each router, periodically sends its vector to his direct
neighbors containing, for each known prefix:
–
–
•
Distance = cost to reach the destination from this router
Direction = direct link that is on that least cost path
the IP address prefix
the distance between itself and the destination
Upon receiving a vector, a router updates the local vector
based on the direct link’s cost and the received vector
Consequently
•
•
Distant Vector (DV) routing algorithm do not allow a router to
know the exact topology of an internetwork
Routers discover the best path to destinations based on
accumulated metrics from each neighbor
Distance Vector Routing: Bellman-Ford
Define distances at each node x
dx(y) = cost of least-cost path from x to y
•
Update distances based on neighbors (each node
only has a “next-hop-view”)
dx(y) = min {c(x,v) + dv(y)} over all neighbors v
•
v
3
u
y
2
1
1
x
2
Every node sends its vector to its
directly connected neighbors
4
z
1
5
w
t
3
4
s
du(z) = min{c(u,v) + dv(z), c(u,w) + dw(z)}
Upon receiving a vector, a router
updates the local vector based on the
direct link’s cost and the received
vector
After a few iterations, the routing table
converges to a consistent state
Distance Vector Routing: Count-to-Infinity
Routing loops can occur when routers propagate
reverse route (= route pointing back to the router from
which packets were received)
“Count-to-infinity” problem (loop between three or
more routers = cycle in network topology graph)
B
A-D link failure
1
A
1
D
A tells B and C that D is unreachable
1
1
C
B tells A that D is reachable via C with cost=3 (since
route is through C, split horizon doesn’t apply)
A tells C that D is reachable through A (cost=4)
Etc…
Rule of thumb: with distance vectors good news travel
quickly and bad news travel slowly (resulting in slow
convergence)
Distance Vector Routing: Count-to-Infinity
How
•
•
to avoid the count-to-infinity problem ?
Define “infinity” as a max.distance (16 is often used in
distance vector routing protocols) to prevent routing updates
from looping endlessly
Avoiding (two-hop) routing loops using Split horizon:
technique for preventing reverse routes between two routers
–
Simple Split Horizon: when sending updates out an interface, do not
send networks that were learned from an update that came in on the
same interface
Rule: do no send back vector information in the direction where it
came
–
Split Horizon with Poisoned Reverse: when sending updates out a
particular interface, mark any networks that were learned from an
update that received on the interface as unreachable (considered
safer and stronger)
Rule: do send back vector information with distance set to infinity
– Note:
Split horizon does not solve “count-to-infinity”
problem for loops (cycle) between three or more routers
Link State Routing
Assumptions: each router can discover the state of the link to
its neighbors and the cost of each link
Basic principle: If every router knows how to reach its directly
connected neighbors, one can distribute the local knowledge
of each router to all other routers so that every router can
construct a weighted graph. With this knowledge, a node can
always determine the shortest path to any other router
A
N1
R1
Process to update
this routing table
R1
Topology change in
Link State Update
R2
R3
Process to update
this routing table
N2
B
R2
R3
Routing table
Routing table
Routing table
Topological
Database
Topological
Database
Topological
Database
LSDB
SPF
LSDB
SPF
LSDB
SPF
Process to update
this routing table
Each router has its own topological database on
which the SPF algorithm is run
SPF Tree
SPF Tree
SPF Tree
Terminology (1)
Link: interface on a router
•
•
An interface has state information associated with it, obtained from the
underlying lower level protocols and the routing protocol itself
– State: the functional level of an interface that determines whether or
not full adjacencies are allowed to form over the interface
Link state (LS): description of router interface (= link)
•
•
•
Interface: connection between a router and one of its attached networks
(an interface is referred to as a link)
A single IP interface address and interface mask (unless the network is
an unnumbered point-to-point network)
Output cost(s): cost of sending data packet on the interface, expressed
in the link state metric (advertised as the interface link cost). The cost of
an interface must be greater than zero
List of neighboring routers: other routers attached through this link
Flooding: mechanism used to reliably distribute local topology
description so as keep routers Link State Database (LSDB) up
to date
Terminology (2)
Link State PDU
•
•
•
Unit of data describing the local state of router's interfaces and adjacencies
Each LS PDU is flooded throughout the “routing domain”
The collected LS advertisements of all routers and networks forms the link
state database
Link State Database (LSDB)
•
•
•
Logically separated LSDB for each area the router is connected to
Two routers interfacing the same area must have, for that area, identical
LSDBs
LSDB collection of all LS PDUs (re-)originated from the area's routers
–
–
Each router advertises directly connected networks via LS PDU
Every router has it’s own view of the network – it builds a “topologic database”
14.2.2.0/24
32.2.9.0/24
Metric = 1
R1
R3
Metric = 1
Metric = 1
Metric = 1
R2
32.2.5.0/24
Metric = 1
R4
Router
Destination
Next Hop
Cost
R1
32.2.5.0/24
R2
1
R1
32.2.9.0/24
R3
1
R1
32.2.3.0/24
R2
2
R1
32.2.3.0/24
R2
2
R2
14.2.2.0/24
R1
1
R2
32.2.9.0/24
R3
1
R2
32.2.3.0/24
R4
1
32.2.3.0/24
Router 1 is aware of 2 paths to 32.2.3.0/24 – this provides redundancy should one of the routers fail
Terminology (3)
Shortest Path computation: performed on the link state database
in order to produce a router's routing table
•
•
Shortest Path Tree (SPT)
•
•
Each router has an identical LSDB, leading to an identical representation
of the network topology graph
Each router generates its routing table from this graph by computing a
tree of shortest paths with the local router as root of this tree
Derived from the collected LS PDUs using the Dijkstra algorithm (Shortest
Path algorithm)
Shortest-path tree with local router as root, gives the shortest path to any
IP dest. network or host (only the next hop to the destination is used in the
forwarding process)
Routing table (a.k.a. Routing Information Base or RIB)
•
•
Derived from the LSDB using the SPF algorithm
Each entry of this table is indexed by a destination, and contains the
destination's cost and a set of paths (described by its type and next hop)
to use in forwarding packets to the destination
Link State Protocol and SPF Algorithm
R2
Link State PDU
R1
R4
R3
Routing table
Topological
Database
A
N1
R1
B
R2
R3
Routing table
Routing table
Routing table
Topological
Database
Topological
Database
Topological
Database
LSDB
Route
SPF
LSDB
N2
LSDB
SPF
LSDB
SPF
SPF
Algorithm
SP Tree
SP Tree
SP Tree
SP Tree
Link-state protocols
•
•
•
Flood reliably routing information to all routers of the same routing domain
Each router knows the topology of the entire routing domain
Converge more quickly than distance vector protocols (link-state protocols
are less prone to routing loops)
Link State Concept (1)
1. Flooding of link-state information
•
•
•
•
Link-state PDUs (LS
PDUs): PDU carrying
routing information
exchanged between routers
R1
Topological database:
2. Building a
collection of information
topological DB
gathered from LS PDUs
Shortest Path First (SPF) Topological
Database Entries
algorithm: performed on
the database resulting in
the SPF tree
Routing tables: A
LSDB
repository of the known
paths and interfaces
R2
Link State PDU
Routing
Table
R3
4. SP Tree
3. SPF
Algorithm
•
Routers send LS PDUs to their adjacent neighbors
•
LS PDUs are used to build a topological database
•
•
5. Routing Table
R4
SPF algorithm computes the shortest path first tree in which the root
is the individual router and then a routing table is created
Resulting routing entries populates the routing table
Route
Link State Concept (2)
1. Flooding of linkstate information
R2
R1
Link State PDU
5. Routing Table
R4
2. Building a topological DB
Routing
Table
R3
Topological
Database Entries
4. SP Tree
Route
3. SPF
LSDB
Algorithm
1. Flooding of link-state information
•
•
Each node, router, on the network announces to other all other routers on
the network its own piece of link-state information (including who their
neighboring routers are and the cost of the link between them)
– Example: “Hi, I’m Router1, and I can reach Router2 via a T1 link and I
can reach Router3 via an Ethernet link.”
Each router sends these announcements to all of the routers in the network
Link State Concept (3)
1. Flooding of linkstate information
R2
R1
Link State PDU
5. Routing Table
R4
2. Building a topological DB
Routing
Table
R3
Topological
Database Entries
4. SP Tree
Route
3. SPF
LSDB
Algorithm
2. Building a Topological Database
Each router collects link-state information from other routers into
LSDB
3. Shortest-Path First (SPF), Dijkstra’s Algorithm
Using this information, routers can recreate a network topology
graph
Path-selection model (to destination prefix): minimum hop count
or sum of link costs
•
•
•
Link State Concept (4)
1. Flooding of linkstate information
R2
R1
Link State PDU
5. Routing Table
R4
2. Building a topological DB
Routing
Table
R3
Topological
Database Entries
4. SP Tree
Route
3. SPF
LSDB
Algorithm
4. Shortest Path Tree
The SPF algorithm computes the SP tree, with the local
router as the root of the tree and the other routers and links
to other routers as the various branches of the tree
•
5. Routing Table
Using this information, the router creates a routing table
populated by routing entries
•
Link State Advertisement Processing
LSA
LSU
Is entry in
LSDB ?
No
A
Yes
Is seq. #
the same?
Ignore LSA
Yes
No
Add to database
Is seq. #
higher?
Send LS Ack
Yes
No
Flood LSA
Run SPF to calculate
new routing table
End
Send LSU with newer
information to
source
End
Go
to A
Shortest-Path Problem
Given: network topology with link costs
•
•
c(x,y): link cost from node x to node y
c(x,y) set to infinity if x and y are not direct neighbors
Compute: from a given source (u) least-cost paths to
all other nodes
•
p(i): predecessor node along path from source to i
v
3
v
y
2
1
1
u
x
2
3
u
1
I/f 2
5
x
2
3
4
s
4
z
1
5
t
w
1
1
I/f 1
z
4
y
2
w
t
3
4
s
Shortest-path tree from u
Dijsktra’s Algorithm (iterative algorithm)
Notation
c(x,y): link cost from node x to y
c(x,y) = infinity if x and y not direct neighbors
D(v): current value of cost of path from source to dest. v
S: set of nodes whose least cost path definitively known
Initialization
S = {u}
/* u is the source node
for all nodes v
if v adjacent to u
then D(v) = c(u,v)
else D(v) = ∞
Repeat
find w not in S such that D(w) is a minimum
add w to S
update D(v) for all v adjacent to w and not in S
D(v) = min( D(v), D(w) + c(w,v) )
until all nodes are in S
Shortest-Path Tree: Example
Shortest-path tree from u
v
I/f 2
1
1
I/f 1
u
y
2
3
x
2
4
z
1
5
w
t
3
4
Routing table at u
Destination
Next-hop
Cost
v
v
3
w
w
2
x
w
3
y
v
5
z
v
6
s
w
6
t
w
8
Destination
Interface
Next-hop
v
1
v
w
2
w
x
2
w
y
1
v
z
1
v
s
2
w
t
2
w
s
Forwarding table at u
Outline
Internet Routing
•
•
•
Internet Routing and Protocols
Organization of the Internet
Intra-domain routing
–
–
–
•
Static routing
Distance vector routing
Link-state routing
Inter-domain routing
–
Path vector routing
BGP basics
BGP in large networks
Inter-domain routing
Goals
•
Allow to transmit IP packets along the best path
towards their destination through several transit
domains while taking into account the routing
policies of each domain without knowing the
detailed topology of those domains
–
From an interdomain viewpoint, best path often
means cheapest path
–
Each domain is free to specify inside its routing policy
the domains for which it agrees to provide a transit
service and the method it uses to select the best path
to reach each destination
Inter-domain Routing Protocols
Border Gateway Protocol (BGP) version 4 (RFC
4271) is an inter-domain routing protocol
•
•
Exchanges routing information between AS while
guaranteeing loop-free path selection
BGP protocol is similar to Distance Vector, but called “Path
Vector” instead
–
–
•
•
BGP router advertises in its vector only reachability information
and associated path attributes to each destination (so, avoids
loops), no costs or hop counts
Unlike IGPs, such as RIPv2, and OSPFv2, BGP does not use
metrics like hop count, link cost, or delay. Instead, BGP performs
its routing decisions (best path selection) based on network
policies and route selection rules applied in sequence to various
path attributes
Supports classless inter-domain routing (CIDR) and route
aggregation
THE inter-domain routing protocol of the Internet
Routing Protocols: Comparison
Link State
Dissemination
Algorithm
Condition
Convergence
Protocols
Distance Vector
Path Vector
Flood reliably topology
information (link state
PDU) to all routers
within the routing
domain
Dijsktra’s shortest path
Best end-to-end paths
are computed locally at
each router (determine
next hop)
Works only if policy is
shared and uniform
Update distances from
neighbors’ distances
Update paths based on
neighbors’ paths
Each router knows little
about network topology
Bellman-Ford shortest
path
Best end-to-end paths
result from composition
of all next-hop choices
Notion of distance and
direction
Fast due to flooding
Responds rapidly to
topology changes
Slow convergence
(bouncing effect, countto-infinity)
Local policy to rank
paths Route update
include the entire AS
path information to
prevent count-to-infinity
Does not require any
notion of distance
Does not require
uniform policies at all
routers
Slow convergence due
to path exploration (path
OSPFv2 (RFC 2328)
IS-IS (RFC 1195)
RIPv1 (RFC 1058)
RIPv2 (RFC 2453)
hunting)
BGPv4
Summary
Types of domains
•
•
Transit domain
Stub domain
Intradomain routing
•
Selects the best route towards each destination
based on one metric
–
–
–
Static routing
Distance vector routing
Link-state routing
Outline
Internet Routing
BGP basics
•
•
•
Routing policies
The Border Gateway Protocol
How to prefer some routes over others
BGP in large networks
Domains versus
Autonomous Systems
The BGP interdomain routing protocol deals
with Autonomous Systems (AS)
•
An AS is defined as “a set of routers under a
single technical administration ... that
presents a consistent picture of what
destinations are reachable through it.”
•
Each AS is identified by its AS number
In practice
•
•
A domain is often equivalent to an AS
A domain may be composed of several ASes
–
•
Ex: Worldcom uses AS701, AS702, ...
Many domains do not have an AS number
–
Ex: small networks connected to one provider without
using BGP
Types of interdomain links
Two types of interdomain links
•
Private link
–
Usually a leased line between two routers belonging to
the two connected domains
R2
R1
DomainB
DomainA
•
Connection via a public interconnection point
–
Usually Gigabit or higher Ethernet switch that
interconnects routers belonging to different domains
R2
Physical link
Interdomain link
R3
R1
R4
Routing policies
In theory BGP allows each domain to define its
own routing policy...
In practice there are two common policies
•
Customer-provider peering
–
•
Customer c buys Internet connectivity from provider P
Shared-cost peering
–
Domains x and y agree to exchange packets by using
a direct link or through an interconnection point
Customer-provider peering
AS1
$
AS3
AS2
$
$
Customer
AS4
$
$
AS7
•
Principle
–
Customer sends to its provider its internal routes and
the routes learned from its own customers
–
–
Provider will advertise those routes to the entire Internet to
allow anyone to reach the Customer
Provider sends to its customers all known routes
–
Customer will be able to reach anyone on the Internet
Provider
Shared-cost peering
AS1
$
AS3
AS2
$
$
$
AS4
Shared-cost
Customer-provider
$
AS7
•
Principle
–
PeerX sends to PeerY its internal routes and the routes
learned from its own customers
–
–
–
PeerY will use shared link to reach PeerX and PeerX's customers
PeerX's providers are not reachable via the shared link
PeerY sends to PeerX its internal routes and the routes
learned from its own customers
–
–
PeerX will use shared link to reach PeerY and PeerY's customers
PeerY's providers are not reachable via the shared link
Routing policies
A domain specifies its routing policy by defining
on each BGP router two sets of filters for each
peer
•
Import filter
–
•
Export filter
–
Specifies which routes can be accepted by the router
among all the received routes from a given peer
Specifies which routes can be advertised by the router to
a given peer
Filters can be defined in RPSL
•
Routing Policy Specification Language
Outline
Internet Routing
BGP basics
•
•
•
Routing policies
The Border Gateway Protocol
How to prefer some routes over others
BGP in large networks
The Border Gateway Protocol
Principle
Path vector protocol
•
–
BGP router advertises its best route to each destination
l
1.0.0.0/8
l
•
AS1
prefix:1.0.0.0/8
lASPath: AS1
AS5
prefix:1.0.0.0/8
lASPath: AS1
AS2
prefix:1.0.0.0/8
lASPath: ::AS2:AS4AS1
l
prefix:1.0.0.0/8
lASPath: AS4:AS1
l
AS4
... with incremental updates
–
Advertisements are only sent when their content changes
''Origin'' of the routes announced by BGP
Where do the routes announced by a BGP
router come from ?
•
Learned from other BGP routers
–
•
Static configuration
–
–
–
•
BGP router only propagates the received routes
BGP router is configured to advertise some prefixes
Drawback : requires manual configuration
Advantage : Stable set of advertised prefixes
Learned from an Interior Gateway Protocol
–
–
The prefixes received from the IGP are advertised by
the BGP router usually as an aggregate
Advantage
–
–
BGP advertisements follow network state, prefix is
automatically withdrawn by BGP it is not reachable via IGP
Drawback
–
BGP announcements will be unstable if IGP is unstable...
Policies and BGP
Two mechanisms to support policies in BGP
•
Each domain defines itself which is the best route
to reach each destination based on the routes
learned from its peers
–
–
•
The chosen best route is not necessarily the ''shortest''
route as with IGPs
Only the best route towards each destination can be
announced to external peers
Each domain determines, on its own, which
routes can be advertised to each peer
–
An AS does not necessarily advertise to all its
neighbors all the routes that it knows
BGP Routing Information Base (RIB)
BGP RIB consists of three distinct parts:
•
Adj-RIBs-In
–
–
•
Loc-RIB
–
–
•
Stores routing information learned from inbound UPDATE
messages received from other BGP speakers
These routes are available as input to the Decision Process after
applying Import Policy rules (import filter)
Contains the local routing information the BGP speaker selects by
applying its local policies to the routing information contained in its
Adj-RIBs-In
These are the routes that will be used by the local BGP speaker
Adj-RIBs-Out
–
–
Stores routing information the local BGP speaker selected for
advertisement to its peers
This routing information will be carried in the local BGP speaker's
UPDATE messages and advertised to its peers by means of the
local speaker's UPDATE messages after applying Export Policy
rules (export filter)
Conceptual model of a BGP router (1)
BGP Adj-RIB-In
Peer[N]
BGP Msgs
from Peer[N]
BGP Msgs
from Peer[1]
All
acceptable
routes
BGP Adj-RIB-Out
Peer[N]
Peer[1]
BGP Msgs
to Peer[N]
Import filter
Attribute
manipulation
BGP Decision
Process
Import filter(Peer[i])
Determines which BGP Msgs
are acceptable from Peer[i]
One best
route selection to
each destination
BGP Loc-RIB
Peer[1]
Export filter
Attribute
manipulation
BGP Msgs
to Peer[1]
Export filter(Peer[i])
Determines which
routes can be sent to Peer[i]
BGP decision process selects
the best route towards each destination
BGP Loc-RIB: contains the routes that
have been selected by the local BGP
speaker's Decision Process
Conceptual model of a BGP router (2)
Constrained by operator’s policies and configuration language
Receive
BGP
Updates
Apply Policy =
Input filtering
routes & treat
attributes
Apply Import
Policies
Apply Policy =
Output filtering
routes & treat
attributes
Based on
Prefix Attribute
BGP Decision
process
Loc-RIB
Best routes
Best Route
Selection
Adj-RIB-Out
Adj-RIB-In
Adj-RIB-In: contains unprocessed routing
information that has been advertised to the local
BGP speaker by its peers
Loc-RIB: contains the routes that have been
selected by the local BGP speaker's Decision Process
Adj-RIB-Out: contains the routes for advertisement
to specific peers by means of the local speaker's
UPDATE messages
Apply Export
Policies
Install forwarding
entries for best routes
IP Forwarding
Table
Send
BGP
Updates
BGP Decision Process
When a BGP speaker receives more than one route for the
same IPv4 prefix, the BGP route selection rules for route
preference are used to choose which IPv4 route is installed by
BGP
Highest Local Preference
Enforce relationships
Shortest AS-PATH
Lowest MED
Traffic Engineering
iBGP < eBGP
Lowest IGP cost to BGP egress
Lowest router ID
Tie breaker
Note: not selected/unprocessed routing information is usually
maintained in case currently selected information is being
withdrawn or superseded
BGP : Principles of operation
Principles
•
BGP relies on the
incremental exchange of path vectors
BGP session established
over
TCP connection between
peers
AS3
R1
BGP
session
Each peer sends all its
active routes
BGP Msgs
R2
AS4
As long as the BGP session
remains up
Incrementally update BGP routing
tables
BGP : Principles of operation (2)
Simplified model of BGP
•
2 types of BGP path vectors
•
UPDATE
–
–
Used to announce a route towards one prefix
Content of UPDATE
–
–
–
•
Destination address/prefix
Interdomain path used to reach destination (AS-Path)
Nexthop (address of the router advertising the route)
WITHDRAW
–
–
Used to indicate that a previously announced route is
not reachable anymore
Content of WITHDRAW
–
Unreachable destination address/prefix
BGP : Session Initialization
Initialize_BGP_Session(RemoteAS, RemoteIP)
{ /* Initialize and start BGP session */
/* Send BGP OPEN Message to RemoteIP on port 179*/
/* Follow BGP state machine */
/* advertise local routes and routes learned from peers*/
foreach (destination=d inside BGP-Loc-RIB)
{
B=build_BGP_UPDATE(d);
S=apply_export_filter(RemoteAS,B);
if (S<>NULL)
{ /* send UPDATE message */
send_UPDATE(S,RemoteAS, RemoteIP)
}
}
/* entire RIB was sent */
/* new UPDATE will be sent only to reflect local or distant
changes in routes */
...
}
Events during a BGP session
1.Addition of a new route to RIB
•
A new internal route was added on local router
–
–
•
Static route added by configuration
Dynamic route learned from IGP
Reception of UPDATE message announcing a new
or modified route
2.Removal of a route from RIB
•
Removal of an internal route
–
–
•
Static route is removed from router configuration
Intradomain route declared unreachable by IGP
Reception of WITHDRAW message
3.Loss of BGP session
•
All routes learned from this peer removed from RIB
Export and Import filters
BGPMsg Apply_export_filter(RemoteAS, BGPMsg)
{ /* check if Remote AS already received route */
if (RemoteAS isin BGPMsg.ASPath)
BGPMsg==NULL;
/* Many additional export policies can be configured : */
/* Accept or refuse the BGPMsg */
/* Modify selected attributes inside BGPMsg */
}
BGPMsg apply_import_filter(RemoteAS, BGPMsg)
{ /* check that we are not already inside ASPath */
if (MyAS isin BGPMsg.ASPath)
BGPMsg==NULL;
/* Many additional import policies can be configured : */
/* Accept or refuse the BGPMsg */
/* Modify selected attributes inside BGPMsg */
}
BGP : Processing of UPDATES
Recvd_BGPMsg(Msg, RemoteAS)
{
B=apply_import_filer(Msg,RemoteAS);
if (B==NULL) /* Msg not acceptable */
exit();
if IsUPDATE(Msg)
{
Old_Route=BestRoute(Msg.prefix);
Insert_in_RIB(Msg);
Run_Decision_Process(RIB);
if (BestRoute(Msg.prefix)<>Old_Route)
{ /* best route changed */
B=build_BGP_Message(Msg.prefix);
S=apply_export_filter(RemoteAS,B);
if (S<>NULL) /* announce best route */
send_UPDATE(S,RemoteAS);
else if (Old_Route<>NULL)
send_WITHDRAW(Msg.prefix);
} ...
BGP : Processing of WITHDRAW
Recvd_Msg(Msg, RemoteAS)
...
if IsWITHDRAW(Msg)
{
Old_Route=BestRoute(Msg.prefix);
Remove_from_RIB(Msg);
Run_Decision_Process(RIB);
if (Best_Route(Msg.prefix)<>Old_Route)
{ /* best route changed */
B=build_BGP_Message(d);
S=apply_export_filter(RemoteAS,B);
if (S<>NULL) /* still one best route */
send_UPDATE(S,RemoteAS, RemoteIP);
else if(Old_Route<>NULL)/* no best route anymore */
send_WITHDRAW(Msg.prefix,RemoteAS,RemoteIP);
}
}
}
The BGP messages
Variable length messages
•
With fixed size header
32 bits
Marker ( 16 bytes ) : All 11...
Length : 16 bits
Type
Max length of BGP messages : 4096 bytes
OPEN
used to establish BGP session
UPDATE
used to send new routes and to remove
unusable routes
NOTIFICATION
used to inform the remote peer of
an error
BGP session is closed upon transmission
or reception of NOTIFICATION message
KEEPALIVE
one message must be sent at least every
30 seconds on each BGP session
ROUTE_REFRESH
used to support graceful restart
The OPEN message
Used to establish a BGP session between two
BGP peers
32 bits
Version
My AS Number
Hold Time
BGP Identifier
Opt. Len
Optional Parameters
Variable Length
Encoded in TLV Format
Currently version 4
AS # of the BGP peer sending the message
Hold Time: maximum delay between successive
KEEPALIVE, and/or UPDATE messages
BGP Id: Usually IP v4 loopback address
of BGP peer
Optional field:
Used notably for capabilities negotiation
Establishment of a BGP session
CONNECT.req
SYN(port=179)
CONNECT.ind
CONNECT.resp
CONNECT.conf
SYN+ACK(port=179)
TCP connection established
DATA.req(OPEN)
ACK(port=179)
TCP connection established
DATA(BGP OPEN)
ACK
DATA.req(OPEN)
BGP session established
DATA.req(OPEN)
DATA.req(OPEN)
DATA(BGP OPEN)
ACK
BGP session established
The UPDATE message
•
Single message type used to carry both IP v4
route announcements and route withdrawals
32 bits
# Withdrawn routes
Withdrawn routes
Variable Length
LEN
Prefix length in bits
Withdrawn prefix (1-4 octets)
Tot. Path Attr. Len
Path attributes
Variable Length
Network Layer
Reachability Information
Variable Length
LEN
Prefix length in bits
Advertised prefix (1-4 octets)
BGP Path Attributes
BGP update (announcement / withdraw) includes:
Path attribute values + Network layer reachability
information (NLRI)
•
•
Reachability information is encoded as one or more 2-tuples
of the form <length, prefix>
Path attributes
Origin
AS_Path
Next_Hop
MED
Local Pref.
….
Each path attribute is encoded in a TLV (type-length-value) format
(variable)
– New attributes can be added by simply appending a new attribute
– Attributes can be transitive / non-transitive, mandatory / optional
–
BGP Path Attributes (2)
ORIGIN: mandatory attribute that defines the origin of the path
information as generated by the BGP speaker that originates the associated
routing information. It is one of: IGP (from a network statement), EGP (from
an external peer), Unknown (from IGP redistribution)
AS_PATH: mandatory attribute identifying the AS sequence through
which routing information carried in the UPDATE message has passed.
Provides a mechanism for loop detection. Local AS added only when send
to external peer.
NEXT_HOP: mandatory attribute that identifies the (unicast) IP address
of the router that should be used as the next hop to the destination prefixes
listed in the NRLI field of the UPDATE message
MED (Multi-Exit Discriminator): optional non-transitive attribute
used on external (inter-AS) links to discriminate among multiple exit or entry
points to the same neighboring AS. Optionally used by a BGP speaker's
Decision Process to discriminate among multiple entry points to a
neighboring AS.
LOCAL_PREF: attribute included in all UPDATE messages that a given
BGP speaker sends to internal peers (iBGP). A BGP speaker uses it to
inform its other internal peers of the advertising speaker's degree of
preference for an advertised route => attribute that overrides AS_PATH,
and is transitive throughout the network. Never advertised to eBGP peer
The KEEPALIVE and NOTIFICATION
messages
The KEEPALIVE message
•
•
BGP Message containing only the default header
Every HoldTime/3 seconds, send a KEEPALIVE
message if no recent BGP message was sent
The NOTIFICATION message
•
indicates problem in processing of BGP message
–
BGP session is released upon transmission/reception of
NOTIFICATION
Err Code
SubCode
Additional data
(variable length)
Example errors :
2 : OPEN Message Error
Unsupported Version, Unsupported
Optional Parameter, ...
3 : UPDATE Message Error
Malformed Attribute List, ...
4 : Hold Timer Expired
5 : Finite State Machine Error
6 : Cease
BGP and IP
A first example
•
Initial updates
UPDATE
lprefix:194.100.0.0/24,
lNextHop:R2
lASPath: AS20:AS10
UPDATE
lprefix:194.100.0.0/24,
lNextHop:R1
lASPath: AS10
AS10
AS20
R1
BGP
194.100.0.0/24
R2
R3
194.100.1.0/24
BGP
BGP
AS30
UPDATE
lprefix:194.100.0.0/24,
lNextHop:R4
lASPath: AS40:AS10
UPDATE
lprefix:194.100.0.0/24,
lNextHop:R1
lASPath: AS10
R4
AS40
•
What happens if link AS10-AS20 goes down ?
BGP and IP
A second example
AS20
AS10
AS30
195.100.0.4/30
195.100.0.0/30
R1 195.100.0.1
195.100.0.2 R2 195.100.0.5
195.100.0.6
194.100.0.0/24
BGP
R3
194.100.1.0/24
194.100.2.0/23
UPDATE
lprefix:194.100.0.0/24,
lNextHop:195.100.0.1
lASPath: AS10
UPDATE
lprefix:194.100.2.0/23,
lNextHop:195.100.0.2
lASPath: AS20
•
Main Path attributes of UPDATE message
–
–
NextHop : IP address of router used to reach destination
ASPath : Path followed by the route advertisement
BGP and IP
A second example (2)
AS20
AS10
AS30
195.100.0.4/30
195.100.0.0/30
R1 195.100.0.1
195.100.0.2 R2 195.100.0.5
195.100.0.6
194.100.0.0/24
BGP
BGP
R3
194.100.1.0/24
194.100.2.0/23
UPDATE
lprefix:194.100.0.0/24
lNextHop:195.100.0.5
lASPath: AS20:AS10
UPDATE
lprefix:194.100.2.0/23
lNextHop:195.100.0.5
lASPath: AS20
UPDATE
lprefix:194.100.1.0/24,
lNextHop:195.100.0.2
lASPath: AS20;AS30
UPDATE
lprefix:194.100.1.0/24,
lNextHop:195.100.0.6
lASPath: AS30
BGP and IP
A second example (3)
AS20
AS10
AS30
195.100.0.4/30
195.100.0.0/30
R1 195.100.0.1
195.100.0.2 R2 195.100.0.5
195.100.0.6
194.100.0.0/24
BGP
194.100.1.0/24
194.100.2.0/23
WITHDRAW
lprefix:194.100.1.0/24
R3
Outline
Internet Routing
BGP basics
•
•
•
Routing policies
The Border Gateway Protocol
How to prefer some routes over others
BGP in large networks
How to prefer some routes over others ?
RA
RB
AS2
Backup: 2Mbps
Primary: 34Mbps
R1
AS1
–
How to ensure that packets will flow on primary link ?
RA
AS2
RB
R3
Expensive
AS1 R1
–
AS3
R5
Cheap
R2
AS4
How to prefer cheap link over expensive link ?
AS5
How to prefer some routes over others (2) ?
Peer[N]
BGP Msgs
from Peer[N]
BGP Msgs
from Peer[1]
Peer[1]
Import filter
Attribute
manipulation
BGP RIB
All
acceptable
routes
BGP Decision
Process
One best
route to each
destination
Import filter
Selection of acceptable routes
Addition of local-pref attribute
inside received BGP Msg
Normal quality route : local-pref=100
Better than normal route :local-pref=200
Worse than normal route :local-pref=50
Peer[N]
Peer[1]
BGP Msgs
to Peer[N]
Export filter
Attribute
manipulation
BGP Msgs
to Peer[1]
Simplified BGP Decision Process
Select routes with highest
local-pref
If there are several routes,
choose routes with the
shortest AS Path
lIf there are still several routes
tie-breaking rule
How to prefer some routes over others (3) ?
RA
AS2
Backup: 2Mbps
RB
Primary: 34Mbps
R1
AS1
RPSL-like policy for AS1
aut-num: AS1
import: from AS2 RA at R1 set localpref=100;
from AS2 RB at R1 set localpref=200;
accept ANY
export: to AS2 RA at R1 announce AS1
to AS2 RB at R1 announce AS1
RPSL-like policy for AS2
aut-num: AS2
import: from AS1 R1 at RA set localpref=100;
from AS1 R1 at RB set localpref=200;
accept AS1
export: to AS1 R1 at RA announce ANY
to AS2 R1 at RB announce ANY
How to prefer some routes over others (4) ?
RA
AS2
RB
R3
AS3
Expensive
R5
AS1 R1
Cheap
R2
AS5
AS4
RPSL policy for AS1
aut-num: AS1
import: from AS2 RA at R1 set localpref=100;
from AS4 R2 at R1 set localpref=200;
accept ANY
export: to AS2 RA at R1 announce AS1
to AS4 R2 at R1 announce AS1
–
–
AS1 will prefer to send packets over the cheap link
But the flow of the packets destined to AS1 will depend
on the routing policy of the other domains
Limitations of local-pref
In theory
•
Each domain is free to define its order of
preference for the routes learned from external
peers
1.0.0.0/8
AS1
Preferred paths for AS3
1. AS4:AS1
2. AS1
AS3
–
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS4
How to reach 1.0.0.0/8 from AS3 and AS4 ?
Limitations of local-pref (2)
AS1 sends its UPDATE messages ...
1.0.0.0/8
UPDATE
lPrefix:1.0.0.0/8
lASPath: AS1
AS3
Preferred paths for AS3
1. AS4:AS1
2. AS1
Routing table for AS3
1.0.0.0/8 ASPath: AS1 (best)
AS1
UPDATE
lPrefix:1.0.0.0/8
lASPath: AS1
AS4
Preferred paths for AS4
1. AS3:AS1
2. AS1
Routing table for AS4
1.0.0.0/8 ASPath: AS1 (best)
Limitations of local-pref (3)
First possibility
•
AS3 sends its UPDATE first...
1.0.0.0/8
AS1
Preferred paths for AS3
1. AS4:AS1
2. AS1
AS3
Routing table for AS3
1.0.0.0/8 ASPath: AS1 (best)
–
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS4
UPDATE
lPrefix:1.0.0.0/8
lASPath: AS3:AS1
Stable route assignment
Routing table for AS4
1.0.0.0/8 ASPath: AS1
1.0.0.0/8 ASPath:AS3:AS1 (best)
Limitations of local-pref (4)
Second possibility
•
AS4 sends its UPDATE first...
1.0.0.0/8
AS1
Preferred paths for AS3
1. AS4:AS1
2. AS1
AS3
Routing table for AS3
1.0.0.0/8 ASPath: AS1
1.0.0.0/8 ASPath: AS4:AS1 (best)
–
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS4
UPDATE
lPrefix:1.0.0.0/8
lASPath: AS4:AS1
Routing table for AS4
1.0.0.0/8 ASPath: AS1 (best)
Another (but different) stable route assignment
Limitations of local-pref (5)
Third possibility
•
AS3 and AS4 send their UPDATE together...
1.0.0.0/8
AS1
Preferred paths for AS3
1. AS4:AS1
2. AS1
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS3
UPDATE
lPrefix:1.0.0.0/8
lASPath: AS3:AS1
–
–
AS4
UPDATE
lPrefix:1.0.0.0/8
lASPath: AS4:AS1
AS3 prefers the indirect path and will thus send withdraw
since the chosen best path is via AS4
AS4 prefers the indirect path and will thus send withdraw
since the chosen best path is via AS3
Limitations of local-pref (6)
Third possibility (cont.)
•
AS3 and AS4 send their UPDATE together...
1.0.0.0/8
Preferred paths for AS3
1. AS4:AS1
2. AS1
Preferred paths for AS4
1. AS3:AS1
2. AS1
AS1
AS3
WITHDRAW
lPrefix:1.0.0.0/8
–
WITHDRAW
lPrefix:1.0.0.0/8
AS3 learns that the indirect route is not available anymore
–
–
AS4
AS3 will reannounce its direct route...
AS4 learns that the indirect route is not available anymore
–
AS4 will reannounce its direct route...
More limitations of local-pref
Unfortunately, interdomain routing may not
converge at all in some cases...
AS1
AS0
Preferred paths for AS3
1. AS4:AS0
2. AS0
AS3
–
Preferred paths for AS1
1. AS3:AS0
2. AS0
Preferred paths for AS4
1. AS1:AS0
2. AS0
AS4
How to reach a destination inside AS0 in this case ?
local-pref and economical
relationships
In practice, local-pref is often used to
enforce economical relationships
Prov1
Prov2
$
$
Peer1
Peer3
AS1
Peer4
Peer2
$
Cust1
Local-pref values used by AS1
> 1000 for the routes received from a Customer
500 – 999 for the routes learned from a Peer
< 500 for the routes learned from a Provider
$
Cust2
Shared-cost
$ Customer-provider
Consequence of this utilization of
local-pref
Which route will be used by AS1 to reach AS5 ?
AS2
$
AS3
$
$
AS4
AS8
$
$
AS6
AS7
$
•
AS5
$
Shared-cost
$ Customer-provider
$
AS1
and how will AS5 reach AS1 ?
Internet paths are often asymmetrical
Guidelines for
a safe utilisation of local-pref
The directed graph composed of the
customer->provider links is loop-free
•
An AS cannot be a customer of a provider of its
providers
$
AS1
•
$
AS2
AS3
$
An AS always prefer a route via a customer over a
route via a provider or a peer
–
With some restrictions on the graph composed of peerto-peer relationships, it is also possible to allow an AS
to give the same preference to a route via a customer
or via a peer
Summary
Routing policies
•
Two main routing policies
–
–
Customer-Provider relationship
Peer-to-Peer relationship
The Border Gateway Protocol
•
•
•
Path vector protocol with incremental updates
Import and export filters to implement routing
policies
Utilisation of local-pref
–
–
–
Influence BGP decision process
Prefer some routes over others
Be careful with possible oscillations due to bad setting
Outline
Internet Routing Protocols
BGP basics
BGP in large networks
•
•
•
•
The needs for iBGP
Confederations and Route Reflectors
The BGP decision process
Scalable routing policies
BGP and IP
Second example
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
194.100.0.0/23
AS30
R2
BGP
AS20
195.100.0.8/30
195.100.0.9
194.100.4.0/23
195.100.0.6
195.100.0.10
R3
BGP
195.100.0.4/30
R4 195.100.0.5
Problem
•
How can R2 (resp. R4) advertise to R4 (resp. R2)
the routes learned from AS10 (resp. AS30) ?
BGP and IP
Second example (2)
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
194.100.0.0/23
AS30
R2
BGP
IGP
AS20
194.100.4.0/23
BGP
195.100.0.4/30
R4 195.100.0.5
First solution
•
R3
195.100.0.8/30
195.100.0.9
195.100.0.6
195.100.0.10
Use IGP (OSPF/ISIS,RIP) to carry BGP routes
Drawbacks
•
•
IGP may not be able to support so many routes
IGP does not carry BGP attributes like ASPath !
The AS7007 incident
The AS7007 incident
AS7007
AS Y
AS x
RX
R1
4.0.0.0/8 : AS x:AS3:AS6
•
R2
RY
4.0.0.0/8 : AS7007 !!!!!!
A single configuration error in two routers
–
–
All routes learned from ASX on R1 were redistributed to
R2 via IGP and R2 announced them to ASY
Consequence
–
–
–
AS7007 advertised routes that almost all IP addresses were
belonging to AS7007
These routes were shorter than the real routes ...
Two hours of disruption for large parts of the Internet !
iBGP and eBGP
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
R2
195.100.0.6
195.100.0.10
R3
195.100.0.8/30
iBGP
195.100.0.9
194.100.4.0/23
AS30
eBGP
195.100.0.4/30
R4 195.100.0.5
Solution
•
Use BGP to carry routes between all routers of domain
–
–
–
Two different types of BGP sessions
eBGP between routers belonging to different ASes
iBGP between each pair of routers belonging to the same AS
–
–
Each BGP router inside ASx maintains an iBGP session with all other
BGP routers of ASx (full iBGP mesh)
Note that the iBGP sessions do not necessarily follow physical
topology
iBGP versus eBGP
Differences between iBGP and eBGP
•
•
local-pref attribute is only carried inside
messages sent over iBGP session
Over an eBGP session, a router only advertises
its best route towards each destination
–
•
Usually, import and export filters are defined for each
eBGP session
Over an iBGP session, a router advertises only
its best routes learned over eBGP sessions
–
–
A route learned over an iBGP session is never advertised
over another iBGP session
Usually, no filter is applied on iBGP sessions
iBGP and eBGP : Example
UPDATE (via eBGP)
lPrefix:194.100.0.0/23,
lNextHop:195.100.0.1
lASPath: AS10
AS10
194.100.2.0/23
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
–
R2
195.100.0.6
195.100.0.10
R3
195.100.0.8/30
AS20
UPDATE (via iBGP)
lPrefix:194.100.0.0/23,
lNextHop:195.100.0.1
lASPath: AS10
lLocal-pref:1000
AS30
iBGP
195.100.0.9
eBGP
195.100.0.4/30
R4 195.100.0.5
194.100.4.0/23
UPDATE (via eBGP)
lPrefix:194.100.0.0/23,
lNextHop:195.100.0.5
lASPath: AS20:AS10
Note that the next-hop and the AS-Path of BGP update
messages are only updated when sent over an eBGP session
iBGP and eBGP
Packet Forwarding
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
IGP routing table of R2
195.100.0.0/30 West
195.100.0.4/30 via 195.100.0.9
195.100.0.8/30 South
194.100.0.4/23 via 195.100.0.9
194.100.2.0/23 North
194.100.4.0/23
R2
195.100.0.6
195.100.0.10
195.100.0.8/30
iBGP
195.100.0.9
BGP routing table of R2
194.100.0.0/23 via 195.100.0.1
AS30
eBGP
195.100.0.4/30
R4 195.100.0.5
BGP routing table of R4
194.100.0.0/23 via 195.100.0.1
IGP routing table of R4
195.100.0.0/30 via 195.100.0.10
195.100.0.4/30 East
195.100.0.8/30 North
194.100.2.0/23 via 195.100.0.10
194.100.0.4/23 West
R3
iBGP and eBGP
Packet Forwarding (2)
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS30
R2
195.100.0.6
195.100.0.10
195.100.0.8/30
AS20
iBGP
195.100.0.9
194.100.4.0/23
eBGP
195.100.0.4/30
R4 195.100.0.5
BGP routing table of R4
194.100.0.0/23 via 195.100.0.1
IGP routing table of R4
195.100.0.0/30 via 195.100.0.10
195.100.0.4/30 East
195.100.0.8/30 North
194.100.2.0/23 via 195.100.0.10
194.100.4.0/23 West
Forwarding of R4
194.100.0.0/23 via 195.100.0.10
195.100.0.0/30 via 195.100.0.10
195.100.0.4/30 East
195.100.0.8/30 North
194.100.2.0/23 via 195.100.0.10
194.100.4.0/23 West
R3
Using non-BGP routers
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
AS30
R2
195.100.0.6
R5
iBGP
eBGP
R3
12.0.0.0/8
195.100.0.4/30
194.100.4.0/23
R4 195.100.0.5
Problem
•
What happens when there are internal backbone
routers between BGP routers inside an AS ?
–
–
iBGP session between BGP routers is easily established
when IGP is running since iBGP runs over TCP connection
How to populate the routing table of the backbone routers
to ensure that they will be able to route any IP packet ?
Using non-BGP routers (2)
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
AS30
R2
195.100.0.6
R5
iBGP
R3
eBGP
195.100.0.4/30
194.100.4.0/23
R4 195.100.0.5
First solution
•
Use tunnels between BGP routers to encapsulate
interdomain packets
–
GRE tunnel
–
–
Needs static configuration and be careful with MTU issues
MPLS tunnel
–
Can be dynamically established in MPLS enabled backbone
MPLS in large ISP networks
Only one BGP table lookup inside the AS
•
Use a hierarchy of labels
top label is used to reach egress router
second label is used to reach eBGP peer
–
–
RG
RH
RA
B4
R1
B3
RB
R2
RC
RD
R5
B6
AS1
Egress Border router
u
packets are
label switched
R7
RE
Ingress Border router
u
Maintains full BGP routing table
u
Attach two labels based on routing table
RF
Using non-BGP routers (3)
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
AS30
R2
195.100.0.6
R5
iBGP
eBGP
R3
12.0.0.0/8
195.100.0.4/30
194.100.4.0/23
R4 195.100.0.5
Second solution
•
•
Use IGP (OSPF/IS-IS - RIP) to redistribute
interdomain routes to internal backbone routers
Drawbacks
–
–
Size of BGP tables may completely overload the IGP
Make sure that BGP routes learned by R2 and injected
inside IGP will not be re-injected inside BGP by R4 !
Using non-BGP routers (4)
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
AS30
R2
iBGP
R5
iBGP
195.100.0.6
eBGP
iBGP
R3
12.0.0.0/8
195.100.0.4/30
194.100.4.0/23
R4 195.100.0.5
Classical solution
•
•
Run BGP on internal backbone routers
Internal backbone routers need to participate in
iBGP full mesh
–
Internal backbone routers receive BGP routes via iBGP
but never advertise any routes
–
Remember : a route learned over an iBGP session is never
advertised over another iBGP session
The roles of IGP and BGP
194.100.2.0/23
AS10
195.100.0.2
195.100.0.0/30
R1 195.100.0.1
eBGP
194.100.0.0/23
AS20
R2
iBGP
R5
iBGP
iBGP
AS30
195.100.0.4/30
194.100.4.0/23
195.100.0.6
R4
eBGP
•
Role of the IGP inside AS20
–
•
R3
12.0.0.0/8
Distribute internal topology and internal addresses
R2-R4-R5)
Role of BGP inside AS20
–
–
Distribute the routes towards external destinations
IGP must run to allow BGP routers to establish iBGP sessions
The iBGP full mesh
Drawback
•
N*(N-1)/2 iBGP sessions for N routers
R
R
R
R
R
R
R
R
iBGP session
R
Outline
Internet routing
BGP basics
BGP in large networks
•
•
•
•
The needs for iBGP
Confederations and Route Reflectors
The BGP decision process
Scalable routing policies
How to scale iBGP in large domains ?
Confederations
•
Divide the large domain in smaller sub-domains
–
–
Use iBGP full mesh inside each sub-domain
Use eBGP between sub-domains
Confederation : AS20
R
R
Member-AS
AS65001
R
R
R
Member-AS
AS65002
R
R
R
iBGP session
eBGP session
•
Each router is configured with two AS numbers
–
–
•
Its confederation AS number
Its Member-AS AS number
Usually, a single IGP covers the whole domain
Confederations : example
UPDATE (via eBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
AS20
eBGP
RX
R2
AS10
iBGP eBGP
iBGP
R1
R6
AS65021
iBGP
iBGP
R3
eBGP
AS65020
R5
RY
AS30
–
–
–
On the eBGP session between R2 and RX, R2 belongs to
AS20
On the eBGP session between R5 and RY, R5 belongs to
AS20
On the eBGP session between R1 and R6, R1 belongs to
AS65020 and R6 belongs to AS65021
Confederations : example (2)
UPDATE (via eBGP)
lPrefix:1.0.0.0/8,
lASPath: [AS65020]:AS10
UPDATE (via iBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
AS20
eBGP
RX
R2
AS10
iBGP eBGP
iBGP
R1
R6
AS65021
iBGP
iBGP
R3
eBGP
AS65020
R5
RY
AS30
–
When propagating an UPDATE via eBGP to another router of
the same confederation, R1 inserts its Member-AS number in
the AS_PATH
Confederations : example (3)
UPDATE (via iBGP)
lPrefix:1.0.0.0/8,
lASPath: [AS65020]:AS10
AS20
eBGP
RX
R2
AS10
iBGP eBGP
iBGP
R1
R6
AS65021
iBGP
iBGP
R3
UPDATE (via eBGP)
lPrefix:1.0.0.0/8,
lASPath: AS20:AS10
eBGP
AS65020
R5
RY
AS30
–
When propagating an UPDATE via eBGP to a router
outside its confederation, R5 removes the internal path
from the AS_Path and inserts its Confederation AS
number in the AS_PATH
Route reflectors
An alternative to confederations
Route reflectors
•
A route reflector is a special router that is allowed
to propagate the routes learned over iBGP
sessions on other iBGP sessions
Normal iBGP full mesh
eBGP
R2
iBGP with one route reflector
eBGP
iBGP
iBGP
R2
iBGP
R1
iBGP
iBGP
eBGP
R3
RR
iBGP
eBGP
R3
Route
Reflector
Behavior of a Route Reflector
Two types of iBGP peers of a route reflector
R1
R2
iBGP
iBGP
....
RN
iBGP
RR clients peers
( do not participate in
iBGP full mesh)
RR
iBGP
iBGP
RX
iBGP
iBGP
RZ
iBGP
RY
iBGP
Non-clients peers
(participate in iBGP full mesh)
Behavior of a Route Reflector
Route received from an eBGP session or a
client peer
•
•
Select best path
Advertise to
–
–
RR clients peers
....
R2
R1
All client peers
All non-client peers
iBGP
iBGP
RN
iBGP
RR
Route received from
non-client peer
•
•
Select best path
Advertise to :
–
All client peers
iBGP
iBGP
RX
iBGP
iBGP
RZ
iBGP
RY
iBGP
Non-clients peers
Fault tolerance of route reflectors
How to avoid having the RR as a single point
of failure ?
•
Solution
–
Allow each client peer to be connected at 2 RRs
R1
RR clients peers
....
R2
iBGP iBGP
•
Issue
–
RR1
iBGP
iBGP
RN
iBGP
RR2
Configuration errors may cause redistribution loops
–
–
ORIGINATOR_ID used to carry router ID of originator of route
CLUSTER_LIST contains the list of RR that sent the UPDATE
message inside the current AS
Route reflectors : an example
UPDATE (via eBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
AS20
eBGP
RX
R2
AS10
iBGP iBGP
RR1
UPDATE (via eBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
RZ
eBGP
iBGP
iBGP
R3
–
–
–
RR6
eBGP
R5
R2 and R3 are clients of Route Reflector RR1
RR1 and RR6 are in iBGP full mesh
R5 is client of Route Reflector RR6
RY
AS30
Route reflectors : an example (2)
UPDATE (via iBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
AS20
lNexthop:RX
eBGP
RX
R2
AS10
RR6
iBGP iBGP
RR1
iBGP
iBGP
R3
RZ
–
eBGP
eBGP
R5
RY
UPDATE (via iBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
lNexthop:RZ
RR1 will select its best path towards 1.0.0.0/8 and will
re-advertise it by adding the ORIGINATOR_ID and the
CLUSTERID
AS30
Route reflectors : an example (3)
UPDATE (via iBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
lNexthop:RX
lORIGINATOR_ID:R2
AS20
lCLUSTER_ID:RR1
eBGP
RX
R2
AS10
iBGP iBGP
UPDATE (via iBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
lNexthop:RX
lORIGINATOR_ID:R2
lCLUSTER_ID:RR1
RR1
eBGP
–
iBGP
iBGP
R3
RZ
RR6
eBGP
R5
RY
AS30
RR1 prefers the path to 1.0.0.0/8 via RX-R2
–
–
RR1 advertises this path to its client peer (R3)
– the path is not advertised to R2 since R2 already received it
RR1 advertises this path to its non-client peer (RR6)
Route reflectors : an example (4)
UPDATE (via iBGP)
lPrefix:1.0.0.0/8,
lASPath: AS10
lNexthop:RX
lORIGINATOR_ID:R2
lCLUSTER_ID:RR1:RR6
AS20
eBGP
RX
R2
AS10
RR6
iBGP iBGP
RR1
iBGP
iBGP
R3
eBGP
RZ
–
R5
RY
AS30
RR6 advertises the path to 1.0.0.0/8 via RX-R2
–
–
eBGP
to its client peer R5
R5 will remove ORIGINATOR_ID and CLUSTER_ID before
advertising the path to RY via eBGP
Hierarchy of route reflectors
In large domains, a hierarchy of route
reflectors can be built
R5
R4
R1
R2
R3
RR4
RR1
R6
RRA
RR5
RR2
RRC
RRB
iBGP session
Confederations versus Route reflectors
Confederations
•
•
•
•
•
•
Solves iBGP scaling
Redundancy with
iBGP full-mesh inside
each MemberAS
Possible to run one
IGP per Member AS
Requires manual
router configuration
Can be used when
merging domains
Can lead to some
routing oscillations
Route reflectors
•
•
•
•
•
Solves iBGP scaling
Redundancy by using
Redundant RRs
Usually a single IGP
for the whole AS
Requires manual
router configuration
Can lead to some
routing oscillations
Outline
Internet Routing
BGP basics
BGP in large networks
•
•
•
•
The needs for iBGP
Confederations and Route Reflectors
The BGP decision process
Scalable routing policies
The BGP decision process
Peer[N]
BGP Msgs
from Peer[N]
BGP Msgs
from Peer[1]
Peer[1]
Import filter
Attribute
manipulation
BGP RIB
All
acceptable
routes
BGP Decision
Process
One best
route to each
destination
Peer[N]
Peer[1]
BGP Msgs
to Peer[N]
Export filter
Attribute
manipulation
BGP Msgs
to Peer[1]
BGP Decision Process
– Ignore routes with unreachable nexthop
– Prefer routes with highest local-pref
– Prefer routes with shortest ASPath
– Prefer routes with smallest MED
– Prefer routes learned via eBGP over routes learned via iBGP
– Prefer routes with closest next-hop
– Tie breaking rules
– Prefer Routes learned from router with lowest router id
The shortest AS-Path step in
the BGP decision process
Motivation
•
•
BGP does not contain a real “ metric”
Use length of AS-Path as an indication of the
quality of routes
–
Not always a good indicator
R0
R1
RA
R2
Consequence
•
•
RB
R3
RC
R4
Internet paths tend to be short, 3-5 AS hops
Many paths converge at Tier-1 ISPs and those
ISPs carry lots of traffic
The prefer eBGP over iBGP step in
the BGP decision process
Motivation: hot potato routing
•
A router should try to get rid of packets sent to
external domains as soon as possible
AS1
R6's routing table
l1/8:AS2 via R2 (eBGP,best)
l1/8:AS2 via R3 (iBGP)
R8
C=50
C=1
R7
R6
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS2
l NextHop: R2
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS2
l NextHop: R3
C=98
C=1
R2
R7's routing table
l1/8:AS2 via R2 (iBGP)
l1/8:AS2 via R3 (eBGP, best)
R0
1.0.0.0/8
R3
AS2
Flow of IP packets
towards 1.0.0.0/8
The closest nexthop step in
the BGP decision process
Motivation : hot potato routing
•
A router should try to get rid of packets sent to
external domains as soon as possible
R8's routing table
l1/8:AS2 via R2 (NH=R7,best)
l1/8:AS2 via R3 (NH=R6)
Content provider
sending to 1.0.0.0/8
R9
Flow of IP packets
R8
AS1
C=1
C=50
R7
R6
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS2
l NextHop: R2
C=98
C=1
R2
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS2
l NextHop: R3
R0
1.0.0.0/8
R3
AS2
The lowest MED step in
the BGP decision process
Motivation : cold potato routing
•
In a multi-connected AS, indicate which entry
border router is closest to the advertised prefix
–
Usually MED= IGP cost
Content provider
sending to 1.0.0.0/8
R9
R8's routing table
l1/8:AS2 via R2 (MED=1,best)
l1/8:AS2 via R3 (MED=98)
AS1
Flow of IP packets
R8
C=1
C=50
R7
R6
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS2
l NextHop: R2
l MED : 1
C=98
C=1
R2
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS2
l NextHop: R3
l MED: 98
R0
1.0.0.0/8
R3
AS2
The lowest router id step in
the BGP decision process
Motivation
•
A router must be able to determine one best route
towards each destination prefix
–
A router may receive several routes with comparable
attributes towards one destination
R0
AS1
1.0.0.0/8
AS2
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS2:AS1
R3
R2
R1
AS3
UPDATE
l Prefix:1.0.0.0/8
l ASPath: AS3:AS1
Consequence
•
A router with a low IP address will be preferred
More on the MED step in the BGP
decision process
Unfortunately, the processing of the MED is more
complex than described earlier
Correct processing of the MED
•
•
MED values can only be compared between routes receiving
from the SAME neighboring AS
– Routes which do not have the MED attribute are considered
to have the lowest possible MED value.
Selection of the routes containing MED values
for m = all routes still under consideration
for n = all routes still under consideration
if (neighborAS(m) == neighborAS(n)) and
(MED(n) < MED(m))
{
remove route m from consideration
}
Why such a complex MED step ?
Content
provider
R9
Flow of IP packets
R8
AS1
C=50
C=50
R6
C=1
C=1
R7
R7b
R6b
R0:AS3:AS0, MED=21
R0:AS3:AS0, MED=20
C=1
R0:AS2:AS0, MED=0
R0:AS2:AS0, MED=9
R3
R4
AS3
C=9
R2
R5
AS2
R0
AS0
Route oscillations with MED
eBGP session
iBGP session
C=1
C=1
C=2
Physical link
R1
R0:ASX:AS0, MED=0
C=4
R2
R3
R0:ASZ:AS0, MED=1
RX
•
RR3
RR1
R0
R0:ASZ:AS0, MED=0
RZ
Consider a single prefix advertised by R0 in AS0
–
–
R1, R2 and R3 always prefer their direct eBGP path
Due to the utilization of route reflectors, RR1 and RR3
only know a subset of the three possible paths
–
This limited knowledge is the cause of the oscillations
Route oscillations with MED (2)
•
RR3's best path selection
–
–
–
If RR3 only knows the R3-RZ path, this path is preferred
and advertised to RR1
RR3 knows the R1-RX and R3-RZ paths, R1-RX is best
(IGP cost) and RR3 doesn't advertise a path to RR1
If RR3 knows the R2-RZ and R3-RZ paths, RR3 prefers
the R3-RZ path (MED) and R3-RZ is advertised to RR1
eBGP session
iBGP session
C=1
RR3
RR1
C=1
C=2
Physical link
R1
C=4
R2
R0:ASX:AS0, MED=0
R3
R0:ASZ:AS0, MED=1
RX
R0
R0:ASZ:AS0, MED=0
RZ
Route oscillations with MED (3)
•
RR1's best path selection
–
If RR1 knows the R1-RX, R2-RZ and R3-RZ paths, R1RX is preferred and RR1 advertises this path to RR3
–
–
But if RR1 advertises R1-RX, RR3 does not advertise any path !
If RR1 knows the R1-RX and R2-RZ paths, RR1 prefers
the R2-RZ path and advertises this path to RR3
–
But if RR1 advertises R2-RZ, RR3 prefers and advertises R3RZ !
eBGP session
iBGP session
C=1
RR3
RR1
C=1
C=2
Physical link
R1
C=4
R2
R0:ASX:AS0, MED=0
R3
R0:ASZ:AS0, MED=1
RX
R0
R0:ASZ:AS0, MED=0
RZ
Other problems with Route Reflectors
RR2
eBGP session
iBGP session
RR3
RR1
C=5
C=1
C=1
C=5
Physical link
C=5
C=1
Ra
RX
•
Rb
RY
Rc
RZ
Consider one prefix advertised by RX,RY,RZ
–
–
Ra, Rb, and Rc will all prefer their direct eBGP path
RR1, RR2 and RR3 will never reach an agreement
Forwarding problems with Route Reflectors
eBGP session
iBGP session
C=1
R1
R2
C=1
C=1
Physical link
RR1
C=5
RX
•
RY
Consider a prefix advertised by RX and RY
–
BGP routing will converge
–
–
BGP
RR2
RR1 (and R1) prefer path via RX, RR2 (and R2) prefer path via RY
But forwarding of IP packets will cause loop !
–
–
R1 sends packets towards prefix via R2 (to reach RX, its best path)
R2 sends packets towards prefix via R1 (to reach© RY,
its best2008
path)
O. Bonaventure,
Outline
Internet Routing
BGP basics
BGP in large networks
•
•
•
•
The needs for iBGP
AS Confederations and Route Reflectors
The BGP decision process
Scalable routing policies
The Community attribute
Principle
•
•
Optional transitive attribute containing a set of
communities
each community acts as a marker
–
–
•
Standardized communities
–
–
•
one community is represented as a 32 bits value
usually routes with same marker are treated same
manner
NO_EXPORT (0xFFFFFF01)
NO_ADVERTISE (0xFFFFFF02)
Delegated communities
–
65536 communities have been delegated to each AS
–
ASX65536 ASX:0 through ASX:65535
Scalable routing policies
with communities
Principle
•
attach same community value to all routes that
need to receive the same treatment
Prov1
Prov2
$
$
Peer1
R
R
Peer2
Route learned from
Peer
Provider
Customer
$
Cust1
R
Peer3
R
R
R
Peer4
$
Cust2
Shared-cost
$ Customer-provider
More complex routing policies
with communities
Other utilizations of communities
•
Research ISP providing two types of services
–
–
–
Access to research networks for universities
Access to the commercial Internet for universities and
government institutions
Solution
–
–
–
•
Tag routes learned from research network and commercial Internet
Only announce the universities to research network
Only advertise research network to universities
Commercial ISP providing several transit services
–
Full transit service
–
–
–
Announce all known routes to all customers
Advertise customer routes to all peers, customers, providers
Client routes only
–
–
Only advertise to those customers the routes learned from
customers, but not the routes learned from peers
Advertise the routes learned from those customers only to
customers
Other utilizations of communities
Communities used for tagging
•
Community attached by router that receives route to
indicate country where route was received
–
Example (Eunet, AS286)
–
–
–
–
Another example (C&W, AS3561)
–
•
286:1000 + countrycode for Public peer routes
286:2000 + countrycode for Private peer routes
286:3000 + countrycode for customer routes
3561:SRCC
– S : Peer or Customer
– R : Regional Code
– CC : ISO3166 country code
Community to indicate IX where route was learned
–
Example : AS12369 (Global Access Telecommunications)
–
–
–
13129:2110 : route leared at DE-CIX
13129:2120 : route learned at INXS
13129:2130 : route learned at SFINX