Transcript slides8
The Basics of BGP (Border
Gateway Protocol) Routing
and its Performance in
Today’s Internet
Presenter: Sophia Poku
Slides taken from presentation by Nina Taft
Outline
1. Highlights
2. Addressing and CIDR
3. BGP Messages and Prefix Attributes
4. BGP Decision and Filtering Processes
5. I-BGP
6. Route Reflectors
7. Multihoming
8. Aggregation
9. Routing Instability
10. BGP Table Growth
Routing Protocols
R
border router
IGP: Interior Gateway Protocol.
Examples: IS-IS, OSPF
I-BGP
internal router
R3
R2
A
AS1
E-BGP
announce B
AS2
R1
AS3
AS (Autonomous System) - a collection of
routers under the same technical and
administrative domain.
EGP (External Gateway Protocol) - used
between two AS’s to allow them to exchange
routing information so that traffic can be
forwarded across AS borders. Example: BGP
R
5
R
4
B
Routers used
Internal Router: directly connects networks
belonging to the same area
It runs a single copy of the basic routing
protocol
Border/Boundary Router: exchanges routing
information with routers belonging to other AS
Purpose: to share connectivity
information
you can reach
net A via me
AS2
BGP
AS1
R3
R2
traffic to A
R1
table at R1:
dest next hop
A
R2
A
R
border router
internal router
BGP Sessions
Primary function is to exchange network-reachability
information (includes AS #s)
Uses TCP to establish connection
Initially … node advertises ALL routes it wants
neighbor to know (could be >50K routes)
Ongoing … only
inform neighbor of changes
One router can participate in many BGP sessions.
AS1
AS2
AS3
Configuration and Policy
A BGP node has a notion of which routes to
share with its neighbor. It may only advertise
a portion of its routing table to a neighbor.
A BGP node does not have to accept every
route that it learns from its neighbor. It can
selectively accept and reject messages.
What to share with neighbors and what to
accept from neighbors is determined by the
routing policy, that is specified in a router’s
configuration file.
Addressing Schemes
Original addressing schemes (class-based):
32 bits divided into 2 parts:
0
8
0 network
Class A
0xxx or 1-126 in decimal; subnet mask:255.0.0.0
0
0 network
host
16
Class B
10xx or 128-192 in decimal
Subnet mask:255.255.0.0
0
0 network
host
24
~2 million nets
256 hosts
Class C
110x or 192-223 in decimal, Subnet Mask:255.255.255.0
host
CIDR (Classless Inter-Domain
Routing)
CIDR introduced to solve 2 problems:
exhaustion of IP address space
size and growth rate of routing table
Problem #1: Lifetime of Address
Space
Example: an organization needs 500 addresses.
A single class C address not enough (256 hosts).
Instead a class B address is allocated. (~64K hosts) That’s
overkill -a huge waste.
CIDR allows networks to be assigned on arbitrary bit
boundaries.
permits arbitrary sized masks: 178.24.14.0/23 is valid
requires explicit masks to be passed in routing protocols
CIDR solution for example above: organization is allocated
a single /23 address (equivalent of 2 class C’s).
Problem #2: Routing Table Size
Without CIDR:
232.71.0.0
232.71.1.0
232.71.2.0
…..
232.71.255.0
service
provider
232.71.0.0
232.71.1.0
232.71.2.0
…..
232.71.255.0
Global
internet
With CIDR:
232.71.0.0
232.71.1.0
232.71.2.0
…..
232.71.255.0
service
provider
232.71.0.0/16
Global
internet
CIDR: Classless Inter-Domain
Routing
Address format <IP address/prefix P>.
The prefix denotes the upper P bits of the IP address.
E.g. in CIDR address 206.13.01.48/25, the “/25”
indicates the first 25 bits are used to identify a
unique network, the remaining bits are host’s
Idea - use aggregation - provide routing for a large
number of customers by advertising one common
prefix.
This is possible because nature of addressing is
hierarchical
Summarizing routing information reduces the size of
routing tables, but allows to maintain connectivity.
Aggregation is critical to the scalability and
survivability of the Internet
Address Arithmetic: Address Blocks
The <address/prefix> pair defines an address block:
Examples:
128.15.0.0/16 => [ 128.15.0.0 - 128.15.255.255 ]
188.24.0.0/13 => [ 188.24.0.0 - 188.31.255.255 ]
consider 2nd octet in binary:
188.00011000.0.0
13th bit
settable
Address block sizes
a /13 address block has 232-13 addresses(=524288)
(/16 has 232-16 =65536)
a /13 address block is 8 times as big as a /16
address block
because 232-13 = 232-16 * 23
CIDR: longest prefix match
Because prefixes of arbitrary length allowed,
overlapping prefixes can exist.
Example:
router hears 124.39.0.0/16 from one neighbor
and 124.39.11.0/24 from another neighbor
Router forwards packet according to most
specific forwarding information, called longest
prefix match
Packet with destination 124.39.11.32 will be
forwarded using /24 entry.
Packet w/destination 124.39.22.45 will be
forwarded using /16 entry
Will CIDR work ?
For CIDR to be successful need:
address registries must assign addresses
using CIDR strategy
providers and subscribers should configure
their networks, and allocate addresses to
allow for a maximum amount of aggregation
BGP must be configured to do aggregation as
much as possible
Factors that complicate achieving
aggregation
multihoming, proxy aggregation, changing
providers
Four Basic Messages
Open:
Establishes BGP session (uses TCP port #179)
Notification:
Report unusual conditions
Update:
Inform neighbor of new routes that become active
Inform neighbor of old routes that become inactive
Keepalive:
Inform neighbor that connection is still viable
BGP Database
1.Neighbor table
List of BGP neighbors
2. BGP forwarding table
List of all networks learned from each
neighbor
3. IP routing table
List of best path to destination networks
OPEN Message
During session establishment, two BGP speakers exchange their
AS numbers
BGP identifiers (usually one of the router’s IP addresses)
Router ID
Holdtime
Open messages are confirmed using a keep-alive message sent
by a peer and must be confirmed before updates
A BGP speaker has option to refuse a session
Select the value of the hold timer:
maximum time to wait to hear something from other end
before assuming session is down.
authentication information (optional)
NOTIFICATION and
KEEPALIVE Messages
NOTIFICATION
Indicates an error
terminates the TCP session
gives receiver an indication of why BGP session
terminated
Examples: header errors, hold timer expiry, bad peer AS,
bad BGP identifier, malformed attribute list,
missing required attribute, AS routing loop, etc.
KEEPALIVE
protocol requires some data to be sent periodically.
If no UPDATE to send within the specified time period,
then send KEEPALIVE message
UPDATE Message
Updates are sent using TCP to ensure
delivery
used to either advertise and/or withdraw
unfeasible prefixes from routing table
path attributes: list of attributes that pertain to
ALL the prefixes in the Reachability Info field
FORMAT:
Withdrawn routes length (2 octets)
Withdrawn routes
(variable length)
Total path attributes length (2 octets)
Path Attributes
(variable length)
Reachability Information (variable length)
Advertising a prefix
When a router advertises a prefix to one of its
BGP neighbors:
information is valid until first router explicitly
advertises that the information is no longer
valid
BGP does not require routing information to be
refreshed
if node A advertises a path for a prefix to node
B,
then node B can be sure node A is using that
path
itself to reach the destination.
BGP Attributes
Attributes: routes learned via BGP have
•
•
•
•
•
associated properties that are used to
determine the best route to a destination
when multiple paths exist to a particular
destination
Local Preference
Multi-Exit Discriminator (MED)
Origin
AS-path
Next-hop
Attribute: ORIGIN
ORIGIN:
Who originated the announcement? Where was a
prefix injected into BGP? – indicates how BGP learned
about a particular route
IGP: route is interior to the originating AS. This value is
Value set using network router configuration command
to inject router into BGP
EGP: route learned via the External Gateway Protocol
Incomplete (often used for static routes): origin of
routes unknown or learned in some other way
Attributes: AS_PATH
a list of AS’s through
which the
announcement for a
prefix has passed
each AS prepends its
AS # to the AS-PATH
attribute when
forwarding an
announcement
useful to detect and
prevent loops
Attribute: NEXT HOP
IP address used to reach the advertising
router
For EBGP session, NEXT HOP = IP
address of neighbor that announced the
route.
For IBGP sessions, if route originated
inside AS, NEXT HOP = IP address of
neighbor that announced the route
For routes originated outside AS, NEXT
HOP of EBGP node that learned of route,
is carried unaltered into IBGP.209.15.1.0/
24
1.1.1.1 A
3.3.3.3D
C
2.2.2.2 B
IBGP
140.20.1.0/24
BGP Table at Router C:
Destination
Next Hop
140.20.1.0/24
2.2.2.2
209.15.1.0/24
1.1.1.1
IP Routing Table at Router C:
Destination
Next Hop
140.20.1.0/24
2.2.2.2
2.2.2.0/24
3.3.3.3
3.3.3.0/24
Connected
209.15.1.0/24
1.1.1.1
1.1.1.0/24
3.3.3.3
Next-Hop Cont’d
Router C advertises
172.16.1.0 with next
hop 10.1.1.1
A propagates it within
its AS
Attribute: Multi-Exit Discriminator
(MED)
when AS’s
interconnected via 2 or
more links
AS announcing prefix
sets MED
enables AS2 to indicate
its preference
AS receiving prefix uses
MED to select link
a way to specify how
close a prefix is to the
link it is announced on
Attribute: Local Preference
Used to prefer an exit point
from the local AS
Used to indicate preference
among multiple paths for the
same prefix anywhere in the
internet.
The higher the value the more
preferred
Exchanged between IBGP
peers only. Local to the AS.
Often used to select a specific
exit point for a particular
destination
Routing Process Overview
Routes
received
from
peers
accept,
deny, set
preferences
Input
policy
engine
Choose
best route
Decision
process
BGP table
forward,
not forward
set MEDs
Routes
used by
router
IP routing
table
Output
policy
engine
Routes
sent to
peers
Input Policy Engine
Inbound filtering controls outbound traffic
filters route updates received from other peers
filtering based on IP prefixes, AS_PATH, community
denying a prefix means BGP does not want to reach that
prefix via the peer that sent the announcement
accepting a prefix means traffic towards that prefix may be
forwarded to the peer that sent the announcement
Attribute Manipulation
sets attributes on accepted routes
example: specify LOCAL_PREF to set priorities among
multiple peers that can reach a given destination
BGP Decision Process
1. Choose route with highest LOCAL-PREF
2. If have more than 1 route, select route with shortest
AS-PATH
3. If have more than 1 route, select according to lowest
ORIGIN type where IGP < EGP < INCOMPLETE
4. If have more than 1 route, select route with lowest
MED value
5. Select min cost path to NEXT HOP using IGP
metrics
6. If have multiple internal paths, use BGP Router ID to
break tie.
Output Policy Engine
Outbound Filtering controls inbound traffic
forwarding a route means others may choose
to reach the prefix through you
not forwarding a route means others must use
another router to reach the prefix
may depend upon whether E-BGP or I-BGP
peer
example: if ORIGIN=EGP and you are a nontransit AS and BGP peer is non-customer,
then don’t forward
Attribute Manipulation
Transit vs. Nontransit AS
Transit traffic = traffic whose source and destination are outside the AS
Nontransit AS: does not carry transit traffic
• Advertise own routes only
• Do not propagate routes learned from other AS’s
• case 1:
r1
Transit AS: does carry transit traffic
• Advertises its own routes PLUS routes
learned from other AS’s
ISP1
ISP1
r3 ISP2
r1
r3
r2
r2
r2
r1
r2,r1
AS1
r2
r3
AS2
r1
AS1
r2
r3
r2
ISP2
r3
r2,r3
r2
AS1
r1
• case 2:
r3
r1
Internal BGP (I-BGP)
Used to distribute routes learned via EBGP to all
the routers within an AS
I-BGP and E-BGP are same protocol in that
same message types used
same attributes used
same state machine
BUT use different rules for readvertising prefixes
Rule #1: prefixes learned from an E-BGP neighbor
can be readvertised to an I-BGP neighbor, and vice
versa
Rule #2: prefixes learned from an I-BGP neighbor
cannot be readvertised to another I-BGP neighbor
I-BGP: Preventing Loops and
Setting Attributes
Why rule #2? To prevent announcements from
looping.
In E-BGP can detect via AS-PATH.
AS-PATH not changed in I-BGP
Implication of rule: a full mesh of I-BGP sessions
between each pair of routers in an AS is
required
Setting Attributes: The router that injects the
route into the I-BGP mesh is responsible for
setting the LOCAL-PREF attribute
prepending AS # to AS-PATH
Route Reflectors
Problem: requiring a full mesh of
I-BGP sessions between all pairs
of routers is hard to manage for
large AS’s.
Solution:
group routers into clusters.
Assign a leader to each
cluster, called a route
reflector (RR).
Members of a cluster are
called clients of the RR
I-BGP Peering
clients peer only with their
RR
RR’s must be fully meshed
RR
RR
RR
A
B
C
clients
clusters
Route Reflectors: Rule on
Announcements
Provides mechanisms for minimizing the # of updates
messages transmitted within an AS and reducing the
amount of data propagated in each message.
If received from RR, reflect to clients
If received from a client, reflect to RRs and clients
If received from E-BGP, reflect to all - RRs and clients
RR’s reflect only the best route to a given prefix, not
all announcements they receive.
helps size of routing table
sometimes clients don’t need to carry full table
Avoiding Loops with Route
Reflectors
Loops cannot be detected by traditional approach using
AS-PATH because AS-PATH not modified within an AS.
Announcements could leave a cluster and re-enter it.
Two new attributes introduced:
ORIGINATOR_ID: router id of route’s originator in AS
rule: announcement discarded if returns to originator
CLUSTER_LIST: a sequence of cluster id’s. set by
RRs.
rule: if an RR receives an update and the cluster list
contains its cluster id, then update is discarded.
Multihoming
Single-homed vs. Multi-homed
subscribers
A single-homed network has one connection
to the Internet (i.e., to networks outside its
domain)
A multi-homed network has multiple
connections to the Internet. Two scenarios:
can be multi-homed to a single provider
can be multi-homed to multiple providers
Why multi-home?
Reliability
Performance
Single-homed AS
Subscriber called a “stub AS”
Provider-Subscriber communication
for route advertisement:
static configuration. most common.
Provider’s router is configured with
subscriber’s prefix.
good if customer’s routes can be
represented by small set of
aggregate routes
bad if customer has many
noncontiguous subnets
can use BGP between provider and
stub AS
Provider
R1
R2
Subscriber
Multihoming to Multiple Providers
ISP 3
ISP 2
ISP 1
Customer
Multihoming Issues
Load sharing
how distribute the traffic over the multiple
links?
Reliability
if load sharing leads to preferencing certain
links for certain subnets, is reliability reduced?
Address/Aggregation
which subnet addresses should the
multihomed customer use?
how will this affect its provider’s ability to
aggregate routes?
Load sharing from ISP to Customer
using attributes
ISP
Goal: provider splits
traffic across 2 links
according to prefix
Implement this strategy
using attributes
customer sets MEDs
provider sets
LOCAL_PREF
R1
140.35/16
208.22/16
R2
R3
Customer
208.22/16
140.35/16
Load sharing from Customer to ISP
using policy
blue: announcements
red: traffic
Goal: send traffic to
ISP’s customers on one
link; send traffic to the
rest of the Internet on
2nd link
Implement using policy
to control
announcements
ISP
R1
R2
advertise
customer
routes only
advertise
default
route 0/0
R3
traffic
Customer
Address/Aggregation Issue
Where should the
customer get its
address block from?
1. From ISP1
2. From ISP2
3. From both ISP1 and
ISP2
4. Independently from
address registry
ISP 3
ISP 1
ISP 2
customer
(cases 1 and 2 are equivalent)
Case 1 & 2: Get address block
from one ISP
example: customer gets
address from ISP 1
ISP 1’s aggregation is
not broken
customer’s prefix not
aggregatable at ISP 2
longer prefix becomes a
traffic magnet
How good is load
sharing?
If all ISP’s generate same
amount of traffic for customer,
then ISP2-customer link twice
as loaded as ISP1-customer link
ISP 3
140.20/16
200.50/16
140.20.6/24
ISP 1
ISP 2
140.20/16
200.50/16
140.20.6/24
140.20.6/24
customer
140.20.6/24
Case 3: Get address block from
both ISPs
announcement policy:
announce prefix only to
its “parent”
advantage: both ISP’s
can aggregate the
prefix they receive
disadvantage: lose
reliability
load balancing good?
depends upon how
much traffic sent to
each prefix
ISP 3
ISP 1
ISP 2
140.20/16
200.50/16
140.20.1/24
200.50.1/24
customer
140.20.1/24
200.50.1/24
Case 4: obtain address block from
registry
no aggregation
possible
no traffic magnets
created
good reliability
how to achieve load
sharing?
customer breaks
address block into 2
/25 blocks, and
announce one per link
(but may lose reliability)
OR use method of
AS-PATH manipulation
ISP 3
100.20/16
200.50/16
150.55.10/24
150.55.10/24
ISP 1
ISP 2
100.20/16
200.50/16
150.55.10/24
150.55.10/24
customer
150.55.10/24
AS-PATH manipulation
Idea: prepend your AS
ISP 3
number in AS-PATH
multiple times to
discourage use of a link
makes AS-PATH seem 150.55.10/24 - 1 33 33
longer than it is
recall BGP decision
AS 1
process uses shortest ASPATH length as a criteria
for selecting best path 150.55.10/24 - 33 33
Example: ISP 3 will
customer
choose path through AS 2
150.55.10/24
AS 33
because its AS-PATH
appears shorter
150.55.10/24 - 2 33
AS 2
150.55.10/24 - 33
Aggregation
Address Arithmetic:
When is aggregation possible? Case 1
Process of combining characteristics of several different routes in
such a way that a single route is advertised.
Possible when one prefix contained in another.
Example: Two AS’s having customer-provider relationship.
Provider does the aggregation.
Provider has address block 18.0.0.0/8
Its customers have address blocks 18.6.0.0/15 and 18.9.0.0/15
Provider announces its address block only
Rule: Prefix p1 contains prefix p2 iff
length(p2) > length(p1) AND
address(p2) / 232-length(p1) = address(p1) / 232-length(p1)
Address Arithmetic:
When is aggregation possible? Case 2
Some pairs of consecutive prefixes
Example: routes within the same AS:
AS has 2 address blocks:
1.2.2.0/24 =
0000001.00000010.00000010.00000000/24
1.2.3.0/24 =
0000001.00000010.00000011.00000000/24
can announce instead 1.2.2.0/23
Rule: consecutive prefixes p1 and p2 are aggregatable
iff
length(p1)=length(p2) AND
address(p1) / 232-length(p1) +1 = address(p2) / 232-length(p2)
AND address(p1) % 233-length(p1) = 0
Black holes and cardinal sins
NAP
“The cardinal sin of BGP
routing is advertising routes
wrong !!
100.24.0.0/18
that you don't know how to
100.24/16
get to; called "black-holing"
100.24.56.0/21
someone”
[100.24.56.0-100.24.63.255]
222.2/16
If you announce part of IP
Net C
space owned by someone
else, using a more-specific
ISP 1
prefix, all their traffic flows to
ISP 2
100.24/16
222.2/16
you. Makes that address
block disconnected from the
Internet.
100.24.0.0/20
Example: black holes can
100.24.16.0/21
[100.24.0.0-100.24.15.255]
happen inadvertently by
[100.24.16.0-100.24.23.255]
Net B
non-careful aggregation
Net A
Limitations to Aggregation
Lack of hierarchical allocation of address
space prior to CIDR (before 1995)
A single AS can have noncontiguous address
blocks
Customer AS and provider AS can have noncontiguous address blocks
Reluctance of customers to renumber their
address space when they switch providers
Multi-homing
multi-homed prefixes require global visibility
may choose not to: load sharing
Rules of Thumb for Aggregation
To avoid black holes: an ISP is not allowed to
aggregate routes unless it is a supernet of the
address block it wants to aggregate
In other words, an ISP has to specifically
announce each IP address range of its
downstream customers that are not part of its
own address space.
Routing Instability
Route Stability
Routing instability: rapid fluctuation of network
reachability information
route flapping: when a route is withdrawn and
re-announced repeatedly in a short period of time
happens via UPDATE messages
because messages propagate to global Internet,
route flapping behavior can cascade and
deteriorate routing performance in many places
Effects: increased packet loss, increased network
latency, CPU overhead, loss of connectivity
Route Stability Studies by C. Labovitz, R. Malan & F.
Jahanian
Types of Routing Updates
Forwarding instability
reflects legitimate topology changes
e.g., changes in Prefix, NEXT_HOP and/or
ASPATH
affects forwarding paths used
Policy fluctuation
reflects changes in policy
e.g., changes in MED, LOCAL_PREF, etc.
may not necessarily affect forwarding paths used
Pathological
redundant messages
reflect neither topology nor policy changes
Who’s Responsible?
AS’s
No single AS dominates instability statistics
No correlation between the size of an AS and
its share of updates generated.
Prefixes
Instability is evenly distributed across routes.
Example of measurements:
75% of AADiff events come from prefixes change
less than 10 times a day.
80-90% of instability comes from prefixes that are
announced less than 50 times/day.
Sources of Instabilities in General
router configuration errors
transient physical and data link problems
software bugs
problems with leased lines (electrical timing
issues that cause false alarms of disconnect)
router failures
network upgrades and maintenance
Instability Problem and Cause.
Example 1.
Problem: 3-5 million duplicate withdrawals
Cause: stateless BGP implementation
time-space tradeoff: no state maintained on what
advertised to peers
when receive any change, transmit withdrawal to
all peers regardless of whether previously notified
or not
sent updates for both explicit and implicit
withdrawals
By 1998, most vendors had BGP implementations
with partial state.
Result: number of WWDups reduced by an order of
magnitude
Instability Problem and Cause.
Example 2
Problem: duplicate announcements
Cause: min-advertisement timer & stateless
BGP
min-adv timer: wait 30 seconds. Combine all
received updates in last 30 seconds into single
outbound update message (if possible).
within 30 seconds route can be withdrawn and
re-announced so that there is no net change
to original announcement
Solution: Have BGP keep some state about
recently sent messages to peers. Avoid
sending duplicate messages
Instability Problem and Cause.
Example 3
AS 1
Net X
R2
R8
R1
R3
4
R4
AS3
R6
3
R7
6
5
10
2
R5
AS2
R
Example: interaction IGP/BGP
policy: set MED using IGP metrics,
such as shortest path
border router
internal router
E-BGP
IGP
Controlling route instability: Route
Dampening
track number of times a route has flapped
over a period of time
routes that exhibit a high level of instability in
a short period of time should be suppressed
(not advertised)
penalize ill behaved routes proportionally to
their expected future stability
if a suppressed route stops flapping for a long
enough period of time, unsuppress it
(readvertise)
Route Dampening
penalty
suppress limit
reuse limit
time
Route Dampening Algorithm
Each time a route flaps, increase the penalty
If the route has not flapped in the last ‘half-
life’ time units, then cut penalty in half.
If the penalty > suppress limit, then suppress
the route
If the penalty < reuse limit, then free a
suppressed route
Side Effect of Route Dampening
A legitimate update may arrive and it will be
ignored because that route has been
suppressed and not yet released.
The modification needed for the legitimate
announcement is delayed
Aggregation can help route
flapping
If a more-specific route
is flapping, but provider
only announces
aggregated prefix, then not flapping
other networks don’t see
route flap.
Hence aggregation can
flapping
mask route flapping.
Aggregation helps
combat instability
because it reduces the
number of networks
visible in the core
Internet.
Tier 1 provider
140.40/16
Tier 2 provider
140.40.4/24
customer