Transcript AS 1

Slides Selected from
SIGCOMM 2001 BGP Tutorial
by Tim Griffin
Best Effort Connectivity
IP traffic
135.207.49.8
192.0.2.153
This is the fundamental service provided
by Internet Service Providers (ISPs)
All other IP services depend on connectivity:
DNS, email, VPNs, Web Hosting, …
Routing vs. Forwarding
Net
Default to
upstream
router
A
B
R
R2
R
D
R3 R
R1
R4
A
B
C
D
E
default
Nxt Hop
R1
Direct
R3
R1
R3
R1
C
R5
Net
E
Forwarding: determine next hop
Routing: establish end-to-end paths
Forwarding always works
A
B
C
D
E
default
Routing can be badly broken
Net
A
B
C
D
E
default
Nxt Hop
R2
R2
Direct
R5
R5
R2
Nxt Hop
R4
R3
R3
R4
Direct
R4
3
How Are Forwarding Tables
Populated to implement Routing?
Statically
Administrator
manually configures
forwarding table entries
+ More control
+ Not restricted to
destination-based
forwarding
- Doesn’t scale
- Slow to adapt to
network failures
Dynamically
Routers exchange network reachability
information using ROUTING PROTOCOLS.
Routers use this to compute best routes
+ Can rapidly adapt to changes
in network topology
+ Can be made to scale well
- Complex distributed algorithms
- Consume CPU, Bandwidth, Memory
- Debugging can be difficult
- Current protocols are destination-based
In practice : a mix of these.
Static routing mostly at the “edge”
4
Routers Talking to Routers
Routing info
Routing info
• Routing computation is distributed among routers within a
routing domain
• Computation of best next hop based on routing information
is the most CPU/memory intensive task on a router
• Routing messages are usually not routed, but exchanged
via layer 2 between physically adjacent routers (internal
BGP and multi-hop external BGP are exceptions)
Before We Go Any Further …
IP ROUTING PROTOCOLS DO NOT
DYNAMICALLY ROUTE AROUND
NETWORK CONGESTION
• IP traffic can be very bursty
• Dynamic adjustments in routing typically
operate more slowly than fluctuations in
traffic load
• Dynamically adapting routing to account
for traffic load can lead to wild, unstable
oscillations of routing system
Autonomous Routing Domains
A collection of physical networks glued together
using IP, that have a unified administrative
routing policy.
•
•
•
•
Campus networks
Corporate networks
ISP Internal networks
…
Autonomous Systems (ASes)
An autonomous system is an autonomous routing domain
that has been assigned an Autonomous System Number (ASN).
… the administration of an AS appears to other ASes to
have a single coherent interior routing plan and presents a
consistent picture of what networks are reachable through it.
RFC 1930: Guidelines for creation, selection,
and registration of an Autonomous System
AS Numbers (ASNs)
ASNs are 16 bit values.
64512 through 65535 are “private”
•
•
•
•
•
•
•
•
Currently over 11,000 in use.
Genuity: 1
MIT: 3
Harvard: 11
UC San Diego: 7377
AT&T: 7018, 6341, 5074, …
UUNET: 701, 702, 284, 12199, …
Sprint: 1239, 1240, 6211, 6242, …
…
ASNs represent units of routing policy
Architecture of Dynamic Routing
OSPF
BGP
AS 1
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP,
EIGRP (cisco)
EGP = Exterior Gateway Protocol
EIGRP
AS 2
Policy based: BGP
The Routing Domain of BGP is the entire Internet
Technology of Distributed Routing
Link State
• Topology information is
flooded within the routing
domain
• Best end-to-end paths are
computed locally at each
router.
• Best end-to-end paths
determine next-hops.
• Based on minimizing some
notion of distance
• Works only if policy is shared
and uniform
• Examples: OSPF, IS-IS
Vectoring
• Each router knows little
about network topology
• Only best next-hops are
chosen by each router for
each destination network.
• Best end-to-end paths result
from composition of all nexthop choices
• Does not require any notion
of distance
• Does not require uniform
policies at all routers
• Examples: RIP, BGP
The Gang of Four
Link State
IGP
EGP
OSPF
IS-IS
Vectoring
RIP
BGP
Many Routing Processes Can Run on a Single
Router
BGP
RIP Process
BGP Process
RIP Routing tables
BGP Routing tables
OSPF Process
OSPF Routing tables
RIP
Domain
OS kernel
OSPF
Domain
Forwarding Table Manager
Forwarding Table
13
IPv4 Addresses are 32 Bit Values
11111111 00010001 10000111 00000000
255
17
134
0
255.17.134.0
Dotted quad notation
IPv6 addresses have 128 bits
14
Classful Addresses
Class A
0nnnnnnn hhhhhhhh hhhhhhhh hhhhhhhh
Class B
10nnnnnn nnnnnnnn hhhhhhhh hhhhhhhh
Class C
110nnnnn nnnnnnnn nnnnnnnn hhhhhhhh
n = network address bit
h = host identifier bit
Leads to a rigid, flat, inefficient use of address space …
15
RFC 1519: Classless Inter-Domain
Routing (CIDR)
Use two 32 bit numbers to represent a network.
Network number = IP address + Mask
IP Address : 12.4.0.0
Address
Mask
IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
11111111 11111110 00000000 00000000
Network Prefix
for hosts
Usually written as 12.4.0.0/15
16
Which IP Addresses are Covered by a Prefix?
12.5.9.16 is covered by prefix 12.4.0.0/15
12.5.9.16
00001100 00000101 00001001 00010000
00001100 00000100 00000000 00000000
12.4.0.0/15
11111111 11111110 00000000 00000000
12.7.9.16
00001100 00000111 00001001 00010000
12.7.9.16 is not covered by prefix 12.4.0.0/15
CIDR = Hierarchy in Addressing
12.0.0.0/16
12.1.0.0/16
12.2.0.0/16
12.3.0.0/16
12.0.0.0/8
:
:
:
12.253.0.0/16
12.254.0.0/16
12.3.0.0/24
12.3.1.0/24
:
:
:
:
:
12.3.254.0/24
12.253.0.0/19
12.253.32.0/19
12.253.64.0/19
12.253.96.0/19
12.253.128.0/19
12.253.160.0/19
12.253.192.0/19
18
Classless Forwarding
Destination =12.5.9.16
------------------------------payload
Prefix
OK
better
Next Hop
Interface
0.0.0.0/0
10.14.11.33
ATM 5/0/9
12.0.0.0/8
10.14.22.19
ATM 5/0/8
even better
12.4.0.0/15 10.1.3.77
Ethernet 0/1/3
best!
12.5.8.0/23 attached
Serial 1/0/7
IP Forwarding Table
IP Address Allocation and
Assignment: Internet Registries
IANA
www.iana.org
ARIN
www.arin.org
RIPE
www.ripe.org
APNIC
www.apnic.org
Allocate to National and local registries and ISPs
Addresses assigned to customers by ISPs
RFC 2050 - Internet Registry IP Allocation Guidelines
RFC 1918 - Address Allocation for Private Internets
RFC 1518 - An Architecture for IP Address Allocation with CIDR
Nontransit vs. Transit ASes
ISP 2
ISP 1
NET A
Traffic NEVER
flows from ISP 1
through NET A to ISP 2
(At least not intentionally!)
IP traffic
Internet Service
providers (often)
have transit
networks
Nontransit AS
might be a corporate
or campus network.
Could be a “content
provider”
21
Selective Transit
NET B
NET A DOES NOT
provide transit
Between NET D
and NET B
NET C
NET A
NET A provides transit
between NET B and NET C
and between NET D
and NET C
NET D
Most transit networks transit in a selective manner…
IP traffic
22
Customers and Providers
provider
provider
customer
IP traffic
customer
Customer pays provider for access to the Internet
Customer-Provider Hierarchy
provider
customer
IP traffic
The Peering Relationship
peer
provider
peer
customer
Peers provide transit between
their respective customers
Peers do not provide transit
between peers
traffic
allowed
traffic NOT
allowed
Peers (often) do not exchange $$$
Peering Provides Shortcuts
Peering also allows connectivity between
the customers of “Tier 1” providers.
peer
provider
peer
customer
Peering Wars
Peer
• Reduces upstream transit
costs
• Can increase end-to-end
performance
• May be the only way to
connect your customers to
some part of the Internet
(“Tier 1”)
Don’t Peer
• You would rather have
customers
• Peers are usually your
competition
• Peering relationships may
require periodic
renegotiation
Peering struggles are by far the most
contentious issues in the ISP world!
Peering agreements are often confidential.
BGP-4
• BGP = Border Gateway Protocol
• Is a Policy-Based routing protocol
• Is the de facto EGP of today’s global Internet
• Relatively simple protocol, but configuration is complex and the
entire world can see, and be impacted by, your mistakes.
•
1989 : BGP-1 [RFC 1105]
–
•
Replacement for EGP (1984, RFC 904)
1990 : BGP-2 [RFC 1163]
• 1991 : BGP-3 [RFC 1267]
•
1995 : BGP-4 [RFC 1771]
–
Support for Classless Interdomain Routing (CIDR)
29
BGP Operations (Simplified)
Establish session on
TCP port 179
AS1
BGP session
Exchange all
active routes
AS2
Exchange incremental
updates
While connection
is ALIVE exchange
route UPDATE messages
30
Four Types of BGP Messages
• Open : Establish a peering session.
• Keep Alive : Handshake at regular intervals.
• Notification : Shuts down a peering session.
• Update : Announcing new routes or withdrawing
previously announced routes.
announcement
=
prefix + attributes values
31
Attributes are Used to Select
Best Routes
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
Given multiple
routes to the same
prefix, a BGP speaker
must pick at most
one best route
(Note: it could reject
them all!)
Two Types of BGP Neighbor
Relationships
AS1
• External Neighbor (eBGP) in a
different Autonomous Systems
• Internal Neighbor (iBGP) in the
same Autonomous System
iBGP is routed (using IGP!)
eBGP
iBGP
AS2
34
iBGP Peers Must be Fully
Meshed
eBGP update
iBGP updates
iBGP neighbors do not announce
routes received via iBGP to other iBGP
neighbors.
• iBGP is needed to
avoid routing loops
within an AS
• Injecting external
routes into IGP
does not scale and
causes BGP policy
information to be
lost
• BGP does not
provide “shortest
path” routing
• Is iBGP an IGP?
NO!
35
BGP Next Hop Attribute
12.125.133.90
AS 7018
12.127.0.121
AT&T
AS 12654
AS 6431
RIPE NCC
RIS project
AT&T Research
135.207.0.0/16
Next Hop = 12.125.133.90
135.207.0.0/16
Next Hop = 12.127.0.121
Every time a route announcement crosses an AS
boundary, the Next Hop attribute is changed to the IP
address of the border router that announced the route.
36
Join EGP with IGP For Connectivity
135.207.0.0/16
Next Hop = 192.0.2.1
135.207.0.0/16
10.10.10.10
Forwarding Table
destination
next hop
192.0.2.0/30
AS 1
192.0.2.1
192.0.2.0/30
10.10.10.10
Forwarding Table
destination
next hop
+
EGP
destination
next hop
135.207.0.0/16
192.0.2.1
135.207.0.0/16
192.0.2.0/30
10.10.10.10
10.10.10.10
AS 2
Next Hop Often Rewritten to Loopback
135.207.0.0/16
Next Hop = 127.22.33.44
135.207.0.0/16
Next Hop = 192.0.2.1
135.207.0.0/16
10.10.10.10
Forwarding Table
destination
next hop
127.22.33.44
AS 1
192.0.2.1
127.22.33.44
10.10.10.10
Forwarding Table
destination
next hop
+
EGP
destination
next hop
135.207.0.0/16
127.22.33.44
135.207.0.0/16
127.22.33.44
10.10.10.10
10.10.10.10
AS 2
Implementing Customer/Provider
and Peer/Peer relationships
Two parts:
• Enforce transit relationships
– Outbound route filtering
• Enforce order of route preference
– provider < peer < customer
Import Routes
provider route
peer route
From
provider
customer route
From
provider
From
peer
From
peer
From
customer
From
customer
ISP route
Export Routes
provider route
peer route
To
provider
customer route
ISP route
From
provider
To
peer
To
peer
To
customer
To
customer
filters
block
How Can Routes be Colored?
BGP Communities!
A community value is 32 bits
Used for signally
within and between
ASes
By convention,
first 16 bits is
ASN indicating
who is giving it
an interpretation
community
number
Very powerful
BECAUSE it
has no (predefined)
meaning
Community Attribute = a list of community values.
(So one route can belong to multiple communities)
Two reserved communities
no_export = 0xFFFFFF01: don’t export out of AS
RFC 1997 (August 1996)
no_advertise 0xFFFFFF02: don’t pass to BGP neighbors
42
Communities Example
• 1:100
• To Customers
– Customer routes
• 1:200
– 1:100, 1:200, 1:300
• To Peers
– Peer routes
• 1:300
– 1:100
• To Providers
– Provider Routes
– 1:100
Import
Export
AS 1
Mars Attacks!
Martian list often includes
•
•
•
•
•
•
•
•
•
0.0.0.0/0: default
10.0.0.0/8: private
172.16.0.0/12: private
192.168.0.0/16: private
127.0.0.0/8: loopbacks
128.0.0.0/16: IANA reserved
192.0.2.0/24: test networks
224.0.0.0/3: classes D and E
…..
Import Routes (Revisited)
provider route
peer route
customer route
ISP route
potential
backhole
Martian
From
provider
xxxxxx
Customer
address
filters
xxxxxx
xxxxxx
cccccc
xxxxxx
From
peer
From
provider
xxxxxx
cccccc
From
customer
xxxxxx
cccccc
From
customer
From
peer
xxxxxx
filters
martians
So Many Choices
peer
provider
peer
customer
AS 4
Frank’s
Internet Barn
AS 3
AS 2
AS 1
Which route should
Frank pick to 13.13.0.0./16?
13.13.0.0/16
47
BGP Route Processing
Open ended programming.
Constrained only by vendor configuration language
Receive Apply Policy =
filter routes &
BGP
Updates tweak attributes
Apply Import
Policies
Based on
Attribute
Values
Best
Routes
Best Route
Selection
Best Route
Table
Apply Policy =
filter routes &
tweak attributes
Transmit
BGP
Updates
Apply Export
Policies
Install forwarding
Entries for best
Routes.
IP Forwarding Table
48
Tweak Tweak Tweak
• For inbound traffic
– Filter outbound routes
– Tweak attributes on
outbound routes in the
hope of influencing your
neighbor’s best route
selection
inbound
traffic
outbound
routes
• For outbound traffic
– Filter inbound routes
– Tweak attributes on
inbound routes to
influence best route
selection
In general, an AS has more
control over outbound traffic
outbound
traffic
inbound
routes
Route Selection Summary
Highest Local Preference
Enforce relationships
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
traffic engineering
Lowest IGP cost
to BGP egress
Lowest router ID
Throw up hands and
break ties
Back to Frank …
peer
provider
peer
Local preference only used in iBGP
customer
AS 4
local pref = 80
local pref = 90
AS 3
local pref = 100
AS 2
Higher Local
preference values
are more preferred
AS 1
13.13.0.0/16
51
Implementing Backup Links with
Local Preference (Outbound Traffic)
AS 1
primary link
Set Local Pref = 100
for all routes from AS 1
backup link
AS 65000
Set Local Pref = 50
for all routes from AS 1
Forces outbound traffic to take primary link, unless link is down.
We’ll talk about inbound traffic soon …
52
Multihomed Backups
(Outbound Traffic)
AS 1
AS 3
provider
provider
primary link
backup link
Set Local Pref = 100
for all routes from AS 1
Set Local Pref = 50
for all routes from AS 3
AS 2
Forces outbound traffic to take primary link, unless link is down.
53
ASPATH Attribute
AS 1129
135.207.0.0/16
AS Path = 1755 1239 7018 6341
135.207.0.0/16
AS Path = 1239 7018 6341
AS 1239
Sprint
AS 1755
135.207.0.0/16
AS Path = 1129 1755 1239 7018 6341
Ebone
AS 12654
AS 6341
AT&T Research
RIPE NCC
RIS project
135.207.0.0/16
AS Path = 7018 6341
AS7018
135.207.0.0/16
AS Path = 6341
Global Access
135.207.0.0/16
AS Path = 3549 7018 6341
AT&T
135.207.0.0/16
AS Path = 7018 6341
AS 3549
Global Crossing
135.207.0.0/16
Prefix Originated
54
Interdomain Loop
Prevention
AS 7018
BGP at AS YYY will
never accept a
route with ASPATH
containing YYY.
Don’t Accept!
12.22.0.0/16
ASPATH = 1 333 7018 877
AS 1
55
Traffic Often Follows ASPATH
135.207.0.0/16
ASPATH = 3 2 1
AS 1
AS 2
AS 3
AS 4
135.207.0.0/16
IP Packet
Dest =
135.207.44.66
… But It Might Not
135.207.0.0/16
ASPATH = 1
AS 1
135.207.44.0/25
ASPATH = 5
AS 2
AS 2 filters all
subnets with masks
longer than /24
135.207.0.0/16
ASPATH = 3 2 1
AS 3
AS 4
135.207.0.0/16
IP Packet
Dest =
135.207.44.66
AS 5
135.207.44.0/25
From AS 4, it
may look like this
packet will take
path 3 2 1, but it
actually takes
path 3 2 5
Shorter Doesn’t Always Mean Shorter
In fairness:
could you do
this “right” and
still scale?
Mr. BGP says that
path 4 1 is better
than path 3 2 1
Duh!
AS 4
AS 3
Exporting internal
state would
dramatically
increase global
instability and
amount of routing
state
AS 2
AS 1
Shedding Inbound Traffic with
ASPATH Padding Hack
AS 1
provider
192.0.2.0/24
ASPATH = 2 2 2
192.0.2.0/24
ASPATH = 2
primary
backup
customer
AS 2
192.0.2.0/24
Padding will (usually)
force inbound
traffic from AS 1
to take primary link
59
Padding May Not Shut Off All Traffic
AS 1
AS 3
provider
provider
192.0.2.0/24
ASPATH = 2
192.0.2.0/24
ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2
primary
backup
customer
AS 2
192.0.2.0/24
AS 3 will send
traffic on “backup”
link because it prefers
customer routes and local
preference is considered
before ASPATH length!
Padding in this way is often
used as a form of load
60
balancing
COMMUNITY Attribute to the Rescue!
AS 1
AS 3
provider
provider
AS 3: normal
customer local
pref is 100,
peer local pref is 90
192.0.2.0/24
ASPATH = 2
COMMUNITY = 3:70
192.0.2.0/24
ASPATH = 2
primary
backup
customer
AS 2
192.0.2.0/24
Customer import policy at AS 3:
If 3:90 in COMMUNITY then
set local preference to 90
If 3:80 in COMMUNITY then
set local preference to 80
If 3:70 in COMMUNITY then
set local preference to 70
61
Hot Potato Routing: Go for the
Closest Egress Point
192.44.78.0/24
egress 2
egress 1
15
56
IGP distances
This Router has two BGP routes to 192.44.78.0/24.
Hot potato: get traffic off of your network as
Soon as possible. Go for egress 1!
62
Getting Burned by the Hot Potato
2865
High bandwidth
Provider backbone
17
SFF
Low bandwidth
customer backbone
Heavy
Content
Web Farm
NYC
15
56
San Diego
Many customers want
their provider to
carry the bits!
tiny http request
huge http reply
63
Cold Potato Routing with MEDs
(Multi-Exit Discriminator Attribute)
Prefer lower
MED values
2865
17
Heavy
Content
Web Farm
192.44.78.0/24
MED = 56
192.44.78.0/24
MED = 15
15
56
192.44.78.0/24
This means that MEDs must be considered BEFORE
IGP distance!
Note1 : some providers will not listen to MEDs
Note2 : MEDs need not be tied to IGP distance
64
Route Selection Summary
Highest Local Preference
Enforce relationships
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
traffic engineering
Lowest IGP cost
to BGP egress
Lowest router ID
Throw up hands and
break ties
This is somewhat simplified. Hey, what happened to ORIGIN??
Policies Can Interact Strangely
(“Route Pinning” Example)
backup
customer
1
3
2
Disaster strikes primary link
and the backup takes over
4
Install backup link using community
Primary link is restored but some
traffic remains pinned to backup
News At 11:00
• BGP is not guaranteed to converge on a
stable routing. Policy interactions could lead
to “livelock” protocol oscillations.
See “Persistent Route Oscillations in Inter-domain Routing” by K. Varadhan, R.
Govindan, and D. Estrin. ISI report, 1996
• Corollary: BGP is not guaranteed to recover
from network failures.
A Wee Bit of Theory
Underlying problem
Distributed means of
computing a solution.
Shortest Paths
RIP, OSPF, IS-IS
X?
BGP
What Problem is BGP solving?
X could
•
•
•
•
aid in the design of policy analysis algorithms and heuristics
aid in the analysis and design of BGP and extensions
help explain some BGP routing anomalies
provide a fun way of thinking about the protocol
Separate dynamic and static semantics
static
semantics
BGP Policies
dynamic
semantics
BGP
Stable Paths
SPVP
Problem (SPP)
See [Griffin, Shepherd, Wilfong]
Booo Hooo,
Many, many
complications...
SPVP = Simple Path
Vector Protocol = a
distributed
algorithm for
solving SPP
An instance of the Stable Paths Problem
(SPP)
• A graph of nodes and edges,
• Node 0, called the origin,
• For each non-zero node, a set
or permitted paths to the origin.
This set always contains the
“null path”.
• A ranking of permitted paths at
each node. Null path is always
least preferred. (Not shown in
diagram)
1
When modeling BGP : nodes represent
BGP speaking routers, and 0 represents
a node originating some address block
210
2
20
5
5210
2
4
420
430
3
30
0
1
130
10
most preferred
…
least preferred (not null)
Yes, the translation
gets messy!
A Solution to a Stable Paths Problem
2
210
20
A solution is an assignment of
permitted paths to each node
such that
• node u’s assigned path is either the
null path or is a path uwP, where wP
is assigned to node w and {u,w} is
an edge in the graph,
• each node is assigned the highest
1
ranked path among those consistent
with the paths assigned to its
neighbors.
5
5210
2
4
420
430
3
30
0
1
130
10
A Solution need not represent
a shortest path tree, or
a spanning tree.
An SPP may have multiple solutions
120
10
120
10
1
120
10
1
0
0
2
210
20
DISAGREE
1
2
210
20
First solution
0
2
210
20
Second solution
BAD GADGET : No Solution
2
210
20
4
0
130
10
1
3
3
320
30
SURPRISE : Beware of Backup
Policies
210
20
BGP is not robust :
it is not guaranteed
to recover from
network failures.
1
130
10
2
Becomes a BAD GADGET if link
(4, 0) goes down.
4
40
420
430
0
3
3420
30
PRECARIOUS
Has a solution, but can get “trapped”
4
310
3120
5
5310
563120
53120
4310
453120
43120
1
3
120
10
0
6
2
6310
643120
63120
This part has a solution only
when node 1 is assigned the
direct path (1 0).
210
20
As with DISAGREE, this part
has two distinct solutions