Efficient Algorithms for Large-Scale Topology Discovery
Download
Report
Transcript Efficient Algorithms for Large-Scale Topology Discovery
Internet Routing
Renata Teixeira
Laboratoire LIP6
CNRS and Université Pierre et Marie Curie
What is routing?
A famous
quotation from RFC 791
“A name indicates what we seek.
An address indicates where it is.
A route indicates how we get there.”
-- Jon Postel
1
Challenges for Internet routing
Scale
With
over 1 billion
destinations
–
–
Administrative
autonomy
internet
= network of
networks
can’t store all dest’s in
routing tables!
routing table exchange each network admin
may want to control
would swamp links!
routing in its own
network
2
Scalability: Classless InterDomain Routing (CIDR)
IP Prefix represents a network (group of dests)
–
Represented by two 32-bit numbers (address + mask)
Hierarchy in address allocation
–
Address allocation by ARIN/RIPE/APNIC and by ISPs
–
Today, routing tables contain ~150,000-200,000 prefixes
IP address: 12.4.0.0
00001100 00000100 00000000 00000000
IP mask: 255.254.0.0
11111111 11111110 00000000 00000000
Network Prefix
for hosts
Usually written as 12.4.0.0/15
3
Reduce routing table size
Without CIDR:
232.71.0.0
232.71.1.0
232.71.2.0
…..
232.71.255.0
232.71.0.0
232.71.1.0
232.71.2.0
…..
232.71.255.0
Big ISP
Internet
With CIDR:
232.71.0.0
232.71.1.0
232.71.2.0
…..
232.71.255.0
Big ISP
232.71.0.0/16
4
Internet
Autonomy: network of networks
DT
AS 2
AS 1
AS 3
LIP6
network
Internet = interconnection of Autonomous Systems (AS)
–
Distinct regions of administrative control
–
Routers/links managed by a single “institution”
–
Service provider, company, university, etc.
5
Hierarchical routing
Inter-AS routing
(Border Gateway Protocol)
determines AS path and
egress point
DT 2
AS
AS 1
LIP6
network
AS 3
Intra-AS routing
(Interior Gateway Protocol)
Most common: OSPF,IS-IS
determines path from ingress
to egress
6
Outline
Intra-domain
routing
Shortest path routing
– Link state: OSPF/IS-IS
– Distance vector: RIP
–
Inter-domain
routing
Challenges and goals
– AS relationships
– Path vector: Border Gateway Protocol (BGP)
–
7
Intra-domain Routing
8
Shortest-path routing
Path-selection
model
–
Destination-based
–
Load-insensitive (e.g., static link weights)
–
Minimum hop count or sum of link weights
2
3
2
1
1
1
4
4
5
3
9
Two types of routing algorithms
Link-state
algorithm
–
Uses global information
–
Based on Dijkstra’s algorithm
Distance-vector
algorithm
–
Information is distributed
–
Based on Bellman-Ford
10
Link-state routing
Each router keeps track of its incident links
–
Each router broadcasts the link state
–
To give every router a complete view of the graph
Each router runs Dijkstra’s algorithm
–
Whether the link is up or down and the cost on the link
To compute the shortest paths and construct the forwarding table
Example protocols
–
Open Shortest Path First (OSPF)
–
Intermediate System – Intermediate System (IS-IS)
11
Detecting topology changes
Beaconing
–
Periodic “hello” messages in both directions
–
Detect a failure after a few missed “hellos”
“hello”
Performance trade-offs
–
Detection speed
–
Overhead on link bandwidth and CPU
–
Likelihood of false detection
12
Broadcasting the link state
Flooding
–
Node sends link-state information out its links
–
And then the next node sends out all of its links
–
… except the one where the information arrived
X
A
C
B
(a)
X
A
C
B
D
D
X
A
C
B
(b)
X
A
C
B
(c)
(d)
13
D
D
Broadcasting the link state
Reliable flooding
–
Ensure all nodes receive link-state information
–
… and that they use the latest version
Challenges
–
Packet loss
–
Out-of-order arrival
Solutions
–
Acknowledgments and retransmissions
–
Sequence numbers
–
Time-to-live for each packet
14
When to initiate flooding
Topology change
–
Link or node failure
–
Link or node recovery
Configuration change
–
Link cost change
Periodically
–
Refresh the link-state information
–
Typically (say) 30 minutes
–
Corrects for possible corruption of the data
15
Convergence
Getting consistent routing information to all nodes
–
E.g., all nodes having the same link-state database
Consistent forwarding after convergence
–
All nodes have the same link-state database
–
All nodes forward packets on shortest paths
–
The next router on the path forwards to the next hop
2
3
2
1
1
1
4
4
5
3
16
Transient disruptions
Detection
delay
–
A node does not detect a failed link immediately
–
… and forwards data packets into a “blackhole”
–
Depends on timeout for detecting lost hellos
2
3
2
1
1
1
4
4
5
3
17
Transient disruptions
Inconsistent
link-state database
–
Some routers know about failure before others
–
The shortest paths are no longer consistent
Can cause transient forwarding loops
2
2
1
3
3
1
1
4
2
2
1
1
5
–
4
4
3
18
1
4
3
Convergence delay
Sources of convergence delay
–
Detection latency
–
Flooding of link-state information
–
Shortest-path computation
–
Creating the forwarding table
Performance during convergence period
–
Lost packets due to blackholes and TTL expiry
–
Looping packets consuming resources
–
Out-of-order packets reaching the destination
Very bad for VoIP, online gaming, and video
19
Reducing convergence delay
Faster detection
–
Smaller hello timers
–
Link-layer technologies that can detect failures
Faster flooding
–
Flooding immediately
–
Sending link-state packets with high-priority
Faster computation
–
Faster processors on the routers
–
Incremental Dijkstra algorithm
Faster forwarding-table update
–
Data structures supporting incremental updates
20
Bellman-ford algorithm
Define distances at each node x
–
dx(y) = cost of least-cost path from x to y
Update distances based on neighbors
–
dx(y) = min {c(x,v) + dv(y)} over all neighbors v
v
y
2
3
u
2
1
1
1
4
x
5
w4
t
3
s
z
du(z) = min{c(u,v) + dv(z),
c(u,w) + dw(z)}
21
Distance vector algorithm
c(x,v) = cost for direct link from x to v
–
Dx(y) = estimate of least cost from x to y
–
Node x maintains distance vector Dx = [Dx(y): y є N ]
Node x maintains its neighbors’ distance vectors
–
Node x maintains costs of direct links c(x,v)
For each neighbor v, x maintains Dv = [Dv(y): y є N ]
Each node v periodically sends Dv to its neighbors
–
And neighbors update their own distance vectors
–
Dx(y) ← minv{c(x,v) + Dv(y)}
for each node y ∊ N
Over time, the distance vector Dx converges
22
Distance vector algorithm
Iterative, asynchronous:
Each node:
each local iteration caused
by:
Local link cost change
Distance vector update
message from neighbor
wait for (change in local link
cost or message from neighbor)
recompute estimates
Distributed:
Each node notifies
neighbors only when its DV
changes
Neighbors then notify their
neighbors if necessary
if DV to any destination has
changed, notify neighbors
23
Changes in a link cost
Dy(x) = min{c(y,x) + Dx(x), c(y,z) + Dz(x)} = min{60+0, 1+5} = 6
X X4 Z 6 Z 8
Z Z1
60
Y
4
1
X
Y
Z
Y4
Y5
Routing loop
Z
50
X
Y
Y5 Y 7
Y1
This process will continue for 44 iterations!
Usually called count-to-infinity problem.
24
Routing Information Protocol
(RIP)
Distance vector protocol
–
Nodes send distance vectors every 30 seconds
–
… or, when an update causes a change in routing
Link costs in RIP
–
All links have cost 1
–
Valid distances of 1 through 15
–
… with 16 representing infinity
–
Small “infinity” smaller “counting to infinity” problem
RIP is limited to fairly small networks
–
E.g., used in the Princeton campus network
25
Link state vs. distance vector
Message complexity
LS: with n nodes, E links,
O(nE) messages sent
DV: exchange between
neighbors only
Speed of Convergence
LS: O(n2) algorithm requires
O(nE) messages
DV: convergence time varies
–
May be routing loops
–
Count-to-infinity problem
Robustness:
What happens when router
malfunctions?
LS:
–
Node can advertise incorrect link
cost
–
Each node computes only its own
table
DV:
–
DV node can advertise incorrect
path cost
–
Each node’s table used by others
(error propagates)
26
Conclusions:
Intra-domain routing
Routing is a distributed algorithm
–
React to changes in the topology
–
Compute the shortest paths
Two main shortest-path algorithms
–
Dijkstra link-state routing (e.g., OSPF and IS-IS)
–
Bellman-Ford distance vector routing (e.g., RIP)
Convergence process
–
Changing from one topology to another
–
Transient periods of inconsistency across routers
27
Inter-domain routing
Outline
Inter-domain
routing
–
AS relationships
–
Path-vector routing
Border
Gateway Protocol (BGP)
–
Incremental, prefix-based, path-vector protocol
–
Internal vs. external BGP
–
Business relationships and traffic engineering with BGP
–
BGP convergence delay
29
Relationship between Autonomous
Systems
30
Hierarchy of autonomous
systems
Large
Big
Medium1
Univ
Medium2
Large, tier-1 provider with a nationwide backbone
–
At the “core” of the Internet, don’t have providers
Medium-sized regional provider with smaller backbone
Small network run by a single company or university
31
Connections between networks
Big
dial-in access
Small
IXP
private
peering
Medium
Large
gateway router
access router
IXP
commercial
customer
32
Internet exchange point
Single-homed customers
Big
Medium1
Univ
Univ
Medium2
Large
has only one connection to the Internet
33
Multi-homed customers
Big
Medium1
Univ
Medium2
Large
Same provider: e.g., Medium2 to Big
Different providers: e.g., Medium2 to Big and Huge
34
Customer-provider relationship
Customer needs to be reachable from everyone
–
Provider exports routes learned from customer to everyone
Customer does not want to provide transit service
Customer does not export from one provider to another
Univ is customer of Medium1
traffic to/from
Medium2 is a customer
Univ
of Big and Large
Big
Medium1
Medium2
–
Univ
Large
35
transit traffic is
not allowed
Peer-peer relationship
Peers
exchange traffic between customers
–
AS exports only customer routes to a peer
–
AS exports a peer’s routes only to its customers
customers
exchange traffic
Big and Large are peers
Big and Medium1 are peers
Big doesn’t provide
transit for its peers
Big
Medium1
Medium2
Univ
Large
36
Peering provides shortcuts
Peering also allows connectivity between
the customers of “Tier 1” providers
37
peer
provider
peer
customer
How peering decisions are made?
Peer
Reduces upstream
transit costs
Don’t Peer
You would rather have
customers
Can
Peers
May
Peering
increase end-to-end
performance
be the only way to
connect your customers
to some part of the
Internet (“Tier 1”)
are usually your
competition
relationships
may require periodic
renegotiation
38
Inter-domain Routing
39
Interconnected ASes
3c AS 3
3b
3a
2c
2a
AS 2
Forwarding
table is
configured by both
intra- and inter-AS
routing algorithm
2b
AS 1 1c
1a
1b
1d
Intra-AS
Routing
algorithm
Inter-AS
Routing
algorithm
Forwarding
table
40
–
Intra-AS sets entries
for internal dests
–
Inter-AS & Intra-As
sets entries for
external dests
Inter-AS tasks
Suppose
router in AS1
receives datagram for
which dest is outside of
AS1
–
Router should forward
packet towards one of the
gateway routers, but
which one?
3c AS 3
3b
3a
AS1 needs:
1.
to learn which dests
are reachable through
AS2 and which
through AS3
2.
to propagate this
reachability info to all
routers in AS1
Job of inter-AS routing!
AS 1 1c
1a
2c
1b
1d
2a
41
2b
AS 2
Example: Setting forwarding
table in router 1d
Suppose AS1 learns (via inter-AS protocol) that prefix x
is reachable via AS3 (gateway 1a) but not via AS2.
Inter-AS protocol propagates reachability info to all
internal routers.
Router 1d determines from intra-AS routing info that its
interface I is on the least cost path to 1a.
Puts in forwarding table entry (x,I).
3c AS 3
3b
3a
AS 1 1c
1a
2c
1b
2a
l 1d
42
2b
AS 2
Example: Choosing among
multiple ASes
Now
suppose AS1 learns from the inter-AS
protocol that prefix x is reachable from AS3 and
from AS2.
To configure forwarding table, router 1d must
determine towards which gateway it should
forward packets for dest x.
This is also the job on inter-AS routing protocol!
3c AS 3
3b
3a
AS 1 1c
1a
2c
1b
2a
l 1d
43
2b
AS 2
Challenges for inter-domain
routing
Scale
–
–
Privacy
–
–
Prefixes: 150,000-200,000, and growing
ASes: 30,000 visible ones, and growing
ASes don’t want to divulge internal topologies
… or their business relationships with neighbors
Policy
–
–
–
No Internet-wide notion of a link cost metric
Need control over where you send traffic
… and who can send traffic through you
44
Shortest-path routing is restrictive
All
traffic must travel on shortest paths
All
nodes need common notion of link costs
Incompatible
with commercial relationships
YES
National
ISP2
National
ISP1
NO
Regional
ISP3
Cust3
Regional
ISP2
Cust2
45
Regional
ISP1
Cust1
Link-state routing is problematic
Topology
–
–
High bandwidth and storage overhead
Forces nodes to divulge sensitive information
Entire
–
information is flooded
path computed locally per node
High processing overhead in a large network
Minimizes
–
Works only if policy is shared and uniform
Typically
–
some notion of total distance
used only inside an AS
E.g., OSPF and IS-IS
46
Distance vector is on the right
track
Advantages
–
Hides details of the network topology
–
Nodes determine only “next hop” toward the dest
Disadvantages
–
Minimizes some notion of total distance, which is
difficult in an inter-domain setting
–
Slow convergence due to the counting-to-infinity
problem (“bad news travels slowly”)
Idea:
extend the notion of a distance vector
47
Path-vector routing
Extension
of distance-vector routing
–
Support flexible routing policies
–
Avoid count-to-infinity problem
Key
idea: advertise the entire path
–
Distance vector: send distance metric per dest d
–
Path vector: send the entire path for each dest d
d: path(2,1)
d: path(1)
2
3
1
data traffic
48
d
Faster loop detection
Node
can easily detect a loop
–
Look for its own node identifier in the path
–
E.g., node 1 sees itself in the path “3, 2, 1”
Node
–
can simply discard paths with loops
E.g., node 1 simply discards the advertisement
d: path(2,1)
d: path(1)
2
3
d: path(3,2,1)
49
1
Flexible policies
Each
node can apply local policies
–
Path selection: Which path to use?
–
Path export: Which paths to advertise?
Examples
–
Node 2 may prefer the path “2, 3, 1” over “2, 1”
–
Node 1 may not let node 3 hear the path “1, 2”
2
3
1
50
Border Gateway Protocol (BGP)
Inter-domain routing protocol for the Internet
–
–
Prefix-based path-vector protocol
Policy-based routing based on AS Paths
Evolved during the past 15 years
–
1989 : BGP-1 [RFC 1105]
•
–
–
–
1990: BGP-2 [RFC 1163]
1991: BGP-3 [RFC 1267]
1995: BGP-4 [RFC 1771]
•
–
Replacement for EGP (1984, RFC 904)
Support for Classless Interdomain Routing (CIDR)
2006: BGP-4 [RFC 4271]
51
BGP session operation
Establish session on
TCP port 179
AS1
BGP session
Exchange all
active routes
AS2
Exchange incremental
updates
While connection
is ALIVE exchange
route UPDATE messages
52
Incremental protocol
A node
learns multiple paths to destination
–
Stores all of the routes in a routing table
Applies policy to select a single active route
–
May advertise the route to its neighbors
–
Incremental
–
Announcement:
•
–
updates
Upon selecting a new active route, add node id to path
Withdrawal
•
If the active route is no longer available
53
BGP route
Destination
prefix (e.g,. 128.112.0.0/16)
Route attributes, including
–
–
AS path (e.g., “2 1”)
Next-hop IP address (e.g., 12.127.0.121)
192.0.2.1
AS 1
12.127.0.121
AS 2
AS 3
128.112.0.0/16
AS path = 2 1
Next Hop = 12.127.0.121
128.112.0.0/16
AS path = 1
Next Hop = 192.0.2.1
54
An AS is not a single node
AS
–
path length can be misleading
An AS may have many router-level hops
BGP says that
path 4 1 is better
than path 3 2 1
AS 4
AS 3
AS 2
AS 1
55
Two types of BGP neighbor
relationships
External BGP (eBGP)
–
Session between routers in different ASes
Internal BGP (iBGP)
–
Need to distribute BGP information within the AS
–
iBGP sessions are routed using IGP
eBGP
AS1
iBGP
56
AS2
iBGP mesh doesn’t scale
iBGP
updates
eBGP
update
Configuration
overhead
–
N border routers means
N(N-1)/2 sessions
–
One new router requires
configuring all the others
Routing
overhead
–
Each router has to listen to
updates from all neighbors
–
Larger routing tables,
because of alternate routes
57
Route reflectors
eBGP
update
iBGP
updates
RR
Acts
RR
RR
–
Routes from clients,
distribute to other RRs
–
Routes from other RRs,
distribute to clients
Only
58
like a route server
sends best route
BGP path selection
Simplest
case
–
Shortest AS path
–
Arbitrary tie break
AS 5
128.112.0.0/16
AS Path = 5 4 3 2 1
AS 7
128.112.0.0/16
AS Path = 6 2 1
AS 6
Example
–
Three-hop AS path preferred over a four-hop AS
path
–
AS 7 prefers path through AS 6
But,
–
BGP is not limited to shortest-path routing
Policy-based routing
59
BGP policy: Applying policy to
routes
Import
–
Filter unwanted routes from neighbor
•
–
E.g. prefix that your customer doesn’t own
Manipulate attributes to influence path selection
•
E.g., assign local preference to favored routes
Export
–
policy
Filter routes you don’t want to tell your neighbor
•
–
policy
E.g., don’t tell a peer a route learned from other peer
Manipulate attributes to control what they see
•
E.g., make a path look artificially longer than it is
60
BGP route processing
Receive Apply Policy =
filter routes &
BGP
Updates tweak attributes
Apply Import
Policies
Based on
Attribute
Values
Best
Routes
Best Route
Selection
Best Route
Table
Apply Policy =
filter routes &
tweak attributes
Apply Export
Policies
Install forwarding
Entries for best
Routes.
IP Forwarding Table
61
Transmit
BGP
Updates
Import policy: Filtering
Discard
–
some route announcements
Detect configuration mistakes and attacks
Examples
–
–
on session to a customer
Discard route if prefix not owned by the customer
Discard route that contains other large ISP in AS
path
Tier-3
Big
Univ
128.112.0.0/16
62
Export policy: Filtering
Discard
–
some route announcements
Limit propagation of routing information
Examples
–
Don’t announce routes from one peer to another
–
Don’t announce routes for network-management
hosts
Huge
Big
Univ
63
Large
network
operator
BGP Policy Configuration
Routing policy languages are vendor-specific
–
Not part of the BGP protocol specification
–
Different languages for Cisco, Juniper, etc.
Still, all languages have some key features
–
Policy as a list of clauses
–
Each clause matches on route attributes
–
… and either discards or modifies the matching routes
Configuration done by human operators
–
Implementing the policies of their AS
–
Business relationships, traffic engineering, security, …
64
Implementing business
relationships with BGP policies
65
Best route selection:
Simplified BGP decision process
Ignore
if next hop unreachable
Highest
local preference
Lowest AS
Lowest
Prefer
path length
MED (with same next hop AS)
eBGP over iBGP
Lowest
IGP cost to egress router
Lowest
router ID of egress router
66
Import policy: Local preference
Favor
–
–
one path over another
Override the influence of AS path length
Apply local policies to prefer a path
Example:
prefer customer over peer
Big
Local-pref = 90
Large
Local-pref = 100
Tier-2
Tier-3
67
Univ
Example:
Customer to provider
router import policies route selection export policies
A
local pref = 100
B
select Univ route
select A’s route
132.239.17.0/24
B
Medium1
A
send to other
iBGP neighbors
send to other
eBGP neighbors
Big
Medium2
Univ
Large
68
Example: Peers
router import policies route selection export policies
A
local pref = 90
B
C
select M1 route
send to other
iBGP routers
don’t send
send to customers
select A’s route
select A’s route
132.239.0.0/16
How do B and C
know that A’s route Medium1
is a peer route?
Suppose Medium1,
Big, and Large are peers
A
C
Big
B
Medium2
Univ
Large
69
Community Attribute
Community
attribute is used to tag routes
–
List of four-byte values
–
Format= <AS number>:<value>
Import
–
Ex.: Internet2 uses to tag commercial routes
•
community add 11537:2001
Export
–
policy: Set community
policy: Match community, apply policy
Ex.: Don’t export commercial routes
•
from community 11537:2001 then reject
70
Example:
Customers vs. peers
router import policies route selection export policies
A
local pref (M1)= 100
local pref (L)= 80
select M1 route
send to other
iBGP and eBGP
neighbors
B
132.239.0.0/16
Big
Medium1
A
Medium2
Suppose:
• M1 is a customer
Univ
of Big and Large
• Big and Large are peers
Large
71
Engineering traffic
with BGP policies
72
Traffic engineering with BGP
For
inbound traffic
–
Filter outbound routes
–
Tweak attributes in outbound
routes in the hope to influence
the neighbor’s selection
For
–
–
inbound
traffic
outbound
routes
outbound
traffic
inbound
routes
outbound traffic
Filter inbound routes
Tweak attributes
in inbound
routes to influence best route
selection
73
Implementing backup paths for
outbound traffic with local prefs
AS 1
primary
Set local-pref = 100
For all routes from AS 1
backup
AS 2
Forces outbound traffic to take
primary link, unless link is down
74
Set local-pref = 50
For all routes from AS 1
Implementing backup paths for
inbound traffic with AS prepending
AS 1
132.239.17.0/24
AS-path = 2 2 2
132.239.17.0/24
AS-path = 2
backup
primary
AS 2
Usually, forces inbound traffic to take
primary link, unless link is down
75
AS prepending may not work
AS 0
AS 1
132.239.17.0/24
AS-path = 2 2 2 2 2 2 2
132.239.17.0/24
AS-path = 2
backup
primary
AS 2
Suppose that AS2 is a customer of AS1 and AS0, then AS0
prefers the backup route, because local preference is
considered before AS path length
76
BGP communities can help
AS 0
AS 1
132.239.17.0/24
community = 3:70
132.239.17.0/24
backup
primary
AS 2
Use community attribute to signal which local pref value
should be used. If customer local pref is 100 and peer local
pref is 90, setting local pref to 70 will solve the problem.
77
Example:
Multiple egress points
router import policies route selection export policies
A
B
C
local pref = 80
local pref = 80
local pref = 80
select Big route
select Big route
select Large route
send to other iBGP
send to other iBGP
send to other iBGP
route to
UPMC
Big
What will router
D choose?
Medium1
A
B Medium2
C
Univ
Large
78
D
Hot-potato (early-exit) routing
route to
Univ
Big
A
B
1
7
2
2
C 1
2
5
1
IGP distances
D-A : 10
D
D-B: 8
D-C: 7
Large
traffic to Univ
Hot-potato routing = route to closest egress point
when there is more than
one route to destination
79
Asymmetric routing
A
2
Big
100
Paris
London
Large
5
150
B
A to B
80
B to A
Some routers don’t need BGP
Customer that connects to a single upstream ISP
–
The ISP can introduce the prefixes into BGP
–
… and the customer can simply default-route to the ISP
Big
Nail up routes 130.132.0.0/16
pointing to Univ
Nail up default routes 0.0.0.0/0
pointing to Big
Univ
130.132.0.0/16
81
Some routers don’t need BGP
Routers
inside a “stub” network
–
Border router may speak BGP to upstream ISPs
–
But, internal routers can simply “default route”
Big
Large
BGP
Univ
130.132.0.0/16
82
Joining BGP and IGP information
Border Gateway Protocol (BGP)
–
Announces reachability to external destinations
–
Maps a destination prefix to an egress point
•
128.112.0.0/16 reached via 192.0.2.1
Interior Gateway Protocol (IGP)
–
Used to compute paths within the AS
–
Maps an egress point to an outgoing link
•
192.0.2.1 reached via 10.1.1.1
10.1.1.1
192.0.2.1
83
BGP convergence
84
Causes of BGP routing changes
Topology changes
–
Equipment going up or down
–
Deployment of new routers or sessions
BGP session failures
–
Due to equipment failures, maintenance, etc.
–
Or, due to congestion on the physical path
Changes in routing policy
–
Reconfiguration of preferences
–
Reconfiguration of route filters
Persistent protocol oscillation
–
Conflicts between policies in different ASes
85
BGP session failure
BGP runs over TCP
–
BGP only sends updates when
changes occur
–
TCP doesn’t detect lost
connectivity on its own
AS1
Detecting a failure
–
Keep-alive: 60 seconds
–
Hold timer: 180 seconds
Reacting to a failure
–
–
Discard all routes learned from
the neighbor
AS2
Send new updates for any
routes that change
86
Routing change: Before and after
0
0
(1,0)
(2,0)
(2,0)
(1,2,0)
1
1
2
2
(3,1,0)
(3,2,0)
3
3
87
Routing change: Path
exploration
AS
1
–
Delete the route (1,0)
–
Switch to next route
(1,2,0)
–
Send route (1,2,0) to AS 3
AS
0
(2,0)
(1,2,0)
1
3
–
Sees (1,2,0) replace (1,0)
–
Compares to route (2,0)
–
Switches to using AS 2
2
(3,2,0)
3
88
Routing change: Path
exploration
(1,0)
(1,2,0)
(1,3,0)
Initial situation
–
Destination 0 is alive
–
All ASes use direct path
When destination dies
–
All ASes lose direct path
–
All switch to longer paths
–
Eventually withdrawn
1
2
0
E.g., AS 2
–
(2,0) (2,1,0)
–
(2,1,0) (2,3,0)
–
(2,3,0) (2,1,3,0)
–
(2,1,3,0) null
(3,0)
(3,1,0)
(3,2,0)
89
(2,0)
(2,1,0)
(2,3,0)
(2,1,3,0)
3
BGP converges slowly, if at all
Path vector avoids count-to-infinity
–
–
Fortunately, in practice
–
–
But, ASes still must explore many alternate paths
… to find the highest-ranked path that is still available
Most popular destinations have very stable BGP routes
And most instability lies in a few unpopular destinations
Still, lower BGP convergence delay is a goal
–
–
–
Can be tens of seconds to tens of minutes
High for important interactive applications
… or even conventional application, like Web browsing
90
Why different intra- and inter-AS
routing?
Policy
–
Inter-AS: admin wants control over how its traffic routed, who
routes through its network
–
Intra-AS: single admin, so no policy decisions needed
Scale
–
hierarchical routing saves table size, reduced update traffic
Performance
–
Intra-AS: can focus on performance
–
Inter-AS: policy may dominate over performance
91
Conclusions
BGP is solving a hard problem
–
Routing protocol operating at a global scale
–
With tens of thousands of independent networks
–
That each have their own policy goals
–
And all want fast convergence
Key features of BGP
–
Prefix-based path-vector protocol
–
Incremental updates (announcements and withdrawals)
–
Policies applied at import and export of routes
–
Internal BGP to distribute information within an AS
–
Interaction with the IGP to compute forwarding tables
92
Acknowledgements
Many of the slides presented here are
borrowed from Tim Griffin, Jim Kurose,
and Jennifer Rexford.
93
Routing measurements
Renata Teixeira
Laboratoire LIP6
CNRS and Université Pierre et Marie Curie
Outline
Measurements
in an AS
–
IGP routing measurements
–
iBGP measurements
–
Impact of IGP changes in BGP
BGP
Measurements from multiple ASes
–
Public BGP data: RIPE and RouteViews
–
Diagnosis of inter-domain routing changes
95
Impact of hot-potato routing changes
in IP networks
with
Aman Shaikh (AT&T)
Tim Griffin (University of Cambridge)
Jennifer Rexford (Princeton)
Internet routing architecture
Big
Medium1
Medium2
Univ
Large
inter-domain routing (BGP)
intra-domain routing (OSPF,IS-IS)
Changes in one AS
End-to-end performance
may impact traffic
depends on all ASes
and routing in other ASes
along the path
97
Inter-domain routing
Border
–
Exchange reachability info between ASes
•
–
Gateway Protocol (BGP)
IP prefix – block of IP addresses
Use policies to determine paths to use and to
export to neighbors
I can reach
132.239.17.0/24
BGP selects egress
Choices of egress points
132.239.17.1
98
Intra-domain routing
Interior
Gateway Protocol (IGP)
–
Most common: OSPF and IS-IS
–
Flood link-state information
–
Compute shortest path based on link weights
–
Link weights are set by network operators
A
E 3
6
1
2
B
F
1
2 D
C
Cost of path from A to F = 1+2+1+3
99
Interaction between
IGP and BGP
route to
Univ
Big
A
B
1
6
2
2
C 1
2
5
1
IGP distances
D-A : 9
D
D-B: 8
D-C: 7
Large
traffic to Univ
Hot-potato routing = route to closest egress point
when there is more than
one route to destination
100
Hot-potato routing changes
traffic to Univ
Big
A
B
1
6
2
2
C 1
Large
2
5
1
IGP distances
D-A : 9
D
D-B: 8 11
D-C: 7 10
Consequences:
Transient
forwarding instability
- failure
Traffic -shift
planned maintenance
BGP routing
- trafficchanges
engineering
101
Impact of
hot-potato routing changes?
How
often hot-potato changes happen?
How
many destinations do they affect?
What
How
are the convergence delays?
much traffic do they affect?
102
Outline
Measurement
methodology
–
Collection of OSPF and BGP data of AT&T
–
Identification of hot-potato routing changes
BGP
impact
Traffic
impact
Minimizing
hot-potato disruptions
103
Routing measurements
Passive
measurement of routing protocol
–
A monitor acts as a regular router
–
Stores all messages with a timestamp
IS-IS
–
or OSPF monitoring
Flooding gives complete view of topology dynamics
BGP
monitoring
–
Only best route for each prefix from a router
–
Need to collect data from multiple vantage points
104
Collecting input data
BGP updates
BGP monitor
Monitor updates from
nine vantage points
AT&T
A backbone
B
OSPF
messages
OSPF Monitor
Replay routing decisions
from vantage point A and B
to identify hot-potato changes
105
Monitor the flooding of
link-state advertisements
Challenges for identifying
hot-potato changes
A single
–
Group related routing messages in time
Router
–
implementation affects message timing
Controlled experiments of router in the lab
Many
–
event may cause multiple messages
BGP updates caused by external events
Classify BGP routing changes by possible causes
106
Algorithm for correlating
routing changes
Compute distance changes
–
Group OSPF messages close in time
–
Compute distance changes from each vantage point
Classify BGP changes by possible OSPF cause
–
Group updates close in time
–
Compare old and new route according to decision process
Determine causal relationship
–
Consistent BGP and OSPF changes
–
Close in time
NY
SF
9
? 10
Dallas
BGP update: SF NY
107
Outline
Measurement methodology
BGP impact
–
How often do hot-potato changes happen?
–
Which fraction of prefixes do they affect?
Traffic impact
Minimizing hot-potato disruptions
108
BGP impact of an OSPF change
router A
router B
… but few have
a very big impact
Vast majority of OSPF changes
have no impact on these routers
109
Variation across routers
dst
dst
NY
NY
SF
9
SF
10
1
B
1000
A
Small changes will make router A
switch egress points to dst
More robust to intradomain
routing changes
Significance of hot-potato routing depends on
network design and router location.
110
Outline
Measurement methodology
BGP impact
Traffic impact
–
How long are convergence delays?
–
What is the impact on traffic?
Minimizing hot-potato disruptions
111
Delay for BGP routing change
Steps
–
–
–
–
between OSPF change and BGP update
OSPF message flooded through the network (t0)
OSPF updates distance information
OSPF monitor
BGP decision process rerun (timer driven)
BGP update sent to another router (t)
•
First BGP update sent (t1)
BGP monitor
Metrics
time to update
other prefixes
time for BGP to
revisit its decision
t0
t1
t
112
time
BGP reaction time
uniform 5 – 80 sec
Worst case scenario:
Transfer delay
0 – 80 sec to revisit BGP decision
50 – 110 sec to send multiple updates
Last prefix may take 3 minutes to converge!
First BGP update
All BGP updates
113
Transient data disruptions
1 – BGP decision process runs in R2
2 – R2 starts using E1 to reach dst
3 – R1’s BGP decision can
take up to 60 seconds to run
R1
10
R2
10
100
111
E2
E1
dst
Disastrous for interactive applications (VoIP, gaming, web)
114
Challenges for Active
Measurements
Problem:
Single-homed probe machines
–
Probes do not experience the loop
–
Probes do not illustrate the customer experience
customer traffic in loop
R1
Operator probes
P1
10
R2
111
10
100
10
R2
E2
E1
dst
115
P2
Traffic shifts
MB per second
e1
p
e2
routing shift (R)
i
load variation (L)
decrease
in traffic
TM(i,e1,t) = L(i,e1,t) + R(i,e1,t)
L(i,e1,t) : variation
that
still uses e
TM(i,e1of
,t) traffic
= TM(i,e
1,t) - TM (i,e11,t-1)
R(i,e1,t) : traffic that moved to e1 – moved out of e1
minutes
116
R relative to normal variations
Large shifts caused by routing
changes
routing shift 70 times
normal variations
TM caused by load
Vast majority (99.88%)
of TM [-4,4]
TM caused by routing
TM relative to normal variations
117
Hot-potato vs. external BGP
routing changes
Hot-potato changes have
biggest impact on TM
Hot potato
eBGP
TM relative to normal variations
118
Summary of
measurement analysis
Convergence can take minutes
–
Forwarding loops, leads to packet loss and delay
–
Fixes: event-driven implementations or tunnels
Frequency of hot-potato changes depends on location
–
Once a week on average for more affected routers
Internal events can have big impact
–
Some events affect over half of a BGP table
–
Responsible for largest traffic variations
119
Implications of hot-potato routing
changes
To
end users
–
Transient disruptions
–
New end-to-end path characteristics
To network administrators
–
Instability in the traffic matrix
120
Outline
Measurement methodology
BGP impact
Traffic impact
Minimizing hot-potato disruptions
–
What can operators do today?
121
What can operators do today?
Network
design
–
Design networks that minimize hot-potato
changes
–
Implement a fixed ranking of egress points
(e.g., MPLS tunnels injected in IGP)
Maintenance
–
Plan maintenance activities considering the
impact of changes on BGP routes
122
Comparison of Network Designs
dst
dst
NY
SF
NY
SF
1
dst
9
10
NY
10
SF
9
Dallas
1000
Dallas
10
9
Dallas
Small changes will make Dallas
switch egress points to dst
123
More robust to intradomain
routing changes
Careful Cost in/out Links
dst
NY
SF
4
10
5
10
10
5
Dallas
124
100
Conclusion
Hot-potato routing is too disruptive
–
Small changes inside an AS can lead to big disruptions on
BGP and transit traffic
In addition, hot potato is…
–
Too restrictive: Egress selection mechanism dictates a policy
–
Too convoluted: IGP metrics determine BGP egress selection
Introduce more flexible egress selection mechanism
–
TIE: Tunable Interdomain Egress selection
125
More Info
http://rp.lip6.fr/~teixeira
BGP impact
–
Traffic impact
–
R. Teixeira, N. Duffield, J. Rexford, and M. Roughan, “Traffic Matrix Reloaded:
Impact of Routing Changes”, in Proc. of PAM, March 2005.
Model of network sensitivity to IGP changes
–
R. Teixeira, A. Shaikh, T. Griffin, and J. Rexford, “Dynamics of Hot-Potato
Routing in IP networks”, in Proc. of ACM SIGMETRICS, June 2004.
R. Teixeira, T. Griffin, A. Shaikh, and G.M. Voelker, “Network Sensitivity to
Hot-Potato Disruptions”, in Proc. of ACM SIGCOMM, August 2004.
New egress selection mechanism
–
R. Teixeira, T. Griffin, M. Resende, and J. Rexford, “TIE Breaking: Tunable
Interdomain Egress Selection”, IEEE/ACM Transactions on Networking,
August, 2007.
126
Pin-pointing routing changes
with
Jennifer Rexford (Princeton)
Why diagnose routing changes?
Routing
changes cause service disruptions
–
Convergence delay
–
Traffic shift
–
Change in path properties
•
RTT, available bandwidth, or lost connectivity
Operators
need to know
–
Why: For diagnosing and fixing problems
–
Where: For accountability
•
Need to guarantee service-level agreements
128
Limitations of active
measurements
Active
measurements: traceroute-like tools
–
Can’t probe in the past
–
Shows the effect, not the cause
AS 2
AS 4
AS 1
AS 3
129
Passive BGP measurements
Passive
measurements: public BGP data
–
RIPE RIS: http://www.ripe.net/ris/
–
RouteViews: http://www.routeviews.org/
eBGP update feeds
Data Collection
(RouteViews, RIPE)
130
Many uses for public BGP data
Build the AS-level topology of the Internet
Inference of AS relationships
Origin of IP prefixes
Prediction of AS-level paths
Map IP addresses to ASes
Understand routing instabilities
Locate the origin of routing changes
Detect prefix hijacks
…
131
Appealing to locate origin of
routing changes
BGP update feeds
Data Mining
(across time, views,
and prefixes)
Data Collection
(RouteViews, RIPE)
132
root cause
Each router’s view is unique
Myth: The BGP updates from a single router accurately represent the AS.
dst
The measurement
system needs to
capture the
BGP routing changes
from all border
routers
AS 2
AS 1
A
B
7
6
12
C
10
D
BGP data
collection
No change
133
Internal routing changes matter
Myth:Routing changes visible in eBGP have greater end-to-end
impact than changes with local scope.
dst
The measurement
system needs to
capture
internal changes inside
an AS
AS 2
AS 1
A
B
5
6
12
C
10
7
D
BGP data
collection
No change
134
Route aggregation hides
information
Myth:BGP data from a router accurately represents changes on that router.
The measurement
system needs to
know
all routes the router
knows.
12.1.1.0/24
BGP data
collection
No change
A
12.1.0.0/16
135
Policy obscures event location
Myth:The AS responsible for the change appears in the old or
the new AS path.
peer
peer
1
2
3
BGP data
4
Accurate
collection
8
troubleshooting
old AS path:
5
may require
1,2,8,9,10
9
measurement data
new AS path:
6
from each AS
1,4,5,6,7,10
7
11
10
136
dst
Conclusion
Routing
monitoring is essential for network
operation
–
Diagnose and fix network failures
Both BGP and IGP monitoring are passive
–
–
Listen to routing messages
Little overhead
Public
–
–
BGP data can be misleading
Only few vantage points per AS
Inaccuracies depend on the goal of the study
137