Routing: Hierarchies

Download Report

Transcript Routing: Hierarchies

Global Internet & BGP
Acknowledgement(texts & figs)
•
•
•
•
•
Kurose
Govindan
Zahid
Peterson & Davie
Kevin
Hierarchies
• What?
– Logical structure overlaid on collections of
nodes
• Why?
– Together with information abstraction, the only
known solution to scaling issues
Routing Hierarchies
• Flat routing doesn’t scale
– Each node cannot be expected to have routes to every
destination (or destination network)
• Key observation
– Need less information with increasing distance to
destination
• Two radically different approaches for routing
– The area hierarchy
– The landmark hierarchy
Inter-AS routing
Autonomous systems
• What is an AS?
– A set of routers under a single technical
administration, using an interior gateway
protocol (IGP) and common metrics to route
packets within the AS and using an exterior
gateway protocol (EGP) to route packets to
other AS’s.
– sometimes AS’s use multiple IGPs and metrics,
but appear as single AS’s to other AS’s.
Subnetting & CIDR
Global Addresses
• Properties
– IPv4 uses 32 bit address space
– globally unique
– hierarchical: network + host
A:
0
7
24
Network
Host
• Dot Notation
– 10.3.2.4
– 128.96.33.81
– 192.12.69.77
• Assigning authority
– Jon Postel ran IANA ‘til ‘98
– Assigned by ICANN
B:
1 0
14
16
Network
Host
21
8
Network
Host
C:
1 1 0
D:
1 1 1 0
Multicast
E:
1 1 1 1
Experimental
How to Make Routing Scale
• Flat (Ethernet) versus Hierarchical (Internet) Addresses
– All hosts attached to same network have same network address
• Problem: inefficient use of Hierarchical Address Space
– class C with 2 hosts (2/255 = 0.78% efficient)
– class B with 256 hosts (256/65535 = 0.39% efficient)
• Problem: still Too Many Networks
– routing tables do not scale
• Big tables make routers expensive
– route propagation protocols do not scale
Today’s Internet
• Consists of ISP’s (Internet Service Providers) who run
AS’s (Autonomous Systems)
• All you need to become an ISP is some address space,
an AS number and a peer or two
– Easier said than done
• Getting addresses and AS number is the tricky part
• There are public peering points (MAE East, Central and West)
– NAP’s run by MCI where peering can take place
• Most peering points are private
• Number of connections have been doubling for some
time – how do we deal with this kind of scaling?
Subnetting - 1985
• Original intent was for network to identify one physical network
– Lots of small networks are what we actually have – how do we handle this?
• Solution: add another level to address/routing hierarchy: subnet
• Subnet masks define variable partition of host part
– 1’s identify subnet, 0’s identify hosts within the subnet
– Mechanism for sharing a single network number among multiple networks
• Subnets visible only within a site
Network number
Host number
Class B address
111111111111111111111111
00000000
Subnet mask (255.255.255.0)
Network number
Subnet ID
Subnetted address
Host ID
Subnetting
• Subnetting is the process of creating multiple
segments within a single IP network address space
• From the perspective of a node outside the
network, all nodes on any of the subnetworks
appear to be on the original single network
• Internet routing tables are not affected by
subnetting, I.e. routing tables need not be
overloaded with information about routes to all
internal subnets, just information to the access
router/gateway
Subnetting (Continued)
• Classes A, B and C in IP addressing are designed with
two levels of hierarchy (netid & hostid)
• Problem: An organization with a class B address can
not have more than network and all 216 hosts are
attached to that network  A nightmare in managing
this network, single broadcast domain, security issues,
etc…
• Subnetting create another level of hierarchy (netid,
subnetid and hostid). Delivery of IP packets involves
three steps; delivery to the site router, delivery to the
subnet router, delivery to the host
Subnet Example
Subnet mask: 255.255.255.128
Subnet number: 128.96.34.0
128.96.34.15
128.96.34.1
H1
R1
Subnet mask: 255.255.255.128
Subnet number: 128.96.34.128
128.96.34.130
128.96.34.139
128.96.34.129
H2
R2
H3
128.96.33.14
128.96.33.1
Subnet mask: 255.255.255.0
Subnet number: 128.96.33.0
Forwarding table at router R1
Subnet Number
Hop
128.96.34.0
interface 0
128.96.34.128
interface 1
128.96.33.0
Subnet Mask
Next
255.255.255.128
255.255.255.128
255.255.255.0
R2
A network with two levels of hierarchy
141.14.2.20
141.14.2.21
141.14.2.105
141.14.7.96
To the internet
141.14.0.0
R
141.14.7.95
141.14.22.64
141.14.22.8
141.14.7.44
A network with three levels of
hierarchy
141.14.2.20
141.14.2.21
141.14.2.105
141.14.2.0
R
141.14.7.96
To the internet
141.14.7.0
141.14.4.45
The previous network is
divided into 3 subnets
141.14.22.0
141.14.22.9
141.14.22.64
Subnet Masking
• Subnetting is achieved by “stealing” some bits from
the hostid field to represent the subnet portion of the
address
• Those bits that are used for the subnetid are identified
through the use of a subnet mask
• Masking is the process of extracting the address of the
physical network (if subnetting is not used) or the
subnet address (if subnetting is used) from an IP
address
• A subnet mask is a 32-bit pattern having a “1” in
every netid and subnetid locations and a “0” in every
hostid location
Subnet masking (Continued)
• Subnet masking is performed (both at the host and
at the router) by applying “bit-wise-and” operation
between the IP address and the subnet mask
• Example 1: Class B network without subnetting
– 141.14.2.21 is 10001101.00001110.00000010.00010101
– 255.255.0.0 is 11111111.11111111.00000000.00000000
– “Bit-wise and”10001101.00001110.00000000.00000000
IP address
141.14.2.21
Mask
255.255.0.0
Network address
141.14.0.0
Subnet Masking (Continued)
• Example 2: Class B network with subnetting
– 141.14.2.21 is
10001101.00001110.00000010.00010101
– 255.255.255.0 is
11111111.11111111.11111111.00000000
– “Bit-wise and” is
10001101.00001110.00000010.00000000
– The subnet address is hence 141.14.2.0
IP address
141.14.2.21
Mask
255.255.255.0
Subnet address
141.14.2.0
Forwarding Algorithm
D = destination IP address
for each entry (SubnetNum,
SubnetMask, NextHop)
D1 = SubnetMask & D
if D1 = SubnetNum
if NextHop is an interface
deliver datagram directly to
D
else
deliver datagram to NextHop
Forwarding
• Use a default router if nothing matches
• Not necessary for all 1s in subnet mask to be
contiguous
• Can put multiple subnets on one physical network
• Subnets not visible from the rest of the Internet
• This is a simple, toy example!!
Subnets
• Subnetting is not the only way to solve scalability problems
• Additional router support is necessary to include netmask and forwarding
functionality
• Non-contiguous netmask numbers can be used
– They make administration more difficult
• Multiple subnets can reside on a single network
– Requires routers within the network
• Subnets help solve scalability problems
– Do not require us to use class B or C address for each physical network
– Help us to aggrigate information
• Chief advantage of IP addresses: routers could keep one entry per network
instead of one per destination host
Continued Problems with IPv4
Addresses
• Problem:
– Potential exhaustion of IPv4 address space (due to
inefficiency)
• Class B network numbers are highly prized
• Lots of class C addresses but no one wants them
– Growth of back bone routing tables
• We don’t want lots of small networks since this causes large routing
tables
• Solution:
– Allow addresses assigned to a single entity to span multiple
classed prefixes
– Enhance route aggregation
Supernetting
• Assign block of contiguous network numbers to nearby
networks
• Called CIDR: Classless Inter-Domain Routing
– Breaks rigid boundries between address classes
– If ISP needs 16 class C addresses, make them contiguous
• Eg.192.4.16 to 192.4.31 enables a 20-bit network number
• Represent blocks (number of class C networks) with a
single pair
(first_network_address, count)
• Restrict block sizes to powers of 2
• Use a bit mask (CIDR mask) to identify block size
• All routers must understand CIDR addressing
IP addressing: CIDR
• CIDR: Classless InterDomain Routing
– network portion of address of arbitrary length
– address format: a.b.c.d/x, where x is # bits in
network portion of address
network
part
host
part
11001000 00010111 00010000 00000000
200.23.16.0/23
CIDR (continued)
• Why?
– Reduce amount of global routing information
via aggregation
204.71.0.0
204.71.1.0
204.71.2.0
204.71.3.0
204.71.4.0
Service
Provider
204.71.0.0/16
Global
Internet
Routing
Mesh
CIDR Addresses
• Identifying a CIDR block requires both an address and a
mask
– Slash notation
– 128.211.168.0/21 for addresses 128.211.168.0 – 128.211.175.255
• Here the /21 indicates a 21 bit mask
– All possible CIDR masks can easily be generated
• /8, /16, /24 correspond to traditional class A, B, C categories
• IP addresses are now arbitrary integers, not classes
• Raises interesting questions about lookups
– Routers cannot determine the division between prefix and suffix
just by looking at the address
• Hashing does not work well
• Interesting lookup algorithms have been developed and analyzed
CIDR – A Couple Details
• ISP’s can further subdivide their blocks of
addresses using CIDR
• Some prefixes are reserved for private
addresses
– 10/8, 172.16/12, 192.168/16, 169.254/16
– These are not routable in the Internet
Inter-domain routing
BGP
AS Numbers (ASNs)
ASNs are 16 bit values.
64512 through 65535 are “private”
•
•
•
•
•
•
•
•
Currently over 11,000 in use.
Genuity: 1
MIT: 3
Harvard: 11
UC San Diego: 7377
AT&T: 7018, 6341, 5074, …
UUNET: 701, 702, 284, 12199, …
Sprint: 1239, 1240, 6211, 6242, …
…
ASNs represent units of routing policy
How Many ASNs are there?
Thanks to Geoff Huston. http://www.telstra.net/ops on June 23, 2001
When will we run out of ASNs?
64,511
2005?
2007?
Internet inter-AS routing: BGP
• BGP (Border Gateway Protocol): the de facto standard
• Path Vector protocol:
– similar to Distance Vector protocol
– each Border Gateway broadcast to neighbors (peers)
entire path (I.e, sequence of ASs) to destination
– E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
Internet inter-AS routing: BGP
Suppose: gateway X send its path to peer gateway W
• W may or may not select path offered by X
– cost, policy (don’t route via competitors AS), loop
prevention reasons.
• If W selects path advertised by X, then:
Path (W,Z) = w, Path (X,Z)
• Note: X can control incoming traffic by controling it
route advertisements to peers:
– e.g., don’t want to route traffic to Z -> don’t
advertise any routes to Z
Internet inter-AS routing: BGP
• BGP messages exchanged using TCP.
• BGP messages:
– OPEN: opens TCP connection to peer and
authenticates sender
– UPDATE: advertises new path (or withdraws old)
– KEEPALIVE keeps connection alive in absence of
UPDATES; also ACKs OPEN request
– NOTIFICATION: reports errors in previous msg;
also used to close connection
Why different Intra- and Inter-AS routing ?
Policy:
• Inter-AS: admin wants control over how its traffic
routed, who routes through its net.
• Intra-AS: single admin, so no policy decisions needed
Scale:
• hierarchical routing saves table size, reduced update
traffic
Performance:
• Intra-AS: can focus on performance
• Inter-AS: policy may dominate over performance
Architecture of Dynamic Routing
OSPF
BGP
AS 1
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP,
EIGRP (cisco)
EGP = Exterior Gateway Protocol
EIGRP
AS 2
Policy based: BGP
The Routing Domain of BGP is the entire Internet
Many Routing Processes Can Run on a Single Router
BGP
RIP Process
BGP Process
RIP Routing tables
BGP Routing tables
OSPF Process
OSPF Routing tables
RIP
Domain
OS kernel
OSPF
Domain
Forwarding Table Manager
Forwarding Table
38
AS categories
– Stub: an AS that has only a single connection
to one other AS - carries only local traffic.
– Multihomed: an AS that has connections to
more than one AS, but refuses to carry transit
traffic
– Transit: an AS that has connections to more
than one AS, and carries both transit and local
traffic (under certain policy restrictions)
Nontransit vs. Transit ASes
ISP 2
ISP 1
NET A
Traffic NEVER
flows from ISP 1
through NET A to ISP 2
(At least not intentionally!)
IP traffic
Internet Service
providers (often)
have transit
networks
Nontransit AS
might be a corporate
or campus network.
Could be a “content
provider”
40
Selective Transit
NET B
NET A DOES NOT
provide transit
Between NET D
and NET B
NET C
NET A
NET A provides transit
between NET B and NET C
and between NET D
and NET C
NET D
Most transit networks transit in a selective manner…
IP traffic
41
Choices for global routing
• Link state or distance vector?
– no universal metric - policy decisions
• Problems with distance-vector:
– Bellman-Ford algorithm may not converge
• Problems with link state:
– metric used by routers not the same - loops
– LS database too large - entire Internet
– may expose policies to other AS’s
Solution: Path Vectors
• Each routing update carries the entire path
• Loops are detected as follows:
– when AS gets route check if AS already in path
– if yes, reject route
– if no, add self and advertise route further
• Advantage:
– metrics are local - AS chooses path, protocol
ensures no loops
ASPATH Attribute
AS 1129
135.207.0.0/16
AS Path = 1755 1239 7018 6341
135.207.0.0/16
AS Path = 1239 7018 6341
AS 1239
Sprint
AS 1755
135.207.0.0/16
AS Path = 1129 1755 1239 7018 6341
Ebone
AS 12654
AS 6341
AT&T Research
RIPE NCC
RIS project
135.207.0.0/16
AS Path = 7018 6341
AS7018
135.207.0.0/16
AS Path = 6341
Global Access
135.207.0.0/16
AS Path = 3549 7018 6341
AT&T
135.207.0.0/16
AS Path = 7018 6341
AS 3549
Global Crossing
135.207.0.0/16
Prefix Originated
44
Interdomain Loop Prevention
AS 7018
BGP at AS YYY will
never accept a
route with ASPATH
containing YYY.
Don’t Accept!
12.22.0.0/16
ASPATH = 1 333 7018 877
AS 1
45
Problems
• Routing table size
– need an entry for all paths to all networks
• Required memory= O(N + M*A) * K)
–
–
–
–
N: number of networks
M: mean AS distance
A: number of AS’s
K: number of BGP peers
• Problem reduced with CIDR
Routing information bases
(RIB)
• Routes are stored in RIBs
• Adj-RIBs-In: routing info that has been
learned from other routers (unprocessed
routing info)
• Loc-RIB: local routing information selected
from Adj-RIBs-In (routes selected locally)
• Adj-RIBs-Out: info to be advertised to
peers (routes to be advertised)
Conceptual Mode of Operation
• RIB = Routing information base
Adj-RIB-In
Adj-RIB-Out
Loc-RIB
per BGP neighbor
per BGP neighbor
Routing table size
networks
(NLRI)
2,100
mean AS
distance
5
# of AS’s
4,000
memory
59
BGP
peers/net
3
10
100
6
108,000
10,000
15
300
10
490,000
100,000
20
3,000
20
1,040,000
27,000
Policy with BGP
• BGP provides capability for enforcing
various policies
• Policies are not part of BGP: they are
provided to BGP as configuration
information
• BGP enforces policies by choosing paths
from multiple alternatives and controlling
advertisement to other AS’s
So Many Choices
peer
provider
peer
customer
AS 4
Frank’s
Internet Barn
AS 3
AS 2
AS 1
Which route should
Frank pick to 13.13.0.0./16?
13.13.0.0/16
51
Implementing Backup Links with Local
Preference (Outbound Traffic)
AS 1
primary link
Set Local Pref = 100
for all routes from AS 1
backup link
AS 65000
Set Local Pref = 50
for all routes from AS 1
Forces outbound traffic to take primary link, unless link is down.
We’ll talk about inbound traffic soon …
52
Back to Frank …
peer
provider
peer
Local preference only used in iBGP
customer
AS 4
local pref = 80
local pref = 90
AS 3
local pref = 100
AS 2
Higher Local
preference values
are more preferred
AS 1
13.13.0.0/16
53
CIDR and BGP
AS X
197.8.2.0/24
AS T (provider)
197.8.0.0/23
AS Z
AS Y
197.8.3.0/24
What should T announce to Z?
Options
• Advertise all paths:
– Path 1: through T can reach 197.8.0.0/23
– Path 2: through T can reach 197.8.2.0/24
– Path 3: through T can reach 197.8.3.0/24
• But this does not reduce routing tables! We
would like to advertise:
– Path 1: through T can reach 197.8.0.0/22
Sets and Sequences
• Problem: what do we list in the route?
– list T: omitting information- not acceptable,
may lead to loops
– list T, X, Y: misleading, appears as 3-hop path
• Solution: restructure AS Path attribute as:
– Path: (Sequence (T), Set (X, Y))
– if Z wants to advertise path:
• Path: (Sequence (Z, T), Set (X, Y))
Routing Areas
• Address areas hierarchically
– sequentially number top-level areas
– sub-areas of area are labeled relative to that
area
– nodes are numbered relative to the smallest
containing area
• nodes can have multiple addresses
Routing
• Within area
– each node has routes to every other node
• Outside area
– each node has routes for other top-level areas
only
– inter-area packets are routed to nearest border
router
• Can result in sub-optimal paths
Path Suboptimality
1
1.1
2
2.1
2.2.1
1.2
3
3 hop red path
vs
2 hop green path
2.2
3.1
3.2
Shorter Doesn’t Always Mean Shorter
In fairness:
could you do
this “right” and
still scale?
Mr. BGP says that
path 4 1 is better
than path 3 2 1
Duh!
AS 4
AS 3
Exporting internal
state would
dramatically
increase global
instability and
amount of routing
state
AS 2
AS 1
BGP limitations
Delayed Internet Routing
Convergence
BGP Limitations: Policy
B
A
E
D
C
F
BGP Limitations: Oscillations
A
D
B
C
Daily Update Count
What is the Sound of One Route Flapping?
Implementation Does Matter!
stateless withdraws
widely deployed
stateful withdraws
widely deployed
Thanks to Abha Ahuja and Craig Labovitz for this plot.
A Few Bad Apples …
Most prefixes are
stable most of the time.
On this day, about 83% of the prefixes
were not updated.
Typically, 80% of
the updates are
for less than 5%
Of the prefixes.
Percent of BGP table prefixes
Thanks to Madanlal Musuvathi for this plot.
Data source: RIPE NCC
30 Second Bursts
How Long Does BGP Take to Adapt
to Changes?
100
Cumulative Percentage of Events
90
80
70
60
Tup
Tshort
50
Tlong
40
Tdow n
30
20
10
0
0
20
40
60
80
100
120
140
160
Seconds Until Convergence
Thanks to Abha Ahuja and Craig Labovitz for this plot.