Transcript Routing

Routing
Outline
Algorithms
Scalability
1
Overview
• Forwarding vs Routing
– forwarding: to select an output port based on
destination address and routing table
– routing: process by which routing table is built
• Network as a Graph
A
6
1
3
2
1
B
E
F
4
C
9
1
D
• Problem: Find lowest cost path between two nodes
• Factors
– static: topology
– dynamic: load
2
Distance Vector
• Each node maintains a set of triples
– (Destination, Cost, NextHop)
• Directly connected neighbors exchange updates
– periodically (on the order of several seconds)
– whenever table changes (called triggered update)
• Each update is a list of pairs:
– (Destination, Cost)
• Update local table if receive a “better” route
– smaller cost
– came from next-hop
• Refresh existing routes; delete if they time out
3
Example
B
C
A
D
E
F
G
Destination Cost NextHop
A
1
A
C
1
C
D
2
C
E
2
A
F
2
A
G
3
A
4
Routing Loops
• Example 1
–
–
–
–
–
–
F detects that link to G has failed
F sets distance to G to infinity and sends update t o A
A sets distance to G to infinity since it uses F to reach G
A receives periodic update from C with 2-hop path to G
A sets distance to G to 3 and sends update to F
F decides it can reach G in 4 hops via A
• Example 2
–
–
–
–
–
–
link from A to E fails
A advertises distance of infinity to E
B and C advertise a distance of 2 to E
B decides it can reach E in 3 hops; advertises this to A
A decides it can read E in 4 hops; advertises this to C
C decides that it can reach E in 5 hops…
5
Loop-Breaking Heuristics
• Set infinity to 16
• Split horizon
• Split horizon with poison reverse
6
Link State
• Strategy
– send to all nodes (not just neighbors)
information about directly connected links (not
entire routing table)
• Link State Packet (LSP)
–
–
–
–
id of the node that created the LSP
cost of link to each directly connected neighbor
sequence number (SEQNO)
time-to-live (TTL) for this packet
7
Link State (cont)
• Reliable flooding
– store most recent LSP from each node
– forward LSP to all nodes but one that sent it
– generate new LSP periodically
• increment SEQNO
– start SEQNO at 0 when reboot
– decrement TTL of each stored LSP
• discard when TTL=0
8
Route Calculation
• Dijkstra’s shortest path algorithm
• Let
–
–
–
–
–
N denotes set of nodes in the graph
l (i, j) denotes non-negative cost (weight) for edge (i, j)
s denotes this node
M denotes the set of nodes incorporated so far
C(n) denotes cost of the path from s to node n
M = {s}
for each n in N - {s}
C(n) = l(s, n)
while (N != M)
M = M union {w} such that C(w) is the minimum for
all w in (N - M)
for each n in (N - M)
C(n) = MIN(C(n), C (w) + l(w, n ))
9
Metrics
• Original ARPANET metric
– measures number of packets queued on each link
– took neither latency or bandwidth into consideration
• New ARPANET metric
– stamp each incoming packet with its arrival time (AT)
– record departure time (DT)
– when link-level ACK arrives, compute
Delay = (DT - AT) + Transmit + Latency
– if timeout, reset DT to departure time for retransmission
– link cost = average delay over some time period
• Fine Tuning
– compressed dynamic range
– replaced Delay with link utilization
10
How to Make Routing Scale
• Flat versus Hierarchical Addresses
• Inefficient use of Hierarchical Address Space
– class C with 2 hosts (2/255 = 0.78% efficient)
– class B with 256 hosts (256/65535 = 0.39% efficient)
• Still Too Many Networks
– routing tables do not scale
– route propagation protocols do not scale
11
Internet Structure
Recent Past
NSFNET backbone
Stanford
ISU
BARRNET
regional
Berkeley
PARC
MidNet
regional
Westnet
regional
UNM
NCAR
UNL
KU
UA
12
Internet Structure
Today
Large corporation
“Consumer ” ISP
Peering
point
Backbone service provider
“ Consumer” ISP
Large corporation
Peering
point
“Consumer”ISP
Small
corporation
13
Subnetting
• Add another level to address/routing hierarchy: subnet
• Subnet masks define variable partition of host part
• Subnets visible only within site
Network number
Host number
Class B address
111111111111111111111111
00000000
Subnet mask (255.255.255.0)
Network number
Subnet ID
Host ID
Subnetted address
14
Subnet Example
Subnet mask: 255.255.255.128
Subnet number: 128.96.34.0
128.96.34.15
128.96.34.1
H1
R1
Subnet mask: 255.255.255.128
Subnet number: 128.96.34.128
128.96.34.130
128.96.34.139
128.96.34.129
H2
R2
H3
128.96.33.14
128.96.33.1
Subnet mask: 255.255.255.0
Subnet number: 128.96.33.0
Forwarding table at router R1
Subnet Number
128.96.34.0
128.96.34.128
128.96.33.0
Subnet Mask
255.255.255.128
255.255.255.128
255.255.255.0
Next Hop
interface 0
interface 1
R2
15
Forwarding Algorithm
D = destination IP address
for each entry (SubnetNum, SubnetMask, NextHop)
D1 = SubnetMask & D
if D1 = SubnetNum
if NextHop is an interface
deliver datagram directly to D
else
deliver datagram to NextHop
•
•
•
•
Use a default router if nothing matches
Not necessary for all 1s in subnet mask to be contiguous
Can put multiple subnets on one physical network
Subnets not visible from the rest of the Internet
16
Supernetting
• Assign block of contiguous network numbers to
nearby networks
• Called CIDR: Classless Inter-Domain Routing
• Represent blocks with a single pair
(first_network_address, count)
• Restrict block sizes to powers of 2
• Use a bit mask (CIDR mask) to identify block size
• All routers must understand CIDR addressing
17
IP Router
• Forwarding Equivalence Classes (FEC)
– e.g., 172.200.0.0/16
• Forwarding table: FEC  < next_hop, port >
– match address to FEC with longest prefix
– forward to “smarter” router by default
• Core routers have ~100,000 FECs
18
Route Propagation
• Know a smarter router
–
–
–
–
hosts know local router
local routers know site routers
site routers know core router
core routers know everything
• Autonomous System (AS)
– corresponds to an administrative domain
– examples: University, company, backbone network
– assign each AS a 16-bit number
• Two-level route propagation hierarchy
– interior gateway protocol (each AS selects its own)
– exterior gateway protocol (Internet-wide standard)
19
Popular Interior Gateway Protocols
• RIP: Route Information Protocol
–
–
–
–
developed for XNS
distributed with Unix
distance-vector algorithm
based on hop-count
• OSPF: Open Shortest Path First
–
–
–
–
recent Internet standard
uses link-state algorithm
supports load balancing
supports authentication
20
EGP: Exterior Gateway Protocol
• Overview
– designed for tree-structured Internet
– concerned with reachability, not optimal routes
• Protocol messages
– neighbor acquisition: one router requests that another
be its peer; peers exchange reachability information
– neighbor reachability: one router periodically tests if
the another is still reachable; exchange HELLO/ACK
messages; uses a k-out-of-n rule
– routing updates: peers periodically exchange their
routing tables (distance-vector)
21
BGP-4: Border Gateway Protocol
• AS Types
– stub AS: has a single connection to one other AS
• carries local traffic only
– multihomed AS: has connections to more than one AS
• refuses to carry transit traffic
– transit AS: has connections to more than one AS
• carries both transit and local traffic
• Each AS has:
– one or more border routers
– one BGP speaker that advertises:
• local networks
• other reachable networks (transit AS only)
• gives path information
22
BGP Example
• Speaker for AS2 advertises reachability to P and Q
– network 128.96, 192.4.153, 192.4.32, and 192.4.3, can be reached
directly from AS2
Customer P
(AS 4)
128.96
192.4.153
Customer Q
(AS 5)
192.4.32
192.4.3
Customer R
(AS 6)
192.12.69
Customer S
(AS 7)
192.4.54
192.4.23
Regional provider A
(AS 2)
Backbone network
(AS 1)
Regional provider B
(AS 3)
• Speaker for backbone advertises
– networks 128.96, 192.4.153, 192.4.32, and 192.4.3 can be reached
along the path (AS1, AS2).
• Speaker can cancel previously advertised paths
23
• Features
–
–
–
–
–
–
–
IP Version 6
128-bit addresses (classless)
multicast
real-time service
authentication and security
autoconfiguration
end-to-end fragmentation
protocol extensions
• Header
– 40-byte “base” header
– extension headers (fixed order, mostly fixed length)
•
•
•
•
fragmentation
source routing
authentication and security
other options
24
4.4 Multicast
Outline
4.4.1 Multicast Addresses
4.4.2 Multicast Routing (DVMRP,
PIM, MSDP) Encoding
25
Routing protocol
Unicast
• Intra domain
– OSPF (Open Shortest Path
First)
– IS-IS
– RIP
– EIGRP
– …
• Inter domain
– BGP v4
(Border Gateway Protocol)
– EGP
(Exterior Gateway Protocol)
• Autonomous System (AS)
– Group of networks, single
administrative authority
• Policy and connectivity
26
Routing protocol
Multicast
• Intra domain
– MOSPF
• Extension to OSPF
– DVMRP
• Distance Vector Multicast
Routing Protocol
• The mrouted implementation
(Flood & Prune)
• Inter domain
– MBGP + MSDP
• Currently used
– BGMP + MASC
– PIM
• Protocol Independent Multicast
– Routing protocol
independent
• Sparse mode
• Dense mode
27
Addressing
• Multicast group in the Internet has its own Class D address
– looks like a host address, but isn’t
– Class D address in IP address space are used as multicast destination
address
– 224.0.0.0 to 239.255.255.255, 28 bits can be used, over 250 million
groups possible
– Multicast address can appear only as destination address, never as source
address
– When sent to a multicast address, the packet reaches to all host who are
currently belonging to that group
28
Unicast one-to-one
29
Multicast one-to-many
30
Multicast routing
•
•
Broadcast and prune (DVMRP, PIM-DM)
• Reverse shortest path tree
• Routers do reverse path forwarding (RPF) check
Explicit join (CBT, PIM-SM)
• Receivers send join to rendezvous point (RP)
• Senders send multicast data to RP, up the tree
• RP fans out multicast data (its a meeting point)
• Optimizations in PIM-SM to short-cut the RP
• Shared tree versus source specific tree
31
DVMRP
• DVMRP ( Distance vector multicast routing protocol)
– Very similar to RIP
• distance vector
• hop count metric
– reverse-path forwarding
– Used in conjunction with
• flood-and-prune (to determine memberships)
– prunes store per-source and per-group information
– Each router stores prune information for reverse path
multicasting i.e. selective forwarding. ( per source, per
group for each interface)
– explicit join messages (unlike pure flood and prune) to reduce join latency (but
no source info, so still need flooding)
32
Internet Multicast Protocol
• Multicast version of OSPF
– In link state each router monitors its directly connected links and
broadcasts to all other routers whenever a change in link state occurs
– The extension requires to support multicasting is following:
-The link state part also contains all multicast groups for which the
link has member(s)
-with this information each router can compute the shortest path
multicast tree for each source of each group
– Since router has to store this tree for each source for each group,
overhead is high, hence not scalable
33
MOSPF
• MOSPF (Multicast OSPF)
– Multicast extension to OSPF
– Routers flood group membership information with LSPs (LSP extended)
– Each router independently computes shortest-path tree that only includes
multicast-capable routers
• no need to flood and prune
• Group joining and leaving information gets updated in all router through
Link State Update
– Complex
• need storage per group per link
• need to compute shortest path tree per source and group
– Since router has to store this tree for each source for each group, overhead
is high, hence not scalable
34
Core based tree Multicasting
• DVMRP and MOSPF were source based multicast tree
– Each source uses different source specific shortest path tree for data
forwarding
– Cost of group formation with these schemes: join/prune information
store per source per group per interface in each router.
– Both suffer from scaling problems.Building trees installs state in the
routers. It is easy to observe that both do not scale well when a
relatively small proportion (sparse mode) of routers wants to receive
packet from a particular group. CBT and PIM( see next slides) are
primarily for sparse mode situation.
35
Core based tree Multicasting
• Core based Tree:
Key idea with core-based tree
– coordinate multicast with a core router
– host sends a join request to core router
– routers along path mark incoming interface for
forwarding.
36
PIM
• PIM Dense mode
– Flood & prune
• PIM Sparse mode
– Shared tree (Core Based
Tree, CBT)
– Switches to SPT
– Root called Rendezvous
Point (RP)
• PIM SSM
– Source specific multicast
– IGMP v3
• PIM Bidir
– implements shared sparse
trees with bidirectional flow
of data
37
Protocol independent multicast –
sparse mode (PIM-SM)
•
•
•
•
Underlying unicast routing protocol is used
Receivers must explictly join groups (no flooding)
Everyone meets at a rendezvous point (RP)
• RP is the core of a uni-directional tree
• First hop routers encapsulate multicast to RP
• RP can join source to the tree to avoid encap
State and reliability issues
38
PIM-SM illustrated
39
Multi-protocol BGP (MBGP)
•
•
•
•
BGP extension to carry other routes (e.g multicast)
Provides for route aggregation and policy
Used between ASes
Carries information about the sources of multicast
40
MBGP
• MBGP, Multiprotocol Extensions for
BGP v4 (BGP4+), RFC 2283
• Extended BGP peering
• Allows different unicast and multicast paths
41
MSDP
• MSDP, Multicast Source Discovery Protocol
– Interconnects RPs and exchanges information of active
sources
– Peer over TCP, sends Source Active message
– Gives PIM information of how to join the source at
exchange point
– One entry per active source!
42
BGMP
• Border Gateway Multicast Protocol
– Each group has a predefined root (or use MASC)
– BGMP builds a bi-directional, shared tree of domains
– Domains can run any multicast IGP internally
– Still under development
http://www.ietf.org/html.charters/bgmp-charter.html
43
4.5 Multiprotocol Label Switching
Outline
4.5.1 Destination-Based Forwarding
4.5.2 Explicit Routing
4.5.3 Virtual Private Networks and Tunnels
44
Bell Existing Network
45
New Bell MPLS Network
46
MPLS Network Model
Internet
LER
IP
LER
LSR
LSR
MPLS
LSR
MPLS
LSR
LER
LSR = Label Switched Router
LER = Label Edge Router
IP
47
Basic Idea
• MPLS is a hybrid model adopted by IETF to incorporate
best properties in both packet routing & circuit switching
IP Router
Control:
MPLS
Control:
IP Router
Software
IP Router
Software
Forwarding:
Forwarding:
Longest-match
Lookup
Label Swapping
ATM Switch
Control:
ATM Forum
Software
Forwarding:
Label Swapping
48
Basic Idea (Cont.)
• Packets are switched, not routed, based on labels
• Labels are filled in the packet header
• Basic operation:
– Ingress LER (Label Edge Router) pushes a label in front of the IP
header
– LSR (Label Switch Router) does label swapping
– Egress LER removes the label
• The key : establish the forwarding table
– Link state routing protocols
• Exchange network topology information for path selection
• OSPF-TE, IS-IS-TE
– Signaling/Label distribution protocols:
• Set up LSPs (Label Switched Path)
• LDP, RSVP-TE, CR-LDP
49
MPLS Operation
1a. Routing protocols (e.g. OSPF-TE, IS-IS-TE)
exchange reachability to destination networks
1b. Label Distribution Protocol (LDP)
establishes label mappings to destination
network
4. LER at egress
removes label and
delivers packet
IP
IP
2. Ingress LER receives packet
and “label”s packets
3. LSR forwards
packets using label
swapping
50
Main features
• Label swapping:
– Bring the speed of layer 2 switching to layer 3
• Separation of forwarding plane and control plane
• Forwarding hierarchy via Label stacking
– Increase the scalability
• Constraint-based routing
– Traffic Engineering
– Fast reroute
• Facilitate the virtual private networks (VPNs)
• Provide class of service
– Provides an opportunity for mapping DiffServ fields onto an
MPLS label
• Facilitate the elimination of multiple layers
51