ECE544Lec8-5DR07

Download Report

Transcript ECE544Lec8-5DR07

ECE544: Communication
Networks-II, Spring 2007
Sanjoy Paul
Lecture 8
Includes teaching materials from L. Peterson, J. Kurose, K. Almeroth
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
IP Address
“class-full” addressing:
class
A
0 network
B
10
C
110
D
1110
1.0.0.0 to
127.255.255.255
host
network
128.0.0.0 to
191.255.255.255
host
network
multicast address
32 bits
host
192.0.0.0 to
223.255.255.255
224.0.0.0 to
239.255.255.255
How to Make Routing Scale
• Flat versus Hierarchical Addresses
• Inefficient use of Hierarchical Address Space
– class C with 2 hosts (2/255 = 0.78% efficient)
– class B with 256 hosts (256/65535 = 0.39%
efficient)
• Still Too Many Networks
– routing tables do not scale
– route propagation protocols do not scale
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Sub-netting
• Add another level to address/routing hierarchy: subnet
• Subnet masks define variable partition of host part
• Subnets visible only within site
Network number
Host number
Class B address
111111111111111111111111
00000000
Subnet mask (255.255.255.0)
Network number
Subnet ID
Subnetted address
Host ID
Subnet Example
Subnet mask: 255.255.255.128
Subnet number: 128.96.34.0
128.96.34.15
128.96.34.1
H1
R1
Subnet mask: 255.255.255.128
Subnet number: 128.96.34.128
128.96.34.130
128.96.34.139
128.96.34.129
H2
R2
H3
128.96.33.14
128.96.33.1
Subnet mask: 255.255.255.0
Subnet number: 128.96.33.0
Forwarding table at router R1
Subnet Number
128.96.34.0
128.96.34.128
128.96.33.0
Subnet Mask
255.255.255.128
255.255.255.128
255.255.255.0
Next Hop
interface 0
interface 1
R2
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Super-netting (CIDR)
• Class addressing doesn’t match real
needs:
– Class C is 255 addresses, too small
– Clsss B is 64K addresses, too big
• Need method of allocating addresses in
multiple sizes
• Assign block of contiguous network
numbers to nearby networks
• Called CIDR: Classless Inter-Domain
Routing
Classless Inter Domain Routing (CIDR)
Net ID
Host ID
Class B:
Class C:
Net ID
Host ID
 Problem: Class B addresses are running out
 Solution: Allocate multiple Class C addresses
 Problem: Random allocation of Class C addresses need
multiple routing table entries
 Solution: Allocate “contiguous” Class C addresses
 Routing entry:
[IP Address of Network and Net Mask]
IP Address:
195.201.3.5 = 11000011 11001001 00000011 00000101
Net Mask:
254.0.0.0 = 11111110 00000000 00000000 00000000
----------------------------------------------------------------------------------------Network IP:
194.0.0.0 = 11000010 00000000 00000000 00000000
CIDR (continued)
 How many Class C addresses ?
Organization’s Requirements
Assignment
Fewer than
256 addresses
1 Class C network
Fewer than
512 addresses
2 Contiguous Class C networks
Fewer than 1024 addresses
4 Contiguous Class C networks
Fewer than 2048 addresses
8 Contiguous Class C networks
Fewer than 4096 addresses
16 Contiguous Class C networks
Fewer than 8192 addresses
32 Contiguous Class C networks
Fewer than 16384 addresses
64 Contiguous Class C networks
 Contiguous Class C network addresses allow a “single”
entry in the routing table for all the above organizations
Coordinated Address Allocation
 Address aggregation using Geographic scope
Multi-regional
192.0.0.0 -- 193.255.255.255
Europe
194.0.0.0 -- 195.255.255.255
Others
196.0.0.0 -- 197.255.255.255
North America
198.0.0.0 -- 199.255.255.255
Central/South America
200.0.0.0 -- 201.255.255.255
Pacific Rim
202.0.0.0 -- 203.255.255.255
Others
204.0.0.0 -- 205.255.255.255
Others
206.0.0.0 -- 207.255.255.255
European networks will have a single entry in routing tables of routers in
other continents: [Network IP =194.0.0.0; mask = 254.0.0.0]
194.0.0.0
= 10110010 00000000 00000000 00000000
195.255.255.255 = 10110011 11111111 11111111 11111111
Same 7 high-order bits implies
Mask = 11111110 00000000 00000000 00000000 = 254.0.0.0
Coordinated Address Allocation
 Networks within Europe can be allocated an address block within the
European “slab”
 Europe’s address range is [194.0.0.0-195.255.255.255]
 An European corporation may be allocated
[194.0.16.0 – 194.0.31.255]
 This creates 2 entries in the routing table:
[Network IP = 194.0.16.0; Mask = 255.255.240.0]
[Network IP = 194.0.0.0; Mask = 254.0.0.0]
[Network IP =194.0.16.0; mask = 255.255.240.0]
194.0.16.0
= 10110010 00000000 00010000 00000000
194.0.31.255
= 10110011 00000000 00011111 11111111
Same 20 high-order bits implies
Mask = 11111111 11111111 11110000 00000000 = 255.255.240.0
[Network IP =194.0.0.0; mask = 254.0.0.0]
194.0.0.0
= 10110010 00000000 00000000 00000000
195.255.255.255 = 10110011 11111111 11111111 11111111
Same 7 high-order bits implies
Mask = 11111110 00000000 00000000 00000000 = 254.0.0.0
• Recap
Today’s Lecture
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Route Aggregation Examples
Q: How does network get network part of
IP addr?
A: gets allocated portion of its provider
ISP’s address space
ISP's block
11001000 00010111 00010000 00000000
200.23.16.0/20
Organization 0
Organization 1
Organization 2
...
11001000 00010111 00010000 00000000
11001000 00010111 00010010 00000000
11001000 00010111 00010100 00000000
…..
….
200.23.16.0/23
200.23.18.0/23
200.23.20.0/23
….
Organization 7
11001000 00010111 00011110 00000000
200.23.30.0/23
Hierarchical addressing: route
aggregation
Hierarchical addressing allows efficient advertisement of routing
information:
Organization 0
200.23.16.0/23
Organization 1
200.23.18.0/23
Organization 2
200.23.20.0/23
Organization 7
.
.
.
.
.
.
Fly-By-Night-ISP
“Send me anything
with addresses
beginning
200.23.16.0/20”
Internet
200.23.30.0/23
ISPs-R-Us
“Send me anything
with addresses
beginning
199.31.0.0/16”
Hierarchical addressing: more
specific routes
ISPs-R-Us has a more specific route to Organization 1
Organization 0
200.23.16.0/23
Organization 2
200.23.20.0/23
Organization 7
.
.
.
.
.
.
Fly-By-Night-ISP
“Send me anything
with addresses
beginning
200.23.16.0/20”
Internet
200.23.30.0/23
ISPs-R-Us
Organization 1
200.23.18.0/23
“Send me anything
with addresses
beginning 199.31.0.0/16
or 200.23.18.0/23”
Route Aggregation with
CIDR
Corporation X
(11000000000001000001)
Border gateway
(advertises path to
11000000000001)
Regional network
Corporation Y
(11000000000001000000)
Chapter 4, Figure 26
Address Matching in CIDR
• Routing table uses “longest prefix match”
–
–
–
–
171.69 (16 bit prefix) = routing table entry #1
171.69.10 (24 bit prefix) = routing table entry #2
then DA=171.69.10.5 matches routing table entry #2
and DA = 171.69.20.3 matches routing table entry #1
CIDR (Summary)
• Continuous block of 2N addresses
• [Base address, Mask]
• Lookup algorithm:
– Masks destination address against mask in routing
table entry
– Match means route is found
– May be multiple matchings!
– Longest mask breaks “ties” (longest prefix match)
IP addressing (Summary)
• Classful addressing:
– inefficient use of address space, address space exhaustion
• e.g., class B net allocated enough addresses for 65K hosts, even if
only 2K hosts in that network
• CIDR: Classless InterDomain Routing
– network portion of address of arbitrary length
– address format: a.b.c.d/x, where x is # bits in network
portion of address
network
part
host
part
11001000 00010111 00010000 00000000
200.23.16.0/23
• Recap
Today’s Lecture
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Introduction: Routing Problem
• Network as a Graph
A
6
1
3
4
C
2
1
B
9
E
F
1
D
Problem: Find lowest cost path between
two nodes
• Factors
– static: topology
– dynamic: load
Two main approaches
• DV: Distance-vector protocols
• LS: Link state protocols
• Variations of above methods applied to:
– Intra-domain routing (small/med networks)
• RIP, OSPF
– Inter-domain routing (large/global networks)
• BGP-4
Routing in the Internet
• The Global Internet consists of Autonomous Systems
(AS) interconnected with each other:
– Stub AS: small corporation: one connection to other AS’s
– Multihomed AS: large corporation (no transit): multiple
connections to other AS’s
– Transit AS: provider, hooking many AS’s together
• Two-level routing:
– Intra-AS: administrator responsible for choice of routing
algorithm within network
– Inter-AS: unique standard for inter-AS routing: BGP
Internet AS Hierarchy
Inter-AS border (exterior gateway) routers
Intra-AS (interior gateway) routers
Intra-AS Routing
• Also known as Interior Gateway Protocols (IGP)
• Most common Intra-AS routing protocols:
– RIP: Routing Information Protocol
– OSPF: Open Shortest Path First
– IGRP: Interior Gateway Routing Protocol
(Cisco proprietary)
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
RIP ( Routing Information Protocol)
• Distance vector algorithm
• Included in BSD-UNIX Distribution in 1982
• Distance metric: # of hops (max = 15 hops)
– Can you guess why?
• Distance vectors: exchanged among
neighbors every 30 sec via Response
Message (also called advertisement)
• Each advertisement: list of up to 25
destination nets within AS
RIP: Example
z
D
w
A
x
y
B
C
Destination Network
w
y
z
x
….
Next Router
Num. of hops to dest.
….
....
A
B
B
--
Routing table in D
2
2
7
1
RIP: Example
Dest
w
x
z
….
Next
C
…
w
hops
4
...
A
Advertisement
from A to D
z
x
Destination Network
w
y
z
x
….
D
y
B
C
Next Router
Num. of hops to dest.
….
....
A
B
B A
--
Routing table in D
2
2
7 5
1
RIP: Link Failure and Recovery
If no advertisement heard after 180 sec -->
neighbor/link declared dead
– routes via neighbor invalidated
– new advertisements sent to neighbors
– neighbors in turn send out new advertisements
(if tables changed)
– link failure info quickly propagates to entire net
– poison reverse used to prevent ping-pong loops
(infinite distance = 16 hops)
RIP Table processing
• RIP routing tables managed by applicationlevel process called route-d (daemon)
• advertisements sent in UDP packets,
periodically repeated
routed
routed
Transprt
(UDP)
network
(IP)
link
physical
Transprt
(UDP)
forwarding
table
forwarding
table
network
(IP)
link
physical
RIP Table example (continued)
Router: giroflee.eurocom.fr
Destination
-------------------127.0.0.1
192.168.2.
193.55.114.
192.168.3.
224.0.0.0
default
•
•
•
•
•
Gateway
Flags Ref
Use
Interface
-------------------- ----- ----- ------ --------127.0.0.1
UH
0 26492 lo0
192.168.2.5
U
2
13 fa0
193.55.114.6
U
3 58503 le0
192.168.3.5
U
2
25 qaa0
193.55.114.6
U
3
0 le0
193.55.114.129
UG
0 143454
Three attached class C networks (LANs)
Router only knows routes to attached LANs
Default router used to “go up”
Route multicast address: 224.0.0.0
Loopback interface (for debugging)
OSPF (Open Shortest Path First)
• “open”: publicly available
• Uses Link State algorithm
– LS packet dissemination
– Topology map at each node
– Route computation using Dijkstra’s algorithm
• OSPF advertisement carries one entry per
neighbor router
• Advertisements disseminated to entire AS (via
flooding)
– Carried in OSPF messages directly over IP (rather
than TCP or UDP
OSPF “advanced” features (not in RIP)
• Security: all OSPF messages authenticated (to
prevent malicious intrusion)
• Multiple same-cost paths allowed (only one path
in RIP)
• For each link, multiple cost metrics for different
TOS (e.g., satellite link cost set “low” for best
effort; high for real time)
• Integrated uni- and multicast support:
– Multicast OSPF (MOSPF) uses same topology
data base as OSPF
• Hierarchical OSPF in large domains.
Hierarchical OSPF
• Two-level hierarchy: local
area, backbone.
– Link-state advertisements
only in area
– each nodes has detailed area
topology; only know
direction (shortest path) to
nets in other areas.
• Area border routers:
“summarize” distances to
nets in own area, advertise to
other Area Border routers.
• Backbone routers: run
OSPF routing limited to
backbone.
• Boundary routers: connect
to other AS’s.
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Inter-AS routing in the Internet: BGP
R4
R5
R3
BGP
AS1
AS2
(RIP intra-AS
routing)
(OSPF
intra-AS
routing)
BGP
R1
R2
Figure 4.5.2-new2: BGP use for inter-domain routing
AS3
(OSPF intra-AS
routing)
Internet inter-AS routing: BGP
• BGP (Border Gateway Protocol): the de facto standard
• Path Vector protocol:
– similar to Distance Vector protocol
– each Border Gateway broadcast to neighbors
(peers) entire path (i.e., sequence of AS’s) to
destination
– BGP routes to networks (ASs), not individual hosts
– E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
Internet inter-AS routing: BGP
Suppose: gateway X send its path to peer gateway W
• W may or may not select path offered by X
– cost, policy (don’t route via competitors AS), loop
prevention reasons.
• If W selects path advertised by X, then:
Path (W,Z) = w, Path (X,Z)
• Note: X can control incoming traffic by controlling its
route advertisements to peers:
– e.g., don’t want to route traffic to Z -> don’t
advertise any routes to Z
BGP: controlling who routes to you
legend:
B
W
provider
network
X
A
customer
network:
C
Y
Figure 4.5-BGPnew: a simple BGP scenario
• A,B,C
are provider networks
• X,W,Y are customer (of provider networks)
• X is dual-homed: attached to two networks
– X does not want to route from B via X to C
– .. so X will not advertise to B a route to C
BGP: controlling who routes to you
legend:
B
W
provider
network
X
A
customer
network:
C
Y
Figure 4.5-BGPnew: a simple BGP scenario
• A advertises
to B the path AW
• B advertises to X the path BAW
• Should B advertise to C the path BAW?
– No way! B gets no “revenue” for routing CBAW since
neither W nor C are B’s customers
– B wants to force C to route to w via A
– B wants to route only to/from its customers!
BGP operation
Q: What does a BGP router do?
• Receiving and filtering route advertisements
from directly attached neighbor(s).
• Route selection.
– To route to destination X, which path (of
several advertised) will be taken?
• Sending route advertisements to neighbors.
BGP Operations (Simplified)
Establish session on
TCP port 179
AS1
BGP session
Exchange all
active routes
AS2
Exchange incremental
updates
While connection
is ALIVE exchange
route UPDATE messages
45
Four Types of BGP Messages
• Open : Establish a peering session.
• Keep Alive : Handshake at regular intervals.
• Notification : Shuts down a peering session.
• Update : Announcing new routes or withdrawing
previously announced routes.
announcement
=
prefix + attributes values
46
BGP Attributes
Value
----1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
...
255
Code
--------------------------------ORIGIN
AS_PATH
NEXT_HOP
MULTI_EXIT_DISC
LOCAL_PREF
ATOMIC_AGGREGATE
AGGREGATOR
COMMUNITY
ORIGINATOR_ID
CLUSTER_LIST
DPA
ADVERTISER
RCID_PATH / CLUSTER_ID
MP_REACH_NLRI
MP_UNREACH_NLRI
EXTENDED COMMUNITIES
Reference
--------[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1771]
[RFC1997]
[RFC2796]
[RFC2796]
[Chen]
[RFC1863]
[RFC1863]
[RFC2283]
[RFC2283]
[Rosen]
Most
important
attributes
reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters
Not all attributes
need to be present in
every announcement
Most Important BGP attributes
• LocalPREF
– Local preference policy to choose “most”
preferred route
• Multi-exit Discriminator
– Which peering point to choose?
• Import Rules
– What route advertisements do I accept?
• Export Rules
– Which routes do I forward to whom?
Attributes are Used to
Select Best Routes
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
192.0.2.0/24
pick me!
Given multiple
routes to the same
prefix, a BGP speaker
must pick at most
one best route
(Note: it could reject
them all!)
ASPATH Attribute
AS 1129
135.207.0.0/16
AS Path = 1755 1239 7018 6341
135.207.0.0/16
AS Path = 1239 7018 6341
AS 1239
Sprint
AS 1755
135.207.0.0/16
AS Path = 1129 1755 1239 7018 6341
Ebone
AS 12654
AS 6341
AT&T Research
RIPE NCC
RIS project
135.207.0.0/16
AS Path = 7018 6341
AS7018
135.207.0.0/16
AS Path = 6341
Global Access
135.207.0.0/16
AS Path = 3549 7018 6341
AT&T
135.207.0.0/16
AS Path = 7018 6341
AS 3549
Global Crossing
135.207.0.0/16
Prefix Originated
50
AS Graphs Do Not Show Topology!
BGP was designed to
throw away information!
The AS graph
may look like this.
Reality may be closer to this…
AS Graphs Depend on Point of View
peer
peer
provider
customer
1
3
2
4
1
3
5
1
2
4
5
6
3
1
6
4
2
2
6
4
5
3
5
6
This explains why there is no UUNET (701) Sprint (1239) link on previous slide!
Shorter Doesn’t Always Mean Shorter
In fairness:
could you do
this “right” and
still scale?
AS 3
Exporting internal
state would
dramatically
increase global
instability and
amount of routing
state
Mr. BGP says that
path 4 1 is better
than path 3 2 1
Duh!
AS 4
AS 2
AS 1
Route Selection Summary
Highest Local Preference
Enforce relationships
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
traffic engineering
Lowest IGP cost
to BGP egress
Lowest router ID
Throw up hands and
break ties
Implementing Customer/Provider
and Peer/Peer relationships
Two parts:
• Enforce transit relationships
– Outbound route filtering
• Enforce order of route preference
– provider < peer < customer
Import Routes
provider route
peer route
From
provider
customer route
From
provider
From
peer
From
peer
From
customer
From
customer
ISP route
Export Routes
provider route
peer route
To
provider
customer route
ISP route
From
provider
To
peer
To
peer
To
customer
To
customer
filters
block
Why different Intra- and Inter-AS routing ?
Policy:
• Inter-AS: admin wants control over how its traffic
routed, who routes through its net.
• Intra-AS: single admin, so no policy decisions
needed
Scale:
• hierarchical routing saves table size, reduced
update traffic
Performance:
• Intra-AS: can focus on performance
• Inter-AS: policy may dominate over performance
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Multicast: one sender to many receivers
• Multicast: act of sending datagram to multiple
receivers with single “transmit” operation
– analogy: one teacher to many students
• Question: how to achieve multicast
Multicast via unicast
• source sends N
unicast datagrams,
one addressed to
each of N receivers
Routers forward
unicast datagrams
multicast receiver (red)
not a multicast receiver
Multicast: one sender to many receivers
• Multicast: act of sending datagram to multiple
receivers with single “transmit” operation
– analogy: one teacher to many students
• Question: how to achieve multicast
Network multicast
Multicast routers (red) duplicate and
forward multicast datagrams
• Routers actively
participate in multicast,
making copies of packets
as needed and
forwarding towards
multicast receivers
Multicast: one sender to many receivers
• Multicast: act of sending datagram to multiple
receivers with single “transmit” operation
– analogy: one teacher to many students
• Question: how to achieve multicast
Application-layer
multicast
• end systems involved
in multicast copy and
forward unicast
datagrams among
P2P hosts duplicate and
forward multicast datagrams themselves
Internet Multicast Service Model
128.59.16.12
128.119.40.186
multicast
group
226.17.30.197
128.34.108.63
128.34.108.60
multicast group concept: use of indirection
– host addresses IP datagram to multicast group
– routers forward multicast datagrams to hosts
that have “joined” that multicast group
Multicast groups
class D Internet addresses reserved for multicast:
host group semantics:
o anyone can “join” (receive) multicast group
o anyone can send to multicast group
o no network-layer identification of hosts members
 needed: infrastructure to deliver mcast-addressed
datagrams to all hosts that have joined that
multicast group
Joining a mcast group: two-step process
• local: host informs local mcast router of desire to
join group: IGMP (Internet Group Management
Protocol)
• wide area: local router interacts with other
routers to receive mcast datagram flow
– many protocols (e.g., DVMRP, MOSPF, PIM)
IGMP
IGMP
wide-area
multicast
routing
IGMP
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
IGMP: Internet Group Management Protocol
• host: sends IGMP report when application joins mcast
group
– IP_ADD_MEMBERSHIP socket option
– host need not explicitly “unjoin” group when leaving
• router: sends IGMP query at regular intervals
– host belonging to a mcast group must reply to
query
query
report
How IGMP Works
routers:
Q
hosts:
• on each link, one router is elected the “querier”
• querier periodically sends a Membership Query message
to the all-systems group (224.0.0.1), with TTL = 1
• on receipt, hosts start random timers (between 0 and 10
seconds) for each multicast group to which they belong
How IGMP Works (cont.)
Q
G
G
G
G
• when a host’s timer for group G expires, it sends a
Membership Report to group G, with TTL = 1
• other members of G hear the report and stop their timers
• routers hear all reports, and time out non-responding
groups
IGMP
IGMP v2: additions include
IGMP version 1
• group-specific Query
• router: Host
Membership Query msg • Leave Group msg
broadcast on LAN to all
– last host replying to Query
hosts
can send explicit Leave
Group msg
• host: Host Membership
– router performs groupReport msg to indicate
specific query to see if any
group membership
hosts left in group
– RFC 2236
– randomized delay before
responding
– implicit leave via no
IGMP v3:
reply to Query
– Join/Leave specific S in G
• RFC 1112
– RFC 3376
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Multicast Routing: Problem Statement
• Goal: find a tree (or trees) connecting
routers having local mcast group members
– tree: not all paths between routers used
– source-based: different tree from each sender to rcvrs
– shared-tree: same tree used by all group members
Source-based trees
Shared tree
Approaches for building mcast trees
Approaches:
• source-based tree: one tree per source
– shortest path trees
– reverse path forwarding
• group-shared tree: group uses one tree
– minimal spanning (Steiner)
– center-based trees
…we first look at basic approaches, then specific
protocols adopting these approaches
Shortest Path Tree
• mcast forwarding tree: tree of shortest
path routes from source to all receivers
– Dijkstra’s algorithm
S: source
LEGEND
R1
1
2
R4
R2
3
R3
router with attached
group member
5
4
R6
router with no attached
group member
R5
6
R7
i
link used for forwarding,
i indicates order link
added by algorithm
Reverse Path Forwarding
 rely on router’s knowledge of unicast
shortest path from it to sender
 each router has simple forwarding behavior:
if (mcast datagram received on incoming link
on shortest path back to source)
then flood datagram onto all outgoing links
else ignore datagram
Building the Reverse Path
source
Building a Reverse Path Tree
source
Reverse Path Forwarding: example
S: source
LEGEND
R1
R4
router with attached
group member
R2
R5
R3
R6
R7
router with no attached
group member
datagram will be
forwarded
datagram will not be
forwarded
• result is a source-specific reverse SPT
– may be a bad choice with asymmetric links
Reverse Path Forwarding: pruning
• forwarding tree contains subtrees with no
mcast group members
– no need to forward datagrams down subtree
– “prune” msgs sent upstream by router with
no downstream group members
LEGEND
S: source
R1
router with attached
group member
R4
R2
P
R5
R3
R6
P
R7
P
router with no attached
group member
prune message
links with multicast
forwarding
Shared-Tree: Steiner Tree
• Steiner Tree: minimum cost tree connecting
all routers with attached group members
• problem is NP-complete
• excellent heuristics exists
• not used in practice:
– computational complexity
– information about entire network needed
– monolithic: rerun whenever a router needs to
join/leave
Center-based trees
• single delivery tree shared by all
• one router identified as “center” of tree
• to join:
– edge router sends unicast join-msg addressed to
center router
– join-msg “processed” by intermediate routers and
forwarded towards center
– join-msg either hits existing tree branch for this
center, or arrives at center
– path taken by join-msg becomes new branch of
tree for this router
Center-based trees: an example
Suppose R6 chosen as center:
LEGEND
R1
3
R2
router with attached
group member
R4
2
R5
R3
1
R6
R7
1
router with no attached
group member
path order in which join
messages generated
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
The First Intra-Domain
Routing Protocol: DVMRP
Distance-Vector Multicast
Routing Protocol (DVMRP)
DVMRP consists of two major components:
(1) a conventional distance-vector routing protocol
(like RIP) which builds, in each router, a routing
table like this:
subnet
shortest dist
via interface
a
1
i1
b
5
i1
c
…
3
…
i2
…
(2) a protocol for determining how to forward
multicast packets, based on the routing table and
routing messages of (1)
Example Topology
g
g
s
g
Phase 1: Truncated Broadcast
g
g
s
g
Phase 2: Pruning
g
g
prune (s,g)
s
prune (s,g)
g
Steady State
g
g
g
s
g
Grafting on New Receivers
g
g
g
report (g)
graft (s,g)
s
graft (s,g)
g
Steady State after Grafting
g
g
g
s
g
Current IP Multicast Routing
Protocols
DVMRP — Distance-Vector Multicast Routing Protocol
broadcast-and-prune,
unidirectional per-source trees,
builds own routing table
MOSPF — Multicast Extensions to Open Shortest-Path
First Protocol
broadcast membership,
unidirectional per-source trees,
uses OSPF routing table
Current IP Multicast
Routing Protocols (cont.)
PIM-DM — Protocol-Independent Multicast, Dense-Mode
broadcast-and-prune,
unidirectional per-source trees,
uses unicast routing table (Protocol Independent)
PIM-SM — Protocol-Independent Multicast, Sparse-Mode
uses meeting places (“rendezvous points”),
unidirectional per-group or shared trees,
uses unicast routing table (Protocol Independent)
CBT — Core-Based Trees
uses meeting places (“cores”),
bidirectional shared trees,
uses unicast routing table
Multicast Routing: MOSPF
Multicast OSPF (MOSPF)
• an extension to OSPF (Open Shortest-Path First),
a link-state, intra-domain routing protocol
specified in RFCs 1584 & 1585
• multicast-capable routers indicate that capability
with a flag in their link-state messages
• routers include in their link-state messages a list
of all groups that have members on the router’s
directly-attached links (as learned through IGMP)
Link state: each router floods link-state advertisement
Multicast: add membership information to “link state”
Each router then has a complete map of the topology, including
which links have members of which multicast groups
S1
Z
X
R1
Y
R2
Z has network map, including membership at X and Y
Z computes shortest path tree from S1 to X and Y
Z builds multicast entry with one outgoing interface
W, Q, R, each build multicast entries
S1
Z
W
X
Q
R
R1
Y
R2
Link-state advertisement with new topology (may be due to link failure)
may require recomputation of tree and forwarding entry.
Link WZ failed in the diagram below.
S1
Z
W
Q
X
R
R1
Y
R2
Link state advertisement (T) with new membership (R3) may require
incremental computation and addition of interface to outgoing interface
list (Z) (Similarly, disappearance of a membership may cause deletion
an interface from an outgoing interface list). Link WZ is back to normal.
S1
Z
R3
W
T
Q
X
R
R1
Y
R2
Multicast Routing: PIM
Protocol Independent
Multicast (PIM)
• “Protocol Independent”
– does not perform its own routing information exchange
– uses unicast routing table made by any of the existing unicast
routing protocols
• PIM-DM (Dense Mode) - similar to DVMRP, but:
– without the routing information exchange part
– differs in some minor details
• PIM-SM (Sparse Mode), or just PIM - instead of
directly building per-source, shortest-path trees:
– initially builds a single (unidirectional) tree per group ,
shared by all senders to that group
– once data is flowing, the shared tree can be converted to a persource, shortest-path tree if needed
PIM Protocol Overview
• Basic protocol steps
– routers with local members send Join messages
towards a Rendezvous Point (RP) to join shared tree
– routers with local sources encapsulate data to RP
– routers with local members may initiate data-driven
switch to source-specific, shortest-path tree
Phase 1: Build Shared Tree
Shared tree after
R1,R2,R3 join
Join message
toward RP
RP
R1
Join G
R4
R2
R3
Phase 2: Sources Send to RP
S1
unicast encapsulated
data packet to RP
S2
RP decapsulates,
forwards down
Shared tree
RP
R1
R4
R2
R3
Phase 3: Stop Encapsulation
S1
(S1,G)
S2
Join G for S2
(S1,G)
(S2,G)
Join G for S1
RP
(*.G)
R1
R4
R2
R3
Phase 4: Switch to Shortest Path Tree
S1
shared tree
Join messages
toward S2
S2
RP
R1
R4
R2
R3
Phase 5: Prune (S2 off) Shared Tree
S1
Prune S2 off Shared tree
where iif of S2 and
RP entries differ
S2 distribution tree
Shared tree
S2
RP
R1
R4
R2
R3
RP Mechanism
• end-systems only need multicast address to
send or receive
• routers use algorithmic mapping of group
address to RP from manageably-small set of
RPs known throughout region
• consistent RP mapping and adaptation to
failures is CRITICAL
– all routers (within PIM region) must associate a
single active RP with a multicast group
• optimal RP location not necessary
RP Mechanisms — Overview
• Each candidate RP periodically indicates liveness
to Bootstrap Router in the PIM region
• Bootstrap Router periodically distributes set of
reachable candidate RPs to all PIM routers in
region
– like unicast routing—track liveness continuously, not on
demand
• Each PIM router uses the same hash
function and set of RPs to map a particular
multicast group address to that group’s RP.
Bootstrap Router
• Bootstrap Router function
– construct set of RPs (RP set) based on
Candidate RP advertisements
– periodically distribute RP set in Bootstrap
messages to all routers in region by hopby-hop flooding
• Bootstrap Router should be wellconnected for stability, and
dynamically elected for robustness
Bootstrap Router Election
• simple bridge-like spanning-tree election
algorithm
• candidate Bootstrap Routers originate PIM
hop-by-hop Bootstrap messages with IP
address and configurable preference value.
• Bootstrap messages exchanged by all PIM
routers within region
• most preferred (or highest numbered)
reachable candidate Bootstrap Router elected
• sent periodically and triggered
All routers use hash function to
map Group Address to RP
• hash function
– input: group address G and address of each
candidate RP in RP set (optional Mask)
– output: Value computed per candidate RP in RP set
– RP with highest value is the RP for G
• desirable characteristics
– minimize remapping when RP reachability changes
— remap only those that lost RP
– load spreading of groups across RPs
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
What Exactly is Needed?
• inter-domain route exchange protocol
• mechanism for connecting domains
– two models:
• discover sources using “source announcing” protocol
• know the source(s) a priori
Inter-Domain Route Exchange
• Exchange multicast reachability between
Autonomous Systems (AS)
– Just like unicast routes are exchanged with BGP
– Protocol is “Multiprotocol extensions to BGP” (RFC 2283)
• Also known as “Multicast” BGP (MBGP)
• Also known as BGP4+
• MBGP is available and deployed today.
– Multiple vendors: Juniper, Cisco, Nortel, etc.
What Exactly is Needed?
• inter-domain route exchange protocol
• mechanism for connecting domains
– two models:
• discover sources using “source announcing” protocol
• know the source(s) a priori
The Internet Solution
• Re-use existing protocols/solutions
– Use PIM-SM in the inter-domain
• The challenge is to avoid “root dependencies”
– A root/RP/core is one domain but no active group
participants (sources or receivers) in the domain
– Root dependencies can lead to political problems
and inefficiencies
The Internet Solution (cont)
• The key: Establish a root/RP/core per domain
– No “root dependencies”
• Remember the problem:
– Connecting sources and receivers
– Solution: Multicast Source Discovery Protocol
(MSDP)
• MSDP is the last piece of the puzzle; is simple to
implement; and yields an interim solution to interdomain multicast
MSDP -- Basic Idea
• MSDP advertises multicast sources to other
domains
• Other domains decide if group members
are active and find a way to get the data
• “MSDP connects shared-trees together”
• MSDP typically runs in the RP
MSDP - Elements of Operation
• Receivers in a domain join the shared-tree
• The RP is known only to routers in the domain
• When a source goes active in a domain, it’s
packets get to the RP in that domain
• The RP sends a Source-Active (SA) message
identifying the source and group it sends to
MSDP - Elements of
Operation (cont)
• How to get SA messages to all MSDP peers?
– Need MSDP topology flooding protocol
– The RP’s address is also in the SA message to
accommodate “peer-RPF” like flooding
– Each MSDP peer receives SA message and
forwards away from the originating RP
MSDP - Elements of
Operation (cont)
• Each MSDP speaking RP will examine SA
message to see if any local members are
joined to the group
• If so, the RP joins to source described in
SA message
• Otherwise, the SA message is ignored
(Flood-and-Join model)
How MSDP works with PIM-SM
A
SA
RP
Source
C
D
SA
Join
B
Join
RP
RP
Join
Join
Join
RP
SA
MSDP peer
PIM message
Physical link
MSDP message
Receiver
What Exactly is Needed?
• inter-domain route exchange protocol
• mechanism for connecting domains
– two models:
• discover sources using “source announcing” protocol
• know the source(s) a priori—SSM model
Source Specific Multicast (SSM)
• Basic idea:
– Assumes receiver knows the source(s)
– Reverse SPT join to source
• No RPs or MSDP
– About as straightforward as you can get!
How SSM Works
A
Source
C
D
Join
B
Join
Join
Join
PIM message
Physical link
Join
Join
Receiver
Source Specific Multicast
• Advantages
– Minor changes to existing infrastructure—still use PIM-SM
– No PIM-SM RP, or MSDP
• Limitations
– Requires modifications (last hop routers) and IGMPv3
– May be difficult to support some applications
• Thoughts
– Works for 9x% of killer-apps -- need mechanism
(WWW) to let receivers know who sources are
– Success will depend on seamless migration strategy
Today’s Lecture
• Recap
– Addressing
•
•
•
•
IP Address
Sub-netting
Super-netting (CIDR)
Route Aggregation Examples
– Routing Protocols
• Introduction
• Intra-domain (RIP, OSPF)
• Inter-domain (BGP)
• Multicast
– Introduction
– Internet Group Management Protocol (IGMP)
– Routing Protocols
• Intra-domain (DVMRP, MOSPF, PIM)
• Inter-domain (MSDP, SSM)
– Reliable Multicast
Reliable Multicast
 Fundamental Problems
• Scalable Reliable Multicast (SRM)
• Reliable Multicast Transport Protocol (RMTP)
• Forward Error Correction (FEC) and Reliable Multicast
• Pretty Good Multicast (PGM)
Multicasting Applications
Interactive
Conferencing
Reliability
Multimedia
distribution
Document distribution
100%
200 ms
2s
20 s
End-to-end Latency --->
•Video and audio
•Audio/video messages
conferencing
•Distance learning
•Interactive simulation •Multimedia entertainment
•Website concurrency
•Document distribution
•Software distribution
•Financial news
•Database concurrency
•Employee communications
•Communications with branches,
•dealerships, retail outlets, suppliers
Reliable Multicast
• Enhancements to IP Multicast
– Full Reliability (no packets lost)
– Maximum reliability given a latency bound (some packets may be
lost)
Unicast Appl
Multicast Appl
RMTP
TCP
IP
UDP
UDP
IP Multicast
Fundamental Problems
S = Sender
R = Receiver
rt = Router
• Ack-implosion
– all receivers send ACK to
sender
– sender becomes bottleneck
– increased end-to-end delay
– reduced throughput
Acknowledgments
S
Acknowledgments
rt
rt
rt
R
• Minimizing end-to-end latency
rt
rt
rt
R
• Flow/Congestion control
rt
rt
• Scalability
rt
R
R
rt
R
R
R
rt
R
R
Fundamental Problems
S = Sender
R = Receiver
rt = Router
• Ack-implosion
• Minimizing end-to-end
rt
Nack
latency
Nack
rt
rt
– packets lost in far
receivers need to be
recovered quickly
R
rt
rt
rt
R
• Flow/Congestion control
• Scalability
S
Retransmit
lost packets
rt
rt
rt
R
rt
R
R
rt
R
R
R
R
R
Fundamental Problems
S = Sender
R = Receiver
rt = Router
• Ack-implosion
S
• Minimizing end-to-end latency
Congested
links
• Flow/Congestion control
– feedback implosion
– what window size at
sender?
rt
rt
rt
R
rt
rt
rt
R
• Scalability
rt
rt
rt
R
R
rt
R
R
R
rt
R
R
Reliable Multicast
• Fundamental Problems
 Scalable Reliable Multicast (SRM)
• Reliable Multicast Transport Protocol (RMTP)
• Forward Error Correction (FEC) and Reliable Multicast
• Pretty Good Multicast (PGM)
Scalable Reliable Multicast (SRM)
• Based on Floyd, Jacobsen, McCanne’s
work (SIGCOMM ‘95)
Missing Parts
• Many-to-many reliable multicast
• Key ideas:
– Application-level Framing (ALF)
– Out-of-order reliable delivery
– NAK-based
– Receiver-based reliability
– NAKs and Repairs are both multicast
• Protocol:
– Sender transmits packets
– Receiver detects loss and multicasts NAK
– Any receiver with the message multicasts
the retransmission
Illustration of shared whiteboard
How does SRM solve Ack-implosion?
• Does not use “Acks”
NAK implosion
• Uses NAKs
– generated on packet loss only
• What about NAK-implosion?
– NAK implosion happens at all
receivers in addition to the
sender
• Need for NAK suppression
Worst Scenario: everyone NAKs
Ideal NAK Suppression
• Single NAK should suppress all others
• Randomly delay NAKs and shut up on
receiving the same NAK
NAKs
• How long should one wait before
sending NAK?
– Uniform distribution in the interval
[c1*ds,a, (c1+c2)*ds,a] where “s” is the
source of data and “a” is the receiver
which missed a packet and ds,a is the
one-way delay between “s” and “a”.
• Each receiver needs one-way delay
from every sender
Ideal Scenario: single NAK
Real-world NAK Suppression
NAK implosion
• Delay estimates are not
accurate
• Constants “c1” and “c2”
depend on the actual
topology of the network over
which the receivers are
distributed
• Imperfect NAK suppression
Better Scenario: some members NAK
How does SRM reduce end-to-end latency?
REPAIR implosion
• Any receiver with the repair
packet can do the multicast
retransmission
• May lead to “repair
implosion”
– repair implosion happens at all
Worst Scenario: everyone sends REPAIR
receivers in addition to the
sender
• Need for repair suppression
Ideal Repair Suppression
• Single “repair” should suppress all
others
REPAIRs
• Schedule repair timers and retransmit
when timer expires unless someone
has already done the retransmission.
• How long should one wait before
sending repair?
– Uniform distribution in the interval []
d1*da,b, (d1+d2)*da,b where “a” is the
NAK generator and “b” is a receiver
with the repair packet and da,b is the
one-way delay from “b” to “a”.
• Each receiver needs one-way delay
from every other receiver
Ideal Scenario: single REPAIR
Real-world Repair Suppression
REPAIR implosion
• Delay estimates are not
accurate
• Constants “d1” and “d2”
depend on the actual
topology of the network over
which the receivers are
distributed
• Imperfect “repair”
suppression
Better Scenario: some send REPAIR
Scalability of SRM
REPAIR implosion
• All pairs round-trip time
estimation needed for
effective NAK suppression
and “repair” suppression
– Not a scalable solution
• “self organization” of
receivers such that each
receiver needs to maintain
round-trip time estimates for
a small subset provides
scalability
Self-organization of receivers
Reliable Multicast
• Fundamental Problems
• Scalable Reliable Multicast (SRM)
 Reliable Multicast Transport Protocol (RMTP)
• Forward Error Correction (FEC) and Reliable Multicast
• Pretty Good Multicast (PGM)
Reliable Multicast Transport Protocol (RMTP)
DR = Designated Receiver
• Groups receivers into “local regions”
with a DR in each region
Acknowledgments
S
S = Sender
R = Receiver
rt = Router
• Organizes the DRs in a logical
rt
Acknowledgments
D
DR
hierarchy
D
D
• Acks from R to DR
• Acks from DR to S or DR
D
rt
rt
D
D
• Retransmission by DR
R
DR
D
rt
D
R
DR
rt
D
rt
R
D
rt
rt
DR
rt
rt
D
• Transmission by Sender
D
R
rt
D
D
R
R
R
R
How does RMTP solve Ack-implosion?
• Divide and Conquer
S = Sender
R = Receiver
rt = Router
DR = Designated
Receiver
S
Acknowledgments
— Acks from R to DR
rt
D
— Acks from DR to S or DR
DR
• Sender receives as many
level DRs
D
D
D
rt
rt
D
D
R
DR
D
rt
D
R
DR
rt
D
rt
R
D
rt
rt
Acks as there are top-
rt
rt
D
DR
D
R
rt
D
D
R
R
R
R
How does RMTP minimize end-to-end latency?
S = Sender
R = Receiver
rt = Router
DR = Designated
Receiver
• Local Recovery
S
• Retransmission by a DR
rt
D
DR
• Retransmission request
D
D
D
rt
rt
D
D
Retransmission
R
R
DR
D
rt
D
R
DR
rt
D
rt
Nack
D
rt
rt
DR
rt
rt
D
(Nack) from R to DR
D
R
rt
D
D
R
R
R
R
Recovery Strategy and Robustness of RMTP
• Receivers switch to next-
S = Sender
R = Receiver
rt = Router
DR = Designated
Receiver
S
Acknowledgments
level DR if current DR fails
rt
D
DR
• Primary and Backup
D
D
D
rt
rt
D
R
D
R
DR
D
rt
D
R
DR
rt
D
rt
DR Failure
D
rt
rt
DR
rt
rt
D
Sender
D
R
rt
D
D
R
R
R
R
Scalability of RMTP
S = Sender
R = Receiver
rt = Router
DR = Designated
Receiver
• RMTP is scalable because
of:
S
– hierarchical organization
rt
D
– “local regions” can be split to
DR
accommodate more receivers
– local recovery keeps end-to-
“logical tree” automatically?
D
rt
rt
D
R
D
R
DR
D
rt
D
– how do you construct the
DR
rt
D
rt
R
D
rt
D
group size
• One caveat:
D
rt
DR
rt
rt
D
end latency low regardless of
D
R
rt
D
D
R
R
R
R
State Diagram of RMTP Sender
Retransmit
Transmit
Process Ack
Retransmit
Transmit
Process Ack
Retransmit Timeout
Process Ack
Retransmit Timeout
Transmit Timeout
Transmit Timeout
Tx_ON
{Retx_Done &
Tx_Timeout}
{Tx_Timeout}
{TxDone}
IRTx_ON
RTx_ON
{Retx_Timeout}
{RetxDone}
{Retx_Done &
NOT Tx_Timeout}
{Imm_Retx_Req}
ACK_ON
Process Ack
Basic Operation of RMTP
• Designed for bulk data transfer
• Packet stream model
– each packet assigned a sequence number
• Use bitmap for error control (L, V)
0 1 1 1 0 0 1 ….. V
4 5 6 7 8 9 10 seq_no
next packet to receive L
Basic Operation of RMTP
• Send packets at fixed intervals (t_send)
• Upper bound for sending rate:
t_send
Time
packets
max. rate = (packet_size * send_win) / t_send
Example to Illustrate RMTP Execution
Retransmit (5,10)
R1
Retransmit (8,10)
1, . . ., 16
R2
S
Retransmit (8,14)
ReXmit
Queue
14
10
8
5
Packet#
1
2
2
1
# of Retransmission Request
R2, R3
R1
R3 R1, R2
1 2
Send
Window
Slide
Window
R3
3
4
5 6
7 8
Address of Requesting Receivers
9 10 11 12 13 14 15 16
Avail_window
Send Window = 16,
Mcast_Thresh = 1
RMTP Implementation Architecture
• User-level protocol process (rmtpd)
Sender
TCP
rmtpd
UDP
UDP
rmtpd
rmtpd
TCP
Receiver
TCP
Receiver
Reliable Multicasting of a File
Application feeds equal-sized data units to RMTP daemon except for the last chunk
Application Data Unit (ADU)
Application Data Unit
Application Data Unit
< ADU size
Application feeding RMTP daemon
< T_dally T_dally
t=0
Time
Retransmission Request
Resets T_dally Timer
Reliable Multicasting of a Continuous
Stream
Application chops a stream into “blocks” and RMTP uses T_dally at the end of each block
BLOCK # 1
EOB
BLOCK # 4
BLOCK # 3
BLOCK # 2
EOB
EOB
EOB
Application feeding RMTP daemon
< T_dally T_dally
T_dally
...
t=0
Time
Retransmission Request
resets T_dally Timer
DRAWBACK:
Sender waits for T_dally at the end of “every” block leading to low throughput
WHY? Neither the sender nor the DRs keep track of “membership” info
Reliable Multicasting of a Continuous
Stream
Sender and DRs explicitly keep track of their children in a dynamic manner
BLOCK # 1
EOB
BLOCK # 4
BLOCK # 3
BLOCK # 2
EOB
EOB
EOB
Application feeding RMTP daemon
t=0
Wait for ACK
from “all” children
DRAWBACK:
Wait for ACK
from “all” children
Additional processing at Sender and DRs
ADVANTAGE: High throughput
Wait for ACK
from “all” children
RMTP Status
- Used in AT&T’s billing network since 7/96
- Licensed the technology to GlobalCast Inc. (start-up in California) 7/97
- Used in distance learning system called IRI (Interactive Remote Instruction)
- Being used by Dow Jones Telerate for market data distribution
- Being proposed as an Internet standard (first Internet draft -- March 1998)
- Being integrated with Microsoft DCOM (alternative to CORBA)
- RMTP technology used in next generation Push systems
- RMTP used in Web Caching solution from Lucent Technologies (IPWorX)
Reliable Multicast
• Fundamental Problems
• Scalable Reliable Multicast (SRM)
• Reliable Multicast Transport Protocol (RMTP)
 Forward Error Correction (FEC) and Reliable Multicast
• Pretty Good Multicast (PGM)
Forward Error Correction and
Reliable Multicast Transmission
• Nonnenmacher, Biersack and Towsley in SIGCOMM’97 showed
how reliable multicast can be made scalable by incorporating
forward error correction (FEC)
• Key idea:
– proactively send parity packets with regular data packets
– loss of limited number of packets can be recovered using the
redundant packets
– reduces retransmissions
– improves latency (useful for delay-sensitive traffic)
How Forward Error Correction
works
Loss in FEC Block
D2
D3
D2 D1
D3
P2
FEC
Encoder
D1
P2 P1
P1
D3
D2
D2 D1
D1
P2
FEC
Decoder
D3
• Parameters:
– k original data packets form a transmission group (TG)
– h parity packets derived from the k data packets
– any k received out of k+h are sufficient
Why FEC for Reliable Multicast
Data Retransmission
D3
D3
D2 D1
D3
S
D3
D2 D1
D2 D1
D3
S
First Transmission
R1
D3
R2
D2 D1
D2 D1
R1
R2
R3
Parity Retransmission
P
R3
P = D1 xor D2 xor D3
D2 D1
S
P
P
R1
R2
R3
• A single parity packet can repair the loss of different data
packets at different receivers
Where to put FEC?
Application
Application
RM
RM/FEC
FEC
Network
Network
Data Link
Data Link
Layered FEC
Integrated FEC
RM: Reliable Multicast
Integrated FEC
• At Sender:
– Send k original packets
• At Receiver:
– If k-l packets (l > 0) have been received, send NAK(l)
requesting l parities
• At Sender:
– Receive NAK(L1), NAK(L2), …, NAK(LR) from the receivers
– Send Lmax = max{L1, L2, …, LR} parity packets
Cost of FEC Computation
• Network benefits from reduced number of transmissions
due to integrated FEC
•
But FEC is not for free
• Processing cost
– How fast can the coding/decoding be done?
– What is the throughput of a protocol based on integrated FEC?
Summary of FEC & RM
• Integrated FEC
– dramatically reduces the number of transmissions
– achieves scalability for large number of receivers (up to 10^6)
– reduces the feedback
• Software FEC for Reliable Multicast is feasible today
• From Nonnenmacher:
¬ FEC is like a wonder under the Christmas tree:
•All children missing different packets are satisfied with a
single common packet
Multicast Layered Recovery (MLR)
• Rhee et.al. At NCSU
• Nonnemacher et.al. Show that FEC+RM induces much less total
traffic compared to retransmission alone
• Question: How many repair packets should be sent?
¬ Worst case receivers?
– Introduces repair locality problem
• Solution: MLR suggests sending FEC packets in multiple layers
Multicast Layered Recovery (MLR)
Data
• Partitions f FEC repair
packets into K groups: F =
{f1, f2,…,fK}
G0
G1
• Transmits each group fi using
a different multicast address
• Receivers join or leave
multicast groups to match
their loss rates
G2
G3
FEC repairs
Reliable Multicast
• Fundamental Problems
• Scalable Reliable Multicast (SRM)
• Reliable Multicast Transport Protocol (RMTP)
• Forward Error Correction (FEC) and Reliable Multicast
 Pretty Good Multicast (PGM)
Pretty Good Multicast (PGM)
S = Sender
R = Receiver
rt = Router
DR = Designated
Receiver
• Router-assisted reliable
multicast
S
• Avoids drawbacks of SRM
rt
D
while maintains the
R
advantages
NAK
R
subnet
– network-level multicast
tree leveraged to do NAK
– subtree multicast is
supported for
retransmissions
rt
R
D
NCF
NCF
NCF
NAK
R
R
D
rt
rt
rt
R
D
rt
NCF
NAK
NCF
suppression
NCF D
rt
NAK
R
rt
rt
NAK
– one NAK generated per
D
R
rt
D
D
rt
NCF
R
R
R
Pretty Good Multicast (PGM)
S = Sender
R = Receiver
rt = Router
DR = Designated
Receiver
• Subtree multicast from
Designated Local
S
Retransmitter (DLR)
rt
DLR
– minimal redundant
retransmissions
R
D
rt
rt
Retransmissions
D
– router maintains per lost
D
R
R
D
rt
D
D
rt
rt
R
R
R
D
rt
rt
R
R
rt
rt
rt
packet state
D
R
R
R
Summary of Reliable Multicast
• One size does not fit all
• SRM with self-organization + local recovery is best suited for
many-to-many applications
• RMTP is good for one-to-many reliable multicast applications
• FEC is a powerful mechanism which can be combined with either
SRM or RMTP to further improve scalability and efficiency
• Pretty Good Multicast (PGM) leverages state maintained in the
routers to improve the efficiency of reliable multicast protocols
Useful References
• Books:
– Multicasting on the Internet and its Applications by Sanjoy Paul
(Kluwer Academic Publisher)
• Urls:
– http://catarina.usc.edu/multicast/srm.html
– http://www.east.isi.edu/RMRG/
– http://www.tascnets.com/mist/doc/mcpCompare.html
– http://research.ivv.nasa.gov/RMP/links.html
– http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1889.txt
– http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2326.txt
– http://www.eurecom.fr/~erbi/Bib/bib.html
How to Dig Deeper
• ftp://ftpeng.cisco.com/ipmulticast/
– directory of lots of useful documents
• briefings, guides, tutorials, command descriptions
• recommended configurations and settings
• interoperability notes, deployment strategies (ATM)
• http://www.cisco.com/warp/public/732/multicast/
• 3Com info: do site search
– http://www.3com.com/nsc/501303.html (Maufer book)
Multicast Textbooks
• Beau Williamson, Developing IP Multicast
Networks (The Cisco Press Design and
Implementation Series), 2000.
• Sanjoy Paul, Multicasting on the Internet and its
Applications, (Kluwer Academic Publishers), 1998
• Tom Maufer, Deploying IP Multicast in the
Enterprise, Prentice Hall, 1997.
• Ken Miller, Multicast Networking and Applications,
Addison Wesley, 1998.