Active networks, (and reliable multicast)

Download Report

Transcript Active networks, (and reliable multicast)

An Active Reliable Multicast
Framework for the Grids
M. Maimour & C. Pham
ICCS 2002, Amsterdam
Network Support and Services for Computational Grids
Sunday, April 21st, 2002
Action INRIA-RESO
http://www.ens-lyon.fr/LIP/RESAM
Outline





Motivations behind (reliable) multicast
Use of active networks : the DyRAM
protocol
DyRAM main services
Simulation results
Conclusion
2
From unicast…

Sender
Problem
Sending same data to
many receivers via
unicast is inefficient.
data
data
data
data
data
data
Receiver
Receiver
Receiver
3
…to multicast on the Internet.

Sender
Problem
Sending same data to
many receivers via
unicast is inefficient.

Solution
data
data
data
data
Using multicast is
more efficient
Receiver
Receiver
Receiver
4
Reliable multicast


At the routing level, IP Multicast efficiently
delivers packets to all the receivers
subscribed to a multicast session but without
any reliability guarantees.
Reliability (including flow and congestion
control) is to be addressed at the transport
level.
5
Reliable multicast: a big win for grids
Data replications
SDSC IBM SP
1024 procs
5x12x17 =1020
Database updates
Code & data transfers
224.2.0.1
Data communications for
distributed applications
(collective & gather
operations, sync. barrier)
NCSA Origin Array
256+128+128
5x12x(4+2+2) =480
CPlant cluster
256 nodes
Multicast address group 224.2.0.1
6
Reliable multicast strategies


End-to-end solutions :
Only the end hosts (the source and/or
the receivers) are involved.
Problem : the lack of topology
information at the end hosts.
In-network solutions :
Some intermediate nodes
(router/server) are involved in the
recovery process.
7
Active networking solutions

Active routers are able to perform
customized computations on incoming
packets:




cache of data,
feedback aggregation,
filtering, subcasting,
…
8
The DyRAM framework for grids
(Dynamic Replier Active Reliable Multicast)
In order to enable distributed grid
applications, main design goals are :
 low recovery latency using local recovery
 low memory usage in routers : local
recovery is performed from the
receivers (no cache in routers)
 low processing overheads in routers :
light active services
9
DyRAM loss recovery strategy :
main active services
DyRAM is NACK-based …




Global NACK suppression
Early packet loss detection
Subcast of repair packets
Dynamic replier election
10
Global NACKs suppression
data4
only one NACK is
forwarded to the
source
11
Early loss packet detection
The repair latency can be reduced if the lost
packet could be requested as soon as possible
data3
data4
data5
A NACK is sent by the
router
These NACKs are ignored!
12
Replier election


A receiver is elected to be a replier for
each lost packet (one recovery tree per
packet)
Load balancing can be taken into account
for the replier election
13
Replier election and repair subcast
NAK 2 from link 1
NAK 2 from link 2
D0
DyRAM
IP multicast
2
NAK 2
0
1
Repair 2
NAK 2,@
D1
Repair 2
DyRAM
Repair 2
IP multicast
R1
NAK 2
1
0
IP multicast
NAK 2,@
NAK 2,@
IP multicast
NAK 2
R4
R3
IP multicast
Repair 2
R2
R5
R6
R7
The DyRAM framework for grids
The backbone is very
fast so nothing else
than fast forwarding
functions.
source
1000 Base FX
active router
Any receiver can be
elected as a replier
for a loss packet.
• Nacks suppresion
• Subcast
• Loss detection
active router
core network
Gbits rate
active router
100 Base FX
active router
active router
•Nacks suppression
•Subcast
•Replier election
A hierarchy of active
routers can be used
for processing specific
functions at different
layers of the hierarchy.
Some simulation results




Network model and metrics used
Local recovery from the receivers
DyRAM vs. ARM (cache in routers)
DyRAM : early lost packet detection
16
Network model
10 MBytes file transfer
Source router
17
Metrics



Load at the source : the number of the
retransmissions from the source.
Load at the network : the consumed
bandwidth.
Completion time per packet (latency).
18
Local recovery from the receivers (1)
4 receivers/group

#grp: 6…24
Local recoveries
reduces the endto-end delay
(especially for high
loss rates and a
large number of
receivers).
p=0.25
19
Local recovery from the receivers (2)

As the group size
increases, doing
the recoveries
from the receivers
greatly reduces the
bandwidth
consumption
48 receivers distributed in g groups  #grp: 2…24
20
DyRAM vs ARM

ARM performs
better than
DyRAM only for
very low loss
rates and with
considerable
caching
requirements
21
DyRAM: early lost packet detection
4 receivers/group #grp: 6…24

The end-to-end
latency is
decreased when
the early lost
packet detection
is enabled
22
Conclusions




Reliability on large-scale multicast is
difficult.
Active services can provide more efficient
solutions for reliable multicast related
problems.
Main DyRAM design goal is reducing the endto-end latencies using active services
which are keeped as light as possible making
DyRAM more suitable to grid applications.
23