Application Layer Multicast

Transcript Application Layer Multicast

Application Layer Multicasting
Yash Sheth
Swapnil Tiwari
Vishal Sahu
42961722
63242307
19489633
Application Layer Multicast
Multicasting is increasingly used as a efficient and
scalable mechanism for the dissemination of data
across heterogeneous systems on the internet
Today’s IP multicast is restricted due to separate
“islands” of singular control and ISPs are reluctant to
enable it in order to reduce router load
To overcome this hurdle of router dependence, there
is a need for efficient data delivery without modifying
the network. This is where the Application layer
comes in!
All multicast activity is implemented by the peers,
instead of the routers
The goal of this multicast protocol is to construct
and maintain an efficient an efficient overlay network
of nodes for data transmission
Overlay networks are formed by peers that need to
transmit or receive the data. They self-organize
themselves into logical topologies which facilitates
multi-point message forwarding
Paper Study:
1.
C.K.Yeo,B.S. Lee, M.H. Er,
"A survey of application level multicast techniques”
2.
Banerjee, S. and Bhattacharjee, B. and Kommareddy, C.,
"Scalable Application Layer Multicast”
3.
B. Zhang, S. Jamin, L. Zhang,
"Host Multicast: A Framework for Delivering Multicast
to End Users"
4.
Kyungbaek Kim, Sharad Mehrotra, and Nalini Venkatasubramanian,
"FaReCast: Fast, Reliable ALM for Flash Dissemination"
Paper 1: A survey of application level
multicast techniques
In this paper, we come across various different kinds of overlay
networks and their characteristics, as well as compare their
performances on a few standard metrics
Classification of Application Layer Multicast
1.
Overlay Topology Design
2.
Service Model
3.
Architecture
Classification by Overlay Topology






Control Topology and Data Topology together determine the
overall structure of the Overlay Network
Tree-Based Topology:
Application Layer Multicast Architecture (ALMA)
Self-Configuring source specific trees where data topology is
comprised within the control topology. Centralized
Directory Server (DS) maintains an updated records of the
tree.
New members query the DS for a list of potential parents
and informs it in case they change their parent
Each member incorporates dual performance metrics of loss
rate and RTT provided by each member to its peers from a
ready-list for each member
This is similar to the Gossip algorithm, where each member
shares its end-to-end metric data with its gossip candidates
 Members change their parent to the one having the most
optimum distance metric. This is called spiraling with their
ancestors and siblings
 Loop detection in this dynamic tree topology is implemented
by maintaining a level number for each member, which is
updated whenever it finds a new parent with respect to the
new parent’s level

Application Level Multicast
Infrastructure (ALMI)






Provides a multicast middleware which is implemented above
the socket layer. Tailored for a small group in a many-to-many
service model
Central controller creates a minimum spanning tree MST
consisting of unicast connections between end hosts.
Latency between members is used as the link cost of the
MST, thereby optimizing ALMI overlay for low latency data
delivery
Dynamic change can create different versions of the MST,
which are tracked by the controller by version number. This
is done to prevent loops and partitions in the new tree
New MST is periodically sent to the previous version
member to update their knowledge of connections
Packet having newer version number is not propagated from
a member with older version till it is updated
Banana Tree protocol (BTP)






Uses a receiver-based self-organizing approach to build a
shared group data tree designed mainly for distributed file
sharing applications
First host to join the group becomes the root. Subsequent
member learn of the root and join the tree
Tree is incrementally optimized for low delay by allowing a
node to switch to a sibling if it is closer than the parent. The
sibling information is provided to each node by its new
parent
Loop prevention is handled by fulfilling these conditions:
A node rejects all attempts for switching if it is itself in the
process of switching parents
It must include the current parent information while
switching to validate that the potential parent is its sibling
Host Multicast (HM)







Provides best-effort delivery to applications while being
compatible with IP multicast to the furthest extent possible
Automates interconnection of IP multicast islands and
provides multicast to “incapable” end-hosts via unicast
tunnels
Multicast islands are connected via UDP tunnels between
Designated Members (DM) of the island
It has a distributed tree building protocol scaling to O(N)
A new member discovers the root of the shared tree
through a Rendezvous Point (RP). It then sets this as a
potential parent and takes the list of its children
DFS-style search for closest parent
Loop detection is easy, as member knows it its entire path
from the root until that point
Overlay Multicast Network
Infrastructure (OMNI)
Targeted at media streaming applications
 Single source overlay tree aimed at minimizing end-to-end
average latency
 Each member maintains information about its neighbors and
ancestors
 Storing the root path aids in loop detection

Mesh Tree
A two step approach:
1. Group Members self organize themselves in control
topology called mesh.
2. A routing protocol runs across this control topology to
define unique path to each member.
Advantage over Tree approach:
Lesser construction and maintenance overhead as routing
protocol itself takes care of loop avoidance and detection.
Narada
First a mesh control topology across member node is built,
followed by nodes self organizing themselves in source-rooted
multicast trees to form a data topology using DVMRP protocol.

Each member maintains state information about all other
members which is periodically exchanged, leading to large
number of messages.

The overhead of member discovery and maintenance of
membership results in considerable overhead which limits
Narada to small/medium groups.

The utility of each link is periodically evaluated and links may
be deleted and added as per need.

Kudos

Extension of Narada. Hierarchical topology.
Partitions nodes into clusters with a unique representative
(cluster head). Within each cluster independent instance of
Narada is run. At the top level, Narada runs on overlay nodes
comprising of all cluster heads.

The data topology comprises of independent trees at bottom
level and a tree at the top level.

Complex architecture due to cluster migration, diffusion and
splitting.

However, it provides superior scalability and low management
overhead.

Scattercast
Similar to Narada but used ScatterCast proxies (infrastructure
service agents).

Uses Gossamer to self organize into a mesh control topology.
Source rooted distribution trees are then constructed using a
routing algorithm.

Gossip style member discovery. Each member periodically
picks up random nodes to exchange membership information.


Latency used a cost metric.
Mesh optimization using a cost function (which is cost of
routing to different sources via its neighbors.

Embedded Structure
Members of the overlay network are assigned logical
addresses from some abstract coordinate space
 Overlay is built using these logical addresses
 Topology Agnostic with some exceptions
 Highly Scalable

ALM - CAN






Designed to scale to a large group with multiple source
service model
Uses underlying CAN architecture of virtual d-dimensional
space divided into zones
Each node owns it zone and maintains a routing table
consisting of only its neighbors
Control topology: d-dimensional space
Data Topology: Directed flooding over Control topology
Dissemination by data forwarding rule
◦ Forward only to neighbors except the one you received it from
◦ Don’t forward if the packet has travelled half the space forward
from source
When nodes join the zone splits, when nodes leave the
zones are remerged
 Long data distribution paths

ALM-DT











Designed to support large multicast size
Established and maintained in a distributed fashion
Each member is assigned (x, y) coordinates
Distance metric is the neighbor test
Control topology consists of combinations of DTs
Nodes only maintain information about neighbors
Source rooted data topology is embedded in the DT
Uses compass routing
New nodes join using the DT server
Set of alternative routes available in case of failure
Suboptimal mapping of the overlay to the physical network
Compass Routing
Bayeux
Designed to be primarily fault tolerant
 Control Topology: Tapestry Overlay
 Data Topology:

◦ Built explicitly over control overlay
◦ Consists of nodes acting as software routers as well as multicast
receivers






Routing done using local routing maps stored at each node
Any destination can be reached in logbN steps
Topology aware
Join and Leaving the group
Generates more traffic since it maintains more group
membership information
Root is the single point of failure
◦ Can be avoided by a multicast tree partitioning mechanism
Routing in Bayeux
Join Request in Bayeux
NICE






Designed for low-bandwidth, data streaming applications
with large receiver sets
Distance metric is the latency between members
Control topology: Consists of multilevel hierarchy
A distributed clustering protocol at each layer chooses a
cluster leader which is the graph theoretic center of cluster
in Layer Li to join Layer Li+1
Cluster size at each layer is between k to 3k-1 where k is
constant and comprises of members close to each other
Joining operation:
◦ A new member is bootstrapped by RP to the hosts in highest
layer(Li) of hierarchy
◦ Joining host contacts the members of these layer to find the
closest one by latency
◦ The selected node informs the node about its peers in layer Li -1
◦ The process iteratively continues till the node locates its cluster
in Layer L0
Multilayered hierarchy in NICE
NICE (Contd)






Topology aware since the proximity is based on latency
between nodes
In the control topology, all members of each cluster peer
with each other and exchange refresh messages
incorporating the latencies between them
Also, members probe peers in supercluster to identify a new
cluster leader to adapt to changing network conditions
Cluster leaders are also responsible for splitting and merging
of clusters to follow size bounds
Worst case control overhead is O(klogN)
Number of application level hops between any pair of
members : O(logN)
NICE (Contd)

Data Topology: Embedded source specific tree with a simple
forwarding rule
◦ Source sends the packet to all its peers in the control toplogy
◦ A receiver forwards the packets only if it’s a cluster leader
Loops and partitions may occur due to stale cluster
membership or node failure
 No special mechanisms for handling, depends on cluster
members and leaders to restore the hierarchical
relationships and reconcile the cluster view for all members

Flat Topology v/s Hierarchy Topology





In a flat topology, there is only a single-layer overlay sitting
atop the physical network.
In a hierarchical topology, we introduce a hierarchy into the
overlay forming a multi-layer overlay.
Hierarchical topology are highly scalable and incur low
management overhead
Children nodes cannot form overlay links across clusters
Additional overhead required for maintenance of clusters
Service Models

Mechanism in which data is delivered multiple destinations.

Best Effort vs Reliability

Source-specific delivery vs many any-source delivery
Determines the type of application the various multicast
techniques can support. (video conferencing vs data replication)

Best Effort vs Reliable transfer

Reliability here refers to end-to-end reliability.

Unlike TCP which only provides hop-to-hop reliability.

Must be augmented with data duplication and retransmission.
Puts pressure on source to handle retransmission and rate
control.

Receivers can implement buffering and rate limiting capabilities
to help source node.

Best Effort vs Reliable transfer
contd...
ALMI uses direct connection to source for retransmission in
case of error recovery.

Scattercast uses Scaleable Reliable Multicast (SRM) to recover
lost data. A recovery request is forwarded to source through
intermediate proxy nodes (SCX) until one of the SCX can
recover data.

Some protocols use some sort of heuristic to choose best
outgoing path or predicting the shortest and most reliable link
to the next hop.

Source-specific vs any-source
Source-specific
Builds a data topology tree rooted at a source tree.
 Simple and grow linearly with node addition.
 Overhead of maintaining and optimizing multiple trees.

Any-specific
Any member can be root in the tree.
 Less overhead in tree building and management.
 But not optimized for a single source.

Source-specific vs any-source
contd ...
Type of model adopted by a protocol depends on the targeted
applications.

Scattercast uses single-source at it is heavily used in media
streaming and broadcasting.

ALMI which is primarily targeted for collaborative applications
uses any source model.

Architecture
Peer-to-Peer (P2P)



All functionalities are vested with end users.
Simplicity of set up and deployment.
Provides redundancy and resource sharing.
Proxy Support




Dedicated servers strategically deployed within network to
provide efficient data distribution and value added services.
Better capable that individual host to provide more efficient
services.
Usually static in nature. Less responsive to changing
conditions.
Problems with deployment.
Peer-to-Peer vs Proxy contd ...




OMNI deploys proxies called Multicast Service Nodes. Prior
to data transfer MSN organize themselves in into a data
delivery tree.
Overcast organizes proxies called overcast nodes into a
distribution tree rooted at single source. Proxies also
provide storage capability in the network.
Scattercast uses its proxies (SCXs) to build an application
aware framework. (such as rate adaptation) SCX are also
used to repair partitions in overlay network.
Host Multicast (HM) uses designated members (DM) to
build a shared data tree. DMs are not proxies as they are not
dedicated server nodes.
Architecture
Centralized
All responsibilities of group management, overlay computation
and optimization are with a central controller.
 Leads to simplification of routing algorithm, efficient and simple
group management.
 However, limited scalability and central point of failure.

Distributed
Group management, overlay computation vested with
individual nodes of network.
 Require some sort of mechanism for host to learn about some
nodes in multicast group.
 More robust to failure and scalable.
 Excessive overhead. Not optimal and efficient in building
overlay network.

Architecture contd…
Hybrid
A lightweight controller along with distributed algorithm.
 Controller is used to facilitate group management, error
recovery and some of overlay construction.
 Actual overlay construction is done by individual nodes.

Centralized, Distributed and Hybrid
contd ...
ALMI only one that uses completely centralized approach
where session controller has complete knowledge of
membership.
 Overcast protocol maintains a global registry through which
proxies learn about the overlay network.
 SCRIBE requires a contact node in the overlay network which is
responsible for bootstrapping the new node.
 NICE and TBCP need a rendezvous point to learn about
higher level/root nodes.
 ALMA and HM are examples of hybrid approach. They both use
centralized server as a rendezvous point for new members. RP
maintains membership information.

Performance Comparison

Scalability
◦
◦
◦
◦
Limited by protocol efficiency
Number of receivers supported can be a good metric
Another metric can be the protocol overhead
A system which scales up more doesn’t mean it is better
than a system which doesn’t scale to the same extent!
Performance Comparison
Performance Comparison

Efficiency of Protocols
◦ Depends on factors such as:





Quality of data paths generated
Protocol Overhead
Time and resources used to construct the application
Layer multicast overlay
Amount of state information required at each node to
maintain the overlay.
Performance Comparison
Quality of data path:
◦ Stretch
 Also called as Relative Data penalty
 Measure of the increase in latency that applications incur while using
overlay routing
 Stretch = Protocol’s unicast routing distance
IP unicast routing distance
◦ Stress
 Measure of effectiveness of the protocol in distributing network load
across different physical links
 Defined on per node per link basis
 Counts the number of identical packets sent by the protocol on that
link
◦ All comparisons are made against IP multicast
◦ As these factors are difficult to measure, other factors such as path
length(for Stretch) and Out-degree(for Stress) can be used for analysis
Performance Comparison
Protocol Overhead
◦ Measured as total bytes of non-data that enters the
network
◦ Includes control and maintenance traffic, probe traffic etc
◦ State information maintained by a node also contributes to
the protocol overhead
Failure Tolerance
◦ Depends on various mechanisms put in place for error
and failure recovery
◦ Whether prone to single point of failure or failure can be
contained or localized
Paper 2: Host Multicast: A Framework for
Delivering Multicast to End Users
In this paper, we study the technique of Host Multicast HM,
which is an application layer technique
Along with overcoming the deployment and control pitfalls of IP
multicasting, HM is compatible with existing IP multicast
infrastructure
This paper goes on to discuss the Host Multicast Tree Protocol
HMTP and further suggest some performance tuning and
provide the performance evaluation of this technique
Host Multicast
It is a hybrid approach to Application layer multicast and IP
multicast
 Host Multicast Framework to provide best effort delivery to
applications
 Compatibility with IP multicast allows it to accept and
forward packets from existing IP multicast networks, hence
deployable on existing internet
 No support is required from the OS, routers or servers.
Hence, it supports transmission of data over heterogeneous
systems irrespective of the network

HM Architecture
Rendezvous Point (HMRP)
RP
Designated Member (DM)
DM
Host
Host
IP Multicast Island
Host Host
DM
Unicast Tunnel
DM
Host
Host
Normal Member
Each cloud above is considered to be an existing IP
multicast island being linked by a channel of communication. HM
is used for distribution of data over such a network

HM Architecture






Every island has a Designated Member DM assigned to
communicate with other such islands
Any 2 DMs are connected via a Unicast Tunnel over UDP
All DMs run HMTP to self-organize their own island as a
member of a bi-directional shared tree. Here, a special node
is assigned as the root
From member to root, there is only one loop-free path
Incoming data is replicated and sent to remaining neighbors,
thus capable of reaching any node on the tree
Each multicast group or island has a Group ID and Session
Directory stored in a Host Multicast Rendezvous Point
(HMRP)
Host Multicast Tree Protocol-HMTP

Build a bi-directional shared-tree connecting all islands

The tree should be congruent to physical network topology
to be efficient: use member-to-member round-trip time as
distance metric in current design

The tree should be robust: be able to handle node failure,
dynamic join/leave etc.

In HMTP members maintain the tree structure by

periodically exchanging messages with neighbors
The mechanisms to detect node failure and loop
formation are described further..
Join
Root of the tree along with its directory is always known by
the HMRP
 New member does DFS to find closest parent
 Clustering nearby members make the tree congruent to
actual physical network topology to support islands

B
A
C
D
F
E
H
G
Where is my group?
Root of your group is A
RP
Maintenance and Member Leaving

When Node leaves, parent and children are automatically
notified
Each member keeps its children list and root path up to date
by exchanging REFRESH and PATH messages with neighbors.
 Root sends REFRESH message to HMRP

B
A
C
D
F
E
H
G
RP
Loop Detection and Resolution





Build a bi-directional shared-tree connecting all islands
Loop is possible: Multiple conflicting joins happen at the same
time.
Loop Detection: One’s root path contains itself
Resolution: Leave the current parent and re-join the tree
from the root.
Hence we can say that loops are rare.
B
A
C
E
D
F
G
Performance Metrics
Tree Cost: sum of all tree link delays
◦ Cost ratio to IP Multicast
 Tree Delay: delay from one member to another along the
tree
◦ Delay ratio to unicast delay
 Link Load
◦ Number of duplicate packets carried by a physical link.
 Compared to Unicast Star and IP Multicast

Performance Comparisons
Conclusion
The authors have discussed the technique of Host Multicast
and compared its performance on various test scenarios,
mainly against Unicast and IP multicast scenarios
 This proves the efficiency of Host Multicast along with the
HMTP Protocol
 There has been a lot of related work regarding HM in the
research of ALMI, Narada, Gossamar, Yold, BTP, etc..
 Future work suggested the reduction of HMTP delay,
supporting multiple DMs on a single island, etc..

Paper 3: Scalable Application
Layer Multicast

Proposal:
The paper proposes a highly scalable application layer
multicast protocol NICE specially designed for lowbandwidth, data streaming application with large receiver sets
Ex: News and sports ticker services, real time stock updates
etc.

NICE is a recursive acronym for NICE Internet
Cooperative Environment
Application Layer Multicast
End nodes of the overlay network replicate data instead of
routers
 Intuitively less efficient than native multicast
 We need a protocol which is:
◦ Efficient in delivery of data along data paths
◦ Robust to failure
◦ Introduces minimum control overhead

Hierarchical Arrangement










Members are assigned to different layers
Layers are sequentially numbered with the lowest layer being
layer zero(L0)
All the hosts belong to L0
Hosts in each layer are partitioned into clusters
Cluster consists of hosts separated by low latency
Size of Cluster is between k and (3k – 1), where k is a
constant
Each cluster has a leader which is at minimum-maximum
distance to all other hosts
The cluster leaders of all clusters in Layer Li-1 join Layer Li
At most logkN layers, highest layer has a single member
Host maintains information about all the clusters it belongs
to and the super cluster
Control Topology






Neighbors on the control topology exchange soft state
refreshes
They do not generate high volumes of traffic with all the
remaining members of the cluster
Each member of the cluster exchanges soft state refreshes
Allows all the cluster members to identify the changes in
cluster membership
Control topology is a clique
Worst case control overhead at the highest layer for the
node is O(klogN)
Clusters in NICE
Data Topology





Data path is source-specific tree implicitly defined from the
control topology
Source forwards the data to all the members in the cluster
Receiving nodes forward the data only if they are cluster
leaders
Data topology is a star topology
Number of application level hops between any pair of
members is logN
Protocol

Join
◦
◦
◦
◦
◦

The new node contacts the Rendezvous point(RP)
RP responds with a list of peers in the highest layer
The node chooses the member closest to itself
The process continues iterative till layer 0
The cluster leader may change during join operation
Cluster Maintenance:
◦ Each member of the cluster periodically sends HeartBeat
messages containing the distance estimates from the
member to the other peers in the cluster
◦ If a member does not send a message within the time
interval, it is assumed to be no longer part of the group
Join
Protocol (Contd…)

Cluster Split
◦ A cluster leader periodically checks for the size of the
cluster
◦ A split occurs if the size goes beyond 3k - 1
◦ The cluster leader initiates the split operation
◦ Divides the cluster into two equal sized clusters
◦ Elects the cluster leaders for them and transfers the
leadership using LeaderTransfer messages

Cluster Merge
◦ Cluster leader initiates the merge operation
◦ It chooses its closest cluster peer in the super-cluster
◦ Performs the merge operation by ClusterMergeRequest
◦ Cluster leaders of both the clusters inform their
respective peers about the change
Protocol (Contd…)

Host Departure
◦ A host can either depart gracefully, by sending a remove
message to all peers
◦ A host can abruptly depart due to failure, in this case the
non-receipt of HeartBeat message indicates the peer has
left

Leader Selection
◦ If the departing host was a leader, departure triggers
leader selection
◦ Each host chooses a leader independently and
communicate this to other peers using HeartBeat
meassages
◦ Conflicts between two leaders are resolved in this phase
Simulations
Conclusion
NICE protocol provides a low overhead control structure
 It also provides low latency data distribution paths
 It is well suited for low bandwidth application consisting of
large receiver sets
