Priority Progress Multicast

Download Report

Transcript Priority Progress Multicast

Failure Recovery
for Priority Progress
Multicast
Jung-Rung Han
Supervisor: Charles Krasic
1
Multicast?

One-to-Many delivery
scalable
 conserve bandwidth
 E.g. Digital TV


IP multicast


Many issues: security, billing, money
Application level multicast
Dedicated content distribution network
 Peer-to-Peer / End system multicast

2
QStream
Priority Progress Streaming (PPS)
 Adaptive to network conditions
 Using TCP
 Speg: Scalable mpeg, a “progressive” codec
 Priority Progress Multicast (PPM)

Makes a tree of PPS
3
Priority Progress Multicast




Store-and-Forward
Fragments data
Flow control
Single tree multicast
4
Presentation Outline





Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
5
Motivation:
Single Tree vs. Graph Multicast
Single tree
 Advantages



Graph
 Advantages
Simpler
Less overhead


Disadvantages


Vulnerable to failure
Unutilized bandwidth

Resilient to failure
Higher bandwidth
utilization
Disadvantages


More overhead
Complex


Hard to implement
Hidden issues
6
Presentation Outline





Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
7
Background:
Some multicast “streaming” systems


QStream: Single Tree Based
Bullet:


Single Tree as backbone, Peering connections form
Mesh
SplitStream:

Multiple Trees
8
SplitStream
9
Presentation Outline





Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
10
Distributed Tree Management





Tree Join operation,
also dealing with failure
Key issues: Scalability, Delay, Security…
Our approach: use Distributed Hash Table
(DHT)
Bamboo DHT and ReDiR Hierarchy
(Recursive Distributed Rendezvous)
Reuse one DHT for all mcast session. Removes
hotspot when nodes join at the same time
11
Failure Recovery






Failure: a node in multicast tree disappears
Important to Single tree approach
Less important to multi-source.
Our goal: hide the impact of failure
Ultimately, no pause in video playback
Our approach: pre-emptively deal with failure to
select a replacement with highest Eligibility.
12
“Eligibility”



A node’s capability to be a good forwarding
node.
Bandwidth, delay, uptime, distance from the root,
etc.
This is another sub-area of research that deals
with predicting and evaluating the quality of a
connection that is inherently variable. Vivaldi,
iPlane
13
Eligibility propagation





Goal: a node has replacement for its parent,
based on eligibility information
Only allows leaf node to be a replacement
candidate.
A leaf node’s eligibility propagates up
All internal node keeps track of the highest leaf
node reported from downstream and select a
replacement for itself
Report the chosen node to its direct children
14
Now we know what to do
when a failure occurs
But we still need something…
15
Failure Detection



TCP’s failure detection is inadequate
Application level heartbeat
Heartbeat interval is major concern


False positive vs. Delay
TCP vs. UDP heartbeat
16
Presentation Outline





Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
17
Evaluation

Multi-dimensional test space:
Roundtrip Time
 Heartbeat interval
 Competing traffic
 Wide vs. Narrow tree
 Long vs. Short tree
 Failure rate
 Adaptation window size
 Different video quality metrics

18
Emulab




www.emulab.net
Network testbed
Hundreds of machines
Allows users high degree of freedom
Network topology
 Traffic shaping: BW, delay, loss rate
 OS modifications


All done through web interface and SSH
19
Minimum Tree – Emulab Topology
20
Minimum Tree – Multicast Tree
21
Minimum Tree – BW graph
22
Medium Size Tree – Emulab Topology
23
Medium Size Tree – Multicast
24
Medium Size Tree – BW graph
25
Presentation Outline





Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
26
Conclusions

A single tree approach can deal with failures
(probably)
Video playback is not interrupted
 Impact of failure is second order concern to TCP
dynamics


Many other evaluations can be done
Different BW and RTT
 Bigger tree
 Varying degree of competing traffic
 Higher failure rate

27
Future Work




Evaluation of Distributed Tree Management
approach
Continued evaluation of failure recovery under
different conditions
Self adjusting tree to optimize bandwidth usage
Scaling window size
28
Final Comment

Evaluating the system is hard
Many variables
 Unexpected results


Using Emulab
Availability affected by time of day and paper
submission deadline
 Nodes do malfunction: do linktest often, but takes
significantly longer with bigger experiment!
 One run of an experiment takes 25 minutes
 Tip: Use a lot of scripts!

29
R
e
D
i
r
30