Priority Progress Multicast
Download
Report
Transcript Priority Progress Multicast
Failure Recovery
for Priority Progress
Multicast
Jung-Rung Han
Supervisor: Charles Krasic
1
Multicast?
One-to-Many delivery
scalable
conserve bandwidth
E.g. Digital TV
IP multicast
Many issues: security, billing, money
Application level multicast
Dedicated content distribution network
Peer-to-Peer / End system multicast
2
QStream
Priority Progress Streaming (PPS)
Adaptive to network conditions
Using TCP
Speg: Scalable mpeg, a “progressive” codec
Priority Progress Multicast (PPM)
Makes a tree of PPS
3
Priority Progress Multicast
Store-and-Forward
Fragments data
Flow control
Single tree multicast
4
Presentation Outline
Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
5
Motivation:
Single Tree vs. Graph Multicast
Single tree
Advantages
Graph
Advantages
Simpler
Less overhead
Disadvantages
Vulnerable to failure
Unutilized bandwidth
Resilient to failure
Higher bandwidth
utilization
Disadvantages
More overhead
Complex
Hard to implement
Hidden issues
6
Presentation Outline
Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
7
Background:
Some multicast “streaming” systems
QStream: Single Tree Based
Bullet:
Single Tree as backbone, Peering connections form
Mesh
SplitStream:
Multiple Trees
8
SplitStream
9
Presentation Outline
Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
10
Distributed Tree Management
Tree Join operation,
also dealing with failure
Key issues: Scalability, Delay, Security…
Our approach: use Distributed Hash Table
(DHT)
Bamboo DHT and ReDiR Hierarchy
(Recursive Distributed Rendezvous)
Reuse one DHT for all mcast session. Removes
hotspot when nodes join at the same time
11
Failure Recovery
Failure: a node in multicast tree disappears
Important to Single tree approach
Less important to multi-source.
Our goal: hide the impact of failure
Ultimately, no pause in video playback
Our approach: pre-emptively deal with failure to
select a replacement with highest Eligibility.
12
“Eligibility”
A node’s capability to be a good forwarding
node.
Bandwidth, delay, uptime, distance from the root,
etc.
This is another sub-area of research that deals
with predicting and evaluating the quality of a
connection that is inherently variable. Vivaldi,
iPlane
13
Eligibility propagation
Goal: a node has replacement for its parent,
based on eligibility information
Only allows leaf node to be a replacement
candidate.
A leaf node’s eligibility propagates up
All internal node keeps track of the highest leaf
node reported from downstream and select a
replacement for itself
Report the chosen node to its direct children
14
Now we know what to do
when a failure occurs
But we still need something…
15
Failure Detection
TCP’s failure detection is inadequate
Application level heartbeat
Heartbeat interval is major concern
False positive vs. Delay
TCP vs. UDP heartbeat
16
Presentation Outline
Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
17
Evaluation
Multi-dimensional test space:
Roundtrip Time
Heartbeat interval
Competing traffic
Wide vs. Narrow tree
Long vs. Short tree
Failure rate
Adaptation window size
Different video quality metrics
18
Emulab
www.emulab.net
Network testbed
Hundreds of machines
Allows users high degree of freedom
Network topology
Traffic shaping: BW, delay, loss rate
OS modifications
All done through web interface and SSH
19
Minimum Tree – Emulab Topology
20
Minimum Tree – Multicast Tree
21
Minimum Tree – BW graph
22
Medium Size Tree – Emulab Topology
23
Medium Size Tree – Multicast
24
Medium Size Tree – BW graph
25
Presentation Outline
Motivation
Background
Description of Approach
Evaluation
Conclusions and Future Work
26
Conclusions
A single tree approach can deal with failures
(probably)
Video playback is not interrupted
Impact of failure is second order concern to TCP
dynamics
Many other evaluations can be done
Different BW and RTT
Bigger tree
Varying degree of competing traffic
Higher failure rate
27
Future Work
Evaluation of Distributed Tree Management
approach
Continued evaluation of failure recovery under
different conditions
Self adjusting tree to optimize bandwidth usage
Scaling window size
28
Final Comment
Evaluating the system is hard
Many variables
Unexpected results
Using Emulab
Availability affected by time of day and paper
submission deadline
Nodes do malfunction: do linktest often, but takes
significantly longer with bigger experiment!
One run of an experiment takes 25 minutes
Tip: Use a lot of scripts!
29
R
e
D
i
r
30