Multimedia conferencing

Download Report

Transcript Multimedia conferencing

Multimedia conferencing
Raphael Coeffic ([email protected])
Based partly on slides of Ofer Hadar, Jon Crowcroft
Which Applications?
•
Conferencing:
 Audio/video communication and application sharing
 First multicast session IETF 1992
 Many-to-many scenarios
•
Media Broadcast
 Internet TV and radio
 One to many scenario
•
Gaming
 Many to many
What is needed?
•
Efficient transport:



•
Audio processing:


•
who is allowed to start a conference?
how fast can a conference be initiated?
Security and privacy:


•
How to ensure Audio/Video Quality?
How to Mix the streams?
Conference setup:


•
enable real time transmission.
avoid sending the same content more than once.
Best transport depends on available bandwidth and technology.
How to prevent not-wanted people from joining?
How to secure the exchanged content?
Floor control:

How to maintain some talking order?
How to Realize? Centralized
•
All register at a central point
•
All send to central point
•
Central point forwards to others
•
Simple to implement
•
Single point of failure
•
High bandwidth consumption at center point

•
Must receive N flows
High processing overhead at center point


Must decode N flows mix the flows and encode N flows
With no mixing the central point would send Nx(N-1) flows
•
Appropriate for small to medium sized conferences
•
Simple to manage and administer:



Allows access control and secure communication
Allows usage monitoring
Support floor control
•
Most widely used scenario
•
No need to change end systems
•
Tightly coupled: Some instances know all information about all participants at all times
How to Realize? Full Mesh
•
All establish a connection to each other
•
All can send directly to the others
•
Each host will need to maintain N connections
•
Outgoing bandwidth:


•
Incoming bandwidth:

•
Send N copies of each packet
simple voice session with 64kb/s would translate to 64xN kb/s
If silence suppression is used then only active speakers send data
In case of video lots of bandwidth might be consumed

Unless only active speakers send video
•
Floor control only possible with cooperating users
•
Security: simple! do not send data to members you do not
trust
•
End systems need to mix the traffic –more complex end
systems
How to Realize? End point based
•
All establish a connection to the chosen mixer.
•
Outgoing bandwidth at the mixer end point:


•
Incoming bandwidth:

•
Send N copies of each packet
simple voice session with 64kb/s would translate to 64xN kb/s
If silence suppression is used then only active speakers send data
In case of video lots of bandwidth might be consumed

Unless only active speakers send video
•
One of the end systems need to mix the traffic –more complex
end system.
•
Mostly used solution for three-way conferencing.
How to Realize? Peer-to-Peer
•
Mixing is done at the end systems
•
Increases processing over-head at the end
systems
•
Increases overall delay

Possibly mixed a multiple times
•
If central points leave a conference the
conference is dissolved
•
Security: Must trust all members

•
Access control: Must trust all members

•
Any member could send all data to non-trusted
users
Any member can invite new members
Floor control: requires cooperating users
Transport considerations
•
Transport layer:
 Most of the group communication systems on top of unicast sessions.
 Very popular in the past: multicast.
•
Application layer:
 RTP over UDP.
 Why not TCP?


•
Better NAT traversal capabilites (used by Skype as the last solution).
But, not really suitable for real time feed back (Why?).
Control protocol:
 Interactive conferencing: SIP, H.323, Skype, etc...
 Webcast: RTSP, Real audio and other flavours.
•
Session description:
 SDP (Session description protocol).
IP Multicast
•
Why?
 Most group communication applications are based on top of unicast sessions.
 By unicast, each single packet has a unique receipient.
•
How?
 Enhance the network with support for group communication
 Optimal distribution is delegated to the network routers instead of end systems
 Receivers inform the network of their wish to receive the data of a communication
session
 Senders send a single copy which is distributed to all receivers
Multicast vs. Unicast
A
B
C
D
E
• File transfer from C to A,B,D and E
• Unicast:
multiple copies
• Multicast:
single copy
IP Multicast
•
True N-way communication
 Any participant can send at any time and everyone receives the message
•
Unreliable delivery
 Based on UDP: Why?

•
Avoids hard problem (e.g., ACK explosion)
Efficient delivery
 Packets only traverse network links once (i.e., tree delivery)
•
Location independent addressing
 One IP address per multicast group
•
Receiver-oriented service model
 Receivers can join/leave at any time
 Senders do not know who is listening
IP Multicast addresses
•
Reserved IP addresses
 special IP addresses (class D): 224.0.0.0 through 239.255.255.255

class D: 1110+28 bits268 million groups (plus scope for add. reuse)
 224.0.0.x: local network only
 224.0.0.1: all hosts
 Static addresses for popular services (e.g., SAP –Session
Announcement protocol)
Alternatives to Multicast
•
Use application level multicast
 Multicast routing done using end hosts

Hosts build a multicast routing tables and act as multicast router (but on
application level)
 User request content using unicast
 Content distributed over unicast to the final users
Application level Multicast vs. unicast
Content source
Traditional
Content source
Application level
multicast
Conference mixer architecture
•
Main components for centralized conference mixer:



•
Coder / decoder (+ quality ensuring components).
Synchronization
Mixer
Processing pipeline:
Audio Mixing
G.711
A
E
G.729
B
E
GSM
C
E
G.711
G.711
D
X-A=B+C
G.729 X=A+B+C
G.729
X-B=A+C
D
Periodic timer
E
B
GSM
GSM
D
E
A
X-C=B+A
E: Encoder
D: Decoder
E
C
Audio Quality
•
Mostly based on „Best effort“ networks:


•
Depending on the codec, different quality can be reached:


•
No garanty for nothing.
Packet get lost and/or delayed depending on the congestion status of the network.
Mostly reducible to a „needed bandwidth vs. quality“ tradeoff.
Wanted properties: loss resistancy, low complexity (easy to implement in embedded hardware).
Audio datas have to be played at the same rate they have been sampled:



Different buffering techniques have to be considered, depending on the application.
Pure streaming (Radio/TV) are not interactive and thus not influenced by the delay. Quality is
everything.
Interactive conferencing need short delays to garanty the real time property. Delay is experienced
as „very annoying“ by users in such applications.
Codecs quality measurements
•
Codecs: Mean Opinion Score (MOS) measurements:
Codecs: loss resistancy
Codecs: complexity
Audio quality: packet loss
•
Packet loss:
 The impact on voice quality depends on many factors:


Average rate: rate under 2~5% (depending on the codec) are almost unhearable. Over
15% (highly depending on the burstiness), most calls are experienced as
ununderstandable.
Burstiness: depending on the loss distribution, the impairement can vary from small
artifacts due to packet loss concealment to really anoying quality loss.
 Modern codecs like iLBC, which are exclusively focused on VoIP, are much more
resistant and should thus be prefered to PSTN based low-bitrate codecs.
 Considering media servers and specially conferencing bridge, we should
concentrate on receiver based methods, as every other method would not be
compatible with the customers‘ phones.
 Solutions: support appropriate codecs, assert a minimal link quality and implement
a reasonable PLC algorithm.
Audio quality: jitter
•
Delay variation (Jitter)
 Why?


varying buffering time at the routers on the packets‘ way.
Inherent to the transmission medium (WiFi).
 Depending on the buffering algorithm, quality impairements are mostly caused by a
too high ear-to-mouth delay or late loss.
 Ear-to-mouth delay:

Whereby delays under 100 ms are not noticeable, value over 400 ms make a natural
conversation very difficult.
 Late loss:

If the buffering delay is smaller than the actual delay, some packets arrive after their
playout schedule. This effect in called ‚Late loss‘.
 Delivering a good voice quality means, apart from packet loss concealment,
minimizing delay and late loss.
Jitter: example
Adaptive playout
•
Static buffer
 Playout is delayed by a fix value.
 Buffer size has to be computed once for the rest of call.
 Some clients implement a panic mode, increasing the buffer size dramaticaly (x 2)
if the late loss rate is too high.
 Advantages:

Very low complexity.
 Drawbacks:



High delay.
Performs poorly if the jitter is too high.
Does not solve the clock skew problem.
Adaptive playout (2)
•
Dynamic buffer: talk spurt based.




Within a phone, a speaker is rarely active all the time. So it is possible to distinguish between
voiced and unvoiced segments.
Ajusting the buffering delay within unvoiced segments has no negative impact on the voice
quality.
Using a delay prediction algorithm on the previous packets, we then try to calculate the
appropriate buffering delay for the next voiced segment.
Advantages:



Low complexity.
Solves the clock skew problem.
Drawbacks:



Needs Voice Activity Detection (VAD), either at the sender or at the receiver.
High delay.
Performs poorly if the jitter is varying fast (within a voice segment).
Adaptive playout (3)
•
Dynamic buffer: packet based.
 Based on Waveform Similarity Overlap Add Time-scale modification
(WSOLA)

Enables packet scaling without pitch distortion.

Very good voice quality: scaling factors from 0.5 to 2.0 are mostly unhearable
if done locally.

But: High processing complexity.
WSOLA: how does it work?