Transcript 980617

Videoconferencing over Packet
Switched Networks
1. Introduction
2. Components
3. Network Architecture
4. Software Architecture
5. Performance
6. A Videoconferencing over ATM
19/06/98
1
1. Introductions
 The International Telecommunications Union (ITU) addressed videoconferencing
standard.
H.320 was defined for a circuit-switched narrow-band ISDN environment at
bandwidths ranging from 64 kbit/s to over 2 Mbit/s.
1) a central conference server called a multipoint control unit (MCU) to enable
multiparty calls. Each participant directly linked to the MCU, which then controls
the conference.
2) the first one. (the H.261 video compression standard)
the H.324 for POTS, the H.323 for LAN's, and H.310 for ATM.
 The circuit-switched network (narrow-band ISDN or the switched 56 kbit/s phone
line)
1. All connections flow over dedicated point-to-point links. Once established, a connection is
dedicated between the endpoint's and bandwidth is guaranteed for the duration of the call.
2. A centralised approach MCU has to be used to achieve a multiparty conference.
3. The transmission rate was ensured by synchornization of the codec and the network clock
in both sender and receiver stations.
4. Video bit rate is forced to be constant efficiently utilize the connection bandwidth.
Variations about video quality and an additional delay in compression engine.
19/06/98
2
1. Introduction
 The packet-based network (e.g., Ethernet and Token Ring, or more generally, IP
network) solution, which strive to carry real-time traffic over existing computer
communications networks.
1. the nodes send discrete blocks of data to each other.
2. Connections may be established between endpoinds, but may only be used
when there is data to be sent.
3. The routers and switches can multicast without through a centralised MCU.
4. Video/audio processing can be performed in the each end station.
5. The packet-based network will be popular.
 The existing applications over packet based networks are:
the Xerox PARC Network Video tool nv,
the INRIA Video Conferencing System ivs, and
the LBL UCB flexible videoconferencing vic.
All of them provide video in real time over the MBone.
They share the goal of supporting low-rate multicast video over Internet.
19/06/98
3
2. Components
Digitizer
Encoder
Packetization
Network
Network
Adaptor
Decoder
Display
/ audio
Fig.1.
A Model of a Videoconferencing System.
 The hardware configuration of a videoconferencing system consists of:
a video/audio capture and display card,
a network communication adaptor,
a compression/decompression card (or software),
a video camera,
a microphone/speaker, and
a high performance computer.
19/06/98
4
3. Network Architecture
1) Multicast
 The purpose of videoconferencing:
a large group of people who spread over a large geographic region have some reason
to hold a conference, which need to sent and receive data within a scalable subgroup.
 Multicast is a special form of broadcast in which packets are delivered to a specified
subgroup of network hosts.
 There are two styles of multicasting:
___doing multicasting utilising the Multicast Backbone (MBone), in which the sender
doesn't know who will receive the packets. The sender just sends to an address and it
up to the receivers to join that group. (nv, ivs and vic)
___ the sender specifies who will receive the packets. This gives more control over the
distribution. The drawback is that it doesn't scale well, and it is always impossible to
handle with thousands of receivers.
A prototype named Multimedia Multiparty Teleconferencing (MMT) is proposed by
IBM. It uses a predetermined multicast address to set up the multicast connections.
Network directory services are required to easily assign addresses and provide
confidentiality.
19/06/98
5
MBone
Internet
C
M1
111

C
M2
111
M
M
C
C
b
b
o
o
M
M
C
C
n
n
M3
b
b
eo
eo
M
M
in
in
b
b
se
se
o
o
ai
ai
n
n
C
C
v
v
se
se
ra
ra
i
i
Fig.2.
MBone
topology
—
islands,
tunnels,
mrouted.
M
M
it
it
v
v
s
s
b
b
u
u
r
ra
MBone is a virtual
network
on
"top"
of
the
Internet
providing
a
multicasting
facility
a
o
o
ait
ait
v It is composed of networks (islands) that support multicast.
v
to the Internet.
 In Fig.2, MBone consists of three islands. Each island consists of a local network
connecting a number of client hostes ("C"), and one host running mrouted ("M").
The mrouted's are connected with each other via unicast tunnels.
 Each mrouted has routing tables for deciding the forward tunnel. Each tunnel has a
metric and a threshold. The metric specifies a routing cost that is used in the Distance
Vector Multicasting Routing Protocol (DVMRP). The threshold is the minimum timeto-live (TTL) that a multicast datagram needs to be forwarded into a given tunnel.
Threshold is specified for each multicast packet.
 All traffic in Mbone uses User Data Protocol (UDP) rather than TCP.
19/06/98
6
2) Protocols
 Because videoconferencing needs real time video/audio transmission, TCP is not
suitable to such requirement.
 Both ivs and vic used real time protocol (RTP) and UDP/IP to transmit the streams
across the Internet.
 RTP was developed by the Internet Engineering Task Force (IETF) as an
application level protocol. It aims to provide a very thin transport layer which is
the most often integrated into the application processing rather than being
implemented as a separate layer.
 In IP protocol stack, RTP is layered over UDP; in the ATM stack, it runs over ATM
Adaptation Layer. (Basically every connection which can be established using
existing end-to-end protocols can be used for RTP.)
RTP/RTCP
UDP
AAL
IP
ATM
Fig.3. RTP and the Protocol Stack
19/06/98
7
Real Time Transport Protocol (RTP)
 RTP describes two protocols: the data transfer protocol (RTP) and the
control protocol (RTCP). RTP/RTCP concentrates on the transmission of
real-time, non-reliable streaming data and is specialised to work both with
unicast and multicast delivery.
 Each RTP consists of a RTP header and a RTP payload.
In ivs, the RTP payload = RTP-H.261 header + H.261 video stream.
 RTCP manages control information like sender identification, receiver
feedback, and cross-media synchronisation. RTCP packets are
transmitted periodically to all participants in the session and the period is
adjusted according to the size of the session.
 The RTP header mainly consists of following items:
 Marker (M): 1 bit, 1 = the last packet of a video frame, 0= other
 Payload type (PT): 7 bits, gives the media type and encoding/
compression format of the RTP payload. At any given time an
RTP sender is supposed to send only a single type of payload.
19/06/98
8
The RTP header:
 Sequence number: 16 bits, this number increments by one for each RTP
data packet sent, and may be used by the receiver to detect packet loss and
to restore packet sequence.
 Timestamp: 32 bits, it encodes the sampling instant of the first data stream
in the RTP data packet. The sampling instant must be based on a clock.
If a video image occupies more than one packet, the time stamp will be the
same on all of those packets. Packets from different video images must
have different time stamps.
 Synchronization source packet identifier (SSRC): 32 bits, it identifies the
synchronization source. It is a randomly chosen value meant to be globally
unique within a particular RTP session.
 Contributing source (CSRC) list: 0 to 15 items, 32 bits each. The list
identifies the contributing sources for the payload contained in this packet.
CSRC lists are be inserted by the mixer. The mixer is an intermediate
system that receives RTP packets from one or more sources, possibly
changes the data format, combines the packets in some manner and then
forwards a new RTP packet.
19/06/98
9
The main goals of RTP:
 Synchronisation of various streams
RTP provides functionality suited for carrying real-time content, e.g. a timestamp
for synchronising different streams with timing properties. Synchronization has
to be happen in application level by either:
1) Using a playback buffer large enough to compensate most of the jitter of
all participating streams,
2) Introducing an additional RTCP-like management stream, which only
gives feedback about relative arrival times between the independent
streams. Hence it will use a relatively low bandwidth.
 Flow and congestion control
The basis for flow and congestion control is provided by the RTCP Sender and
Receiver Reports. Congestion is distincted as transient congestion and persistent
congestion. By analysing the interarrival jitter field of the sender RTCP report,
the jitter over a certain interval can be measured, and then to indicate congestion
before it becomes persistent.
• A global service provider can detect local or global network congestions,
and react in time to prevent heavy packet loss, by using a monitor
program to receive and evaluate only RTCP packets.
• The receivers of a stream detect decreasing packet rates and inform the
sender by using Sender Reports. The sender can then change the
format/compression of the media.
19/06/98
10
The main goals of RTP:
 Support of different payload types
The PT field of RTP header can identify the payload media type.
1) dynamic payload types : in the 96 — 127 area.
2) popular payload types : detailed descriptions.
 Packet source tracing after arrival
1) the SSRC field of each RTP header can be used to trace its origin.
2) the CSRC field of RTP header is used to identify the information of the
use of mixers and translators.
 Reliability
RTP is a non-reliable protocol. It doesn’t provide any mechanism to error
detection or packet ordering.
1) The sequence number of each RTP header can be used to reorder
packets and estimate local or overall packets loss rate.
2) RTCP Receiver Reports are used to give a feedback to the sender about
the received packet rate and network influences like jitter.
19/06/98
11
3) Congestion control
 Congestion control scheme used in ivs is an end-to-end control model, which
consists of a network sensor, a throughput controller and a output rate control.
1) To perform a network sensor, two approaches are provided.
__ let each receiver send a negative acknowledgment (NACK) packet whenever it
detects a loss.
__periodically sending a QoS measure of the packet loss rate observed by receivers
during a time interval of getting 100 packet.
The QoS approach is more efficient than the NACK approach if the packet loss rate
is higher than 1%.
After receiving these information, the sender gets a median loss rate med_loss.
2) In throughput controller, ivs adjusts the maximum output rate of the coder
max_rate so that med_loss < a tolerable loss rate tol_loss.
The control algorithm is shown as following:
If ( med_loss > tol_loss )
max_rate = max (max_rate/2, min_rate)
else
max_rate = gain*max_rate.
Where, min_rate = 10kb/s, gain=1.5, tol_loss=10%, and max_rate=100kb/s.
The resulted max_rate will be used to control the output rate.
19/06/98
12
3) In ivs, two methods are used to control the output rate.
__In privilege quality (PQ) mode, the value of the quantizer and the motion
detection threshold are constant, and match the maximal visual quality. The
frame rate is changed so that the output rate stays below max_rate.
__In privilege frame rate (PFR) mode, for constant frame rate, the output rate is
controlled using the different quantizer and motion detection threshold values.
 In MMT, just frame rate control is used in the rate control implementation due to
that it used the hardware JPEG chip in compression scheme. The frame rate is
calculated as follows:
Frame rate = target bandwidth/ size of the current compressed frame.
Which is computed just when the different of the current compressed frame size
and the previous size large than a threshold.
19/06/98
13
4) Error control
 ivs used two methods to do error control.
1) the receiver identifies the missing blocks by using the timestamp, as all GOBs
belonging to a given image have the same timestamp. When receiver notices that
some packets are not received, it will send a NACK packet to the sender. The sender
will send the INTRA encoded data of a new frame to the receiver.
This is a forced replenishment and not a retransmission procedure.
The drawbacks are :
a) feedback explosion when receivers are too numerous,
b) regular hardware H.261 codecs are not designed to adapt NCAK packets.
2) periodically refreshing the image by INTRA encoding mode. The H.261
recommendation requires INTRA encoding of each MB at least once every 131 times
it is transmitted for control of accumulation of decode mismatch error.
In ivs, when the receiver number is less than 10, NACK's packets are used; else the
INTRA refreshment is used in ivs.
19/06/98
14
4. Software Architecture
The software of videoconferencing consisted of: 1) user interface, 2) capture path, 3)
compression path, 4) packtization path, 5) encryption path, 6) the conference
control bus, 7) decryption path, 8) depacktization path, 9) decompression path, 10)
rending path. The nv, ivs, vic are developed by c++ and Tcl/tk in UNIX
environment.
 User Interface
It should include the conference, compression and transmission information and
other useful information.
 Capture path
It performs converting an analog video to digital signals by software or hardware.
---MIT laboratory of CS has developed a video capture Vidboard for a real time
distributed multimedia system centered around a ATM network.
---nv, ivs and vic did not provide software capture, they use a video capture board.
---vic optimised the capture paths by supporting each video format.
 Rending path
It is to convert video from the YUV pixel representation used by most compression
schemes to a format suitable for the output device(an external video device or to an
X window). vic supports several operations for X window : colour-mapping
display; simply convert pixels from the YUV color space to RGB; to dither only the
regions of the image that change.
19/06/98
15
User Interface
ivs
vic
19/06/98
16
 Compression path
It performs video compression to reduce data redundancy.
1) nv uses a Haar wavelet compression scheme, in which each 8x8 image blocks
are transformed by a Haar wavelet. Resulted coefficient less than a threshold are
set to zero, then are coded by the run-length coding. ivs uses the H.261.
2) vic provided several compression schemes including MPEG, Motion-JPEG,
H.261, Intra-H.261. Intra.H.261 uses the intraframe coding of H.261 without the
H.261' interframe coding.
3) ivs uses the H.261.
The H.261 standard, commonly called px64, is optimised to achieve very high
compression ratio for full-color, real-time motion video transmission. The px64
(p=1, 2,..., 30) compression algorithm combines:
intraframe coding ___ DCT-based intraframe compression,
interframe coding ___the predictive interframe coding based on DPCM
(Differential Pulse Code Modulation) and motion estimation.
The px64 algorithm operates with two picture formats adopted by the CCITT,
Common Intermediate Format (CIF), and Quarter-CIF (QCIF), where 1
CIF=12GOBs (Group of Blocks), 1 QCIF=3GOBs, 1 GOB = 3x11MBs (Macro
Blocks) and 1 MB = 4Y+Cb+Cr.
19/06/98
17

Packettization path
The compressed data are fragmented or collected to transmission unit, and then
interlaced with audio stream and transmitted over network in packetization path.
_Both ivs and vic use real time protocol (RTP) to transmit the video/audio flow.
_The latest version of ivs takes the MB as unit of fragmentation in the packtization
scheme instead of GOB in first version.
1) Packets must start and end on a MB boundary, it means that a MB cannot be
split across multiple packets.
2)The fragmentation units are carried as payload data within the RTP protocol.
This packetization scheme is currently proposed as standard by the audio-video
transport working group (AVT-WG) at the IETF.
 Encryption/decryption path
This is for network security. Encryption is implemented as the last step in the
transmission path and decryption is implemented as the first step in the reception
path.
_vic employed the Data Encryption Standard (DES) in cipher block chaining mode.
19/06/98
18
 The conference bus
It is only provided by vic as a mechanism to provide coordination among the separate
processes.
Each application can broadcast a typed message on the bus and all applications that
are registered to receive that message type will get a copy.
Conference Buses are implemented as multicast datagram sockets bound to the
interface.
vic uses the conference bus to provides following functions:
• Floor Control: the moderator can give the floor to a participant by
multicasting a takes-floor directive with that participant's RTP CNAME.
Locally, each receiver then mutes all participants except the one that holds
the floor.
• Synchronization: each real-time application induces a buffering delay, called
the playback point, to adapt to packet delay variations. By broadcasting
"synchronise" messages across the Conference Bus, the different media can
compute the maximum of all advertised playout delays. This maximum is
then used in the delay-adaptation algorithm.
• Voice-switched Windows: the current speaker messages are broadcast by vat
over the Conference Bus, indicating the speaker’s RTP CNAME. vic
monitors these messages and switches the viewing window to that person.
• Device Access: applications sharing a common device issue claim-device and
release-device messages on the global bus to coordinate ownership of an
exclusive-access device.
 19/06/98
19
5. Performance
 Performance is tested on an SS10/20(41 Mips) platform with the SunVideo board.
 Results:
1) only nv can reach 20 QCIF frames/s when the video is very animated.
2) the ivs-H.261 gives a 30% higher compression rate than vic-H.261.
3) the vic-H.261 coder is less greedy of CPU than the ivs-H.261.
 nv is strong in:
1) the low complexity of compression algorithm, then a higher frame rate.
But, the compression rate is low.
 ivs is strong in:
1) network control.
 vic is strong in:
1) compression methods.
19/06/98
20
6. A videoconferencing over ATM

ATM versus Internet:
• ATM provides all sorts of services, including audio, video and
data applications.
• Internet is not much suitable for real-time transmission.
• ATM provides high performance required by live video and
audio.
• ATM seems easier to provide quality of service guarantees that
are suitable for high-quality voice and video.
• the CCITT pursues a cell architecture for future telephone
networks since cells had the potential to reduce the number of
transmission networks, provide easier support for multicasting
and offer a better multiplexing schemes than ISDN for high
speeds. Being a particular form of cell networking, ATM is
argued as the only rational choice for future networks.
19/06/98
21
•A prototype of videoconferencing over ATM
Application
H.261 standard
RTP
UDP
Application
Adapted H.261 /?
RTP
? protocol
AAL
IP
ATM
network
Fig.4. The network architecture of existing videoconferencing tools (left)
and the proposed one (right).
19/06/98
22