PowerPoint 簡報

Download Report

Transcript PowerPoint 簡報

Electrical Engineering
National Central University
Overview of H.264/AVC
2003.9.x
M.K.Tsai
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Outline
 Abstract
 Applications
 Network Abstraction Layer,NAL
 Conclusion—(I)
 Design feature highlight
 Conclusion—(II)
 Video Coding Layer,VCL
 Profile and potential application
 Conclusion—(III)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
abstract
 H.264/AVC is newest video coding standard
 Main goals have been enhanced compression and
provision of “network-friendly” representation addressing
“conversational”(video telephony) and “nonconversational”
(storage,broadcast, or streaming) application
 H.264/AVC have achieved a significant improvement in
rate-distortion efficiency
 Scope of standardization is illustrated below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
applications
 Broadcast over cable, cable modem …
 Interactive or serial storage on optical and DVD …
 Conversational service over LAN, modem …
 Video-on-demand or streaming service over
ISDN,wireless network …
 Multimedia message service (MMS) over DSL, mobile
network …
How to handle the variety of applications and networks ?
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
applications
 To address this need for flexibility and customizability, the
H.264/AVC design VCL and NAL, structure of H.264/AVC
encoder is shown below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
applications
 VCL(video coding layer), designed to efficiently represent
video content
 NAL(network abstraction layer), formats the VCL
representation of the video and provides header
information in a manner appropriate for conveyance by a
variety of transport layers or storage media
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 To provide “network friendliness” to enable simple and
effective customization of the use of the VCL
 To facilitate the ability to map H.264/AVC data to transport
layers such as :
 RTP/IP for kind of real-time Internet services
 File formats,ISO MP4 for storage
 H.32X for conversational services
 MPEG-2 systems for broadcasting services
 The design of the NAL anticipates a variety of such
mappings
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Some key concepts of the NAL are NAL units, byte stream,
and packet format uses of NAL units, parameter sets and
access units …
 NAL units
 a packet that contains an integer number of bytes
- First byte is header byte containing indication of type of data
- Remaining byte contains payload data
- Payload data is interleaved as necessary with emulation
prevention bytes, preventing start code prefix from being
generated inside payload
 Specifies a format for use in both packet- and bitstreamoriented transport system
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 NAL units in Byte-Stream format use
 byte stream format
 Each is prefixed by a unique start code to identify the boundary
 Some systems require delivery of NAL unit stream as ordered
stream of bytes (like H.320 and MPEG-2/H.220)
 NAL units in packet-transport system use
 Coded data is carried in packets framed by system transport
protocol
 Can be carried by data packets without start code prefix
 In such system, inclusion of start code prefixes in data would
be waste
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 VCL and Non-VCL NAL units
 VCL NAL units contain data represents the values of the
samples in video pictures
 Non- VCL NAL units contain extra data like parameter sets
and supplemental enhancement information (SEI)
- parameter sets, important header data applying to large number
of VCL NAL units
- SEI, timing information and other supplemental data enhancing
usability of decoded video signal but not necessary for decoding
the values in the picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Parameter sets
 Contain information expected to rarely change and offers the
decoding of a large number of VCL NAL units
 Divided into two types
- Sequence parameter sets, apply to series of consecutive coded
video picture
- Picture parameter sets, apply to the decoding of one or more
individual picture within a coded video sequence
 The above two mechanisms decouple transmission of
infrequently changing information
 Can be sent well ahead of the VCL NAL units and repeated to
provide robustness against data loss
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Parameter sets
 Can be sent well ahead of the VCL NAL units and repeated to
provide robustness against data loss
 Small amount of data can be used (identifier) to refer to a
larger amount of of information (parameter set)
 In some applications, these may be sent within the channel
(termed “in-band” transmission)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Parameter sets
 In other applications, it can be advantageous to convey
parameters sets “out of band” using reliable transport
mechanism
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Access units
 The format of access unit is shown below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Access units
 Contains a set of VCL NAL units to compose a primary coded
picture
 Prefixed with an access unit delimiter to aid in locating the
start of the access unit
 SEI contains data such as picture timing information
 Primary coded data consists of VCL NAL units consisting of
slices that represent the sample of the video
 Redundant coded picture are available for use by decoder in
recovering from loss of data
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Access units
 For the last coded picture of video sequence, end of sequence
NAL unit is present to indicate the end of sequence
 For the last coded picture in the entire NAL unit stream, end of
stream NAL unit is present to indicate the stream is ending
 Decoder are not required to decode redundant coded pictures
if they are present
 Decoding of each access unit results in one decoded picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
 Coded video sequences
 Consists of a series of access unit and use only one sequence
parameter set
 Can be decoded independently of other coded video
sequence ,given necessary parameter set
 Instantaneous decoding refresh(IDR) access unit is at the
beginning and contains intra picture
 Presence of IDR access unit indicates that no subsequent
picture will reference to picture prior to intra picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Conclusion—(I)
 H.264/AVC represents a number of advances in standard
video coding technology in term of flexibility for effective
use over a broad variety of network types and application
domain
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Variable block-size motion compensation with small block
size
 With minimum luma block size as small as 4x4
 The matching chroma is half the length and width
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Quarter-sample-accurate motion compensation
 Half-pixel is generated by using 6 tap FIR filter
 As first found in advanced profile of MPEG-4, but further
reduces the complexity
 Multiple reference picture motion compensation
 Extends upon enhanced technique found in H.263++
 Select among large numbers of pictures decoded and
stored in the decoder for pre-prediction
 Same for bi-prediction which is restricted in MPEG-2
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Decoupling of reference order from display order
 A strict dependency between ordering for referencing and
display in prior standard
 Allow encoder to choose ordering of pictures for referencing
and display purposes with a high degree of flexibility
 Flexibility is constrained by total memory capability
 Removal of restriction enable removing extra delay
associated with bi-predictive coding
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Motion vector over boundaries
 Motion vectors are allowed to point outside pictures
 Especially useful for small picture and camera movement
 Decoupling of picture representation methods from picture
referencing capability
 Bi-predictively-encoded pictures could not be used as
references in prior standard
 Provide the encoder more flexibility to use a picture for
referencing that is closer to the picture being coded
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Weighted prediction
 Allow motion-compensated prediction signal to be weighted
and offset by amounts
 Improve coding efficiency for scenes containing fades
one grid means one pixel
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Improved skipped and direct motion inference
 In prior standard ,”skipped” area of a predictively-coded
picture can’t motion in the scene content ,which is
detrimental for global motion
 Infers motion in “ skipped ” motion
 For bi-predictively coded areas ,improves further on prior
direct prediction such as H.263+ and MPEG-4.
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Directional spatial prediction for intra coding
 Extrapolating edges of previously decoded parts of current
picture is applied in intra-coded regions of picture
 Improve the quality of the prediction signal
 Allow prediction from neighboring areas that were not intra-
coded
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 In-the-loop deblocking filtering
 Block-based video coding produce artifacts known as
blocking artifacts originated from both prediction and
residual difference coding stages of decoding process
 Improvement in quality can be used in inter-picture
prediction to improve the ability to predict other picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
In addition to improved prediction methods coding efficiency
is also enhanced, including the following
 Small block-size transform
 All major prior video coding standards used a transform
block size of 8x8 while new ones is based primarily on 4x4
 Allow the encoder to represent the signal in a more locallyadaptive fashion and reduce artifact
 Short word-length transform
 Arithmetic processing 32-bit  16-bits
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Hierarchical block transform
 Extend the effective block size for low-frequency chroma to
8x8 array and luma to 16x16 array
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Exact-match inverse transform
 Previously transform was specified within error tolerance
bound due to impracticality of obtaining exact match to ideal
inverse transform
 Each decoder would produce slightly different decoded
video, causing “drift” between encoder and decoder
 Arithmetic entropy coding
 Previously found as an optional feature of H.263
 Use a powerful “Context-adaptive binary arithmetic
coding”(CABAC)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Context-adaptive entropy coding
 Both “CAVLC (context-adaptive variable length coding)” and
“CABAC” use context-based adaptivity to improve
performance
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Robustness to data errors/losses and flexibility for operation
over variety of network environments is enable, including
the following
 Parameter set structure
 Key information was separated for handling in a more
flexible and specialize manner
 Provide for robust and efficient conveyance header
information
 Flexible slice size
 Rigid slice structure reduce coding efficiency by increasing
the quantity of header data and decreasing the
effectiveness of prediction in MPEG-2
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 NAL unit syntax structure
 Each syntax structure in H.264/AVC is placed into a logical
data packet called a NAL unit
 Allow greater customization of the method of carrying the
video content in a manner for each specific network
 Redundant pictures
 Enhance robustness to data loss
 Enable a representation of regions of pictures for which the
primary representation has been lost
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Flexible macroblock ordering (FMO)
 Partition picture into regions called slice groups, with each
slice becoming independently decodable subset of a slice
group
 Significantly enhance robustness by managing the spatial
relationship between the regions that are coded in each
slice
 Arbitrary slice ordering (ASO)
 Enable sending and receiving the slices of the picture in any
order relative to each other as found in H.263+
 Improve end-to-end delay in real time applications
particularly for out-of-order delivery behavior
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 Data partitioning
 Allow the syntax of each slice to be separated into up to
three different partitions(header data, Intra-slice, Inter-slice,
partition), depending on a categorization of syntax elements
 SP/SI synchronization/switching pictures
 Allow exact synchronization of the decoding process of
some decoder with an ongoing video
 Enable switching a decoder between video streams that use
different data rate, recover from data loss or error
 Enable switching between different kind of video streams,
recover from data loss or error
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 SP/SI synchronization/switching pictures
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
 SP/SI synchronization/switching pictures
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Conclusion—(II)
 H.264/AVC represents a number of advances in standard
video coding technology in term of both coding efficiency
enhancement and flexibility for effective use over a board
variety of network types and application domain
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Pictures, Frames, and Fields
 Picture can represent either an entire frame or a single field
 If two fields of a frame were captured at different time
instants the frame is referred to as a interlaced frame,
otherwise it is referred to as a progressive frame
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 YCbCr color space and 4:2:0 sampling
 Y represents brightness
 Cb、Cr represents color deviates from gray toward blue
and red
 Division of the picture into macroblock
 Slices and slice groups
 Slices are a sequence of macroblocks processed in the
order of a raster scan when not using FMO
 Some information from other slices maybe needed to apply
the deblocking filter across slice boundaries
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Picture may be split into one or more slices without FMO
shown below
 FMO modifies the way how pictures are partitioned into
slices and MBs by using slice groups
 Slice group is a set of MBs defined by MB to slice group
map specified by picture parameter set and some
information from slice header
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Slice group can be partitioned into one or more slices, such
that a slice is a sequence of MBs within same slice group
processed in the order of raster scan
 By using FMO, a picture can be split into many macroblock
scanning patterns such as the below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Each slice can be coding using different types
 I slice
- A slice where all MBs are coded using intra prediction
 P slice
- In addition to intra prediction, it can be coded with inter prediction
with at most one motion-compensated prediction
 B slice
- In addition to coding type of P slice, it can be coded with inter
prediction with two motion-compensated prediction
 SP (switching P) slice
- Efficient switching between different pre-coded pictures
 SI (switching I) slice
- Allows exact match of a macroblock in an SP slice for random
access and error recovery
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer


If all slices in stream B are P-slices, decoder won’t have
correct reference frame, solution is to code frame as an I-slice
like below
I-slice result in a peak in the coded bit rate at each switching
point
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer


SP-slices are designed to support switching without increased
bit-rate penalty of I-slices
Unlike “ normal ” P-slice, the subtraction occurs in transform
domain
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer

A simplified diagram of encoding and decoding processing for
SP-slices A2、B2、AB2 is shown (A’ means reconstructed frame)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer

If stream A and B are versions of the same original sequence
coded at different bit-rates the SP-slice AB2 should be efficient
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer


SP-slices is to provide random access and “VCR-like”
functionalities.(e.g decoder can fast-forward from A0 directly to
frame A10 by first decoding A0, then decoding SP-slice A0-10)
Second type of switching slice, SI-slice may be used to switch
from one sequence to a completely different sequence
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Encoding and decoding process for macroblocks
 All luma and chroma samples of a MB are either spatially or
temporally predicted
 Each color component of prediction is subdivided into 4x4
blocks and is transformed using integer transform and then
be quantized and encoded by entropy coding methods
 The input video signal is split into MBs, the association of
MBs to slice groups and slices is selected
 An efficient parallel processing of MB is possible when there
are various slices in the picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Encoding and decoding process for macroblocks
 block diagram of VCL for a MB is in the following
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Adaptive frame/field coding operation
 For regions of moving objects or camera motion, two
adjacent rows show a reduced degree of dependency in
interlaced frames but progressive frames
 To provide high coding efficiency, H.264/AVC allows the
following decisions when coding a frame
 To combine two fields and code them as one single frame
(frame mode)
 To not combine the two fields and to code them as separated
coded fields (field mode)
 To combine the two fields and compress them as a single
frame, before coding them to split the pairs of the vertically
adjacent MB into pairs of two fields or frame MB
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 The three options can be made adaptively and the first two
can be is referred to as picture-adaptive frame/field (PAFF)
coding
 As a frame is coded as two fields, coded in ways similar to
frame except the following



Motion compensation utilizes reference fields rather frames
The zig-zag scan is different
Strong deblocking is not used for filtering horizontal edges of
MB in fields
 A frame consists of mixed regions, it’s efficient to code the
nonmoving regions in frame mode, moving regions in field
mode
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 A frame/field encoding decision can be made
independently for each vertical pairs of MB. The coding
option is referred as macroblock-adaptive frame/field
(MBAFF) coding. The below shows the MBFAA MB pair
concept.
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 An important distinction between PAFF and MBAFF is
that in MBAFF, one field can’t use MBs in other field of
the same frame
 Sometimes PAFF coding can be more efficient than
MBAFF coding, particularly in the case of rapid global
motion, scene change, intra picture refresh
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Intra frame prediction
 In all slice coding type Intra_4x4 or intra_16x16 together
with chroma prediction and I_PCM prediction mode
 Intra_4x4 mode is based on 4x4 luma block and suited for
significant detail of picture
 When using, each 4x4 block is predicted from the
neighboring samples like the below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Intra frame prediction
 4x4 block prediction mode
 Suited to predict textures with structure in the specified
direction except the “DC” mode prediction
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Intra frame prediction
 In earlier draft, the four samples below L were also used for
some prediction modes. They are dropped due to the need
to reduce memory access
 Intra modes for neighboring 4x4 block are highly correlated.
For example, if previously-encoded 4x4 blocks A and B
were predicted mode 2, it’s likely that the best mode for
block C is also mode 2.
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Intra frame prediction
 Intra_16x16 mode is suited for smooth areas of a picture
 When using this mode, it contains vertical、horizontal、DC
and plane prediction
 Plane prediction works well in areas of smoothly-varying
luminance
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Intra frame prediction
 Chroma of MB is predicted by the similar prediction as
Intra_16x16(the same four modes)
 I_PCM mode allows the encoder to bypass the prediction
and transform coding process and instead directly send the
values of the encoded samples
 I_PCM mode server the following purposes



Allow the encoder to precisely represent the value of samples
Provide a way to accurately represent the values of
anomalous picture content
Enable placing a hard limit on the number of bits, decoder
must handle for MB without harm to coding efficiency
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Intra frame prediction
 Constrained intra coding mode allows prediction only from
intra-coded neighboring MBs
 Intra prediction across slice boundaries is not used
 Referring to neighboring samples of previously-coded
blocks may incur error propagation in environments with
transmission error
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Inter frame prediction
 In P slices
 Each P MB type is partitioned into partitions like the below
 This method of partitioning MB is known as tree structure
motion compensation
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Inter frame prediction
 Choosing larger partition size means
- Small number of bits are required to signal the choice of MV and
the type of partition
- Motion compensated residual contain a significant amount of
energy in frame areas with high detail
 Choosing small partition size means
- Give a lower-energy residual after motion compensation
- Require larger number of bits to signal MV and type of partition
 The accuracy of motion compensation is in units of one
quarter of the distance between two luma sample
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Inter frame prediction
 Half-sample values are obtained by applying a onedimensional 6-tap FIR filter vertically and horizontally
 6 tap interpolation filter is relatively complex but produces
more accurate fit to the integer-sample data and hence better
motion compensation performance
 Quarter-sample values are generated by averaging samples at
integer- and half-sample position
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 The above illustrates the half sample interpolation
b1  ( E  5F  20G  20H  5 I  J )
b1  (
E  20
5F  20
20H 20
 5 I M
J)
h1  ( A  5C
GG  
 5R  T )
h1  ( A  5C  20G  20M  5R  T )
b1  16)  5
b  (b1  16bh) ((
5
h1  16)  5
j1  (cc  5dd  20h1  20m1  5ee  ff )
j  ( j1  512)  10
h  (h1  16)  5
a  (G  b  1)  1
 (b  h  1)  1
j1  (cc  5edd
 20h1  20m1  5ee  ff )
j  ( j1  512)  10
a  (G  b  1)  1
e  (b  h  1)  1
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Inter frame prediction
 The following illustrates the luma quarter-pel positions
a = round ((G+b)/2)
d = round ((G+h)/2)
e = round ((h+b)/2)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 The prediction for chroma component are obtained by bilinear
interpolation
 The displacements used for chroma have one-eighth sample
position accuracy
a = round([(8-dx)(8-dy)A + dx(8-dy)B + (8-dx)dyC + dxdyD]/64)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Inter frame prediction
 Motion prediction using full,half,and one-quarter sample have
improvements than the previous standards for two reasons
- More accurate motion representation
- More flexibility in prediction filtering
 Allows MV over picture boundaries
 No MV prediction takes place across slice boundaries
 Motion compensation for smaller regions than 8x8 use the
same reference index for prediction of all blocks within 8x8
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Inter frame prediction
 Choice of neighboring partitions of same and different size are
shown below
- For transmitted partitions, excluding 16x8 and 8x16 partition
sizes: MVp is the median of the MV for partitions A,B,C
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
- For 16x8 partitions: MVp for the upper 16x8 partition is predicted
from B, MVp for the lower 16x8 partition is predicted from A
- For 8x16 partitions: MVp for the left 8x16 partition is predicted
from A, MVp for the right 8x16 partition is predicted from C
- For skipped macroblocks: a 16x16 vector MVp is generated as in
case(1) above
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 P MB can be coded in P_Skip type useful for large areas with
no change or constant motion like slow panning can be
represented with very few bits
 Support multi-picture motion-compensation like below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 In B slices
 Intra coding are also supported
 Four other types are supported : list 0, list 1, bi-predictive, and
direct prediction
 For bi-predictive mode, the prediction signal is formed by a
weighted average of motion-compensation list 0 and list 1
prediction signal
 The direct mode can be list 0 or list 1 prediction or bi-predictive
 Support multi-frame motion compensation
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Transform, scaling, quantization
 Transform is applied to 4x4 block
 Instead of DCT, a separated integer transform with similar
properties as DCT is used
 Inverse transform mismatches are avoided
 At encoder, transform, scanning, scaling, and rounding as
quantization followed by entropy coding
 At decoder, process of inverse encoding is performed
except for the rounding
 Inverse transform is implemented using only additions and
bit-shifting operations of 16 bit
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Several reasons for using smaller size transform
 Remove statistical correlation efficiently
 Have visual benefits resulting in less noise around edges
 Require less computations amd a smaller processing wordlength
 Quantization parameter(QP) can take 52 values
 Qstep double in size for every increment of 6 in QP
 With increasing 1 of QP means increasing 12.5% Qstep
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Wide range of quantizer step size make it possible for
encoder to control the trade-off between bit rate and quality
accurately and flexibly
 The values of QP may be different from luma and chroma.
QPchroma is derived from QPY by user-defined offset
 4x4 luma DC coefficient and quantization (16x16 intra mode
only )
 The DC coefficient of each 4x4 block is transformed again
using 4x4 Hadamard transform
 In a intra-coded MB, much energy is concentrated in the DC
coefficients and this extra transform helps to de-correlate
the 4x4 luma DC coefficients
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 2x2 chroma DC coefficient transform and quantization, as
with Intra luma DC coefficients, the extra transform help to
de-correlate the 2x2 chroma DC coefficients and improve
compression performance
 The complete process
Encoding:
Input: 4x4 residual samples: X
t
 Forward “core” transform:
W  Cf XCf
( followed by forward transform for Chroma DC or Intra-16 Luma
DC coefficients)
PF
Z W
 Post-scaling and quantization:
Qstep 2qbits
( modified for Chroma DC or Intra-16 Luma DC)

Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Decoding :
( inverse transform for chroma DC or intra-16 luma DC coefficient )
 Re-scaling ( incorporating inverse transform pre-scaling ):
W '  Z.Qstep .PF.64
(modified for chroma DC or Intra-16 Luma DC coefficients)
 Inverse “core” transform:
T
X ' Ci W ' Ci
 Post-scaling:
X ' '  round( X ' / 64)
 Output:4x4 residual samples:
X ''
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Flow chart
 An additional 2x2 transform is also applied to DC
coefficients of the four 4x4 blocks of chroma
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Entropy coding
 Simpler method use a single infinite-extent codeword table
for all syntax elements except residual
 mapping of codeword table is customized according to data
statistics
 Codeword table chosen is an exp-Golomb code with simple
and regular decoding property
 In CAVLC, VLC tables for various syntax elements are
switched depending on already transmitted syntax elements
 In CAVLC, number of non-zero quantized coefficient and
actual size and position of the coefficients are coded
separately
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Entropy coding
 VLC tables are designed to match the corresponding
conditioned statistics
 CAVLC encoding of a block of transform coefficients
proceeds as follows

Encode number of non-zero coefficients and “ trailing 1s ”
- Encode total number of non-zero coefficients(TotalCoeffs) and
trailing +/-1 values(T1)  coeff_token
- TotalCoeffs:0~16 ,T1:0~3
- There are 4 look-up tables for coeff_token (3 VLC and 1 FLC)

Encode the sign of each T1
- Coded in reverse order, starting with highest-frequency
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Entropy coding

Encoding levels of remaining non-zero coefficients
- Coded in reverse order
- There are 7 VLC tables to choose from
- Choice of table adapts depending on magnitude of coded level

Encode total number of zeros before last coefficient
- TotalZeros is sum of all zeros preceding the highest non-zero
coefficient in the reorder array
- Coded with a VLC

Encode each run of zeros
- Encoded in reverse order
- Chosen depending on ZerosLeft、run_before
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Entropy coding
 example
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Entropy coding
 example
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Entropy coding
 example
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 In CABAC, it allows assignment of a non-integer number of
bits to each symbol of an alphabet
 Usage of adaptive codes permits adaptation to nonstationary symbol statistics
 Statistics of already coded syntax elements are used to
estimate conditional probabilities used for switching several
estimated models
 Arithmetic coding core engine and its associated probability
estimation are specified as multiplication-free low
complexity methods using only shift and table look-ups
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Coding a data symbol involves the following stages (take
MVDx)

Binarization
- For |MVDX|<9 it’s carried out by following table, larger values are
by Exp-Golomb codeword
the first bit is bin 1,second bit is bin 2
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Coding a data symbol involves the following stages (take
MVDx)

Context model selection
- It’s by following table

Arithmetic encoding
- Selected context model supplies two probability estimates (1 and
0) to determine sub-range the arithmetic coder uses
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Coding a data symbol involves the following stages (take
MVDx)

Probability update
- The value of bin 1 is “0”, the frequency count of “0” is
incremented
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 In-loop deblocking filter
 Applied between inverse transform and reconstruction of MB
 Particular characteristics of block-based coding is the
accidental production of visible block structures
 Block edges are reconstructed with less accuracy than
interior pixels and “blocking” is most visible artifacts
 It has two benefits


Block edges are smoothed
Resulting in a smaller residuals after prediction
 In adaptive filter, strength of filtering is controlled by several
syntax elements
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 In-loop deblocking filter
 Basic idea is that if a relatively larger absolute difference
between samples near a block edge is measured , it is quite
likely a blocking artifact and should be reduced
 If magnitude of difference is large and can’t be explained by
coarse quantization, it’s likely actual behavior of picture
 Filtering is applied 4x4 block
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 In-loop deblocking filter
 Filtering is applied 4x4 block
 Choice of filtering outcome depends on boundary strength
and gradient of image across boundary
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 In-loop de-blocking filter
 Boundary strength Bs is chosen according to following table
 Filter implementation


Bs {1,2,3}:a 4-tap linear filter is applied
Bs {4}: 3、4、5-tap linear filter may be used
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Below shows principle using one dimensional edge
 Samples p0 and q0 as well as p1 and q1 are filtered is determined
using quantization parameter (QP) dependent thresholds α(QP)
and β(QP), β(QP) is smaller than α(QP)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 Filtering of p0 and q0 takes place if each of the below is satisfied
1. |p0 – q0| < α(QP)
2. |p1 – p0| < β(QP)
3. |q1 – q0| < β(QP)
 Filtering of p1 and q1 takes place if the below is satisfied
1. |p2 – p0| < β(QP)
or
2. |q2 – q0| < β(QP)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Foreman.qcif 10 Hz
Video-Audio Processing Laboratory
Foreman.cif 30 Hz
Electrical Engineering
National Central University
Video Coding Layer
 Hypothetical reference decoder (HRD)
 For a standard, it’s not sufficient to provide a coding
algorithm
 It’s important in real-time system to specify how bits are fed
to a decoder and how the decoded pictures are removed
from decoder
 Specifying input and output buffer models and developing
an implementation independent model of receiver called
HRD
 Specifies operation of two buffers
 Coded picture buffer (CPB)
 Decoded picture buffer (DPB)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
 CPB models arrival and removal time of the coded bits
 HRD is more flexible in support of sending video at variety
of bit rates without excessive delay
 HRD specifies DPB model management to ensure that
excessive memory capability is not needed
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Profile and potential application
 Profiles
 Three profiles are defined, which are Baseline, Main, and
Extended profiles.
 Baseline support all features except the following
 B slice, weighted prediction, CABAC, field coding, and picture
or MB adaptive switching between frame/field coding
 SP/SI slices, and slices data partition
 Main profile supports first set of above but FMO, ASO, and
redundant pictures
 Extended profile supports all features of baseline and the
above both set except for CABAC
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Profile and potential application
 Areas for profiles of new standard to be used
 A list of possible application areas is list below
 Conversational services
- H.323 conversational video services that utilize circuit–switched
ISDN-based video conference
- H.323 conversational services over internet with best effort
IP/RTP protocols
 Entertainment video applications
- Broadcast via satellite, cable or DSL
- DVD for standard
- VOD(video on demand) via various channels
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Profile and potential application
 Streaming services
- 3GPP streaming using IP/RTP for transport and RSTP for
session setup
- Streaming over wired Internet using IP/RTP protocol and RTSP
for session
 Other services
- 3GPP multimedia messaging services
- Video mail
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Conclusion—(III)
 Its VCL design is based on convectional block-based
hybrid video coding concepts, but with some differences
relative to prior standard, they are illustrated below
 Enhanced motion-prediction capability
 Use of a small block-size exact-match transform
 Adaptive in-loop de-blocking filter
 Enhanced entropy coding methods
Video-Audio Processing Laboratory