PowerPoint 簡報
Download
Report
Transcript PowerPoint 簡報
Electrical Engineering
National Central University
Overview of H.264/AVC
2003.9.x
M.K.Tsai
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Outline
Abstract
Applications
Network Abstraction Layer,NAL
Conclusion—(I)
Design feature highlight
Conclusion—(II)
Video Coding Layer,VCL
Profile and potential application
Conclusion—(III)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
abstract
H.264/AVC is newest video coding standard
Main goals have been enhanced compression and
provision of “network-friendly” representation addressing
“conversational”(video telephony) and “nonconversational”
(storage,broadcast, or streaming) application
H.264/AVC have achieved a significant improvement in
rate-distortion efficiency
Scope of standardization is illustrated below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
applications
Broadcast over cable, cable modem …
Interactive or serial storage on optical and DVD …
Conversational service over LAN, modem …
Video-on-demand or streaming service over
ISDN,wireless network …
Multimedia message service (MMS) over DSL, mobile
network …
How to handle the variety of applications and networks ?
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
applications
To address this need for flexibility and customizability, the
H.264/AVC design VCL and NAL, structure of H.264/AVC
encoder is shown below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
applications
VCL(video coding layer), designed to efficiently represent
video content
NAL(network abstraction layer), formats the VCL
representation of the video and provides header
information in a manner appropriate for conveyance by a
variety of transport layers or storage media
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
To provide “network friendliness” to enable simple and
effective customization of the use of the VCL
To facilitate the ability to map H.264/AVC data to transport
layers such as :
RTP/IP for kind of real-time Internet services
File formats,ISO MP4 for storage
H.32X for conversational services
MPEG-2 systems for broadcasting services
The design of the NAL anticipates a variety of such
mappings
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Some key concepts of the NAL are NAL units, byte stream,
and packet format uses of NAL units, parameter sets and
access units …
NAL units
a packet that contains an integer number of bytes
- First byte is header byte containing indication of type of data
- Remaining byte contains payload data
- Payload data is interleaved as necessary with emulation
prevention bytes, preventing start code prefix from being
generated inside payload
Specifies a format for use in both packet- and bitstreamoriented transport system
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
NAL units in Byte-Stream format use
byte stream format
Each is prefixed by a unique start code to identify the boundary
Some systems require delivery of NAL unit stream as ordered
stream of bytes (like H.320 and MPEG-2/H.220)
NAL units in packet-transport system use
Coded data is carried in packets framed by system transport
protocol
Can be carried by data packets without start code prefix
In such system, inclusion of start code prefixes in data would
be waste
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
VCL and Non-VCL NAL units
VCL NAL units contain data represents the values of the
samples in video pictures
Non- VCL NAL units contain extra data like parameter sets
and supplemental enhancement information (SEI)
- parameter sets, important header data applying to large number
of VCL NAL units
- SEI, timing information and other supplemental data enhancing
usability of decoded video signal but not necessary for decoding
the values in the picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Parameter sets
Contain information expected to rarely change and offers the
decoding of a large number of VCL NAL units
Divided into two types
- Sequence parameter sets, apply to series of consecutive coded
video picture
- Picture parameter sets, apply to the decoding of one or more
individual picture within a coded video sequence
The above two mechanisms decouple transmission of
infrequently changing information
Can be sent well ahead of the VCL NAL units and repeated to
provide robustness against data loss
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Parameter sets
Can be sent well ahead of the VCL NAL units and repeated to
provide robustness against data loss
Small amount of data can be used (identifier) to refer to a
larger amount of of information (parameter set)
In some applications, these may be sent within the channel
(termed “in-band” transmission)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Parameter sets
In other applications, it can be advantageous to convey
parameters sets “out of band” using reliable transport
mechanism
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Access units
The format of access unit is shown below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Access units
Contains a set of VCL NAL units to compose a primary coded
picture
Prefixed with an access unit delimiter to aid in locating the
start of the access unit
SEI contains data such as picture timing information
Primary coded data consists of VCL NAL units consisting of
slices that represent the sample of the video
Redundant coded picture are available for use by decoder in
recovering from loss of data
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Access units
For the last coded picture of video sequence, end of sequence
NAL unit is present to indicate the end of sequence
For the last coded picture in the entire NAL unit stream, end of
stream NAL unit is present to indicate the stream is ending
Decoder are not required to decode redundant coded pictures
if they are present
Decoding of each access unit results in one decoded picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Network Abstraction Layer
Coded video sequences
Consists of a series of access unit and use only one sequence
parameter set
Can be decoded independently of other coded video
sequence ,given necessary parameter set
Instantaneous decoding refresh(IDR) access unit is at the
beginning and contains intra picture
Presence of IDR access unit indicates that no subsequent
picture will reference to picture prior to intra picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Conclusion—(I)
H.264/AVC represents a number of advances in standard
video coding technology in term of flexibility for effective
use over a broad variety of network types and application
domain
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Variable block-size motion compensation with small block
size
With minimum luma block size as small as 4x4
The matching chroma is half the length and width
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Quarter-sample-accurate motion compensation
Half-pixel is generated by using 6 tap FIR filter
As first found in advanced profile of MPEG-4, but further
reduces the complexity
Multiple reference picture motion compensation
Extends upon enhanced technique found in H.263++
Select among large numbers of pictures decoded and
stored in the decoder for pre-prediction
Same for bi-prediction which is restricted in MPEG-2
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Decoupling of reference order from display order
A strict dependency between ordering for referencing and
display in prior standard
Allow encoder to choose ordering of pictures for referencing
and display purposes with a high degree of flexibility
Flexibility is constrained by total memory capability
Removal of restriction enable removing extra delay
associated with bi-predictive coding
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Motion vector over boundaries
Motion vectors are allowed to point outside pictures
Especially useful for small picture and camera movement
Decoupling of picture representation methods from picture
referencing capability
Bi-predictively-encoded pictures could not be used as
references in prior standard
Provide the encoder more flexibility to use a picture for
referencing that is closer to the picture being coded
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Weighted prediction
Allow motion-compensated prediction signal to be weighted
and offset by amounts
Improve coding efficiency for scenes containing fades
one grid means one pixel
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Improved skipped and direct motion inference
In prior standard ,”skipped” area of a predictively-coded
picture can’t motion in the scene content ,which is
detrimental for global motion
Infers motion in “ skipped ” motion
For bi-predictively coded areas ,improves further on prior
direct prediction such as H.263+ and MPEG-4.
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Directional spatial prediction for intra coding
Extrapolating edges of previously decoded parts of current
picture is applied in intra-coded regions of picture
Improve the quality of the prediction signal
Allow prediction from neighboring areas that were not intra-
coded
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
In-the-loop deblocking filtering
Block-based video coding produce artifacts known as
blocking artifacts originated from both prediction and
residual difference coding stages of decoding process
Improvement in quality can be used in inter-picture
prediction to improve the ability to predict other picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
In addition to improved prediction methods coding efficiency
is also enhanced, including the following
Small block-size transform
All major prior video coding standards used a transform
block size of 8x8 while new ones is based primarily on 4x4
Allow the encoder to represent the signal in a more locallyadaptive fashion and reduce artifact
Short word-length transform
Arithmetic processing 32-bit 16-bits
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Hierarchical block transform
Extend the effective block size for low-frequency chroma to
8x8 array and luma to 16x16 array
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Exact-match inverse transform
Previously transform was specified within error tolerance
bound due to impracticality of obtaining exact match to ideal
inverse transform
Each decoder would produce slightly different decoded
video, causing “drift” between encoder and decoder
Arithmetic entropy coding
Previously found as an optional feature of H.263
Use a powerful “Context-adaptive binary arithmetic
coding”(CABAC)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Context-adaptive entropy coding
Both “CAVLC (context-adaptive variable length coding)” and
“CABAC” use context-based adaptivity to improve
performance
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Robustness to data errors/losses and flexibility for operation
over variety of network environments is enable, including
the following
Parameter set structure
Key information was separated for handling in a more
flexible and specialize manner
Provide for robust and efficient conveyance header
information
Flexible slice size
Rigid slice structure reduce coding efficiency by increasing
the quantity of header data and decreasing the
effectiveness of prediction in MPEG-2
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
NAL unit syntax structure
Each syntax structure in H.264/AVC is placed into a logical
data packet called a NAL unit
Allow greater customization of the method of carrying the
video content in a manner for each specific network
Redundant pictures
Enhance robustness to data loss
Enable a representation of regions of pictures for which the
primary representation has been lost
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Flexible macroblock ordering (FMO)
Partition picture into regions called slice groups, with each
slice becoming independently decodable subset of a slice
group
Significantly enhance robustness by managing the spatial
relationship between the regions that are coded in each
slice
Arbitrary slice ordering (ASO)
Enable sending and receiving the slices of the picture in any
order relative to each other as found in H.263+
Improve end-to-end delay in real time applications
particularly for out-of-order delivery behavior
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
Data partitioning
Allow the syntax of each slice to be separated into up to
three different partitions(header data, Intra-slice, Inter-slice,
partition), depending on a categorization of syntax elements
SP/SI synchronization/switching pictures
Allow exact synchronization of the decoding process of
some decoder with an ongoing video
Enable switching a decoder between video streams that use
different data rate, recover from data loss or error
Enable switching between different kind of video streams,
recover from data loss or error
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
SP/SI synchronization/switching pictures
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Design feature highlight
SP/SI synchronization/switching pictures
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Conclusion—(II)
H.264/AVC represents a number of advances in standard
video coding technology in term of both coding efficiency
enhancement and flexibility for effective use over a board
variety of network types and application domain
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Pictures, Frames, and Fields
Picture can represent either an entire frame or a single field
If two fields of a frame were captured at different time
instants the frame is referred to as a interlaced frame,
otherwise it is referred to as a progressive frame
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
YCbCr color space and 4:2:0 sampling
Y represents brightness
Cb、Cr represents color deviates from gray toward blue
and red
Division of the picture into macroblock
Slices and slice groups
Slices are a sequence of macroblocks processed in the
order of a raster scan when not using FMO
Some information from other slices maybe needed to apply
the deblocking filter across slice boundaries
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Picture may be split into one or more slices without FMO
shown below
FMO modifies the way how pictures are partitioned into
slices and MBs by using slice groups
Slice group is a set of MBs defined by MB to slice group
map specified by picture parameter set and some
information from slice header
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Slice group can be partitioned into one or more slices, such
that a slice is a sequence of MBs within same slice group
processed in the order of raster scan
By using FMO, a picture can be split into many macroblock
scanning patterns such as the below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Each slice can be coding using different types
I slice
- A slice where all MBs are coded using intra prediction
P slice
- In addition to intra prediction, it can be coded with inter prediction
with at most one motion-compensated prediction
B slice
- In addition to coding type of P slice, it can be coded with inter
prediction with two motion-compensated prediction
SP (switching P) slice
- Efficient switching between different pre-coded pictures
SI (switching I) slice
- Allows exact match of a macroblock in an SP slice for random
access and error recovery
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
If all slices in stream B are P-slices, decoder won’t have
correct reference frame, solution is to code frame as an I-slice
like below
I-slice result in a peak in the coded bit rate at each switching
point
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
SP-slices are designed to support switching without increased
bit-rate penalty of I-slices
Unlike “ normal ” P-slice, the subtraction occurs in transform
domain
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
A simplified diagram of encoding and decoding processing for
SP-slices A2、B2、AB2 is shown (A’ means reconstructed frame)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
If stream A and B are versions of the same original sequence
coded at different bit-rates the SP-slice AB2 should be efficient
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
SP-slices is to provide random access and “VCR-like”
functionalities.(e.g decoder can fast-forward from A0 directly to
frame A10 by first decoding A0, then decoding SP-slice A0-10)
Second type of switching slice, SI-slice may be used to switch
from one sequence to a completely different sequence
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Encoding and decoding process for macroblocks
All luma and chroma samples of a MB are either spatially or
temporally predicted
Each color component of prediction is subdivided into 4x4
blocks and is transformed using integer transform and then
be quantized and encoded by entropy coding methods
The input video signal is split into MBs, the association of
MBs to slice groups and slices is selected
An efficient parallel processing of MB is possible when there
are various slices in the picture
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Encoding and decoding process for macroblocks
block diagram of VCL for a MB is in the following
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Adaptive frame/field coding operation
For regions of moving objects or camera motion, two
adjacent rows show a reduced degree of dependency in
interlaced frames but progressive frames
To provide high coding efficiency, H.264/AVC allows the
following decisions when coding a frame
To combine two fields and code them as one single frame
(frame mode)
To not combine the two fields and to code them as separated
coded fields (field mode)
To combine the two fields and compress them as a single
frame, before coding them to split the pairs of the vertically
adjacent MB into pairs of two fields or frame MB
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
The three options can be made adaptively and the first two
can be is referred to as picture-adaptive frame/field (PAFF)
coding
As a frame is coded as two fields, coded in ways similar to
frame except the following
Motion compensation utilizes reference fields rather frames
The zig-zag scan is different
Strong deblocking is not used for filtering horizontal edges of
MB in fields
A frame consists of mixed regions, it’s efficient to code the
nonmoving regions in frame mode, moving regions in field
mode
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
A frame/field encoding decision can be made
independently for each vertical pairs of MB. The coding
option is referred as macroblock-adaptive frame/field
(MBAFF) coding. The below shows the MBFAA MB pair
concept.
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
An important distinction between PAFF and MBAFF is
that in MBAFF, one field can’t use MBs in other field of
the same frame
Sometimes PAFF coding can be more efficient than
MBAFF coding, particularly in the case of rapid global
motion, scene change, intra picture refresh
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Intra frame prediction
In all slice coding type Intra_4x4 or intra_16x16 together
with chroma prediction and I_PCM prediction mode
Intra_4x4 mode is based on 4x4 luma block and suited for
significant detail of picture
When using, each 4x4 block is predicted from the
neighboring samples like the below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Intra frame prediction
4x4 block prediction mode
Suited to predict textures with structure in the specified
direction except the “DC” mode prediction
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Intra frame prediction
In earlier draft, the four samples below L were also used for
some prediction modes. They are dropped due to the need
to reduce memory access
Intra modes for neighboring 4x4 block are highly correlated.
For example, if previously-encoded 4x4 blocks A and B
were predicted mode 2, it’s likely that the best mode for
block C is also mode 2.
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Intra frame prediction
Intra_16x16 mode is suited for smooth areas of a picture
When using this mode, it contains vertical、horizontal、DC
and plane prediction
Plane prediction works well in areas of smoothly-varying
luminance
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Intra frame prediction
Chroma of MB is predicted by the similar prediction as
Intra_16x16(the same four modes)
I_PCM mode allows the encoder to bypass the prediction
and transform coding process and instead directly send the
values of the encoded samples
I_PCM mode server the following purposes
Allow the encoder to precisely represent the value of samples
Provide a way to accurately represent the values of
anomalous picture content
Enable placing a hard limit on the number of bits, decoder
must handle for MB without harm to coding efficiency
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Intra frame prediction
Constrained intra coding mode allows prediction only from
intra-coded neighboring MBs
Intra prediction across slice boundaries is not used
Referring to neighboring samples of previously-coded
blocks may incur error propagation in environments with
transmission error
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Inter frame prediction
In P slices
Each P MB type is partitioned into partitions like the below
This method of partitioning MB is known as tree structure
motion compensation
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Inter frame prediction
Choosing larger partition size means
- Small number of bits are required to signal the choice of MV and
the type of partition
- Motion compensated residual contain a significant amount of
energy in frame areas with high detail
Choosing small partition size means
- Give a lower-energy residual after motion compensation
- Require larger number of bits to signal MV and type of partition
The accuracy of motion compensation is in units of one
quarter of the distance between two luma sample
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Inter frame prediction
Half-sample values are obtained by applying a onedimensional 6-tap FIR filter vertically and horizontally
6 tap interpolation filter is relatively complex but produces
more accurate fit to the integer-sample data and hence better
motion compensation performance
Quarter-sample values are generated by averaging samples at
integer- and half-sample position
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
The above illustrates the half sample interpolation
b1 ( E 5F 20G 20H 5 I J )
b1 (
E 20
5F 20
20H 20
5 I M
J)
h1 ( A 5C
GG
5R T )
h1 ( A 5C 20G 20M 5R T )
b1 16) 5
b (b1 16bh) ((
5
h1 16) 5
j1 (cc 5dd 20h1 20m1 5ee ff )
j ( j1 512) 10
h (h1 16) 5
a (G b 1) 1
(b h 1) 1
j1 (cc 5edd
20h1 20m1 5ee ff )
j ( j1 512) 10
a (G b 1) 1
e (b h 1) 1
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Inter frame prediction
The following illustrates the luma quarter-pel positions
a = round ((G+b)/2)
d = round ((G+h)/2)
e = round ((h+b)/2)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
The prediction for chroma component are obtained by bilinear
interpolation
The displacements used for chroma have one-eighth sample
position accuracy
a = round([(8-dx)(8-dy)A + dx(8-dy)B + (8-dx)dyC + dxdyD]/64)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Inter frame prediction
Motion prediction using full,half,and one-quarter sample have
improvements than the previous standards for two reasons
- More accurate motion representation
- More flexibility in prediction filtering
Allows MV over picture boundaries
No MV prediction takes place across slice boundaries
Motion compensation for smaller regions than 8x8 use the
same reference index for prediction of all blocks within 8x8
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Inter frame prediction
Choice of neighboring partitions of same and different size are
shown below
- For transmitted partitions, excluding 16x8 and 8x16 partition
sizes: MVp is the median of the MV for partitions A,B,C
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
- For 16x8 partitions: MVp for the upper 16x8 partition is predicted
from B, MVp for the lower 16x8 partition is predicted from A
- For 8x16 partitions: MVp for the left 8x16 partition is predicted
from A, MVp for the right 8x16 partition is predicted from C
- For skipped macroblocks: a 16x16 vector MVp is generated as in
case(1) above
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
P MB can be coded in P_Skip type useful for large areas with
no change or constant motion like slow panning can be
represented with very few bits
Support multi-picture motion-compensation like below
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
In B slices
Intra coding are also supported
Four other types are supported : list 0, list 1, bi-predictive, and
direct prediction
For bi-predictive mode, the prediction signal is formed by a
weighted average of motion-compensation list 0 and list 1
prediction signal
The direct mode can be list 0 or list 1 prediction or bi-predictive
Support multi-frame motion compensation
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Transform, scaling, quantization
Transform is applied to 4x4 block
Instead of DCT, a separated integer transform with similar
properties as DCT is used
Inverse transform mismatches are avoided
At encoder, transform, scanning, scaling, and rounding as
quantization followed by entropy coding
At decoder, process of inverse encoding is performed
except for the rounding
Inverse transform is implemented using only additions and
bit-shifting operations of 16 bit
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Several reasons for using smaller size transform
Remove statistical correlation efficiently
Have visual benefits resulting in less noise around edges
Require less computations amd a smaller processing wordlength
Quantization parameter(QP) can take 52 values
Qstep double in size for every increment of 6 in QP
With increasing 1 of QP means increasing 12.5% Qstep
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Wide range of quantizer step size make it possible for
encoder to control the trade-off between bit rate and quality
accurately and flexibly
The values of QP may be different from luma and chroma.
QPchroma is derived from QPY by user-defined offset
4x4 luma DC coefficient and quantization (16x16 intra mode
only )
The DC coefficient of each 4x4 block is transformed again
using 4x4 Hadamard transform
In a intra-coded MB, much energy is concentrated in the DC
coefficients and this extra transform helps to de-correlate
the 4x4 luma DC coefficients
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
2x2 chroma DC coefficient transform and quantization, as
with Intra luma DC coefficients, the extra transform help to
de-correlate the 2x2 chroma DC coefficients and improve
compression performance
The complete process
Encoding:
Input: 4x4 residual samples: X
t
Forward “core” transform:
W Cf XCf
( followed by forward transform for Chroma DC or Intra-16 Luma
DC coefficients)
PF
Z W
Post-scaling and quantization:
Qstep 2qbits
( modified for Chroma DC or Intra-16 Luma DC)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Decoding :
( inverse transform for chroma DC or intra-16 luma DC coefficient )
Re-scaling ( incorporating inverse transform pre-scaling ):
W ' Z.Qstep .PF.64
(modified for chroma DC or Intra-16 Luma DC coefficients)
Inverse “core” transform:
T
X ' Ci W ' Ci
Post-scaling:
X ' ' round( X ' / 64)
Output:4x4 residual samples:
X ''
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Flow chart
An additional 2x2 transform is also applied to DC
coefficients of the four 4x4 blocks of chroma
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Entropy coding
Simpler method use a single infinite-extent codeword table
for all syntax elements except residual
mapping of codeword table is customized according to data
statistics
Codeword table chosen is an exp-Golomb code with simple
and regular decoding property
In CAVLC, VLC tables for various syntax elements are
switched depending on already transmitted syntax elements
In CAVLC, number of non-zero quantized coefficient and
actual size and position of the coefficients are coded
separately
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Entropy coding
VLC tables are designed to match the corresponding
conditioned statistics
CAVLC encoding of a block of transform coefficients
proceeds as follows
Encode number of non-zero coefficients and “ trailing 1s ”
- Encode total number of non-zero coefficients(TotalCoeffs) and
trailing +/-1 values(T1) coeff_token
- TotalCoeffs:0~16 ,T1:0~3
- There are 4 look-up tables for coeff_token (3 VLC and 1 FLC)
Encode the sign of each T1
- Coded in reverse order, starting with highest-frequency
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Entropy coding
Encoding levels of remaining non-zero coefficients
- Coded in reverse order
- There are 7 VLC tables to choose from
- Choice of table adapts depending on magnitude of coded level
Encode total number of zeros before last coefficient
- TotalZeros is sum of all zeros preceding the highest non-zero
coefficient in the reorder array
- Coded with a VLC
Encode each run of zeros
- Encoded in reverse order
- Chosen depending on ZerosLeft、run_before
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Entropy coding
example
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Entropy coding
example
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Entropy coding
example
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
In CABAC, it allows assignment of a non-integer number of
bits to each symbol of an alphabet
Usage of adaptive codes permits adaptation to nonstationary symbol statistics
Statistics of already coded syntax elements are used to
estimate conditional probabilities used for switching several
estimated models
Arithmetic coding core engine and its associated probability
estimation are specified as multiplication-free low
complexity methods using only shift and table look-ups
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Coding a data symbol involves the following stages (take
MVDx)
Binarization
- For |MVDX|<9 it’s carried out by following table, larger values are
by Exp-Golomb codeword
the first bit is bin 1,second bit is bin 2
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Coding a data symbol involves the following stages (take
MVDx)
Context model selection
- It’s by following table
Arithmetic encoding
- Selected context model supplies two probability estimates (1 and
0) to determine sub-range the arithmetic coder uses
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Coding a data symbol involves the following stages (take
MVDx)
Probability update
- The value of bin 1 is “0”, the frequency count of “0” is
incremented
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
In-loop deblocking filter
Applied between inverse transform and reconstruction of MB
Particular characteristics of block-based coding is the
accidental production of visible block structures
Block edges are reconstructed with less accuracy than
interior pixels and “blocking” is most visible artifacts
It has two benefits
Block edges are smoothed
Resulting in a smaller residuals after prediction
In adaptive filter, strength of filtering is controlled by several
syntax elements
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
In-loop deblocking filter
Basic idea is that if a relatively larger absolute difference
between samples near a block edge is measured , it is quite
likely a blocking artifact and should be reduced
If magnitude of difference is large and can’t be explained by
coarse quantization, it’s likely actual behavior of picture
Filtering is applied 4x4 block
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
In-loop deblocking filter
Filtering is applied 4x4 block
Choice of filtering outcome depends on boundary strength
and gradient of image across boundary
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
In-loop de-blocking filter
Boundary strength Bs is chosen according to following table
Filter implementation
Bs {1,2,3}:a 4-tap linear filter is applied
Bs {4}: 3、4、5-tap linear filter may be used
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Below shows principle using one dimensional edge
Samples p0 and q0 as well as p1 and q1 are filtered is determined
using quantization parameter (QP) dependent thresholds α(QP)
and β(QP), β(QP) is smaller than α(QP)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Filtering of p0 and q0 takes place if each of the below is satisfied
1. |p0 – q0| < α(QP)
2. |p1 – p0| < β(QP)
3. |q1 – q0| < β(QP)
Filtering of p1 and q1 takes place if the below is satisfied
1. |p2 – p0| < β(QP)
or
2. |q2 – q0| < β(QP)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
Foreman.qcif 10 Hz
Video-Audio Processing Laboratory
Foreman.cif 30 Hz
Electrical Engineering
National Central University
Video Coding Layer
Hypothetical reference decoder (HRD)
For a standard, it’s not sufficient to provide a coding
algorithm
It’s important in real-time system to specify how bits are fed
to a decoder and how the decoded pictures are removed
from decoder
Specifying input and output buffer models and developing
an implementation independent model of receiver called
HRD
Specifies operation of two buffers
Coded picture buffer (CPB)
Decoded picture buffer (DPB)
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Video Coding Layer
CPB models arrival and removal time of the coded bits
HRD is more flexible in support of sending video at variety
of bit rates without excessive delay
HRD specifies DPB model management to ensure that
excessive memory capability is not needed
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Profile and potential application
Profiles
Three profiles are defined, which are Baseline, Main, and
Extended profiles.
Baseline support all features except the following
B slice, weighted prediction, CABAC, field coding, and picture
or MB adaptive switching between frame/field coding
SP/SI slices, and slices data partition
Main profile supports first set of above but FMO, ASO, and
redundant pictures
Extended profile supports all features of baseline and the
above both set except for CABAC
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Profile and potential application
Areas for profiles of new standard to be used
A list of possible application areas is list below
Conversational services
- H.323 conversational video services that utilize circuit–switched
ISDN-based video conference
- H.323 conversational services over internet with best effort
IP/RTP protocols
Entertainment video applications
- Broadcast via satellite, cable or DSL
- DVD for standard
- VOD(video on demand) via various channels
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Profile and potential application
Streaming services
- 3GPP streaming using IP/RTP for transport and RSTP for
session setup
- Streaming over wired Internet using IP/RTP protocol and RTSP
for session
Other services
- 3GPP multimedia messaging services
- Video mail
Video-Audio Processing Laboratory
Electrical Engineering
National Central University
Conclusion—(III)
Its VCL design is based on convectional block-based
hybrid video coding concepts, but with some differences
relative to prior standard, they are illustrated below
Enhanced motion-prediction capability
Use of a small block-size exact-match transform
Adaptive in-loop de-blocking filter
Enhanced entropy coding methods
Video-Audio Processing Laboratory