Transcript hint

Overview of the H.264/AVC
Video Coding Standard
T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra
Overview of the H.264/AVC Video Coding Standard,
IEEE Transactions on Circuits and Systems for Video Technology,
Vol. 13, No. 7, pp. 560-576, July 2003.
CMPT 820: Multimedia Systems
Outline






Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
Evolution of Video Compression
Standards
MPEG
ITU-T
H.261
Video Telephony
MPEG-1
Video-CD
H.262/MPEG-2
Digital TV/DVD
H.263
Video Conferencing
MPEG-4 Visual
Object-based Coding
H.264 MPEG-4 AVC
H.264/AVC Coding Standard

Various Applications






Broadcast: cable, satellites, terrestrial, and DSL
Storage: DVDs (HD DVD and Blu-ray)
Video Conferencing: over different networks
Multimedia Streaming: live and on-demand
Multimedia Messaging Services (MMS)
Challenge:


How to handle all these applications and networks
Flexibility and customizability
Structure of H.264/AVC Codec
Layered design
 Network Abstraction
Layer (NAL)


formats video and meta
data for variety of
networks
Video Coding Layer
(VCL)

represents video in an
efficient way
Scope of H.264 standard
Outline






Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
Network Abstraction Layer

Provide network friendliness for sending
video data over various network transports,
such as:




RTP/IP for Internet applications
MPEG-2 streams for broadcast services
ISO File formats for storage applications
We present a few NAL concepts
NAL Units

Packets consist of video data


Support two types of transports



short packet header: one byte
stream-oriented: no free unit boundaries  use a
3-byte start code prefix
packet-oriented: start code prefix is a waste
Can be classified into:


VCL units: data for video pictures
Non-VCL units: meta data and additional info
Non-VCL NAL Units

Two types of non-VCL NAL units

Parameter sets: headers shared by a large
number of VCL NAL units




a VCL NAL unit has a pointer to its picture parameter
set
a picture parameter set points to its sequence
parameter set
Supplemental enhancement info (SEI): optional
info for higher-quality reconstruction and/or better
application usability
Sent over in-band or out-of-band channels
Access Units



A set of NAL units
Decoding an access unit
results in one picture
Structure:




Delimiter: for seeking in a
stream
SEI: timing and other info
primary coded picture: VCL
redundant coded picture:
for error recovery
Video Sequences and IDR Frames

Sequence: an independently decodable NAL unit
stream  don’t need NALs from other sequences



with one sequence parameter set
starts with an instantaneous decoding refresh (IDR) access
unit
IDR frames: random access points



Intra-coded frames
no subsequent picture of an IDR frame will require
reference to pictures prior to the IDR frame
decoders mark buffered reference pictures unusable once
seeing an IDR frame
Outline






Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
Video Coding Layer (VCL)

(Like other) hybrid video coding: H.264/AVC
represents pictures in macroblocks



Small improvements add up to huge gain


Motion compensation: temporal redundancy
Transform: spatial redundancy
Combining many coding tools together
Pictures/Frames/Fields



Fields: top/bottom field contains even/odd rows
Interlaced: two fields were captured at diff time
Progressive: otherwise
Macroblocks and Slices


Fixed size MBs: 16x16 for luma, and 8x8 for
chroma
Slice: a set of MBs that can be decoded
without use of other slices





I slice: intra-prediction (I-MB)
P slice: possibly one inter-prediction signal (I- and
P-MBs)
B slice: up to two inter-prediction signals (I- and BMBs)
SP slice: efficient switch among streams
SI slice: used in conjunction with SP slices
Flexible Macroblock Ordering (FMO)




MBs in a slice: in raster-order
Slice group: more flexible
Each slice group contains one
or several slices
Possible usages:


Region-of-interest (ROI)
Checker-board for video
conferencing
Adaptive Field Coding

Two fields of a frame can be coded as:




Picture-adaptive frame/field (PAFF)



A single frame (frame mode)
Two separate fields (field mode)  suitable for Interlaced and high motion
A single frame with adaptive mode (mixed mode)
frame/field decision is made at frame level
16% - 20% bit rate reduction over frame only
Macroblock-adaptive frame/field (MBAFF)


frame/field decision is made at MB level
14% - 16% bit rate reduction over PAFF
[Ref] http://scien.stanford.edu/2005projects/ee398/projects/presentations/Guerrero Chan Tsang - Project Presentation - Fast Macroblock Adaptive Coding in H264.ppt
Intra-frame Prediction


In spatial domain, using samples to the left
and/or on above to predict samples in a MB
Types of intra-frame prediction:




Intra_4x4: detailed luma block
Intra_16x16: smooth luma blocks
Chroma_8x8: similar to Intra_16x16 as chroma
components are smooth
I_PCM: bypass prediction/transform, send
samples

anomalous pictures, loseless, and predictable bit rate
Intra_4x4 Prediction



Samples in 4x4 block are predicted using 13
neighboring sample
8 prediction mode: 1 DC and 8 directional
Sample D is used if E-H is not available
Sample Intra_4x4 Prediction

Interpolation is used in some modes
[Ref] Foreman sequence, http://www.vcodex.com/files/h264_intrapred.pdf
Intra_16x16 Prediction

4 modes




Vertical
Horizontal
DC
Planer (Diagonal)
Inter-Prediction in P Slices

Two-level segmentation of MBs

Luma




MBs are divided into at most 4 partitions (as small as
8x8)
8x8 partitions are divided into at most 4 partitions
Chroma – half size horizontally and vertically
Maximum of 16 motion vectors for each MB
Examples of Segmentation
[Ref] http://www.vcodex.com/files/h264_interpred.pdf
Inter-Prediction Accuracy


¼-pixel for luma, 1/8-pixel for Chroma
Half-pixel samples:
6-tap FIR filter
b1  ( E  5F  20G  20 H  5I  J )
b  round (b1 / 32)

Quarter-pixel samples:
average of neighbors
E
F
cc
dd
K
L
A
aa
B
C
bb
D
Ga b c H
d e f g
h i j km
n p q r
M s N
a  round ((G  b) / 2)

Chroma predictions: bilinear
interpolation
R
gg
S
T
hh
U
I
J
ee
ff
O
P
Multiframe Inter-Prediction in P Slices


More than one prior reference pictures (by diff. MBs)
Encoders/decoders buffer the same reference
pictures for inter-prediction



Reference index is used when coding MVs
MVs for regions smaller than 8x8 uses the same index for
all MVs in the 8x8 region
P_skip mode:




Don’t send residual signals nor MVs nor reference index
Use buffered frame 0 as the reference picture
Use neighbor’s MVs
Large areas with no change or constant motion like slow
panning can be represented with very few bits.
Multiframe Inter-Prediction in B Slices




Weighted average of
2 predictions
B-slices can be used
as reference
Two reference picture
lists are used
One out of four pred. methods for each partition:





list 0
list 1
bi-predictive
direct prediction: inferred from prior MBs
The MB can be coded in B_skip mode (similar to
P_skip)
4x4 Integer Transform

Why smaller transform:





Only use add and shift, an exact inverse transform is
possible  no decoding mismatch
Not too much residue to code
Less noise around edge (ringing or mosquito noise)
Less computations and shorter data type (16-bit)
An approximation to 4x4 DCT:
, where
2nd Transform and Quantization
Parameter

2nd Transform: Intra_16x16 and Chroma
modes are for smooth area


DC coefficients are transformed again to cover the
whole MB
Quantization step is adjusted by an
exponential function of quantization
parameter  to cover a wider range of QS


QP increases by 6 => QS doubles
QP increases by 1 => QS increases by 12% =>
bit rate decreases by 12%
Entropy Coding


Non-transform coefficients: an infinite-extent
codeword table
Transform coefficients:

Context-Adaptive Variable Length Coding
(CAVLC)


several VLC tables are switched dep. on prior
transmitted data  better than a single VLC table
Context-Adaptive Binary Arithmetic Coding
(CABAC)


flexible symbol probability than CAVLC  5 – 15% rate
reduction
efficient: multiplication free
In-loop Deblocking Filter



Operate within coding loop
Use filtered frames as ref. frames
 improve coding efficiency
Adaptive deblocking, need to determine



Blocking effects or object edges
Strong or weak deblocking
Intuitions


Large difference near a block edge -> likely a block artifact
If the difference is too large to be explained by the QP
difference -> likely a real edge
 E.g., Filter p0 and q0 if
Hypothetical Reference Decoder (HRD)


Standard receiver buffer models  encoders
must produce bit streams that are decodable
to HRD
Two buffers

Coded picture buffer (CPB)


models the bit arrival and removal time
Decoded picture buffer (DPB)

models the frame decoded and output time in reference
frame lists
Outline






Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
Profiles and Applications



Defines a set of coding tools and algorithms
Conformance points for interoperability
3 Profiles for different applications




Baseline – video conferencing
Main – broadcast, media storage, digital cinema
Extended – streaming over IP (wire/wireless)
15 Levels




pic size
decoding rate (MB/s)
bit rate
buffer size
[Ref] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke,
F. Pereira, T. Stockhammer, and T. Wedi
Video coding with H.264/AVC: tools, performance and complexity
IEEE Circuits and Systems Magazine 4(1) pp. 7 - 28 May 2004
Outline






Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
Feature Highlights -- Prediction










Variable blocksize MCs
Quarter-sample accurate MCs
MVs over pic. Boundaries
Multiple reference pictures
Weighted Bi-directional prediction
Decoupling of referencing from display orders
Decoupling prediction mode from reference
capability (uses B frames as reference)
Improved Skip/Direct modes
Intra prediction in Spatial domain
In-loop deblocking filter
Feature Highlights -- Transform






Small block-size transform
2-level block transform (repeated DC
transform)
Short data-type transform (16-bit)
Exact inverse transform
Context-adaptive entropy coding
Arithmetic entropy coding
Feature Highlights -- Network








Parameter set structure (efficient)
NAL unit syntax structure (flexibility)
Flexible slice size
Flexible macroblock ordering (FMO)
Arbitrary slice ordering (ASO)
Redundant pictures
Data partitioning (unequal error protection)
SP/SI switching pictures
Outline






Overview
Network Abstraction Layer (NAL)
Video Coding Layer (VCL)
Profiles and Applications
Feature Highlights
Conclusions
Conclusions

Key improvements




Enhanced prediction
(intra- and inter-)
Small block size
exact match
transform
Adaptive in-loop
deblocking filter
Enhanced entropy
coding method
[Ref] G Sullivan and T. Wiegand, Video Compression—From Concepts to the H.264/AVC Standard,
Proc. of IEEE, 93(1), Jan 2005