Transcript proposal

Low complexity H.264 Encoder
using machine learning
H.264 encoder
•
Video Input
Bitstream
Output
+
-
Entropy
Coding
Transform &
Quantization
Inverse Quantization
& Inverse Transform
+
+
Intra/Inter Mode
Decision
Motion
Compensation
Intra
Prediction
Picture
Buffering
Motion
Estimation
Deblocking
Filter
• Block diagram for H.264 Decoder
Bitstream
Input
+
Entropy
Decoding
Inverse Quantization
& Inverse Transform
Video
Output
Deblocking
Filter
+
Intra/Inter Mode
Selection
Picture
Buffering
Intra
Prediction
Motion
Compensation
• H.264 can achieve considerably higher coding
efficiency.
• Efficiency comes at a cost in considerably
increased complexity at the encoder mainly
due to motion estimation and mode decision.
• Aim to reduce the complexity of the H.264
encoder using machine learning techniques.
• The idea behind using machine learning is to
exploit structural similarities in video.
• In the H.264 standard, the MB mode decision
in Inter frames is the most computationally
expensive process.
• variable block-size, motion estimation,
quarter-pixel motion compensation, etc bring
in this complexity.
Inter-prediction modes in H.264
• It is important to emphasize that the most
computational expensive process is ME.
• For example, assuming FS(full search) and M
block types, N reference frames and a search
range for each reference frame and block type
equal to +/- W, we need to examine N x M x
(2W + 1)^2 positions compared to only (2W +
1)^2 positions for a single reference/block
type.
Machine learning
• Machine learning is a subfield of artificial
intelligence.
• The major focus of machine learning research
• is to extract information from data
automatically, by computational and statistical
• methods.
• Beware of ‘over-fitting ‘: over-fitting data to
noise.
C4.5 Classifier
• C4.5 (know as a J48) is a system that
constructs classifiers.
• With learnt data, a classifier accurately
predicts the class to which a new case
belongs.
• C4.5 first grows an initial Treeusing divideand-conquer.
• Basic idea: grow a tree and reduce entropy in
the subtrees.
• Decisions are made on the basis of metrics.
• For each frame and each MB of pixels the
follow metrics were calculated.
• The metrics that can be used are:
MB mean, MB variance and Edges detection.
Training methods
• The process of obtaining data for training is
done offline.
• In this supervised learning approach, we used
the data of the first four frames of the video.
• Theresidual and current MB metrics and the
MB mode selected by standard Intel® IPP
H.264 are saved in a file.
• Trees arediscovered through C4.5 (J48)
classifier algorithm.
• Then, these Trees are implemented as if-else
statements in the Intel® H.264 encoder.
• The purpose of these Trees is to replace the
original complex Inter mode decision.
The C4.5 system consists of four
principal programs:
1) decision tree generator
2) production rule generator :form production rules from unpruned tree
3) decision tree interpreter :classify items using a decision tree
4) production rule interpreter :classify items using a rule set
C4.5 algo demo: trainig data
Decision tree:can be implemented
using if-else statements.
Next step
• While checking the machine learning
algorithm simultaneously check results for the
following schemes:
1.Intra directional mask approach.
2.Only-intra spatial-temporal prediction
scheme.
3.Intra mode selection using edges.
4.Inter spatial-temporal prediction scheme.
References
[1] Escribano Gerardo, “Low complexity MPEG-2 to H.264 Transcoding”, Doctoral dissertation, Albacete
Espana, chapter 3 pg 39 – 48.
[2] Jongho Kim, Kicheol Jeon, and Jechang Jeong, “H.264 Intra Mode Decision for Reducing Complexity
Using Directional Masks and Neighboring Modes”, PSIVT 2006, LNCS 4319, pp. 959 – 968, 2006.
[3] Xin, Vetro, “Fast Mode Decision for Intra-only H.264/AVC Coding”, TR2006-034 May 2006.
[4] Pan, Lin, Rahardja, Lim, Wu, “Fast Mode Decision Algorithm for Intraprediction in H.264/AVC Video
Coding”, IEEE Transactions On Circuits And Systems For Video Technology. Vol 15, No. 7, July 2005
[5] Cheng-Chang Lien, Chung-Ping Yu, “A Fast Mode Decision Method for H.264/AVC Using the SpatialTemporal Prediction Scheme”, ICPR 2006
[6] Wu, Kumar, Quinlan, Ghosh, Yang, Motoda, McLachlan, Ng, Liu,Yu, Zhou, Steinbach, Hand, Steinberg,
Verlag, “Top 10 algorithms in data mining ” London Limited 2007.
[7] Fernández, Kalva, Cuenca, Orozco, “A first approach to speeding-up the inter mode selection in
MPEG-2/H.264 transcoders using machine learning”, Multimed Tools Appl (2007) 35:225–240
[8] Intel Integrated Performance Primitives Reference Manual: Volume 2
[9] S. Saponara, M. Casula, F. Rovati, D. Alfonso, L. Fanucci, “Dynamic Control of Motion Estimation
Search Parameters for Low Complex H.264 Video Coding”, IEEE Transactions on Consumer
Electronics, Vol. 52, No. 1, FEBRUARY 2006.
[10] S. Saponara, M. Melani, L. Fanucci, P. Terreni, “Adaptive algorithm for fast motionestimation in
H.264/MPEG-4 AVC”, Proc. Eusipco2004, pp. 569 – 572, Wien, Sept. 2004.
[11] P. Carrillo, H. Kalva and T. Pin, " Low complexity H.264 video encoding", SPIE. VOL.7443, PApER #
74430A, Aug. 2009