Transcript tv 100Hz

520.443 Digital Multimedia Coding
& Processing: A Review
Trac D. Tran
ECE Department
The Johns Hopkins University
Baltimore, MD 21218
General Information I

Instructor





Prof. Trac D. Tran
Office: Barton 215. Phone: 410-516-7416.
Email: [email protected]
Office Hour: Wed 10-12 or by appointment
TA
 Yi Chen. Office: Barton 322. Email: [email protected]
 Office Hour: Tues 2-4 or by appointment

Lectures
 Wednesday 2:30 – 5:00, Barton 225

Course Web Page
 http://thanglong.ece.jhu.edu/Course/443/
General Information II

Homework Assignments
 Around 5-6, most with computer assignments

Final project
 Team of 2 or 3 on a topic of choice
 Topic can be chosen from a list of suggestions
 A final project report and a 15-minute oral presentation

Grading
 Homework/Class Participation: 50%. Project: 50%.
General Information III

Prerequisites
 520.435 Digital Signal Processing
 Some prior experience with Matlab and/or C/C++
 Basic knowledge in linear algebra and probability

Programming
 Emphasizes hand-on learning with a lot of computer
assignments and projects. There will not be any exam!
 The use of Matlab and C/C++ is encouraged
 You need to bring a laptop with Matlab installed to lectures
Recommended Textbooks









K. Saywood, Introduction to Data Compression, 3rd Edition, Morgan
Kaufmann, 2005. ISBN 012620862X.
J. W. Woods, Multidimensional Signal, Image, and Video Processing and
Coding, Academic Press, 2006. ISBN 0120885166.
K. R. Rao and J. J. Hwang, Techniques and Standards for Image Video and
Audio Coding, Prentice Hall, Upper Saddle River, NJ. ISBN 0133099075.
A. M. Tekalp, Digital Video Processing, Prentice Hall, Upper Saddle River,
NJ. ISBN 0131900757.
B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video : An Introduction to
MPEG-2, Chapman & Hall, New York, NY. ISBN 0412084112.
V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards
Algorithms and Architectures, Kluwer Academic Publishers, Boston, MA.
ISBN 0792399528.
J. L. Mitchell (Editor), W. B. Pennebaker (Editor), C. E. Fogg, and D. J.
LeGall, MPEG Video: Compression Standard, Chapman & Hall, New York,
NY. ISBN 0412087715.
W. B. Pennebaker and Joan L. Mitchell, JPEG: Still Image Data Compression
Standard, Van Nostrand Reinhold, New York, NY. ISBN 0442012721.
Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and
Communications, Prentice Hall, Englewood Cliffs, NJ, 2002. ISBN
0130175471.
Course Overview

Audio/Image/Video Compression and Communications
 Fundamentals: motivation, signal properties & formats,
information theory, variable-length coding, quantization
 Transform coding framework, JPEG, JPEG2000, MP3
 Video coding and international video standards
 Multimedia communications

Goals
 Focus on big pictures, key concepts, elegant ideas, no
rigorous treatment
 Provide hands-on experience with simple Matlab exercises
 Illustrate applications of digital signal processing
 Hopefully lead to future research and developments
!!!Fun Fun Fun Fun Fun!!!
Tentative Syllabus I








Jan 27: Introduction. Motivation. Main Principles. Review.
Feb 3: Information Measures. Lossless Coding Techniques.
Entropy Coding. Huffman and Arithmetic Coding.
Feb 10: Quantization. Optimal Conditions. Quantizer Design.
Feb 17: Multirate System Fundamentals. Polyphase. Filter
Banks. Transforms. Basis Functions.
Feb 24: KLT. DFT. FFT. DCT. MLT. Wavelet Transform.
Mar 3: Audio Coding Standards. MP3. AAC. Image
Compression Standards. JPEG.
Mar 10: Zerotree Coding. Embedded Coding. JPEG2000.
Project Proposal Due.
Mar 17: Spring Vacation. No Lecture.
Tentative Syllabus II







Mar 24: Video Coding Fundamentals. Motion Estimation
and Compensation.
Mar 31: Popular Video Coding Standards. MPEG Family.
H.26 Family.
Apr 7: Latest Video Compression Standard: H.264 or
MPEG-4 Part 10 or MPEG-4 AVC.
Apr 14: Multimedia Processing in the Compressed Domain.
Communication and Networking Issues. Error Resilience.
Apr 21: Multimedia Streaming. Packet Video.
Apr 28: Final Project Oral Presentations.
May 12: Final Project Report Due.
Outline

Introduction to multimedia coding & processing





Multimedia is everywhere!
The need for compression & efficient representation
Multimedia signals: properties & formats, color spaces
General multimedia compression framework
A review





Probability
Random variables
Random processes
Statistical modeling of audio/image/video signals
Error & similarity measurements
Multimedia Everywhere!













Fax machines: transmission of binary images
Digital cameras: still images
iPod / iPhone & MP3
Digital camcorders: video sequences with audio
Digital television broadcasting
Compact disk (CD), Digital video disk (DVD)
Personal video recorder (PVR, TiVo)
Images on the World Wide Web
Video streaming & conferencing
Video on cell phones, PDAs
High-definition televisions (HDTV)
Medical imaging: X-ray, MRI, ultrasound, telemedicine
Military imaging: multi-spectral, satellite, infrared, microwave
Digital Bit Rates


A picture is worth a thousand words?
Size of a typical color image
 For display
 640 x 480 x 24 bits = 7372800 bits = 92160 bytes
 For current mainstream digital cameras (5 Mega-pixel)
 2560 x 1920 x 24 bits = 117964800 bits = 14745600 bytes
 For an average word
 4-5 characters/word, 7 bits/character: 32 bits ~= 4 bytes

Bit rate: bits per second for transmission
 Raw digital video (DVD format)
 720 x 480 x 24 x 24 frames: ~200 Mbps
 CD Music
 44100 samples/second x 16 bits/sample x 2 channels ~ 1.4 Mbps
Reasons for Compression

Digital bit rates








Terrestrial TV broadcasting channel:
DVD:
Ethernet/Fast Ethernet:
Cable modem downlink:
DSL downlink:
Dial-up modem:
Wireless cellular data:
~20 Mbps
10...20 Mbps
<10/100 Mbps
1-3 Mbps
384...2048 kbps
56 kbps max
9.6...384 kbps
Compression = Efficient data representation!
 Data need to be accessed at a different time or location
 Limited storage space and transmission bandwidth
 Improve communication capability
Personal Video Recorder (PVR)
MPEG2 Quality
Best
High
Medium
Basic
7.7 Mbps
5.4 Mbps
3.6 Mbps
2.2 Mbps
Continuous & Discrete Representations
Continuous-Amplitude
x(t)
Discrete-Amplitude
x(t)
Continuous
-Time
t
Local telephone, cassette-tape
(Space)
t
recording & playback,
phonograph, photograph
x[n]
Discrete
-Time
(Space)
telegraph
x[n]
n
n
CD, DVD, cellular phones,
Switched capacitor filter,
digital camera & camcorder,
speech storage chip, half-tone digital television, inkjet
photography
printer
Sound Fundamentals



Sound waves:
vibrations of air
particles
Fluctuations in air
pressure are
picked up by the
eardrums
Vibrations from
the eardrums are
then interpreted
by the brain as
sounds
Sound Waves: 1-D signals




How fast the air pressure fluctuates
High pitch, low pitch
xi (t )  Ai cos( i t  i )
Volume



frequency
Frequency
Amplitude of the sound wave
How loud the sound is
Phase

volume
phase
envelope
Determine temporal and spatial
localization of the sound wave
x(t )   xi (t )
i
Frequency Spectrum for Audio
0
0
Human Auditory System
20Hz-20kHz
10k
FM Radio Signals
100Hz-12kHz
10k
20k
20k
AM Radio Signals
100Hz-5kHz
0
10k
20k
f (Hz)
f (Hz)
f (Hz)
Telephone Speech f max  3.3kHz  f sampling  6.6kHz
300Hz-3.5kHz
f (Hz)
0
10k
20k
Speech Signals
ph - o - n - e -

t -
i -
c
- ia -
Main useful frequency range of human voice:
300 Hz – 3.4 kHz
n
Music Signals
  2f  fundamenta l frequency
x(t )  cost   0.75 cos3t   0.5 cos5t  
0.14 cos7t   0.5 cos9t   0.12 cos11t   0.17 cos13t 
Harmonics in Music Signals



The spectrum of a single note
from a musical instrument
usually has a set of peaks at
harmonic ratios
If the fundamental frequency is f,
there are peaks at f, and also at
(about) 2f, 3f, 4f…
Best basis functions to capture
speech & music: cosines & sines
Multi-Dimensional Digital Signals

Images: 2-D digital signals
pixel
or
pel
black gray white
p=0 p=128 p=255
colors:
combination
of RGB

Video Sequences: 3-D digital signals,
a collection of 2-D images called
frames
y
t
x
Color Spaces: RGB & YCrCb

RGB
 Red Green Blue, typically 8-bit per sample for each color plane

YCrCb
 Y: luminance, gray-scale component
 Cr & Cb: chrominance, color components, less energy than Y
 Chrominance components can be down-sampled without much
aliasing
 YCrCb, also known as YPrPb, is used in component video
0.504
0.098   R   16 
 Y   0.257
C    0.439  0.368  0.071 G   128
 R 
   
CB   0.148  0.291 0.439   B  128
Y
sample
Cr, Cb
sample
Another Color Space: YUV

YUV is another popular color space, similarly to YCrCb
 Y: luminance component
 UV: color components
 YUV is used in PAL/NTSC broadcasting
0.587
0.114   R 
Y   0.299
U   0.147  0.289 0.436  G 
  
 
V   0.615  0.515  0.100  B 
U: 88 x 72
Y: 176 x 144
V: 88 x 72
Popular Signal Formats

CIF: Common Intermediate Format





Frame
n
QCIF: Quarter Common Intermediate Format





Y resolution: 352 x 288
CrCb/UV resolution: 176 x 144
Frame rate: 30 frames/second progressive
8 bits/pixel(sample)
Y resolution: 176 x 144
CrCb/UV resolution: 88 x 72
Frame rate: 30 frames/second progressive
8 bits/pixel (sample)
TV – NTSC
DVD – NTSC
 Resolution: 720 x 480, 24 – 30 frames/second
progressive
Cr
Cb
Frame
n+1
 Resolution: 704 x 480, 30 frames/second interlaced

Y
Y
Cr
Cb
High-Definition Television (HDTV)

720i
 Resolution: 1280 x 720, interlaced

720p
 Resolution: 1280 x 720, progressive

1080i
 Resolution: 1920 x 1080, interlaced

1080p
 Resolution: 1920 x 1080, progressive
odd field
Interlaced
Video
Frame
even field
Examples of Still Images
Examples of Video Sequences
Frame 1

51
71
91
111
Observations of Visual Data
 There is a lot of redundancy, correlation, strong structure within
natural image/video
 Images
 Spatial correlation: a lot of smooth areas with occasional edges
 Video
 Temporal correlation: neighboring frames seem to be very similar
Image/Video Compression Framework
Quantization
original
signal
T
Q
E
compressed
bit-stream
Channel
reconstructed
signal
T
1
Q
Prediction
Transform
De-correlation
1
E
1
Information theory
VLC
Huffman
Arithmetic
Run-length
Deterministic versus Random

Deterministic
 Signals whose values can be specified explicitly
 Example: a sinusoid

Random
 Digital signals in practice can be treated as a
collection of random variables or a random process
 The symbols which occur randomly carry information

Probability theory
 The study of random outcomes/events
 Use mathematics to capture behavior of random
outcomes and events
Random Variable

Random variable (RV)

 A random variable X is a mapping which
assigns a real number x to each possible
outcome of a random experiment 
 A random variable X takes on a value x from a
given set. Thus it is simply an event whose
outcomes have numerical values
 Examples
 X in coin toss, X=1 for Head, X=0 for Tail
 The temperature outside our lecture hall at
any moment t
 The pixel value at location x, y in frame n
of a future Hollywood blockbuster

x

Probability Density Function

Probability density function (PDF) of a RV X
 Function f X (x) defined such that:
Px1  X  x2  
 Histogram of X !!!
 Main properties:


f
X
( x)dx  1

 f X ( x)  0, x
x2
f
x1
X
( x)dx
PDF Examples
1 /( a  b), a  x  b
f X ( x)  
otherwise
 0,
1
ba
0
f X ( x) 
1
 2
e
 2 x /
a
b x
Uniform PDF
1
( x   ) 2 / 2 2
f X ( x) 
e
 2

Gaussian PDF
x

Laplacian PDF
x
Discrete Random Variable



RV that takes on discrete values only
PDF of discrete RV = discrete histogram
Example: how many Heads in 3 independent
coin tosses?
f X (x)
3/8
3/8
1/8
0
1/8
1
2
f X ( x)   PX ( xk ) x  xk 
k
3
x
wit h
PX ( xk )  PX  xk 
Expectation

Expected value
 Let g(X) be a function of RV X. The expected value of
g(X) is defined as
Eg  X    g x  f X x dx


 Expectation is linear!
 Expectation of a deterministic constant is itself: EC   C
 X  EX    x f X x dx
Mean

2
Mean-square value E X
2
2




E
X


Variance
X
X




 
 

E X 2   X2   X2

Cross Correlation & Covariance

Cross correlation
 X, Y: 2 jointly distributed RVs
 Joint PDF:
Px1  X  x2 , y1  Y  y2  
 Expectation:
 
E g  X , Y  
 Cross-correlation:
  g  x, y  f
y 2 x2
 f
( x, y )dxdy
y1 x1
XY
  
RXY  EXY 

XY
Cross covariance
CXY  E X   X Y  Y 
 RXY  CXY   X Y
( x, y )dxdy
Independence & Correlation


Marginal PDF: f X x    f XY x, y dy

fY  y    f XY x, y dx


Statistically independent: f XY x, y   f X x fY  y 

Uncorrelated:

Orthogonal:
EXY   EX EY , i.e. CXY  0
EXY   0
with 0-mean RVs
Random Process

Random process (RP)





A collection of RVs
A time-dependent RV
Denoted {X[n]}, {X(t)} or simply X[n], X(t)
We need N-dimensional joint PDF to characterize X[n]!
Note: the RVs made up a RP may be dependent or
correlated
 Examples:
 Temperature X(t) outside campus
 A sequence of binary numbers transmitted over a
communication channel
 Speech, music, image, video signals
Wide-Sense Stationary

Wide-sense stationary (WSS) random process (RP)
 A WSS RP is one for which E[X[n]] is independent of n
and Rm, n  EX mX n only depends on the
difference (m – n)
 Mean: mX  EX n
 Auto-correlation sequence: RXX k   EX nX n  k 
 Energy:
E X 2 n  R 0



XX
2
 Variance:  X  E  X n  m X 
2

  X2  RXX 0  mX2
 Co-variance: CXX k   E X n  mX  X n  k   mX 
What happens if the WSS RP has 0-mean?
White Random Process

Power spectral density
 The power spectrum of a WSS RP is defined as the
Fourier transform of its auto-correlation sequence
 
S XX e j   RXX k e  jk

k
White RP
 A RP is said to be white if any pair of samples are
uncorrelated, i.e., EX nX m  EX nEX m, m  n


2

mX ,
k 0
White WSS RP
RXX k    2
2


m
X, k 0
 X
White 0-mean WSS RP
j
 
RXX k   2
X
0
S XX e
k
 X2
0

Stochastic Signal Model
H z  
w[n]
1
1  n1 an z n
N
white 0-mean
WSS Gaussian
noise  For speech: N = 10 to 20

x[n]
AR(N) signal
For images: N = 1! and a1    0.95
W z 
X z   H z W z  
1  z 1
AR(1) Signal
 X z   W z   z 1 X z 
1


 X z  z X z   W z 
 xn  wn  xn 1
Error or Similarity Measures

Mean Square Error (MSE)
1
L 2 - norm error : MSE 
N


N 1
2


ˆ
E
X

X


i
i 


i 0
Mean Absolute Difference (MAD)
1 N 1
L1 - norm error : MAD   E X i  Xˆ i
N i 0
Max Error
L  - norm error : MaxError  max E X i  Xˆ i

i



Peak Signal-to-Noise Ratio (PSNR)
M2
PSNR  10 log 10
;
MSE
M  maximum peak - to - peak value

Summary

Introduction to audio/image/video signals
 Audio-visual information is everywhere in our
everyday life
 Efficient representation (compression) of
audio/image/video facilitates information storage,
archival, communications and even processing
 Compression is achievable since visual data contains a
lot of redundancy, both spatially and temporally

Review




Random variables, PDF, mean, variance, correlation
Random processes, wide-sense stationary RP, white
Simple stochastic signal models via AR processes
Error or similarity measures