Transcript tv 100Hz
520.443 Digital Multimedia Coding
& Processing: A Review
Trac D. Tran
ECE Department
The Johns Hopkins University
Baltimore, MD 21218
General Information I
Instructor
Prof. Trac D. Tran
Office: Barton 215. Phone: 410-516-7416.
Email: [email protected]
Office Hour: Wed 10-12 or by appointment
TA
Yi Chen. Office: Barton 322. Email: [email protected]
Office Hour: Tues 2-4 or by appointment
Lectures
Wednesday 2:30 – 5:00, Barton 225
Course Web Page
http://thanglong.ece.jhu.edu/Course/443/
General Information II
Homework Assignments
Around 5-6, most with computer assignments
Final project
Team of 2 or 3 on a topic of choice
Topic can be chosen from a list of suggestions
A final project report and a 15-minute oral presentation
Grading
Homework/Class Participation: 50%. Project: 50%.
General Information III
Prerequisites
520.435 Digital Signal Processing
Some prior experience with Matlab and/or C/C++
Basic knowledge in linear algebra and probability
Programming
Emphasizes hand-on learning with a lot of computer
assignments and projects. There will not be any exam!
The use of Matlab and C/C++ is encouraged
You need to bring a laptop with Matlab installed to lectures
Recommended Textbooks
K. Saywood, Introduction to Data Compression, 3rd Edition, Morgan
Kaufmann, 2005. ISBN 012620862X.
J. W. Woods, Multidimensional Signal, Image, and Video Processing and
Coding, Academic Press, 2006. ISBN 0120885166.
K. R. Rao and J. J. Hwang, Techniques and Standards for Image Video and
Audio Coding, Prentice Hall, Upper Saddle River, NJ. ISBN 0133099075.
A. M. Tekalp, Digital Video Processing, Prentice Hall, Upper Saddle River,
NJ. ISBN 0131900757.
B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video : An Introduction to
MPEG-2, Chapman & Hall, New York, NY. ISBN 0412084112.
V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards
Algorithms and Architectures, Kluwer Academic Publishers, Boston, MA.
ISBN 0792399528.
J. L. Mitchell (Editor), W. B. Pennebaker (Editor), C. E. Fogg, and D. J.
LeGall, MPEG Video: Compression Standard, Chapman & Hall, New York,
NY. ISBN 0412087715.
W. B. Pennebaker and Joan L. Mitchell, JPEG: Still Image Data Compression
Standard, Van Nostrand Reinhold, New York, NY. ISBN 0442012721.
Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and
Communications, Prentice Hall, Englewood Cliffs, NJ, 2002. ISBN
0130175471.
Course Overview
Audio/Image/Video Compression and Communications
Fundamentals: motivation, signal properties & formats,
information theory, variable-length coding, quantization
Transform coding framework, JPEG, JPEG2000, MP3
Video coding and international video standards
Multimedia communications
Goals
Focus on big pictures, key concepts, elegant ideas, no
rigorous treatment
Provide hands-on experience with simple Matlab exercises
Illustrate applications of digital signal processing
Hopefully lead to future research and developments
!!!Fun Fun Fun Fun Fun!!!
Tentative Syllabus I
Jan 27: Introduction. Motivation. Main Principles. Review.
Feb 3: Information Measures. Lossless Coding Techniques.
Entropy Coding. Huffman and Arithmetic Coding.
Feb 10: Quantization. Optimal Conditions. Quantizer Design.
Feb 17: Multirate System Fundamentals. Polyphase. Filter
Banks. Transforms. Basis Functions.
Feb 24: KLT. DFT. FFT. DCT. MLT. Wavelet Transform.
Mar 3: Audio Coding Standards. MP3. AAC. Image
Compression Standards. JPEG.
Mar 10: Zerotree Coding. Embedded Coding. JPEG2000.
Project Proposal Due.
Mar 17: Spring Vacation. No Lecture.
Tentative Syllabus II
Mar 24: Video Coding Fundamentals. Motion Estimation
and Compensation.
Mar 31: Popular Video Coding Standards. MPEG Family.
H.26 Family.
Apr 7: Latest Video Compression Standard: H.264 or
MPEG-4 Part 10 or MPEG-4 AVC.
Apr 14: Multimedia Processing in the Compressed Domain.
Communication and Networking Issues. Error Resilience.
Apr 21: Multimedia Streaming. Packet Video.
Apr 28: Final Project Oral Presentations.
May 12: Final Project Report Due.
Outline
Introduction to multimedia coding & processing
Multimedia is everywhere!
The need for compression & efficient representation
Multimedia signals: properties & formats, color spaces
General multimedia compression framework
A review
Probability
Random variables
Random processes
Statistical modeling of audio/image/video signals
Error & similarity measurements
Multimedia Everywhere!
Fax machines: transmission of binary images
Digital cameras: still images
iPod / iPhone & MP3
Digital camcorders: video sequences with audio
Digital television broadcasting
Compact disk (CD), Digital video disk (DVD)
Personal video recorder (PVR, TiVo)
Images on the World Wide Web
Video streaming & conferencing
Video on cell phones, PDAs
High-definition televisions (HDTV)
Medical imaging: X-ray, MRI, ultrasound, telemedicine
Military imaging: multi-spectral, satellite, infrared, microwave
Digital Bit Rates
A picture is worth a thousand words?
Size of a typical color image
For display
640 x 480 x 24 bits = 7372800 bits = 92160 bytes
For current mainstream digital cameras (5 Mega-pixel)
2560 x 1920 x 24 bits = 117964800 bits = 14745600 bytes
For an average word
4-5 characters/word, 7 bits/character: 32 bits ~= 4 bytes
Bit rate: bits per second for transmission
Raw digital video (DVD format)
720 x 480 x 24 x 24 frames: ~200 Mbps
CD Music
44100 samples/second x 16 bits/sample x 2 channels ~ 1.4 Mbps
Reasons for Compression
Digital bit rates
Terrestrial TV broadcasting channel:
DVD:
Ethernet/Fast Ethernet:
Cable modem downlink:
DSL downlink:
Dial-up modem:
Wireless cellular data:
~20 Mbps
10...20 Mbps
<10/100 Mbps
1-3 Mbps
384...2048 kbps
56 kbps max
9.6...384 kbps
Compression = Efficient data representation!
Data need to be accessed at a different time or location
Limited storage space and transmission bandwidth
Improve communication capability
Personal Video Recorder (PVR)
MPEG2 Quality
Best
High
Medium
Basic
7.7 Mbps
5.4 Mbps
3.6 Mbps
2.2 Mbps
Continuous & Discrete Representations
Continuous-Amplitude
x(t)
Discrete-Amplitude
x(t)
Continuous
-Time
t
Local telephone, cassette-tape
(Space)
t
recording & playback,
phonograph, photograph
x[n]
Discrete
-Time
(Space)
telegraph
x[n]
n
n
CD, DVD, cellular phones,
Switched capacitor filter,
digital camera & camcorder,
speech storage chip, half-tone digital television, inkjet
photography
printer
Sound Fundamentals
Sound waves:
vibrations of air
particles
Fluctuations in air
pressure are
picked up by the
eardrums
Vibrations from
the eardrums are
then interpreted
by the brain as
sounds
Sound Waves: 1-D signals
How fast the air pressure fluctuates
High pitch, low pitch
xi (t ) Ai cos( i t i )
Volume
frequency
Frequency
Amplitude of the sound wave
How loud the sound is
Phase
volume
phase
envelope
Determine temporal and spatial
localization of the sound wave
x(t ) xi (t )
i
Frequency Spectrum for Audio
0
0
Human Auditory System
20Hz-20kHz
10k
FM Radio Signals
100Hz-12kHz
10k
20k
20k
AM Radio Signals
100Hz-5kHz
0
10k
20k
f (Hz)
f (Hz)
f (Hz)
Telephone Speech f max 3.3kHz f sampling 6.6kHz
300Hz-3.5kHz
f (Hz)
0
10k
20k
Speech Signals
ph - o - n - e -
t -
i -
c
- ia -
Main useful frequency range of human voice:
300 Hz – 3.4 kHz
n
Music Signals
2f fundamenta l frequency
x(t ) cost 0.75 cos3t 0.5 cos5t
0.14 cos7t 0.5 cos9t 0.12 cos11t 0.17 cos13t
Harmonics in Music Signals
The spectrum of a single note
from a musical instrument
usually has a set of peaks at
harmonic ratios
If the fundamental frequency is f,
there are peaks at f, and also at
(about) 2f, 3f, 4f…
Best basis functions to capture
speech & music: cosines & sines
Multi-Dimensional Digital Signals
Images: 2-D digital signals
pixel
or
pel
black gray white
p=0 p=128 p=255
colors:
combination
of RGB
Video Sequences: 3-D digital signals,
a collection of 2-D images called
frames
y
t
x
Color Spaces: RGB & YCrCb
RGB
Red Green Blue, typically 8-bit per sample for each color plane
YCrCb
Y: luminance, gray-scale component
Cr & Cb: chrominance, color components, less energy than Y
Chrominance components can be down-sampled without much
aliasing
YCrCb, also known as YPrPb, is used in component video
0.504
0.098 R 16
Y 0.257
C 0.439 0.368 0.071 G 128
R
CB 0.148 0.291 0.439 B 128
Y
sample
Cr, Cb
sample
Another Color Space: YUV
YUV is another popular color space, similarly to YCrCb
Y: luminance component
UV: color components
YUV is used in PAL/NTSC broadcasting
0.587
0.114 R
Y 0.299
U 0.147 0.289 0.436 G
V 0.615 0.515 0.100 B
U: 88 x 72
Y: 176 x 144
V: 88 x 72
Popular Signal Formats
CIF: Common Intermediate Format
Frame
n
QCIF: Quarter Common Intermediate Format
Y resolution: 352 x 288
CrCb/UV resolution: 176 x 144
Frame rate: 30 frames/second progressive
8 bits/pixel(sample)
Y resolution: 176 x 144
CrCb/UV resolution: 88 x 72
Frame rate: 30 frames/second progressive
8 bits/pixel (sample)
TV – NTSC
DVD – NTSC
Resolution: 720 x 480, 24 – 30 frames/second
progressive
Cr
Cb
Frame
n+1
Resolution: 704 x 480, 30 frames/second interlaced
Y
Y
Cr
Cb
High-Definition Television (HDTV)
720i
Resolution: 1280 x 720, interlaced
720p
Resolution: 1280 x 720, progressive
1080i
Resolution: 1920 x 1080, interlaced
1080p
Resolution: 1920 x 1080, progressive
odd field
Interlaced
Video
Frame
even field
Examples of Still Images
Examples of Video Sequences
Frame 1
51
71
91
111
Observations of Visual Data
There is a lot of redundancy, correlation, strong structure within
natural image/video
Images
Spatial correlation: a lot of smooth areas with occasional edges
Video
Temporal correlation: neighboring frames seem to be very similar
Image/Video Compression Framework
Quantization
original
signal
T
Q
E
compressed
bit-stream
Channel
reconstructed
signal
T
1
Q
Prediction
Transform
De-correlation
1
E
1
Information theory
VLC
Huffman
Arithmetic
Run-length
Deterministic versus Random
Deterministic
Signals whose values can be specified explicitly
Example: a sinusoid
Random
Digital signals in practice can be treated as a
collection of random variables or a random process
The symbols which occur randomly carry information
Probability theory
The study of random outcomes/events
Use mathematics to capture behavior of random
outcomes and events
Random Variable
Random variable (RV)
A random variable X is a mapping which
assigns a real number x to each possible
outcome of a random experiment
A random variable X takes on a value x from a
given set. Thus it is simply an event whose
outcomes have numerical values
Examples
X in coin toss, X=1 for Head, X=0 for Tail
The temperature outside our lecture hall at
any moment t
The pixel value at location x, y in frame n
of a future Hollywood blockbuster
x
Probability Density Function
Probability density function (PDF) of a RV X
Function f X (x) defined such that:
Px1 X x2
Histogram of X !!!
Main properties:
f
X
( x)dx 1
f X ( x) 0, x
x2
f
x1
X
( x)dx
PDF Examples
1 /( a b), a x b
f X ( x)
otherwise
0,
1
ba
0
f X ( x)
1
2
e
2 x /
a
b x
Uniform PDF
1
( x ) 2 / 2 2
f X ( x)
e
2
Gaussian PDF
x
Laplacian PDF
x
Discrete Random Variable
RV that takes on discrete values only
PDF of discrete RV = discrete histogram
Example: how many Heads in 3 independent
coin tosses?
f X (x)
3/8
3/8
1/8
0
1/8
1
2
f X ( x) PX ( xk ) x xk
k
3
x
wit h
PX ( xk ) PX xk
Expectation
Expected value
Let g(X) be a function of RV X. The expected value of
g(X) is defined as
Eg X g x f X x dx
Expectation is linear!
Expectation of a deterministic constant is itself: EC C
X EX x f X x dx
Mean
2
Mean-square value E X
2
2
E
X
Variance
X
X
E X 2 X2 X2
Cross Correlation & Covariance
Cross correlation
X, Y: 2 jointly distributed RVs
Joint PDF:
Px1 X x2 , y1 Y y2
Expectation:
E g X , Y
Cross-correlation:
g x, y f
y 2 x2
f
( x, y )dxdy
y1 x1
XY
RXY EXY
XY
Cross covariance
CXY E X X Y Y
RXY CXY X Y
( x, y )dxdy
Independence & Correlation
Marginal PDF: f X x f XY x, y dy
fY y f XY x, y dx
Statistically independent: f XY x, y f X x fY y
Uncorrelated:
Orthogonal:
EXY EX EY , i.e. CXY 0
EXY 0
with 0-mean RVs
Random Process
Random process (RP)
A collection of RVs
A time-dependent RV
Denoted {X[n]}, {X(t)} or simply X[n], X(t)
We need N-dimensional joint PDF to characterize X[n]!
Note: the RVs made up a RP may be dependent or
correlated
Examples:
Temperature X(t) outside campus
A sequence of binary numbers transmitted over a
communication channel
Speech, music, image, video signals
Wide-Sense Stationary
Wide-sense stationary (WSS) random process (RP)
A WSS RP is one for which E[X[n]] is independent of n
and Rm, n EX mX n only depends on the
difference (m – n)
Mean: mX EX n
Auto-correlation sequence: RXX k EX nX n k
Energy:
E X 2 n R 0
XX
2
Variance: X E X n m X
2
X2 RXX 0 mX2
Co-variance: CXX k E X n mX X n k mX
What happens if the WSS RP has 0-mean?
White Random Process
Power spectral density
The power spectrum of a WSS RP is defined as the
Fourier transform of its auto-correlation sequence
S XX e j RXX k e jk
k
White RP
A RP is said to be white if any pair of samples are
uncorrelated, i.e., EX nX m EX nEX m, m n
2
mX ,
k 0
White WSS RP
RXX k 2
2
m
X, k 0
X
White 0-mean WSS RP
j
RXX k 2
X
0
S XX e
k
X2
0
Stochastic Signal Model
H z
w[n]
1
1 n1 an z n
N
white 0-mean
WSS Gaussian
noise For speech: N = 10 to 20
x[n]
AR(N) signal
For images: N = 1! and a1 0.95
W z
X z H z W z
1 z 1
AR(1) Signal
X z W z z 1 X z
1
X z z X z W z
xn wn xn 1
Error or Similarity Measures
Mean Square Error (MSE)
1
L 2 - norm error : MSE
N
N 1
2
ˆ
E
X
X
i
i
i 0
Mean Absolute Difference (MAD)
1 N 1
L1 - norm error : MAD E X i Xˆ i
N i 0
Max Error
L - norm error : MaxError max E X i Xˆ i
i
Peak Signal-to-Noise Ratio (PSNR)
M2
PSNR 10 log 10
;
MSE
M maximum peak - to - peak value
Summary
Introduction to audio/image/video signals
Audio-visual information is everywhere in our
everyday life
Efficient representation (compression) of
audio/image/video facilitates information storage,
archival, communications and even processing
Compression is achievable since visual data contains a
lot of redundancy, both spatially and temporally
Review
Random variables, PDF, mean, variance, correlation
Random processes, wide-sense stationary RP, white
Simple stochastic signal models via AR processes
Error or similarity measures