Transcript Chapter 1
Multimedia Systems
Chapter 3: Audio and video
Technology
Audio
• Audio is a wave resulting from air pressure
disturbance that reaches our eardrum
generating the sound we hear.
– Humans can hear frequencies in the range 2020,000 Hz.
• ‘Acoustics’ is the branch of physics that studies
sound
Facsimile Technology
All modes of mass communication are
based on the process of facsimile
technology. That is, sounds from a
speaker and pictures on a TV screen are
merely representations, or facsimiles, of
their original form.
In general, the more faithful the reproduction or
facsimile is to the original, the greater is its fidelity.
High-fidelity audio, or hi-fi, is a close approximation of
the original speech or music it represents. And a
videocassette recorder marketed as high fidelity
boasts better picture quality than a VCR without hi-fi
(known as H-Q, to distinguish video high fidelity from
its audio counterpart).
The second point about facsimile technology is that
in creating their facsimiles, radio and TV are not
limited to plaster of Paris, crayon, oils, or even
photographic chemicals and film. Instead, unseen
elements such as radio waves, beams of light, and
digital bits and bytes are utilized in the process
Bear in mind that the engineer’s goal in radio, TV,
and cable is to
• create the best possible facsimile of our original
sound or image, to transport that image without
losing too much fidelity (known as signal loss),
and to:
• re-create that sound or image as closely as
possible to its original form. Today, engineers use
both analog and digital systems to transport
images and sounds, but more and more we are
switching to digital transmission.
Transduction
• Another basic concept is transduction,
the process of changing one form of
energy into another: When the telephone
operator says “the number is 555-2796” and you
write it down on a sheet of notepaper.
Why does this matter?
getting a sound or picture from a TV studio or concert hall
to your home usually involves at least three or four
transductions. At each phase loss of fidelity is possible
and must be controlled. With our current system of
broadcasting, it is possible that with each phase the
whole process may break down into noise—unwanted
interference—rendering the communication impossible.
Sound and audio
• While sound refers to the ability of vibrations to pass
through a medium and reflect off a medium. Audio is
the ability to digitally create sound using electronic
equipment.
Sound is a continuous wave that travels through air.
The wave itself is comprised of pressure
difference. Detection of sound is accomplished
by measuring these pressure levels and their
succession in time. The human ear does this
detection naturally when the wave with its
pressure differences impinges on the
•The properties of sound include: Frequency,
Wavelength, Wave number, Amplitude,
Sound pressure, Sound intensity, Speed of
sound and Direction. The speed of sound is
an important property that determines the
speed at which sound travels. The speed of
sound differs depending on the medium
through which it travels.
The frequency refers to the rate at which the wave repeats.
It is expressed as cycles per second or by the units hertz.
The human hear is capable of perceiving wave
frequencies in the range 20Hz and 20KHz, which is audio
in nature. The amplitude is a measure of the
displacement of the wave from the mean. For human
perception this is related but not the same as loudness.
Air Pressure
Amplitude
Time
One Period
One particular frequency component
The wavelength of a sound is the distance the disturbance
travels in one cycle and is related to the sound’s speed and
frequency.
However, in order to store this input in a computer
one has to convert it to a digital form, that is into
0s and 1s. Further a continuous wave has infinite
resolution which cannot be represented in a
computer.
Waveform Representation
Audio
Source
Human
Ear
Audio
Capture
Playback
(speaker)
Sampling &
Digitization
Storage or
Transmission
Receiver
Digital to
Analog
Audio Generation and Playback
SIGNAL GENERATION
• This step involves the creation of the
necessary oscillations, or detectable
vibrations of electrical energy, which
correspond to the frequencies of their
original counterparts in nature. In plain
language, signal generation involves
getting the sound vibrations into a
microphone, or the bits and bytes onto a
CD, a DVD, or an MP3 player.
Audio Signal Generation
• Sound signals are generated by two main
transduction processes: mechanical and
electronic.
Mechanical
methods,
like
microphones, phonograph records, and
tape recorders, have been in use for many
years.
Mechanical Methods: Mechanical
generation
• Mechanical means are used to translate
sound waves into a physical form, one you
can hold in your hand, like a phonograph
record or an audiocassette.
Inside the microphone
• One place where speech or music is
mechanically re-created to produce electrical
signals is inside a microphone. There are
three basic types of microphones: dynamic,
velocity, and condenser. Each produces the
waveforms required for transmission in a
different manner.
dynamic microphone
• In the center of the microphone is a coil of
electrical wire, called a voice coil. Sound pressure
vibrates the diaphragm, which moves the voice
coil up and down between the magnetic poles.
Digitization
• Digitization is achieved by recording or
sampling the continuous sound wave at
discrete points. The more frequently one
samples the closer one gets to capturing
the continuity of the wave
Principles of Digitization
• Sampling: Divide the horizontal axis (time) into discrete pieces
• The other aspect of digitization is the measurement of the
voltages at these discrete sampling points. As it turns out
these values may be of arbitrary precision, that is we could
have values containing small fractions or decimal numbers
that take more bits to represent.
23
• Quantization: Divide the vertical axis (signal strength voltage) into pieces. For example, 8-bit quantization
divides the vertical axis into 256 levels. 16 bit gives you
65536 levels. Lower the quantization, lower the quality
of the sound
Coding
• The process of representing quantized
values digitally
Sample
Sample
Sampling and Quantization
Time
Sampling
• Sampling rate: Number of
samples per second (measured in
Hz)
• E.g., CD standard audio uses a
sampling rate of 44,100 Hz (44100
samples per second)
Time
3-bit quantization
3-bit quantization gives 8 possible
sample values
E.g., CD standard audio uses 16-bit
quantization giving 65536 values.
Why Quantize?
To Digitize!
28
Quantizing
• Instead of sending the actual sample, first
the sampled signal was put into a known
number of levels, which is informed to the
receiver.
• Suppose instead of sending a whole range of voltages,
the source informs the destination that it is going to
send only 4 voltage levels, say 0-3V. For example if the
sample is 2.7V, first, source will convert it into a 3V
sample. Then it will be sent through the transmission
medium. Destination gets a sample of 3.3V. Then
immediately he knows that this is not an agreed level,
hence the sent value has been changed. destination
converts 3.3V sample back into a 3V.
linear quantization
• With linear quantization every increment in
the sampled value corresponds to a fixed size
analogue increment. E.g. an 8 bit A-D or D-A
with a 0 - 1 V analogue range has 1 / 256 = 3.9
mV per bit, regardless of the actual signal
amplitude.
Non-linear quantization
• With non-linear quantization you normally
have some sort of logarithmic encoding, so
that the increment for small sample values is
much smaller than the increment for large
sample values. Ideally the step size should be
roughly proportional to the sample size
Quality of voice
• The quality of voice transmission is
measured by signal to noise ratio. That is
the division of the original signal value by
the change made when quantizing.
• The following S/N ratios were calculated
Linear quantizing is not used. Why....?
• It is noticeable that even though the noise
is the same for both these signals the S/N
ratio is highly different. It seems that
Linear Quantizing gives high S/N ratios for
high signals and low S/N ratios for low
signals
Nyquist Theorem
•Any analog signal consists of components at
various frequencies. The simplest case is the
sine wave, in which all the signal energy is
concentrated at one frequency. In practice,
analog signals usually have complex
waveforms, with components at many
frequencies.
The
highest
frequency
component in an analog signal determines
the bandwidth of that signal. The higher the
frequency, the greater the bandwidth, if all
other factors are held constant.
• Suppose the highest frequency component, in
hertz, for a given analog signal is fmax. According to
the Nyquist Theorem, the sampling rate must be at
least 2fmax, or twice the highest analog frequency
component. The sampling in an analog-to-digital
converter is actuated by a pulse generator (clock).
If the sampling rate is less than 2fmax, some of the
highest frequency components in the analog input signal
will not be correctly represented in the digitized output.
Nyquist Theorem
Consider a sine wave
Sampling once a cycle
Appears as a constant signal
Sampling 1.5 times each cycle
Appears as a low frequency
sine signal
• For Lossless digitization, the sampling rate should be at
least twice the maximum frequency responses
38
Characteristics of Audio
• Audio has normal wave
properties
– Reflection
– Refraction
– Diffraction
• A sound wave has several
different properties:
– Amplitude (loudness/intensity)
– Frequency (pitch)
– Envelope (waveform)
•Refraction occurs when a wave crosses a boundary from one medium to another. A
wave entering a medium at an angle will change direction.
•Diffraction refers to the "bending of waves around an edge" of an object. Diffraction
depends on the size of the object relative to the wavelength of the wave
Decibel (dB)
• The decibel (dB) is a logarithmic unit used
to describe a ratio. The ratio may be
power, or voltage or intensity or several
other things
• Suppose we have two loudspeakers, the first
playing a sound with power P1, and another
playing a louder version of the same sound
with power P2, but everything else (how far
away, frequency) kept the same.
• The difference in decibels between the two
is given by 10 log (P2/P1) dB
• If the second produces twice as much power
than the first, the difference in dB is 10 log
(P2/P1) =
10 log 2 = 3 dB.
• If the second had 10 times the power of the
first, the difference in dB would be 10 log
(P2/P1)= 10 log 10 = 10 dB.
• If the second had a million times the power of
the first, the difference in dB would be 10 log
(P2/P1) = 10 log 1000000 = 60 dB.
What happens when you halve the
sound power?
• The log of 2 is 0.3, so the log of 1/2 is -0.3. So, if
you halve the power, you reduce the power
and the sound level by 3 dB. Halve it again
(down to 1/4 of the original power) and you
reduce the level by another 3 dB. That is
exactly what we have done in the first graphic
and sound file below.
The first sample of sound is
white noise (a mix of all
audible frequencies, just as
white light is a mix of all
visible frequencies). The
second sample is the same
noise, with the voltage
reduced by a factor of the
square root of 2. 2-0.5 is
approximately 0.7, so -3 dB
corresponds to reducing the
voltage or the pressure to
70% of its original value.
How big is a decibel?
• One decibel is close to the Just
Noticeable Difference (JND)
for sound level. As you listen
to these files, you will notice
that the last is quieter than the
first, but it is rather less clear
to the ear that the second of
any pair is quieter than its
predecessor. 10*log10(1.26) = 1,
so to increase the sound level
by 1 dB, the power must be
increased by 26%, or the
voltage by 12%.
Standard reference levels ("absolute"
sound level)
• When the decibel is used to give the sound
level for a single sound rather than a ratio,
then a reference level must be chosen. For
sound intensity, the reference level (for air)
is usually chosen as 20 micropascals, or 0.02
mPa
• Cont…..
• This is very low: it is 2 ten billionths of an
atmosphere. Nevertheless, this is about
the limit of sensitivity of the human ear, in
its most sensitive range of frequency.
Usually this sensitivity is only found in
rather young people or in people who
have not been exposed to loud music or
other loud noises.
• Personal music systems with in-ear
speakers ('walkmans') are capable of very
high sound levels in the ear, and are
believed by some to be responsible for
much of the hearing loss in young adults
in developed countries.
• So if you read of a sound intensity level of
86 dB, it means that
20 log (p2/p1) = 86 dB
• where p1 is the sound pressure of the
reference level, and p2 that of the sound
in question. Divide both sides by 20:
log (p2/p1) = 4.3
p2/p1 = 104.3
p2/p1 =
4.3
10
• 4 is the log of 10 thousand, 0.3 is the log
of 2, so this sound has a sound pressure
20 thousand times greater than that of
the reference level (p2/p1 = 20,000). 86 dB
is a loud but not dangerous level of
sound, if it is not maintained for very long.
What does 0 dB mean?
• This level occurs when the measured
intensity is equal to the reference level.
i.e., it is the sound level corresponding to
0.02 mPa. In this case we have sound
level =
20 log (pmeasured/preference) = 20 log 1 = 0 dB
• So 0 dB does not mean no sound, it means a
sound level where the sound pressure is equal
to that of the reference level. This is a small
pressure, but not zero. It is also possible to
have negative sound levels: - 20 dB would
mean a sound with pressure 10 times smaller
than the reference pressure, i.e. 2
micropascals.
Audio Amplitude
• In microphones, audio is captured as analog signals
(continuous amplitude and time) that respond
proportionally to the sound pressure, p.
• The power in a sound wave, all else equal, goes as
the square of the pressure.
– Expressed in dynes/cm2.
• The difference in sound pressure level between two
sounds with p1 and p2 is therefore 20 log10 (p2/p1) dB
• The “acoustic amplitude” of sound is measured in
reference to p1 = pref = 0.0002 dynes/cm2.
– The human ear is insensitive to sound pressure levels
below pref.
Audio Amplitude
Intensity
0 dB
20 dB
25 dB
40 dB
50 dB
60 - 70 dB
80 dB
90 dB
120 - 130 dB
140 dB
Typical Examples
Threshold of hearing
Rustling of paper
Recording studio (ambient level)
Resident (ambient level)
Office (ambient level)
Typical conversation
Heavy road traffic
Home audio listening level
Threshold of pain
Rock singer screaming into microphone
Audio Frequency
• Audio frequency is the number of high-to-low
pressure cycles that occurs per second.
– In music, frequency is referred to as pitch.
• Different living organisms have different abilities to
hear high frequency sounds
–
–
–
–
–
Dogs: up to 50KHz
Cats: up to 60 KHz
Bats: up to 120 KHz
Dolphins: up to 160KHz
Humans:
• Called the audible band.
• The exact audible band differs from one to another and
deteriorates with age.
Audio Frequency
• The frequency range of sounds can be divided
into
–
–
–
–
Infra sound
Audible sound 20 Hz
Ultrasound
Hypersound
0 Hz – 20 Hz
– 20 KHz
20 KHz – 1 GHz
1 GHz – 10 GHz
• Sound waves propagate at a speed of around
344 m/s in humid air at room temperature (20
C)
– Hence, audio wave lengths typically vary from 17 m
(corresponding to 20Hz) to 1.7 cm (corresponding to
20KHz).
• Sound can be divided into periodic (e.g. whistling
wind, bird songs, sound from music) and
nonperiodic (e.g. speech, sneezes and rushing
Audio Frequency
• Most sounds are combinations of
different frequencies and wave shapes.
Hence, the spectrum of a typical audio
signal contains one or more fundamental
frequency, their harmonics, and possibly a
few cross-modulation products.
– Fundamental frequency
– Harmonics
•The harmonics and their amplitude determine the
tone quality or timbre.
Audio Envelope
• When sound is generated, it does not last
forever. The rise and fall of the intensity
of the sound is known as the envelope.
• A typical envelope consists of four
sections: attack, decay, sustain and
release.
Audio Envelope
• Attack: The intensity of a note increases from silence to
a high level
• Decay: The intensity decreases to a middle level.
• Sustain: The middle level is sustained for a short period
of time
• Release: The intensity drops from the sustain level to
zero.
Audio Envelope
• Different instruments have different
envelope shapes
– Violin notes have slower attacks but a longer
sustain period.
– Guitar notes have quick attacks and a slower
release