Transcript second

Media Processing – Audio Part
Dr Wenwu Wang
Centre for Vision Speech and Signal Processing
Department of Electronic Engineering
[email protected]
http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html
1
Approximate outline
 Week 6: Fundamentals of audio
 Week 7: Audio acquiring, recording, and standards
 Week 8: Audio processing, coding, and standards
 Week 9: Audio production and reproduction
 Week 10: Audio perception and audio quality assessment
2
Audio recording
Concepts and topics to be covered:
 Microphones
 Directional response of microphones
 Omni pattern
 Figure-eight pattern
 Cardioid pattern
 A/D conversion
 Sampling
 Quantisation
 D/A conversion
 Digital audio recording formats/standards
 Free audio recording software
3
Audio recording / processing /
production chain
Recording/acquisation system
Sound source
Production system (speaker)
Listener
4
Microphone
 A microphone, whose functionality is opposite to a loudspeaker, is a
transducer that converts acoustical sound energy into electrical form.
 There are three most common operation principles for microphones: the
dynamic (i.e. moving coil), the ribbon, and the capacitor (or condensor).
5
Dynamic microphone
 It consists of a rigid diaphragm, typically
20-30 mm in diameter, suspended in
front of a magnet.
 The coil sits in the gap of a strong
permanent magnet.
 Sound waves cause diaphragm to
vibrate, and ultimately the coil to move
in the magnet’s gap, which results in an
alternating current flows in the coil,
producing the electrical output.
 Such a microphone is useful in
applications such as drums, and handheld vocal use, due to its advantage in
robustness. Its disadvantage lies in its
limited frequency response (fairly rapid
fall-off in response above 8 or 10 kHz).
Source: Francis Rumsey and
Tim McCormick (1994)
6
Ribbon microphone
 The ribbon microphone consists of a
long thin strip of conductive metal
foil, magnetic poles, and
transformer.
 The foil is pleated to give it ‘spring’,
which is lightly tensioned between
two end clamps. The magnetic poles
create a magnetic field across the
ribbon. When it is excited by the
sound waves, a current is
generated.
 A transformer is used to magnify the
electrical output of the ribbon, which
is very small. Note that the standard
output impedance is 200 ohms, as in
the dynamic microphone.
Source: Francis Rumsey and
Tim McCormick (1994)
7
Capacitor microphone
 The capacitor consists of a flexible diaphragm and a rigid back plate, separated by
an insulator. The diaphragm is free to vibrate with the sound waves. When one plate
(i.e. diaphragm) is free to move with respect to the other (i.e. the earthed back
plate), then the capacitance, i.e. the ability to hold electrical charge, will vary.
 The DC phantom power charges the capacitor via a very high resistance. A DC
blocking capacitor prevents the phantom power from entering the head amplifier,
allowing only audio signals to pass.
 Sound waves cause the diaphragm to move, and thus the changes in the
capacitance and in the voltage across the capacitor proportionally (as the high
resistance only allows very slow leakage of charge from the diaphragm).
 The head amplifier converts the very high impedance voltage output of the capacitor
to a much lower impedance. The transformer further balances the signal for output.
Source: Francis Rumsey and
Tim McCormick (1994)
8
Directional responses of
microphone and polar diagrams
 Microphones are designed to have certain directional response pattern,
often described by ‘polar diagram’, which is a two-dimensional contour
map, showing the magnitude of the microphone’s output at different angles
of incidence of a sound wave.
 The distance of the polar plot from the centre of the diagram is usually
measured by decibels (dB). The further the plot is from the centre, the
greater the output of the microphone at that angle.
 A nominal 0 dB is usually marked for the response at zero degrees at 1
kHz.
9
Omnidirectional pattern
 An omnidirectional microphone picks up sound equally from all directions,
i.e., a response of 1.
 The polar diagram of an ideal omnidirectional microphone is shown below,
where the microphone response is omnidirectional for all frequencies.
 Such a pattern can be achieved by leaving the microphone diaphragm
open at the front, but completely enclosed at the rear, so that it responds
only to the change of air pressure caused by the sound waves.
Source: Francis Rumsey and
Tim McCormick (1994)
10
Omnidirectional pattern (cont.)
 The polar diagram of a (typical) real omnidirectional microphone at a
number of frequencies is shown in the figure below.
Source: Francis Rumsey and
Tim McCormick (1994)
11
Omnidirectional pattern (cont.)
 For this microphone, its response is perfectly omnidirectional for
frequencies up to 2 kHz. For frequencies between 3-6 kHz, its sensitivity
at 180 degree (i.e. at the rear of the microphone) drops about 6 dB, as
compared with the lower frequencies (up to 2 kHz). For frequencies above
8 kHz, the response at the 180 degree could drop as much as 15 dB.
Therefore, sounds picked up by such a microphone will lose considerably
the treble (high) frequency components of the signal.
 The smaller the dimension of the microphone, the better the polar
response at high frequencies, and the mics with quarter-inch diaphragms,
for example, maintains good response up to 10 kHz.
 Omni microphones are usually the most immune to mic movement and
wind noise (as compared to the other types discussed later), as they are
only sensitive to the absolute sound pressure.
12
Bidirectional (figure-eight) pattern
 Bidirectional (or figure-eight) microphone has a polar response proportional
to the mathematical cosine of the angle of incidence of the sound waves.
 At 90 degree, no sound is picked up. At 0 degree, the sound is picked up
by a front lobe, and at 180 degree, by a rear lobe, which will be 180 degree
out of phase as compared with the one from the front lobe.
Phase
Source: Francis Rumsey and
Tim McCormick (1994)
13
Bidirectional pattern (cont.)
 In such a microphone, such as the traditional ribbon microphone, the
diaphragm operates on the pressure-gradient principle, i.e. responding to
the difference in pressure between the front and the rear of the
microphone. Therefore, for a sound from a direction 90 degree off axis, the
sound pressure will be of equal magnitude to both sides of the diaphragm,
and hence cause no movement of diaphragm, giving no output.
 For a sound arrives to the microphone from the front at 0 degree, a phase
difference arises between the front and the rear of the diaphragm, due to
the small additional distance travelled by the wave. The resulting difference
in pressure produces movement of the diaphragm and hence gives an
output (or response).
 For very low frequencies, the phase difference between the front and rear
becomes very small (due to the long wavelengths), and the output
response will become lower.
 The polar response of bidirectional mic tends to be very uniform at all
frequencies, except for a slight narrowing at above approximately 10 kHz.
 In practice, correct orientation of such microphones is required in use.
14
Unidrectional (cardioid) pattern
 Unidirectional (also known as cardioid) pattern is described mathematically
as 1+cos(phi), where phi is the angle of incidence of the sound signal.
 An idealised polar diagram of an unidirectional microphone is shown in the
figure below.
Source: Francis Rumsey and
Tim McCormick (1994)
15
Unidrectional pattern (cont.)
 The response of the unidirectional microphone can be regarded as a
combination of the omnidirectional and bidirectional responses, as shown
in the figure below.
 At 0 degree, both polar responses are of equal amplitude and phase, when
adding together, they produce a total output which is twice that of either
separately. At 180 degree, they cancel each other due to the opposite
phase.
Source: Francis Rumsey and
Tim McCormick (1994)
16
Unidirectional pattern (cont.)
 Such microphones can be obtained by leaving
the diaphragm open at the front, but
introducing various acoustic labyrinths at the
rear which cause sound to reach the back of
the diaphragm in various combinations of
amplitude and phase, resulting in a cardioid
response.
 A typical polar diagram of an unidirectional
microphone at low (LF), middle (MF) and high
frequencies (HF) is shown on the right figure.
 The polar response at mid-frequencies is very
good, but tends to degenerate towards omni at
the low frequencies (which are picked up quite
uniformly), and becomes more directional than
is desirable at high frequencies (sounds
arriving from the rear will not be completely
attenuated).
Source: Francis Rumsey and
Tim McCormick (1994)
17
Hypercardioid pattern
 The hypercardioid response is described mathematically as
0.5+cos(phi), where phi is the angle of incidence of sound signal.
 It can be considered as a combination of an omni response (attenuated
by 6 dB), and a figure-eight response. The shape of the response lies
in between the cardioid and figure-eight patterns, having a relatively
small rear lobe which is out of phase with the front lobe.
 The hypercardioid microphone has the highest direct-to-reverberant
ratio of the patterns, implying that the ratio between the level of on-axis
sound and the level of reflected sounds picked up from other angles is
very high. As a result, it is good for excluding unwanted sounds (such
as room reverberations or unwanted noise).
Demo for polar patterns:
http://www.youtube.com/watch?v=_MMHi8bQVv0
http://www.youtube.com/watch?v=TUHpLqvw9AA
18
Examples of microphones:
switchable polar patterns
 Two identical diaphragms are used and placed on each side of a central rigid plate.
Perforations in the central plate give both diaphragms an cardioid response.
 When the polarising voltage of one side is the opposite to the other, the combined
output gives a figure-eight response, as cardioids are out of phase. When the
polarising voltage of both sides is the same, the combined output gives
omnidirectional response, as the cardioids are in phase. Intermediate combinations
give cardioid and supercardioid polar responses.
A typical double-diaphragm microphone with switchable polar patter: AKG C414B-ULS
19
Examples of microphones: stereo
micophones
 Two microphones are built into a single casing where one capsule is
rotatable with respect to the other so that the angle between the two can be
adjusted.
 Each capsule can be switched to give desired polar response, such as a
pair of figure-eight microphones, or a pair of cardioids.
A typical stereo microphone: the Neumann SM69
20
Examples of microphones: stereo
microphones
 The sum-and-difference microphone is another type of stereo in which the
sum (middle, i.e. (L+R)/2 of the conventional stereo microphone) and
difference (side, i.e. (L-R)/2)) are combined in a matrix box to produce a
left-right stereo signal.
A typical sum-and-difference stereo microphone: the Shure VP88
21
Examples of microphones: stereo
microphones
 An example of sophisticated stereo microphone is the AMS Soundfield
microphone, shown below.
 In this microphone, each channel is fully adjustable from omni through
cardioid to figure-eight, and angles between the capsules are also fully
adjustable. These are controlled electronically by a remotely sited control
unit.
Second generation AMS
First generation AMS
Source: Francis Rumsey and
Tim McCormick (1994)
22
A/D Conversion
 A/D converter is used to convert the analogue audio signal (a time varying
electrical voltage, say, the output of a microphone), into a series of
‘samples’ which are ‘snapshots’ of the analogue signal taken at periodic
intervals (known as the sampling period).
 It usually consists of sampling and quantisation steps.
23
Sampling
 In this process, measurements (i.e. samples) are taken from the analogue
audio signal (shown on the left sub-plot below) at regular intervals in time.
This is usually achieved by a sample and hold circuit.
 To represent the fine detail of the signal (or to reconstruct the analogue
signal perfectly from the samples), it is necessary to take a large number of
samples per second. As dictated by the Shannon sampling theorem, at
least two samples must be taken per audio cycle (i.e. period). In other
words, the sampling frequency should be at least two times of the
frequency of the highest frequency component within the signal.
Sample period: T (in second)
Sample frequency : f = 1/T (in
Hz)
T
24
Aliasing effect due to under sampling
 For the subplot a on the figure below, enough samples have been taken
and the signal can be perfectly reconstructed from the samples.
 For the subplot b, less than two samples per cycle are taken from the
wave, as a result the signal may be reconstructed as another signal
(denoted by the dashed curve), instead of the signal that was originally
sampled (the solid curve). This is known as aliasing effect.
Source: Francis Rumsey and
Tim McCormick (1994)
25
Frequency domain interpretation of
sampling
 The sampling process can also be considered as a modulation process,
called pulse amplitude modulation (PCM) where a series of pulses of
constant amplitude is amplitude modulated by the analogue audio
waveforms. In other words, the amplitudes of the pulses are modified by
the instantaneous amplitude of the analogue audio signal.
26
Frequency domain interpretation of
sampling (cont.)
Source: Francis Rumsey and
Tim McCormick (1994)
27
Frequency domain interpretation of
sampling (cont.)
 (a) The unmodulated sample pulses display a typical harmonic series of
components at integer multiples of fs (fs =30 kHz in this case).
 (b) When a 1 kHz sine wave is sampled at fs = 30k Hz, it generates
sideband components at frequencies spaced at the either side of fs (i.e. 29
= fs-1 and 31 = fs+1 kHz), and its multiples (i.e., 59 =2fs-1 and 61 =2fs+1
kHz).
 (c) When a 17 kHz sine wave is sampled at fs = 30k Hz, it generates
sideband components at frequencies spaced at the either side of fs (i.e. 13
= fs-17 and 47 = fs+17 kHz), and its multiples (i.e., 43= 2fs-17 and 77 =
2fs+17 kHz). As the sideband 13 kHz will be within the frequency range of
the baseband, i.e. the spectrum of the original audio signal), it will also be
audible.
28
Anti-aliasing
 One way to remove the aliasing effect is to make sure the sampling frequency
to be at least twice the highest frequency in the signal.
 An alternative way is to use an anti-aliasing filter to remove the frequency
components of the signal whose frequencies are higher than half of the
sampling frequency (also usually called Nyquist frequency), as shown below.
Source: Francis Rumsey and
Tim McCormick (1994)
Demo for aliasing effects and anti-aliasing:
http://www.youtube.com/watch?v=YB9nALmwSL8
http://www.youtube.com/watch?v=EQ-ovLnVTIM
29
Quantisation
 In the quantisation process, each sample is assigned a value from a range of fixed
possibilities, as shown in an example below, where a scale from 1 to 10 is used for
both positive and negative ranges (i.e. a decimal system). Each sample is
represented by an integer number on this scale, and hence if the amplitude of the
sample obtained from the sampling process is a fraction or decimal, it will be
rounded to the nearest integer number during quantisation.
Source: Francis Rumsey and
Tim McCormick (1994)
The quantised sequence: -3, 1, 5, 7, …, -5, -7, -9
30
Quantisation (cont.)
 The difference between the
sample amplitude represented
by the numbers and the
original amplitude of the
sample is called quantisation
error.
 The maximum quantisation
error will be half of a
quantisation step size, Q.
 In the subplot (a), there are a
fewer number of quantisation
steps, therefore, the
quantisation error is bigger, as
compared to the subplot (b).
Source: Francis Rumsey and Tim McCormick (1994)
31
Quantisation (cont.)
 In digital audio systems, a binary number (instead of decimal) system is
used to quantise the samples, as shown below: (a) a binary number
consists of a number of bits; (b) each bit represents a power of two; (c)
binary numbers can be represented electrically in pulse-code modulation
(PCM) by a string of high and low voltages.
Source: Francis Rumsey and Tim
McCormick (1994)
32
Quantisation (cont.)
 A 4-bit binary quantisation scale: two’s complement. The leftmost bit is the
most significant bit (MSB) which determines whether the number is positive
or negative.
Source: Francis Rumsey and
Tim McCormick (1994)
33
Quantisation (cont.)
 The quantisation error (noise) can be considerably reduced by oversampling (the
Nyquist frequency is above the upper limit of the audio band) which essentially
spreads the quantisation noise into a wide range of frequencies, resulting in about 3
dB noise reduction per octave (i.e. double the sampling frequency) of oversampling.
Therefore, it is the key in improving digital audio quality on both A/D and D/A
converters. ‘Decimation’ is performed to reduce the sampling rate and increase the
bit depth of the quantised samples obtained at high sampling rate.
Source: Francis Rumsey and
Tim McCormick (1994)
34
Quantisation (cont.)
 The dynamic range of the digital audio is limited by the high-level end of
the quantisation scale. Any amplitude of the samples that is out of this
range will be clipped, and, as a result, the signal will be distorted.
Demos for quantisation noise:
Source: Francis Rumsey and
Tim McCormick (1994)
http://www.youtube.com/watch?v=_cRFBBnUFug
35
D/A conversion
 The audio sample words are converted back into a staircase-like chain of
electrical levels corresponding to the sample values.
 Resampling is used to reduce the width of the pulses, in order to reduce
the so-called aperture effect (equalisation is required to correct for the
aperture effect).
 Finally, a low-pass smoothing filter is used to reconstruct the audio signal.
Source: Francis Rumsey and
Tim McCormick (1994)
36
Earlier digital audio recording formats
 Digital tape recording (a magnetic tape data storage format
introduced by Sony, 1980s)
 Hard-disk recording (a digital magnetic data storage format,
introduced by IBM in 1956, used for audio recording in 1976 by
Sony)
 Compact-disc (CD) recording (an optical disc used originally to
store digital audio data, commercially available in 1982)
 DVD recording (an optical disc storage format, invented by Philip,
Sony, Toshiba, Panasonic in 1995, offering higher storage
capacity than CD while having the same dimensions)
37
Examples of digital audio recorder
 Sony’s PCM-F1, digital tape recording, sampling rate 44.1 kHz.
Source: Francis Rumsey and
Tim McCormick (1994)
38
Examples of digital audio recorder
(cont.)
 Sony’s PCM-1610 digital tape recorder, sampling rates 44.1.
Source: Francis Rumsey and
Tim McCormick (1994)
39
Examples of digital audio recorder
(cont.)
 A Sony portable DAT digital tape recorder, sampling rates 44.1 and 48 kHz.
Source: Francis Rumsey and
Tim McCormick (1994)
40
Examples of digital audio recorder
(cont.)
 Lynx Digital Audio Recorder (containing the A/D, D/A converters)
Demos:
http://www.youtube.com/watch?v=gwLTr8v01AI
http://www.youtube.com/watch?v=OVauM51sLYw
41
Recent developments in digital
audio recording (since 2000s)
 Super audio CD (high resolution, optical disc for audio storage)
 DVD-A (a digital format for delivering high-fidelity audio contents on DVD)
 Blue-ray Disc (an optical disc storage media, a competitor of HD DVD)
 HD DVD (a high-density optical disc format using red laser for recording)
 Internet radio webcasting (audio service transmitted over the internet)
 Podcasting (non-streamed webcast, audio downloaded from web feed (a
remote server) through a client software podcatcher)
42
Free digital audio recording and
editing software
 Audacity

Audacity is free, open source software for recording and editing sounds. It allows you to
record live audio, converts tapes and records into digital recordings or CDs, edit Ogg
Vorbis, MP3, WAV or AIFF sound files. You also can cut, copy, split or mix sounds together
with Audacity. Built-in effects are given to remove static, hiss, hum or other constant
background noises.
 Power Sound Editor

Power Sound Editor is a visual audio editing and recording solution, which supports many
advanced and powerful operations with audio data.
 MP3DirectCut

mp3DirectCut is a fast and extensive audio editor and recorder for compressed mp3. You
can directly cut, copy, paste or change the volume with no need to decompress your files
for audio editing. Using Cue sheets, pause detection or Auto cue you can easily divide long
files.
43
Free digital audio recording and
editing software (cont.)
 Music Editor Free

Music Editor Free (MEF) is a multi-award winning music editor software tool. MEF helps
you to record and edit music and sounds. It lets you make and edit music, voice and other
audio recordings. When editing audio files you can cut, copy and paste parts of recordings
and, if required, add effects like echo, amplification and noise reduction.
 Wavosaur

Wavosaur is a free sound editor, audio editor, wav editor software for editing, processing
and recording sounds, wav and mp3 files. Wavosaur has all the features to edit audio (cut,
copy, paste, etc.) produce music loops, analyze, record, batch convert. Wavosaur supports
VST plugins, ASIO driver, multichannel wav files, real time effect processing. The program
has no installer and doesn’t write in the registry. Use it as a free mp3 editor, for mastering,
sound design.
 Ardour

Ardour is a digital audio workstation. You can use it to record, edit and mix multi-track
audio. You can produce your own CDs, mix video soundtracks, or just experiment with new
ideas about music and sound.
Source: http://www.hongkiat.com/blog/25-free-digital-audio-editors/, where you can find more
free audio recording software from this link.
44
Reference
 F. Rumsey and T. McCormick, Sound and Recording: an Introduction, 2nd
Edition, 1994.
45