Transcript (a) (b)

E.M. Bakker
LML Audio Processing and Indexing 2010
1
Overview
 The Physics of Sound
 The Production of Speech
 The Perception of Speech and Audio
 Frequency Masking
 Noise Masking
 Temporal Masking
 Phonetics and Phonology
 Analog and Digital Signals
LML Audio Processing and Indexing 2010
2
The Physics of Sound
Speed of Sound
Air (at sea level, 20 C)
343 m/s (1230 km/h)
Water (at 20 C)
1482 m/s (5335 km/h)
Steel
5960 m/s (21460 km/h)
Tendon
1650 m/s
V=(331+0.6T) m/s
Speed of Sound
 Proportional to sqrt(elastic modulus/density)
 Second order dependency on the amplitude of the
sound => nonlinear propagation effects
LML Audio Processing and Indexing 2010
3
The Production of Speech
Speech Signal
Speech
Recognition
Words
“How are you?”
How is SPEECH produced?
 Characteristics of Acoustic Signal
LML Audio Processing and Indexing 2010
4
Human Speech Production
 Physiology
 Schematic and X-ray Sagittal View
 Vocal Cords at Work
 Transduction
 Spectrogram
 Acoustics
 Acoustic Theory
 Wave Propagation
(Picture by Y.Mrabet from Wikipedia)
LML Audio Processing and Indexing 2010
5
Sagittal Plane View of
the Human Vocal Apparatus
LML Audio Processing and Indexing 2010
6
Sagittal Plane View of
the Human Vocal Apparatus
LML Audio Processing and Indexing 2010
7
Characterization of
English Phonemes
LML Audio Processing and Indexing 2010
8
English Phonemes
Bet
Pin
LML Audio Processing and Indexing 2010
Debt
Sp
i
Allophone p
Get
n
9
The Vowel Space
 We can characterize a
vowel sound by the
locations of the first and
second spectral
resonances, known as
formant frequencies.
 Some voiced sounds,
such as diphthongs, are
transitional sounds that
move from one vowel
location to another.
LML Audio Processing and Indexing 2010
10
Phonetics
Formant Frequency Ranges
LML Audio Processing and Indexing 2010
11
Vocal Chords
LML Audio Processing and Indexing 2010
12
Models for Speech Production
LML Audio Processing and Indexing 2010
13
Models for Speech Production
LML Audio Processing and Indexing 2010
14
Vocal Tract Workshop
Articulatory Speech Synthesis using TractSyn (P. Birkholz,
2004).
Reading material:
P. Birkholz, D. Jackel, A Three-Dimensional Model of the
Vocal Tract for Speech Synthesis. In Proceedings of the 15th
International Congress of Phonetic Sciences, pp. 25972600, Barcelona, Spain, 2003.
See: http://www.vocaltractlab.de/
LML Speech Recognition 2009
15
The Perception of Speech
Speech Signal
Speech
Recognition
Words
“How are you?”
How is SPEECH perceived
LML Audio Processing and Indexing 2010
16
The Perception of Speech
Sound Pressure
 The ear is the most sensitive human
organ. Vibrations on the order of
angstroms are used to transduce
sound. It has the largest dynamic
range (~140 dB) of any organ in
the human body.
 The lower portion of the curve is an
audiogram - hearing sensitivity. It
can vary up to 20 dB across listeners.
 Above 120 dB corresponds to a nice
pop-concert (or standing under a
Boeing 747 when it takes off).
 Typical ambient office noise is about
55 dB.
x dB = 10 log10(x/x0)), x0 = 1kHz signal
with intensity that is just hearable.
LML Audio Processing and Indexing 2010
17
dB
(SPL)
Source (with distance)
194
Theoretical limit for a sound wave at 1 atmosphere environmental pressure; pressure waves
with a greater intensity behave as shock waves.
188
Space Shuttle liftoff as heard from launch tower (less than 100 feet) (source: acoustical studies [1] [2]).
180
Krakatoa volcano explosion at 1 mile (1.6 km) in air [3]
160
M1 Garand being fired at 1 meter (3 ft); Space Shuttle liftoff as heard from launch pad perimeter
(approx. 1500 feet) (source: acoustical studies [4] [5]).
150
Jet engine at 30 m (100 ft)
140
Low Calibre Rifle being fired at 1m (3 ft); the engine of a Formula One car at 1 meter (3 ft)
130
Threshold of pain; civil defense siren at 100 ft (30 m)
120
Space Shuttle from three mile mark, closest one can view launch. (Source: acoustical studies) [6] [7].
[Train horn]] at 1 m (3 ft). Many foghorns produce around this volume.
110
Football stadium during kickoff at 50 yard line; chainsaw at 1 m (3 ft)
100
Jackhammer at 2 m (7 ft); inside discothèque
90
Loud factory, heavy truck at 1 m (3 ft), kitchen blender
80
Vacuum cleaner at 1 m (3 ft), curbside of busy street, PLVI of city
70
Busy traffic at 5 m (16 ft)
60
Office or restaurant inside
50
Quiet restaurant inside
40
Residential area at night
30
Theatre, no talking
20
Whispering
10
Human breathing at 3 m (10 ft)
0
Threshold of human hearing (with healthy ears); sound of a mosquito flying 3 m (10 ft) away
LML Audio Processing and Indexing 2010
18
The Perception of Speech:The Ear
 Three main sections, outer, middle,
and inner ear.
 The outer and middle ears reproduce
the analog signal (impedance
matching).
 the inner ear transduces the pressure
wave into an electrical signal.
 The outer ear consists of the external
visible part and the auditory canal.
The tube is about 2.5 cm long.
 The middle ear consists of the
eardrum and three bones (malleus,
incus, and stapes). It converts the
sound pressure wave to displacement
of the oval window (entrance to the
inner ear).
LML Audio Processing and Indexing 2010
19
The Perception of Speech:The Ear
 The inner ear primarily consists of
a fluid-filled tube (cochlea) which
contains the basilar membrane.
Fluid movement along the basilar
membrane displaces hair cells,
which generate electrical signals.
 There are a discrete number of hair
cells (30,000). Each hair cell is
tuned to a different frequency.
 Place vs. Temporal Theory: firings
of hair cells are processed by two
types of neurons (onset chopper
units for temporal features and
transient chopper units for spectral
features).
LML Audio Processing and Indexing 2010
20
Perception
Psychoacoustics
 Psychoacoustics: a branch of science
dealing with hearing, the sensations
produced by sounds.
 A basic distinction must be made
between the perceptual attributes of
a sound vs measurable physical
quantities:
 Many physical quantities are
perceived on a logarithmic scale (e.g.
loudness). Our perception is often a
nonlinear function of the absolute
value of the physical quantity being
measured (e.g. equal loudness).
 Timbre can be used to describe why
musical instruments sound different.
 What factors contribute to speaker
identity?
Physical Quantity
Perceptual Quality
Intensity
Loudness
Fundamental
Frequency
Pitch
Spectral Shape
Timbre
Onset/Offset Time
Timing
Phase Difference
(Binaural Hearing)
Location
LML Audio Processing and Indexing 2010
21
Perception
Equal Loudness
 Just Noticeable Difference
(JND): The acoustic value
at which 75% of responses
judge stimuli to be
different (limen)
 The perceptual loudness
of a sound is specified via
its relative intensity above
the threshold. A sound's
loudness is often defined
in terms of how intense a
reference 1 kHz tone must
be heard to sound as
loud.
0 dB
LML Audio Processing and Indexing 2010
22
Perception, Non-Linear Frequency Warping:
Bark and Mel Scale
 Critical Bandwidths: correspond to approximately 1.5 mm
spacings along the basilar membrane, suggesting a set of 24
bandpass filters.
 Critical Band: can be related to a bandpass filter whose
frequency response corresponds to the tuning curves of
auditory neurons. A frequency range over which two
sounds will sound like they are fusing into one.
 Bark Scale:
 Mel Scale:
LML Audio Processing and Indexing 2010
23
Perception
Bark and Mel Scale
 The Bark scale
implies a nonlinear
frequency mapping
LML Audio Processing and Indexing 2010
24
Perception
Bark and Mel Scale
 Filter Banks used in ASR:
 The Bark scale implies a
nonlinear frequency
mapping
LML Audio Processing and Indexing 2010
25
Comparison of
Bark and Mel Space Scales
LML Audio Processing and Indexing 2010
26
Perception: Tone-Masking Noise
 Frequency masking: one sound cannot be perceived if another




sound close in frequency has a high enough level. The first
sound masks the second.
Tone-masking noise: noise with energy EN (dB) at Bark
frequency g masks a tone at Bark frequency b if the tone's energy
is below the threshold:
TT(b) = EN - 6.025 - 0.275g + Sm(b-g) (dB SPL)
where the spread-of-masking function Sm(b) is given by:
Sm(b) = 15.81 + 7.5(b+0.474)-17.5*
sqrt(1 + (b+0.474)2) (dB)
Temporal Masking: onsets of sounds are masked in the time
domain through a similar masking process.
Thresholds are frequency and energy dependent.
Thresholds depend on the nature of the sound as well.
LML Audio Processing and Indexing 2010
27
Perception: Noise-Masking Tone
 Noise-masking tone: a tone at Bark frequency g with energy ET (dB)
masks noise at Bark frequency b if the noise energy is below the
threshold:
TN(b) = ET - 2.025 - 0.17g + Sm(b-g) (dB SPL)
 Masking thresholds are commonly referred to as Bark scale functions of
just noticeable differences (JND).
 Thresholds are not symmetric.
 Thresholds depend on the nature of the noise and the sound.
LML Audio Processing and Indexing 2010
28
Masking
LML Audio Processing and Indexing 2010
29
Perceptual Noise Weighting
 Noise-weighting: shaping the
spectrum to hide noise
introduced by imperfect analysis
and modeling techniques
(essential in speech coding).
 Humans are sensitive to noise
introduced in low-energy areas
of the spectrum.
 Humans tolerate more additive
noise when it falls under high
energy areas of the spectrum.
The amount of noise tolerated is
greater if it is spectrally shaped to
match perception.
 We can simulate this phenomena
using "bandwidth-broadening"
LML Audio Processing and Indexing 2010
30
Perceptual Noise Weighting
Simple Z-Transform interpretation:
 can be implemented by
evaluating the Z-Transform
around a contour closer to the
origin in the z-plane:
Hnw(z) = H(az).
 Used in many speech
compression systems (Code
Excited Linear Prediction).
 Analysis performed on
bandwidth-broadened speech;
synthesis performed using
normal speech. Effectively
shapes noise to fall under the
formants.
LML Audio Processing and Indexing 2010
31
Perception: Echo and Delay
 Humans are used to hearing their voice while they speak 





real-time feedback (side tone).
When we place headphones over our ears, which dampens
this feedback, we tend to speak louder.
Lombard Effect: Humans speak louder in the presence of
ambient noise.
When this side-tone is delayed, it interrupts our cognitive
processes, and degrades our speech.
This effect begins at delays of approximately 250 ms.
Modern telephony systems have been designed to maintain
delays lower than this value (long distance phone calls
routed over satellites).
Digital speech processing systems can introduce large
amounts of delay due to non-real-time processing.
LML Audio Processing and Indexing 2010
32
Perception: Adaptation
 Adaptation refers to changing sensitivity in response to a continued
stimulus, and is likely a feature of the mechano-electrical
transformation in the cochlea.
 Neurons tuned to a frequency where energy is present do not change
their firing rate drastically for the next sound.
 Additive broadband noise does not significantly change the firing rate
for a neuron in the region of a formant.
Visual Adaptation
 The McGurk Effect is an auditory illusion which results from
combining a face pronouncing a certain syllable with the sound of a
different syllable. The illusion is stronger for some combinations than
for others. For example, an auditory 'ba' combined with a visual 'ga' is
perceived by some percentage of people as 'da'. A larger proportion will
perceive an auditory 'ma' with a visual 'ka' as 'na'. Some researchers
have measured evoked electrical signals matching the "perceived"
sound.
LML Audio Processing and Indexing 2010
33
Perception: Timing
 Temporal resolution of the ear is crucial.
 Two clicks are perceived mono-aurally as one unless they





are separated by at least 2 ms.
17 ms of separation is required before we can reliably
determine the order of the clicks. (~58bps or ~3530bpm)
Sounds with onsets faster than 20 ms are perceived as
"plucks" rather than "bows".
Short sounds near the threshold of hearing must exceed a
certain intensity-time product to be perceived.
Humans do not perceive individual "phonemes" in fluent
speech - they are simply too short. We somehow integrate
the effect over intervals of approximately 100 ms.
Humans are very sensitive to long-term periodicity (ultra
low frequency) – this has implications for random noise
generation.
LML Audio Processing and Indexing 2010
34
Analog and Digital Signals
1.
From Analog to Digital Signal
2.
Sampling & Aliasing
LML Audio Processing and Indexing 2010
35
Analog and Digital Signals
Analog Signals
Continuous function F of a
continuous variable t (t can be
time, space etc) : F(t)
Digital Signals
Discrete function Fk of a discrete
(sampling) variable tk, with k an
integer: Fk = F(tk).
Uniform (periodic) sampling with sampling
frequency fS = 1/ tS, e.g., ts = 0.001 sec => fs = 1000Hz
LML Audio Processing and Indexing 2010
36
Digital System Implementation
Analog Input
Antialiasing Filter
A/D Conversion
Digital Processing
Important issues:
Analysis bandwidth, Dynamic range
• Pass/stop bands
• Sampling rate, Number of bits, and
further parameters
• Digital format
Digital Output
LML Audio Processing and Indexing 2010
37
Sampling
How fast must we sample a continuous signal to preserve its information content?
Example: turning wheels in a movie
 25 frames per second, i.e., 25 samples/sec.
 Train starts => wheels appear to go clockwise
 Train accelerates => wheels go counter clockwise
Frequency misidentification due to low sampling frequency.
Sampling: independent variable (for example time) continuous -> discrete
Quantisation: dependent variable (for example voltage) continuous -> discrete.
Here we’ll talk about uniform sampling.
LML Audio Processing and Indexing 2010
38
Sampling
1.2
__ s(t) = sin(2f t)
0
1
0.8
0.6
s(t) @ fS
f0 = 1 Hz, fS = 3 Hz
0.4
0.2
0
tt
-0.2
-0.4
-0.6
__ s (t) = sin(8f t)
0
1
-0.8
-1
__ s (t) = sin(14f t)
0
2
-1.2
s(t) @ fS represents exactly all sine-waves sk(t) defined by:
sk (t) = sin( 2 (f0 + k fS) t ) , k 
N
LML Audio Processing and Indexing 2010
39
The sampling theorem
Theorem
A signal s(t) with maximum frequency fMAX can be
recovered if sampled at frequency fS > 2 fMAX .
* Multiple proposers: Whittaker(s), Nyquist, Shannon, Kotel’nikov.
Nyquist frequency (rate) fN = 2 fMAX
Example
s(t)  3  cos(50 π t)  10  sin(300 π t)  cos(100π t)
F1
F2
Condition on fS?
F3
F1=25 Hz, F2 = 150 Hz, F3 = 50 Hz
fS > 300 Hz
fMAX
LML Audio Processing and Indexing 2010
40
Frequency Domain
 Time and Frequency are two complementary signal
descriptions. Signal can be seen as projected onto the time
domain or frequency domain.
 Bandwidth indicates the rate of change of a signal: High
bandwidth => signal changes fast
 Passband Bandwidth => lower and upper cutoff
frequencies
Example
Ear + brain act as frequency analyser: audio spectrum
split into many narrow bands
low-power sounds
detected out of loud background.
LML Audio Processing and Indexing 2010
41
Sampling Low-Pass Signals
Continuous spectrum
(a)
-B
0
(b)
B
(a) Band-limited signal:
frequencies in [-B, B] (fMAX = B).
f
Discrete spectrum
No aliasing
-B
0
B fS/2
(b) Time sampling
frequency
repetition.
fS > 2 B
no aliasing.
Note: s(t) at fS represents all sine-waves sk(t) defined by:
f
sk (t) = sin( 2 (f0 + k fS) t ) , k  N
Discrete spectrum
Aliasing & corruption
(c)
0
fS/2
(c) fS
f
2B
aliasing !
Aliasing: signal ambiguity in
frequency domain
LML Audio Processing and Indexing 2010
42
Antialiasing Filter
(a)
(a),(b) Out-of-band noise can aliase
Signal of interest
Out of band
noise
Out of band
noise
-B
0
B
-B
0
B fS/2
into band of interest. Filter it before!
f
(b)
(c)
(c) Antialiasing filter
f
Passband: depends on bandwidth of
interest.
LML Audio Processing and Indexing 2010
43
Under-sampling
Bandpass signal
centered on fC
Using spectral replications to reduce sampling
frequency fS requirements.
B
0
f
2  fC  B
2  fC  B
 fS 
m 1
m
fC
mN, selected so that fS > 2B
-fS
f
Example
fC = 20 MHz, B = 5MHz
Without under-sampling fS > 40 MHz.
With under-sampling fS = 22.5 MHz (m=1);
= 17.5 MHz (m=2); = 11.66 MHz (m=3).
LML Audio Processing and Indexing 2010
0
fS
2fS
fC
Advantages
 Slower ADCs / electronics
needed.
 Simpler antialiasing filters.
44
Over-sampling
Oversampling : sampling at frequencies fS >> 2 fMAX .
Over-sampling & averaging may improve ADC resolution
fOS =
4w
· fS
fOS = over-sampling frequency,
w = additional bits required.
Each additional bit implies/requires over-sampling by a factor of four.
LML Audio Processing and Indexing 2010
45
(Some) ADC parameters
1.
2.
3.
4.
Number of bits N (~resolution)
Data throughput (~speed)
Signal-to-noise ratio (SNR)
Signal-to-noise-&-distortion rate
(SINAD)
5. Effective Number of Bits (ENOB)
6. …
Different applications
have different needs.
Radar systems
Static distortion
Communication
Imaging / video
NB: Definitions may be slightly manufacturer-dependent!
LML Audio Processing and Indexing 2010
46
ADC - Number of bits N
Continuous input signal digitized into 2N levels.
Uniform, bipolar transfer function (N=3)
1113
2
1
Quantisation step q =
0
-4
-3
-2
-1
0
1
2
3
V max
2N
4
V
-1
Ex: Vmax = 1V , N = 12
q = 244.1 V
010
-2
001
-3
Voltage ( = q)
Scale factor (= 1 / 2N )
Percentage (= 100 / 2N )
000
-4
VFSR
1
q/2
0.5
0
-4
-3
-2
-1
0
1
-0.5
-q/2
2
3
4
Quantisation error
-1
LML Audio Processing and Indexing 2010
47
ADC - Quantisation error
0.3
Voltage [V]
0.2
0.1
0
0
2
4
6
8
10
-0.1
|e q | [V]
-0.2
2 10-4
10
 Quantisation Error eq in
[-0.5 q, +0.5 q].
 eq limits ability to resolve
small signal.
 Higher resolution means
lower eq.
time [ms]
QE for
N = 12
VFS = 1
-4
0
2
4
6
8
10
Sampling time, tk
LML Audio Processing and Indexing 2010
48
SNR of ideal ADC
 RMS input  
 (1)
SNRideal  20  log10 
 RMS(e q ) 


Also called SQNR
(signal-to-quantisation-noise ratio)
Assumptions
 Ideal ADC: only quantisation error eq
(p(e) constant, no stuck bits…)
 eq uncorrelated with signal.
 ADC performance constant in time.
(RMS = root mean square)
RMS input  
T
2
V
1  VFSR

 
 sinωt  dt  FSR
T  2
2 2

0
Input(t) = ½ VFSR sin( t).
p(e)
quantisation error probability density
q/2
RMS(e q ) 

 
eq2  p eq deq 
-q/2
VFSR
q

12 2N  12
(sampling frequency fS = 2 fMAX)
LML Audio Processing and Indexing 2010
1
q
q
2
q
2
eq
Error value
49
SNR of ideal ADC
Substituting in (1) =>
SNRideal  6.02  N  1.76 [dB]
One additional bit
(2)
SNR increased by 6 dB
Real SNR lower because:
- Real signals have noise.
- Forcing input to full scale unwise.
- Real ADCs have additional noise (aperture jitter, non-linearities etc).
Actually (2) needs correction factor depending on ratio between sampling freq
& Nyquist freq. Processing gain due to oversampling.
LML Audio Processing and Indexing 2010
50
ADC selection dilemma
Speed & resolution:
a tradeoff.
Effective Number of Bits (ENOB)
High resolution (bit #)
- Higher cost & dissipation.
- Tailored onto DSP word width.
*
Sound cards:
High speed
- Large amount of data to store/analyse.
- Lower accuracy & input impedance.
x
DIFFICULT area moves down
& right every year. Rule of thumb:
1 bit improvement every 3 years.
*
Oversampling & averaging (see 2 ).
Dithering ( = adding small random noise before quantisation).
LML Audio Processing and Indexing 2010
may increase SNR.
51
Finite word-length effects
Overflow : arises when arithmetic operation result has one too
many bits to be represented in a certain format.
Fixed point ~ 180 dB
Dynamic rangedB = 20 log10
largest value
smallest value
Floating point ~1500 dB
High dynamic range => wide data set representation with no overflow.
Note: Different applications have different needs. For example:
• Telecommunication: 50 dB
• HiFi audio: 90 dB.
LML Audio Processing and Indexing 2010
52
Erwin M. Bakker
Overview
 Fourier Transforms
 DFT Topics
 Material slightly adapted from lectures by
 Dr M.E. Angoletta at DISP2003,
 a DSP course given by CERN and University of
Lausanne (UNIL)
 http://humanresources.web.cern.ch/humanresources/
external/training/tech/special/DISP2003.asp
Fourier Transforms




Frequency analysis: a powerful tool
A tour of Fourier Transforms
Continuous Fourier Series (FS)
Discrete Fourier Series (DFS)
Frequency Analysis




Fast & efficient insight on signal’s building blocks.
Simplifies original problem - ex.: solving Part. Diff. Eqns. (PDE).
Powerful & complementary to time domain analysis techniques.
Several transforms: Fourier, Discrete Cosine, Laplace, z, Wavelet, etc.
time, t
analysis
General Transform as
problem-solving tool
frequency, f
F
S(f) = F[s(t)]
s(t)
synthesis
s(t), S(f) :
Transform Pair
Bases of Vector Spaces
f(t) = k1eit+ k2e2it+k3e3it
v = xe1+ye2+ze3
e3it
e3
e2
e2it
e1
eit
v is a linear combination of the
basis vectors ei (i = 1,2,3)
f is a linear combination of the
basis functions eit,e2it,e3it
Fourier Coefficients
v
w
v-cw
cw
Let <.,.> an in-product for our vector space V. Then we calculate
the Fourier coefficient of v in V with respect to (basis) vector w by:
cw is the component of v along the direction of w.
c
 v, w 
 w, w 
Fourier Analysis - Tools
Input Time Signal
Frequency spectrum
2.5
2
1.5
1
0.5
Periodic
0
0
1
2
3
4
5
6
7
8
time, t
Continuous
2.5
(period T)
2
Aperiodic
1.5
1
FS
Discrete
FT
Continuous
T
1
ck    s(t)  e j k ω t dt
T
0
 j2 π f t

S(f)   s(t)  e
dt

0.5
0
0
2
4
6
8
10
12
time, t
2.5
2
1.5
Periodic
1
0.5
1
2
3
4
5
6
7
Discrete
8
time, tk
Discrete
DTFT
2.5
Continuous
Aperiodic
2
1.5
1
0.5
0
0
2
4
6
time, tk
8
10
2πk n
N
(period T)
0
0
**
DFS
j
1 N 1
~
ck   s[n]  e
N n 0
12
**
DFT
Discrete
Note: j =-1,  = 2/T, s[n]=s(tn), N = No. of samples

 s[n]  e j 2 π f n
n 
2πkn
N1

j
1
~
N
ck   s[n]  e
N
n 0
S(f) 
** Calculated using FFT
History Fourier Transform
 1669: Newton stumbles upon light spectra (specter = ghost) but fails to
recognise “frequency” concept (corpuscular theory of light, & no waves).
 18th century: two outstanding problems
 celestial bodies orbits: Lagrange, Euler & Clairaut approximate observation data
with linear combination of periodic functions; Clairaut,1754(!) first DFT formula.
 vibrating strings: Euler describes vibrating string motion by sinusoids (wave
equation). BUT peers’ consensus is that sum of sinusoids only represents smooth
curves. Big blow to utility of such sums for all but Fourier ...
 1807: Fourier presents his work on heat conduction  Fourier analysis born.
 Diffusion equation  series (infinite) of sines & cosines. Strong criticism by peers
blocks publication. Work published, 1822 (“Theorie Analytique de la chaleur”).
History Fourier Transform -2
 19th / 20th century: two paths for Fourier analysis - Continuous & Discrete.
CONTINUOUS
 Fourier extends the analysis to arbitrary function (Fourier Transform).


Dirichlet, Poisson, Riemann, Lebesgue address FS convergence.
Other FT variants born from varied needs (ex.: Short Time FT - speech analysis).
DISCRETE: Fast calculation methods (FFT)
 1805 - Gauss, first usage of FFT (manuscript in Latin went unnoticed!!!



Published 1866).
1965 - IBM’s Cooley & Tukey “rediscover” FFT algorithm (“An algorithm for
the machine calculation of complex Fourier series”).
Other DFT variants for different applications (ex.: Warped DFT - filter design &
signal compression).
FFT algorithm refined & modified for most computer platforms.
Another Space, Another Base
F(t) = sin(2π.t)
Another Space, Another Base
F(t) = sin(2π.t)
F(t) = sin(2π.2t)
F(t) = sin(2π.3t)
F(t) = cos(2π.t)
F(t) = cos(2π.2t)
F(t) = cos(2π.3t)
{ cos(2π.kt), sin(2 π.kt) }k forms a orthogonal basis
Linear Combination of Functions
F(t) = sin(2π.t)
F(t) = sin(2π.t) + sin(2π.2t)
F(t) = sin(2π.2t)
Linear Combination of Functions
F(t) = sin(2π.t) + sin(2π.2t)
F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.3t)
F(t) = sin(2π.2t)
F(t) = sin(2π.3t)
Linear Combination of Functions
F(t) = sin(2π.t) + sin(2π.2t)
F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.4t)
F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.3t)
F(t) = sin(2π.t) + sin(2π.2t) + sin(2π.30t)
Linear Combination of Functions
Phase Shift:
F(t) = sin(2π.t)
F(t) = cos(2π.t) – 2.sin(2π.2t)
Low Band Pass Filters
F(t) = sin(2π.t) + sin(2π.2t) +
<- low freq.
sin(2π.30t) + sin(2π.120t) <- high freq.
F(t) = sin(2π.t) + sin(2π.2t)
Fourier Transforms
The values h of some process can be described in the time domain:
 h(t) values as a function of time t (-∞< t <∞)
The same process can be described as amplitudes and phases (complex values) H
in the frequency domain:
 H(f) values as a function of frequency f (-∞< f <∞)
One can transform the representation in the time domain to the
representation in the frequency domain by using the Fourier
Transform equation:

2ift
H ( f )   h(t ).e

dt
And back, using the FT-equation:

h(t )   H ( f ).e

 2ift
df
Fourier Series (FS)
A periodic function s(t) satisfying Dirichlet’s conditions * can be expressed
as a Fourier series, with harmonically related sine/cosine terms.
s(t)  a0 

 ak  cos (k ω t)  bk  sin(k ω t)
k 1
For all t but discontinuities
a0, ak, bk : Fourier coefficients.
k: harmonic number,
T: period,  = 2/T
T
1
(signal average over a period, i.e. DC term &
a0    s(t)dt
zero-frequency component.)
T
0
T
2
ak    s(t)  cos(k ω t) dt
Note: {cos(kωt), sin(kωt) }k
T
0
form orthogonal base of
T
function space.
2
- bk    s(t)  sin(k ω t) dt
T
0
* see next slide
Fourier Series Convergence
Dirichlet conditions
(a) s(t) piecewise-continuous;
(b) s(t) piecewise-monotonic;
In any period:
(c) s(t) absolutely integrable ,
T

s(t) dt  
0
Example:
square wave
Rate of convergence
T
if s(t) discontinuous then
|ak|<M/k for large k (M>0)
s(t)
T
(a)
(b)
(c)
Fourier Series Analysis - 1
FS of odd* function: square wave.
2π
π

1 

a0 
   dt   ( 1)dt   0
2π 

π
0

2π
π

1 

ak     cos kt dt   cos kt dt   0
π 

π
0

(zero average)
(odd function)
2π
π

1 
2

- bk     sinkt dt   sinkt dt   ... 
  1 cos kπ  
π 
kπ

π
0

 4
 k  π , k odd

 
 0 , k even


4
4
4
sw(t)   sin t 
 sin 3  t 
 sin 5  t  ...
π
3π
5π
square signal, sw(t)
T  2π  ω  1

1.5
1
0.5
0
0
2
4
6
8
10
-0.5
t
-1
-1.5
* Even & Odd functions
s(x)
Even :
s(-x) = s(x)
x
s(x)
Odd :
s(-x) = -s(x)
x
Fourier Series Analysis - 2
Fourier spectrum
representations
s(t) 
zk = (rk , k)

bk
 vk (t)
rk
rk = ak2 + bk 2
k
k = arctan(bk/ak)
ak
k 0
Rectangular
Polar
vk = rk cos (k t + k)
vk = akcos(k t) - bksin(k t)
rk
ak
-bk
4/π
f1 2f1 3f1 4f1 5f1 6f1
rK = amplitude,
K = phase
f
fk=k /2
4/π
4/3π
f1 2f1 3f1
4f1
5f1
6f1
f
Fourier spectrum
of square-wave.
4/3π
θk
-π/2
f1
3f1
5f1
f
f1
3f1
5f1
f
Fourier Series Synthesis
Square wave reconstruction
from spectral terms
1.5
7
3
15
911
sw1
(t)
sin(kt)

(t)
sin(kt)
sin(kt)
--b-bkbkksin(kt)
5
7
3
11
9(t)
kk
k111
square signal, sw(t)
1
0.5
0
-0.5
-1
-1.5
0
2
4
t
6
8
Convergence may be slow (~1/k) - ideally need infinite terms.
Practically, series truncated when remainder below computer tolerance
( error). BUT … Gibbs’ Phenomenon.
10
Gibbs Phenomenon
1.5
sw 79 (t) 
79
 - bk  sin(kt) 
k 1
1
square signal, sw(t)
Overshoot exist at each
discontinuity
0.5
0
-0.5
-1
-1.5
0
2
4
t
6
• First observed by Michelson, 1898. Explained by Gibbs.
• Max overshoot pk-to-pk = 8.95% of discontinuity magnitude.
Just a minor annoyance.
• FS converges to (-1+1)/2 = 0 at discontinuities, in this case.
8
10
Fourier Series Time Shifting
a 0 0
(zero average)
 4
 k  π , k odd, k  1, 5, 9...


ak    4
, k odd, k  3, 7, 11...
 kπ


0
, k even.

1.5
square signal, sw(t)
FS of even function:
/2-advanced square-wave

1
0.5
0
0
2
4
6
8
10
-0.5
t
-1
-1.5
rk
4/π
4/3π
- bk  0
(even function)
Note: amplitudes unchanged BUT
phases advance by k/2.
θk
f1
3f1
5f1
7f1
f
f1
3f1
5f1
7f1
f
π
Complex Fourier Series
Euler’s notation:
e-jt = (ejt)* = cos(t) - j·sin(t)
T
1
ck    s(t)  e- j k ω t dt
T
0
s(t) 

“phasor”
e jt  e jt
cos(t) 
2
Complex form of FS (Laplace 1782). Harmonics
ck separated by f = 1/T on frequency plot.
jk ω t
c

e
k
k 
z=re
Note: c-k = (ck)*
Link to FS real coeffs.
c0  a0
ck 
e jt  e  jt
sin(t) 
2 j
1
1
 ak  j  bk    a k  j  b k 
2
2
b
r
a

j
r = a2 + b2
 = arctan(b/a)
Fourier Series Properties
Time
Homogeneity
a·s(t)
Additivity
s(t) + u(t)
Linearity
a·s(t) + b·u(t)
Time reversal
Time shifting
a·S(k)
S(k)+U(k)
a·S(k)+b·U(k)
s(-t)
S(-k)

Multiplication *
Convolution *
Frequency
s(t)·u(t)
T
1
  s(t  t )  u( t ) dt
T
0
s(t  t )
Frequency shifting e
j
2π m t
T  s(t)
 S(k  m)U(m)
m  
S(k)·U(k)
e
j
2π k t
T
 S(k)
S(k - m)
Fourier Series – ‘Oddities’
Orthonormal base
Fourier components {uk} form orthonormal base of signal space:
T
 u k , u m   u k  u*m dt
uk = (1/T) exp(jkωt) (|k| = 0,1 2, …+)
<uk,um> = δk,m (1 if k = m, 0 otherwise).
(Remember (ejt)* = e-jt )
o
Then ck = (1/T) <s(t),uk> i.e. (1/T) times projection of signal s(t) on component uk
Negative frequencies & time reversal
k = - , … -2,-1,0,1,2, …+ , ωk = kω, k = ωkt, phasor turns anti-clockwise.
Negative k  phasor turns clockwise (negative phase k ), equivalent to negative time t,
 time reversal.
Fourier Series - Power
T
1
W   s(t)
To
Average power W :
2
dt  s(t) , s(t) 
Parseval’s Theorem

1   2
2
2
W   ck  a0    ak  bk 2 


2
k 
k 1
Example
Pulse train, duty cycle  = 2 t/ T
s(t)
• FS convergence ~1/k
 lower frequency terms
Wk = |ck|2 carry most power.
• Wk vs. ωk: Power density spectrum.
2
1
Wk/W0
Wk = 2 W0 sync2(k )
10-1
2t
10-2
T
kf
10-3
t
bk = 0 a0 =  sMAX
ak = 2sMAX sync(k )
0
50
W0 = ( sMAX)2
sync(u) = sin( u)/( u)
100
150
200
 W 



W  W0  1  k 
W

 k 1 0 

Fourier Series of Some Waveforms
Discrete Fourier Series (DFS)
Band-limited signal s[n], period = N.
DFS defined as:
2π k n
N

1

j
1
~
N
ck   s[n]  e
N
n 0
~
~
Note: ck+N = ck  same period N
i.e. time periodicity propagates to frequencies!
2π k n
N1
j
s[n]   ~
ck  e N
k 0
Synthesis: finite sum  band-limited s[n]
DFS generate periodic ck
with same signal period
Orthogonality in DFS:
2π n(k -m)
N

1
j
1
N
e
 δ k,m

N
n 0
Kronecker’s delta
N consecutive samples of s[n]
completely describe s in time
or frequency domains.
Discrete Fourier Series Analysis
DFS of periodic discrete
1-Volt square-wave
s[n]: period N, duty factor L/N
L

,
k  0,  N,  2N,...

N



~
ck  
π k (L 1)
 π kL 
 j
sin 

N
N 
e


, otherwise

N
π k
sin 


 N 

s[n]
1
-5
0 1 2 3 4 5 6 7 8 9 10
0
L
N
1
ck
0.24
0.24
0.2
0.6
0.6
0.24
0.24
0 1 2 3 4 5 6 7 8 9 10
k
0.4
0.2
Discrete signals  periodic frequency spectra.
Compare to continuous rectangular function
(slide # 20)
1
~
0.6
0.6
n
k
0.4
0.2
0
2
4 5 6 7 8 9 10
-0.2
-0.4
n
-0.2
-0.4
Discrete Fourier Series Properties
Time
Frequency
Homogeneity
a·s[n]
Additivity
s[n] + u[n]
Linearity
a·s[n] + b·u[n]
a·S(k)+b·U(k)
s[n] ·u[n]
1 N1
  S(h)U(k - h)
N h0
Multiplication
Convolution
N1
a·S(k)
S(k)+U(k)
 s[m]  u[n  m]
S(k)·U(k)
m 0
Time shifting
s[n - m]
e
Frequency shifting
j
e
2π h t
T  s[n]
j
2π k m
T
S(k - h)
 S(k)
TOPICS
 DFT windows
 DFT resolution - improvement
 Efficient DFT calculation: FFT
 Hints on system spectral analysis
DFT – Window Characteristics
•
•
Finite discrete sequence  spectrum convoluted with rectangular window spectrum.
Leakage amount depends on chosen window & on how signal fits into the window.
(1) Resolution: capability to distinguish different tones. Inversely proportional to mainlobe width. Wish: as high as possible.
(2) Peak-sidelobe level: maximum response outside the main lobe.
Determines if small signals are hidden by nearby stronger ones.
Wish: as low as possible.
(3) Sidelobe roll-off: sidelobe decay
(2)
(1)
(3)
per decade. Trade-off with (2).
Several windows used (applicationdependent): Hamming, Hanning,
Blackman, Kaiser ...
Rectangular window
DFT of Main Windows
Windowing reduces leakage by
minimising sidelobes magnitude.
In time it reduces endpoints discontinuities.
Sampled sequence
Non
windowed
Windowed
Some window functions
DFT - Window Choice
Common windows characteristics
Window type
-3 dB Mainlobe width
-6 dB Mainlobe width
Max sidelobe
level
Sidelobe roll-off
[dB/decade]
Rectangular
[bins]
0.89
[bins]
1.21
[dB]
-13.2
20
Hamming
1.3
1.81
- 41.9
20
Hanning
1.44
2
- 31.6
60
Blackman
1.68
2.35
-58
60
Observed signal
Window wish list
Far & strong interfering components

Near & strong interfering components

Accuracy measure of single tone

high roll-off rate.
small max sidelobe level.
wide main-lobe
NB: Strong DC component can shadow nearby small signals. Remove it!
DFT - Window Loss Remedial
Smooth data-tapering windows cause information loss near edges.
Solution:
sliding (overlapping) DFTs.
• Attenuated inputs get next
window’s full gain & leakage
reduced.
• Usually 50% or 75% overlap
(depends on main lobe width).
Drawback: increased total
processing time.
Zero Padding
Improves DFT frequency inter-sampling spacing (“resolution”).
8
1
6
0.5
time
0
-0.5
4
2
0
4
8
12
16
0
0
-1
2
3
4
5 bin
6
bin
bin
48
12
88
1
66
0.5
0
-0.5
1
0
time
time
44
10
70 21
80 90
3 206 309 401250 1560 18
24 100
27 110
30120
00
-1
After padding bins @ frequencies f m 
22
00
m  fS
NS  L
12
3
24
6
36
9
NS = original samples, L = padded.
Zero Padding -2
DFT spectral resolution
Capability to distinguish two closelyspaced frequencies: not improved by
zero-padding!.
Frequency inter-sampling spacing:
increased by zero-padding (DFT
“frequency span” unchanged due to
same sampling frequency)
•
Zero-padding in frequency domain increases sampling rate in time domain.
Note: it works only if sampling theorem satisfied!
•
Additional reason for zero-padding: to reach a power-of-two input samples
number (see FFT).
NOTE Apply zero-padding after windowing (if any)! Otherwise
stuffed zeros will partially distort window function.
DFT - Scalloping Loss (SL)
Input frequency f0 btwn. bin
centres causes magnitude loss
We’re lucky here!
bin, k
~
~
SL = 20 Log10(|cr+kmax /ckmax|)
Worst case when f0 falls
exactly midway between 2
successive bins (|r|=½)
f0 = (kmax + r) fS/N
kmax
SL
|r|  ½
Frequency error: f = r fS/N,
relative error: R=f / f0 = r/[(kmax+r)]
R  1/(1+2 kmax)
May impact on data
interpretation (wrong f0!)
bin, k
kmax
f0
Note: Non-rectangular windows broaden
DFT main lobe  SL less severe
Correction depends on window used.
DFT - SL Example
2
DC bias correction,
Rectang. window,
zero padding, FFT
1.5
1
•
•
•
•
SL remedial
increasing N (?)
improve windowing,
zero-padding,
interpolation around kmax.
0.5
0
0
50
100
150
200
250
300
350
400
450
500
1
DC bias correction,
Hanning window,
zero padding, FFT
0.75
0.5
0.25
0
0
50
100
150
200
250
300
350
400
450
500
DFT - Parabolic Interpolation
Rectangular window
Hanning window
1.968
0.977
1.967
1.966
0.976
1.965
1.964
0.975
1.963
1.962
198
199
200
201
202
203
0.974
199
200
201
202
 Parabolic interpolation often enough to find position of
peak (i.e. frequency).
 Other algorithms available depending on data.
203
204
Efficient DFT Calculation: FFT
N1
1
~
ck   s[n]  e
N n 0
j
2π k n
N

N1
1
s[n]  WNkn

N n 0
Direct DFT calculation
redundancy
WNkn periodic function calculated many times.
WNn,k = twiddle factors
k = 0,1 .. N-1
W85,k
W86,k
W80,k
W84,k
Direct DFT calculation requires ~N2
complex multiplications.
complexity O(N2)
W83,k
W87,k
W81,k
W82,k
Algorithms (= Fast Fourier Transform) developed to compute N-points DFT with
~ Nlog2N multiplications (complexity O(Nlog2N) ).
Number of Operations
FFT Advantages
2000
1500
DFT  N2
1000
FFT  N log2N
500
N
DFT
Radix-2
4
16
4
32
1024
80
128
16384
448
1024
1048576
5120
0
0
10
20
30
Number of samples, N
40
DSPs & PLDs influenced algorithms design. ‘60s & ‘70s: multiplication
counts was “quality factor”. Now: number of additions & memory access
(s/w) and communication costs (h/w) also important.
FFT Philosophy
General philosophy (to be applied recursively): divide & conquer.
cost(sub-problems) + cost(mapping) < cost(original problem)
Different algorithms balance costs differently.
time
(*)
frequency
Example: Decimation-in-time algorithm
Step 1: Time-domain decomposition. N-points
signal  N, 1-point signals (interlace
decomposition). Shuffled input data (bitreversal). log2N stages.
(*): only first decomposition shown.
Step 2: 1-point input spectra calculation.
(Nothing to do!)
Step 3: Frequency-domain synthesis.
N spectra synthesised into one.
FFT Family Tree
Divide & conquer
N : GCD(*)(N1,N2) = 1
N1, N2 co-prime. Ex: 240 = 16·3·5
Cost: SUB-PROBLEMS.
N : GCD(N1,N2) <> 1
Ex: N = 2n
Cost: MAPPING.
• No twiddle-factors calculations.
• Easier mapping (permutations).
• Twiddle-factors calculations.
• Easier sub-problems.
• Some algorithms:
Good-Thomas, Kolba,
Parks, Winograd.
• Some algorithms:
Cooley-Tukey,
Decimation-in-time / in-frequency
Radix-2, Radix-4,
Split radix.
(*)
GCD= Greatest Common Divisor
References
Papers
1. On bandwidth, David Slepian, IEEE Proceedings, Vol. 64, No 3, pp 291 - 300.
2. The Shannon sampling theorem - Its various extensions and applications: a tutorial
review, A. J. Jerri, IEEE Proceedings, Vol. 65, no 11, pp 1565 – 1598.
3. What every computer scientist should know about floating-point arithmetic, David
Goldberg.
4. IEEE Standard for radix-independent floating-point arithmetic, ANSI/IEEE Std 854-1987.
Books
1. Understanding digital signal processing, R. G. Lyons, Addison-Wesley Publishing, 1996.
2. The scientist and engineer’s guide to digital signal processing, S. W. Smith, at
http://www.dspguide.com.
3. Discrete-time signal processing, A. V. Oppeheim & R. W. Schafer, Prentice Hall, 1999.
References - 1
Papers
1.
2.
3.
4.
5.
6.
Tom, Dick and Mary discover the DFT, J. R. Deller Jr, IEEE Signal Processing
Magazine, pg 36 - 50, April 1994.
On the use of windows for harmonic analysis with the Discrete Fourier Transform, F.
J. Harris, IEEE Proceedings, Vol. 66, No 1, January 1978.
Some windows with a very good sidelobe behaviour, A. H. Nuttall, IEEE Trans. on
acoustics, speech and signal processing, Vol ASSP-29, no. 1, February 1981.
Some novel windows and a concise tutorial comparison of windows families, N. C.
Geckinli, D. Yavuz, IEEE Trans. on acoustics, speech and signal processing, Vol
ASSP-26, no. 6, December 1978.
Study of the accuracy and computation time requirements of a FFT-based
measurement of the frequency, amplitude and phase of betatron oscillations in LEP,
H.J. Schmickler, LEP/BI/Note 87-10.
Causes et corrections des erreurs dans la mesure des caracteristiques des
oscillations betatroniques obtenues a partir d’une transformation de Fourier, E. Asseo,
CERN PS 85-9 (LEA).
References - 2
7.
8.
9.
Precise measurements of the betatron tune, R. Bartolini et al., Particle Accel., 1996,
vol. 55, pp 247-256.
How the FFT gained acceptance, J. W. Cooley, IEEE Signal Processing Magazine,
January 1992.
A comparative analysis of FFT algorithms, A. Ganapathiraju et al., IEEE Trans.on
Signal Processing, December 1997.
Books
1.
2.
3.
4.
5.
The Fourier Transform and its applications, R. N. Bracewell, McGraw-Hill, 1986.
A History of scientific computing, edited by S. G. Nash, ACM Press, 1990.
Introduction to Fourier analysis, N. Morrison, John Wiley & Sons, 1994.
The DFT: An owner’s manual for the Discrete Fourier Transform, W. L. Briggs, SIAM,
1995.
The FFT: Fundamentals and concepts, R. W. Ramirez, Prentice Hall, 1985.