European SR Engine for Navi

Download Report

Transcript European SR Engine for Navi

Acoustic transduction
Katedra Multimediów
 Speech sounds - rapid variations of air pressure and velocity around their
normal values
 sound field - variation of air density and pressure are functions of time and
space and propagate as acoustic wave
 let assume the air to be homonogeus in a room
 speed of acoustic wave propagation depends on temperature (in K):
c  331.45
T
m/ s
273
 wave equation describes propagation of sound, if pressure is represented by a
2
2
2
2
scalar field p(a,t), a=[x y z]T


 p  p  p 1  p(a , t )
 2 p(a , t )  2  2  2  2
 x  y  z c
t 2
Krzysztof Marasek
Summer 2002
PJWSTK
Wave propagation (2)
 one of the solutions of wave equation is the monochromatic plane wave of

frequency f=w/2P
j (w t k a )
Katedra Multimediów

p(a, t )  Ae
 where A is the wave amplitude and k=[kx,ky,kz]T is the wavenumber vector and
has a direction normal to the propagating wavefront.
 Distance l2P/|K|c/f is called wavelength and describes spatial period of
propagating wave
 in spherical coordinates (r,f,q) sound pressure depends only on the distance r
from the source
 2 p 2p 1  2 p

 2 2 ,
2
r
rr c t
p(r , t ) 
A jw (t r / c )
e
r
 any sound field can be expressed as superposition of elementary plane
and spherical waves
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Formants
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Room acoustics
 Reflections from surfaces, diffusion and diffraction by objects inside the room reverberation effect
 T60 - reverberation time, defined as the time needed for the acoustic power of
the signal to decay by 60 dB after sound source is abruptly stopped
 T60 is nearly independent from the listening position in given enclosure, it can be
approximated by Sabine formula:

V
aS
T60  0.163
where V is room volume in m3, S is total surface area
of the room in m2 and a is the average absorption coefficient of the surfaces
 reverberation times up to 1 s (for frequencies 500-1000 Hz) do not cause any
loss in speech intelligibility
 impulse response h(t): described the path between source and receiver, all
reflections
 early reflections - perceived if delay > 50 ms, shorter perceived as part of the
direct sound
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Room acoustics (2)
 speech intelligibility: “Deutlichkeit” index, centre of gravity, modulation index
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Room Impulse Response
 Simplest method: apply impulse excitation and observe the response of the
system: balloon popping, gunshots, but it may not guarantee SNR and flat
frequency response, also overload possible
 to overcome these difficulties: excitation using maximum length pseudo-random
sequences (Schroeder, 1979) - flat spectrum, auto-correlation of the sequence
of length L becomes a close approximation of delta function when L is large:
 then the room impulse response can be simply obtained by reproducing the
acoustic signal corresponding to the sequence and then by simply crosscorrelating the excitation sequence p(n) with the signal y(n) acquired by the
1 / L if k  0
sensor
fp  {
 sound ray conceptdiffracted by edges,
scattered by small obstacles
Krzysztof Marasek
Summer 2002
L if
k 0
PJWSTK
Impulse response measurement
Katedra Multimediów
 How can it be measured?
reference
microphone
preamplifier
adapter
‘speech’ mic
signal source (tape)
to recording
equipment
preamplifier
reference mic
small active loudspeaker box
Speecon,2001
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Microphones
 Converts the acoustic energy of sound into a corresponding electrical
energy; usually realized with a diaphragm whose movements are
produced by sound pressure and vary the parameters of an electrical
system (resistance, capacity, etc)
 characterized by
 frequency response (flatness in speech sounds range)
 signal-to-noise ratio (SNR)
 impedance (better if low, connected to low impedance amplifier gives
lower hum and electrical noise), usually specified for 94 dB SPL
 sensitivity: output voltage (in milivolts) or power (in dBm)
 directional pattern: cardioid (supercardioid, hyper-, shotgun, etc),
bidirectional (figure of eight) or omni-directional (circle)
 mountings: hand-held, head-mounted, table stand (desk-top), Lavalier
 Small or big diaphragm
Microphone polar response
0 dB SPL=0.0002 mbar (threshold of hearing ; 0dBm corresponds to 0dB referenced to 1mW
Krzysztof Marasek
Summer 2002
PJWSTK
Microphones: basic transduction categories

Katedra Multimediów

Passive: converts directly sound to electrical energy, active: needs
additional energy source (battery, phantom power)
electromagnetic and electro-dynamic microphones:
 ribbon - duralumin ribbon moving in permanent magnetic field
 moving-coil- inverse of loudspeaker, bigger than ribbon, thus higher voltage
induced
 widely used, good frequency and transient response, moderate cost
 rather old

electrostatic microphones:
 condenser: capacitor with dielectric inside, one of plates can move, prepolarization needed, very high output impedance; excellent frequency and
transient response, low distortion
 electret: with built-in pre-polarization condenser (100 V), power supply needed,
good frequency and transient response, low distortion, but lower dynamic range
and sensitivity as for condenser m.

piezoresitive and piezoelectric microphones:
 variation of resistance
 carbon: small cylinder with granulates of carbon - by vibrations granules can
separate, changing the electric resistance of cylinder;low quality
 crystal and ceramic: Rochelle salt - the same principle like carbon mike; low
quality

special microphones: pressure-zone (PZM, for speech reinforcement), pressuregradient microphone (for directional acquisition), noise-canceling, micro-mechanical
silicon microphones, optical wave-guide
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Ribbon microphones
 Principle of work: duralumin ribbon
moving in permanent magnetic field
 Could be very good and expensive:
(Royer labs)
 Features:
 Very high overload characteristics – max
SPL > 135 dB
 Extremely low noise
 Absence of high frequency phase
distortion
 Excellent phase linearity
 Equal sensitivity from front/back
 Consistent frequency response regardless
of distance
 No power supply required
 Strong proximity effect
 Strong wind effects
Krzysztof Marasek
Summer 2002
PJWSTK
Moving coil
Katedra Multimediów
 A moving-coil microphone contains a diaphragm exposed to sound waves. The
diaphragm carries a coil placed in the magnetic field. The voltage induced in the
coil is proportional to its amplitude of vibration, which, in turn, depends on the
sound pressure.
 Moving coil microphones are cheap and robust making them good for the rigors
of live performance and touring. They are especially suited for the close micking
of Bass and Guitar speaker cabinets and Drum kits.
 They are also good for live vocals as their resonance peak of around 5kHz
provides an inbuilt presence boost that improves speech/singing intelligibility
 However the inertia of the coil reduces high frequency response. Hence they are
NOT best suited to studio applications where quality and subtlety are important
such as high quality vocal recording or acoustic instrument micking
Krzysztof Marasek
Summer 2002
PJWSTK
Condenser microphone
Katedra Multimediów


A condenser microphone incorporates a stretched metal diaphragm
that forms one plate of a capacitor. A metal disk placed close to the
diaphragm acts as a backplate. When a sound field excites the
diaphragm, the capacitance between the two plates varies
according to the variation in the sound pressure. A stable DC
voltage is applied to the plates through a high resistance to keep
electrical charges on the plate. The change in the capacitance
generates an AC output proportional to the sound pressure. In order
to convert ultralow-frequency pressure variations, a high-frequency
voltage (carrier) is applied across the plates. The output signal is
the modulated carrier.
Condenser microphone. AP = acoustic pressure,
Are the best, need
C = variable capacitance, 1 = metal diaphragm, 2
Krzysztof Marasek
Summer 2002
= metal disk, 3 = insulator, 4 = case.
PJWSTK
Electret microphone
Katedra Multimediów
 An electret-type microphone is a condenser microphone in which the electrical
charges are created by a thin layer of polarized ceramic or plastic films
(electrets). The ability of the electrets to keep the charge obviates using the
source for a high-voltage polarization
Electret-type microphone. AP = acoustic pressure, Uo = output voltage, 1 = diaphragm, 2 = electret, 3 = case.





Output impedance is relatively high (typically about 1k to 5k)
Signal output is limited (relatively low sensitivity)
Noise is relatively high
Sound level handling ability is low (typically < 90dB SPL)
They are normally available from retail outlets very cheaply
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Piezoresistive mics
 In a carbon-button microphone, the sound field acts upon an electroconductive
diaphragm that develops pressure on a packet of carbon granules. The contact
resistance between the granules depends on the pressure. When a DC voltage
is applied across the packet, the alternating resistance produces an AC voltage
drop, which is proportional to the sound intensity.
Krzysztof Marasek
Summer 2002
Carbon-button microphone. AP = acoustic
pressure, R = variable resistance,
1 = electroconductive particles, 2 =
diaphragm, 3 = electrode.
PJWSTK
Katedra Multimediów
Microphone arrays
 Selective acquisition of speech in spatial domain, detection, tracking and
selective acquisition of speaker automatically
 beamforming: spatial filtering: filtering and sum approach: compensate for
difference in path length from source to each of the microphones
delay in time domain  linear phase shift in frequency domain
 dereverberation, talker location - time difference of arrival,power field scanning,
MUSIC
Krzysztof Marasek
Summer 2002
z (t )   wn sn (t   n )
Z ( f )   wn S m ( f )e  j 2 Pf  n
z (t )   wn hn (t )  sn (t   n )
PJWSTK
Katedra Multimediów
Microphones in speech recognition





Training and testing condition mismatch: the same microphone preferred
microphone normalization - multichannel recording and matching of signals
noise canceling head-set preferred in ASR, but users don’t like this
room acoustic influence on recording and ASR
ASR in car:
 non-homogenous acoustic environment - dependence on microphone position




Speecon project: consumer devices environment
gradient microphones in adverse condition: aircraft cockpit
feature selection: filtering
cochlear model and binaural processing: special microphones and filtering
methods
 use of microphone arrays
 active noise cancelling: new buzzword
Krzysztof Marasek
Summer 2002
PJWSTK
Katedra Multimediów
Krzysztof Marasek
Summer 2002
PJWSTK