Transcript (770k ppt)

What can we hear?
James D. Johnston
home.comcast.net/~retired_old_jj
[email protected]
3/23/2002
Copyright James D. Johnston 2003. Permission granted for any
educational use.
1
How do our ears work?
And what can we detect in a
natural soundfield?
3/23/2002
Copyright James D. Johnston 2003
2
How One Ear Works - Short
Form! We aren’t going to talk
about binaural today.
(this being the short-form of what should occupy a
semester’s examination and discussion)
The ear is usually broken into 3 separate parts, the
outer, middle, and inner ears. The outer ear consists
of the head, the pinna, and the ear canal. The middle
ear consists of the eardrum, the 3 small bones, and the
connection to the cochlea. Finally, the inner ear consists
of the cochlea, containing the organ of corti,
basilar membrane, tectoral membrane, and the
associated fluids and spaces.
3/23/2002
Copyright James D. Johnston 2003
3
The Outer Ear
The outer ear provides frequency directivity via shadowing,
shaping, diffraction, and the like. It is different (by enough
to matter) for different individuals, but can be summarized
by the “Head Related Transfer Functions” (HRTF’s) or
“Head Related Impulse Responses”(HRIR’s) mentioned in the
literature, at least on the average or for a given listener.
The HRTF’s or HRIR’s are ways of determining the effect on
a sound coming from a given direction to a given ear.
The ear canal inserts a 1 octave or so wide resonance at about
1 to 4 kHz depending on the individual.
3/23/2002
Copyright James D. Johnston 2003
4
The Middle Ear
The middle ear carries out several functions, the most
important of which, for levels and frequencies that
are normally (or wisely) experienced, is matching the
impedence of the air to the fluid in the cochlea.
There are several other functions related to overload
protection and such, which are not particularly germane
under comfortable conditions.
The primary effect of the middle ear is to provide a 1-zero
high pass function, with a matching pole at approximately
700Hz or so, depending on the individual.
3/23/2002
Copyright James D. Johnston 2003
5
The Inner Ear
A complicated subject at best, the inner ear can be thought of
as having two membranes, each a travelling wave filter,
one a high-pass, and the other a low-pass filter. Between the
two membranes are two sets of hair cells, the inner hair cells,
and the outer hair cells. The inner hair cells are primarily
detectors. They fire when the movement of the two membranes
are different. The outer hair cells are primarily a system that
controls the exact points of the very steep low pass filters and
high pass filters. The outer hair cells can polarize and depolarize,
and change both their length and stiffness. This polarization is
how they affect the relative tunings of the two membranes.
3/23/2002
Copyright James D. Johnston 2003
6
Outer Hair Cells Fully
Polarized
frequency
Outer Hair Cells Fully
Depolarized
3/23/2002
Copyright James D. Johnston 2003
7
An example (not a human subject)
3/23/2002
Copyright James D. Johnston 2003
8
The exact magnitude and shape of those
curves are under a great deal of discussion and
examination, but it seems clear that, in fact, the
polarization of the outer hair cells creates the
compression exhibited in the difference between
applied intensity (the external power) and the
internal loudness (the actual sensation level
experienced by the listener).
There is at least 60dB of compression available.
Fortunately, the shape of the resulting curve does
not change very much, except at the tails, between
the compressed and uncompressed state, leading to
a set of filter functions known as the cochlear filters.
3/23/2002
Copyright James D. Johnston 2003
9
Critical Bands and
Cochlear Filters
The overall effect of this filter structure is time/frequency
analysis of a particular sort, called critical band (Bark
Scale) or effective rectangular bandwidth (ERB)
filter functions. Note that this
is not a set of filters, but rather a continuous set of filters,
with lower and higher bandwidths varying according to
the center frequency.
Roughly speaking, critical bandwidths are about 100Hz up to
700Hz, and 1/3 octave thereafter. ERB’s are usually
a bit narrower, especially at higher frequencies.
3/23/2002
Copyright James D. Johnston 2003
10
A discussion of which is right, and which should be used,
is, by itself, well beyond the range of a one-hour seminar.
The basic point that must come out of this discussion is that
the sound arriving in an ear will be analyzed in something
approximating 100Hz bandwidth filters at low frequencies,
and at something like 1/3 octave bandwidths at higher
frequencies, and that the system will detect either the
signal waveform itself (below 500Hz) or the signal
envelope (above 4000 Hz), or a bit of both (in the
range between 500Hz and 4000 Hz). Exactly what is
detected is likewise, by itself, well beyond a one hour
seminar, and furthermore, a consensus is yet to emerge.
3/23/2002
Copyright James D. Johnston 2003
11
3/23/2002
Copyright James D. Johnston 2003
12
For a given cochlear filter bandwidth, there is a corresponding
time width of the main lobe of the filter. For the auditory
system, these filter lengths vary approximately by a factor of
40:1, from the range of 10 milliseconds down to .25
millisecond.
This means that at low frequencies, the time resolution available
to the ear is quite poor, but that at high frequencies, it is quite
accurate, on the order of a dozen or so samples at 48kHz.
Over any time extent longer than this, the ear, due to the
compression effects of the ear, can not be considered a linear
transducer. This can create problems, such as pre-echo, in
filterbanks or even in simple filters under some situations.
3/23/2002
Copyright James D. Johnston 2003
13
2.25kHz filter
750Hz Filter
3/23/2002
Copyright James D. Johnston 2003
14
Schematic Cochlea
Audio In
HF
F
I
L
T
E
R
S
.
.
.
.
LF
Feedback
3/23/2002
Copyright James D. Johnston 2003
D
E
T
E
C
T
O
R
S
Auditory
Nerve
CNS
Feedback
15
How about the detectors?
• Below 500Hz, the detectors fire on the
positive going edge of the filtered
waveform.
• Above 2kHz, the detectors fire
synchronously with the ENVELOPE of the
filtered waveform
• Between 500Hz and 2kHz, the detectors
function on a mix of the two mechanisms.
3/23/2002
Copyright James D. Johnston 2003
16
What does this mean, in practical
terms.
• Below 500Hz, distorting the waveform itself, and
moving zero-crossings of the filtered waveform
(to to distortion, phase shifts, etc) will be audible.
• Above 2kHz, the same effects happen on the
signal envelope. Again, phase shifts can radically
change the signal envelope, as can distortions.
• Between 500Hz and 2kHz, both mechanisms will
operate to some extent, with each favored toward
its end of the frequency spectrum.
3/23/2002
Copyright James D. Johnston 2003
17
So?
• At low frequencies, don’t change zero
crossings or the signal waveform.
• At high frequencies, don’t change the signal
envelope.
• Things like jitter, distortions, and phase
shift can cause either of these problems.
3/23/2002
Copyright James D. Johnston 2003
18
What are the hard level limits?
• The atmosphere, due to the discrete nature of air
molecules, has a noise level. At the eardrum, it is
approximately white noise at a level of 6dB SPL.
• The ear’s lowest detection level is about -6dB
SPL, which nearly matches the energy in the
critical band near the ear canal resonance due to
basic atmospheric noise.
3/23/2002
Copyright James D. Johnston 2003
19
Fletcher’s loudness plot goes here.
(From Fletcher)
3/23/2002
Copyright James D. Johnston 2003
20
What about the loud end of things?
• Anything above 120dB SPL is bad for the
auditory system.
• Anything above 140dB SPL is in a regime where
the atmosphere is very nonlinear. Some signals
(percussion, natural sounds, shuttle takeoffs) may
reach these levels.
• More than 70-80dB of instantaneous dynamic
range across frequency in a 20 millisecond period
is approximately the largest spectral tilt that is
audible.
3/23/2002
Copyright James D. Johnston 2003
21
What does extreme loudness mean?
• 194dB SPL in a sine wave represents a sine wave
that goes from zero to two atmospheres. This can
not be physically realized.
• Above that level, the proper term is “shock wave”,
as the air is propagating in a very nonlinear
fashion.
• 32 bits of uniform PCM dynamic range takes us
from the noise level of the atmosphere (6dB SPL)
to 198dB SPL, or 4dB above 1 atmosphere. This
level is usually experienced in catastrophic
military situations.
3/23/2002
Copyright James D. Johnston 2003
22
What about high and low
frequencies?
• Frequencies in the lowest audio octave are sensed
substantially by the body. The hearing apparatus has a
high-pass filter, which is fortunate, because otherwise the
“weather” would be deafeningly loud.
• 20kHz is not a firm “cutoff” for human hearing. Children
appear to hear above 20kHz, as do some teens who haven’t
been noise-exposed.
• Age and noise exposure reduce high-frequency hearing
ability.
• At high power levels, ultrasonic signals are perceived on
the skin. These levels are approached in sonar and the like,
however the only musical occurrences may be from
percussion, and at a close range.
3/23/2002
Copyright James D. Johnston 2003
23
What about nonlinear effects?
• The ear analyzes on a time-scale much like
that of the cochlear filters. If a long-term
signal or signal-processing process is longer
than the shortest cochlear filter, the effects
of the nonuniform time/frequency scaling
and detection must be considered.
3/23/2002
Copyright James D. Johnston 2003
24
An extreme example, pre-echo in
audio codecs.
3/23/2002
Copyright James D. Johnston 2003
25
A potential, but unproven, issue with
pre-echo.
3/23/2002
Copyright James D. Johnston 2003
26
Some conclusions
• Audible effects must be considered as
analyzed by critical band filters. These
filters determine both time and frequency
sensitivity to artifacts.
• Altering waveform at low frequencies, or
signal envelope at high frequencies, will
create audible differences.
3/23/2002
Copyright James D. Johnston 2003
27
• 0dB SPL is a more than reasonable minimum level
for presentations. More low-level response is only
useful before the ear is involved.
• Recording engineers may meet levels peaking
above 150dB or so, but they may not be either
accurately recordable or reproducible.
• 20kHz is a reasonable limit for adult human
beings, but is not a “hard limit”. An young
individual may be able to hear above 20kHz.
Other sensory modes are not generally active at
high frequencies at levels that we hope to be
exposed to.
3/23/2002
Copyright James D. Johnston 2003
28
• A variety of nonlinear effects may create
audible differences due to small time or
frequency changes in signals.
• In general, the farther removed from the
original frequency that an artifact occurs,
the more audible it will be, if it creates
sensation or changes sensation at a point
where signal energy on the basilar
membrane is lower.
3/23/2002
Copyright James D. Johnston 2003
29