Perception of mid frequency and high frequency intermodulation

Download Report

Transcript Perception of mid frequency and high frequency intermodulation

Perception of mid frequency and high
frequency intermodulation distortion in
loudspeakers, and its relationship to
high-definition audio.
(A physicist meets the twilight zone)
David Griesinger
Lexicon
3 Oak Park
Bedford, MA 01730
Why Bother?
• What is the relationship between high frequency intermodulation
distortion and recordings with frequency response above 20kHz?
• Why do my choral recordings sound fuzzy – particularly when played
at high level?
– And does the perception of “fuzziness” grow more obvious with long-term
listening?
• Is the short-term A/B test the ultimate “Gold Standard” for audio
reproduction?
– Is it possible that long-term listening can reveal flaws that go undetected
in a short-term A/B test?
• This talk will examine the physics and physiology that these these
questions involve.
– We may even suggest a few answers!
High Frequency Intermodulation
Distortion and Ultrasonic reproduction
• THE essential paper on this subject is: Karou and Shogo
“Detection of Threshold for tones above 22kHz.” –
Convention paper 5401 presented at the 110th Convention,
May 12-15 2001, Amsterdam.
– The authors presented 13 subjects with a test signal consisting of a
2kHz tone combined with odd order harmonics, both sonic and
ultrasonic.
•
The ultrasonic harmonics were switched on and off at a 2Hz rate.
– ALL subjects could discriminate the ultrasonics when the
combined signal was presented through a single loudspeaker.
– NONE of the subjects could discriminate the ultrasonics when
each ultrasonic harmonic was reproduced from a separate speaker.
Spectrum at the listener position – from
Karou and Shogo
Note that an essential feature of this experiment is that the stimulus harmonics
are all ODD. Asymmetric non-linearity produces both odd and EVEN
harmonics – which are immediately visible (and audible.)
Karou and Shogo’s result is strong
• Their choice of source signal MAXIMIZES the (possible)
audibility of an ultrasonic signal.
– The sound pressure of the ultrasonic harmonics are equal to the
sound pressure of the harmonics below 20kHz.
• For almost all common sound sources the ultrasonics are weaker.
• Thus if the ultrasonics are perceived directly through some effect of
their presence on nerve firings for the lower harmonics, this signal
should produce a positive result.
– The use of only odd harmonics for the source signal maximizes the
chance that the ultrasonics will be perceived if ANY part of human
physiology is (asymmetrically) non-linear.
• The basilar membrane is inherently asymmetrically non-linear. The
hair cells are half-wave rectifiers.
• The probability of finding asymmetric non-linearities in other parts
of the system is large.
There was a null result!
• When the EXTERNAL non-linearities were
eliminated, there was NO ultrasonic perception.
– We can conclude that:
• The mechanical conduction of ultrasonics to the basilar
membrane is either effectively zero, or symmetrically linear.
• If significant ultrasonic energy actually reaches the basilar
membrane, there are NO hair cells that respond to it.
• We will present evidence from other experiments
that support these conclusions.
DG experiment – Modulated Harmonics
• We wanted to measure the distortion generating mechanism observed
by Karou and Shogo by inducing distortion in common tweeters.
• Seemed like a simple experiment:
–
–
–
–
Make some sweep signals in MATLAB
Record a few common instruments with a B&K 4133 microphone
Filter out the frequencies below 20kHz
Play them back at various levels, and listen.
ENTER – the TWILIGHT ZONE
• All I needed was a good quality sound card that would record and
playback 96kHz.
– Many manufacturers seem to offer such a thing at reasonable prices. I
chose one by the most popular manufacturer… let’s call them “C”.
• After the usual frustrating two hours to purchase the board, disassemble the
computer, install the drivers, call customer service when the drivers crash,
download new drivers from the web, etc,
• Although you could set the device to record at 96kHz, signals above 23kHz
would not record.
• Matlab sweeps generated at 96kHz would not play back above 23kHz.
• More calls to customer service. Typical conversation: What interrupts is the
card using? What video card are you using? What operating system are you
using? (Windows 2000 pro.) OH! That’s your problem – try XP.
– So I decided to bite the bullet and upgrade to XP. I also upgraded to the
most expensive version of this manufacturer’s sound boards.
• But the XP upgrade took longer than expected at the computer shop.
More twilight
• I decided to buy a USB based device that did 96/24, as this
would be useful for demos from my laptop – such as at this
lecture.
– But the drivers would not load in my regular laptop, and the
customer service had no advice at all.
– I tried a different laptop. The drivers loaded fine. Same operating
system. (Windows ME.)
– But this external board would also not record or play above 23kHz.
– Customer service was again not very helpful. “What interrupts is
the USB service routine using? You are actually looking at the
output with an oscilloscope!?”
– It’s a good thing I am sometimes known as “Mr. Matrix!”
A bit of light
• So I just kept calling customer service until I got an
operator who was willing to answer the question “Are
ANY of your boards actually capable of operating at
96kHz?”
– After a great deal of waiting on hold the answer came back.
– “NO”
– “In fact, none of our competitors do either… We all have 96/24
converters, but we run them at 48kHz.”
– What do you suggest that I do…
• “Don’t buy consumer – go to pro.”
– So I did. No problem, just much more money.
So – finally to the experiments
• C language program was written to generate a twin-tone frequency
sweep, broken into tone bursts (so as not to burn out the tweeter.)
• The sweep is repeated with 6dB increases in level, so the level
dependence of any distortion could be measured.
Spectrum of the burst signal
• The two frequencies are relatively constant during each burst, but
sweep slowly upwards in frequency.
• The start frequencies and end frequencies of the sweep in each tone
can be specified, along with the sweep rate.
• This allows different types of non-linearity to be tested.
Result – nothing significant is heard.
• Subharmonics of the burst signal can be heard with
difficulty…
– In a quiet room…
• When the ultrasonic level is above 80dB SPL at one meter the
subharmonics are below 30dB SPL, at 15cm from the tweeter.
– The levels of the sub harmonics are consistent with the distortion
in the amplifier - ~0.1 %.
• The LOUDSPEAKER seems blameless.
– 4 different loudspeakers with different tweeters were tried, with the
same result.
– The observed harmonics were produced by the amplifier.
DG experiment 2: Rattling Keys
•
•
A set of three house keys on a plastic key ring were shaken in front of a B&K
4133 microphone, and the output was recorded at 96kHz.
The resulting signal has an enormous crest factor: 28dB.
– This means it is ~16dB quieter than non-compressed music with the same peak
signal voltage.
•
And very high ultrasonic content.
Keys – spectrum
Experiment:
• Reproduce the signal with and without the
ultrasonic component.
This was done by low-pass filtering the original
signal at 20kHz, and alternating it with the full
bandwidth signal.
Reproduce only the components above 20kHz
and listen for ANY audible sound.
Keys – ultrasonics
• Waveform of the frequencies above 20kHz – same scale as
previous slide.
Result – the same as for bursts
• No difference could be heard with and
without the ultrasonics
– (but the one subject was rather old…)
• When the ultrasonic signals only were
played at high levels, intermodulation
products from the input signals were easily
heard
- at levels consistent with amplifier distortion.
Conclusions from DG ultrasonic
tests
• The various tweeters tested – 3 metal dome
tweeters and one soft dome tweeter – produce
insignificant amounts of intermodulation products
below 20kHz when driven by ultrasonic signals.
• Amplifier distortion can produce distortion
products below 20kHz that are audible (with
difficulty) in the absence of other signals below
20kHz.
– But with a high quality amplifier these distortion
products are not audible in the presence of even
extraordinary ultrasonic sources such as rattling keys.
– Unless the amplifier is driven into clipping.
Ultrasonic content of musical
instruments
• Trumpet – spectrum of the note with the highest harmonics:
Sopranino Recorder
• Spectrum of highest note – 3200Hz
Sopranino Recorder 2
• Although the highest note of the particular sopranino
recorder I own produces ultrasonic harmonics –
• These harmonics are AT MAXIMUM 40dB below the
level of the fundamental.
– Compare this to the levels used by Karou and Shogo, where the
ultrasonic harmonics were equal in level to the fundamentals
• Notice also that both even and odd harmonics are present
in the sopranino.
– So any even order harmonic products will be masked.
Percussion
• DG lacks a home drum set. So he went looking for recordings of
drums…
• MORE TWILIGHT ZONE
• I many samplers and examples of SACD and DVD audio disks.
• Very few had any popular music that was not resampled from 48kHz.
• So I borrowed three more samplers and five SACD disks from John
Newton.
– None of the popular music samples had anything at all above 23kHz.
SACD examples – Sting
Steely Dan – two against nature
Diana Krall
Jazz at 192kHz – test DVD – (the spot with
the highest harmonics)
Notice the ultrasonic harmonics are lower than the fundamentals by more than
42dB.
John Eargle, Schnittke SACD
Note the ultrasonic harmonics disappear into the SACD noise at about 27kHz. Remember
that the SACD noise is believed by everyone to be inaudible.
Timing accuracy and information theory.
• It is widely believed that the assumed superiority of DVD audio and
SACD is improved “resolution” due to improved “timing accuracy.”
• It is well known that human binaural hearing can distinguish timing
differences between the ears of as little as two microseconds
– This is often taken to imply that the frequency response of the
physiological system must extend to 500kHz.
• However it is easy to demonstrate that a 1kHz sine tone – modulated
with a raised cosine – can be accurately localized, even though the
waveform contains NO frequencies above 1400Hz!
Bandwidth and Signal to Noise Ratio
•
In Physics, the accuracy of timing is not determined by the bandwidth, but
roughly by the product of the bandwidth and the signal to noise ratio.
– Audio systems have low bandwidth but very high signal to noise ratio.
•
The hair cells in the basilar membrane fire when movement causes an ion
channel to open.
– Firings maximize at positive zero crossings of the membrane motion
Timing difference and signal to noise
•
The timing difference between two waveforms can be determined as long as
the signal to noise ratio is high enough to allow accurate determination of the
zero crossing.
Sampling Theory and Timing
• Sampling theory proves that as long as the sampling rate is at least twice
the bandwidth of a signal ALL the information content of that signal will
be retained after sampling, and can be exactly reconstructed.
Sampling Theory and interchannel Timing
• The timing of any zero-crossing is exactly preserved if SR > 2*BW. Extra
samples are wasted.
Sampling and reconstruction in practice
•
•
•
10 or more years ago it was difficult to sample signals with sufficient accuracy
to approach the theoretical ideals.
Similarly, reconstruction with practical D/A converters and filters could cause
(barely) audible artifacts.
The artifacts present in practical converters were reduced at higher sampling
rates.
– So raising the sampling rate above 48kHz seemed reasonable for the highest quality
audio.
•
Presently integrated A/D and D/A converters are available that sample and
reconstruct signals to the theoretical limits (at 18 or 20 bit accuracy).
•
These converters are inexpensive and in wide use. For these converters, there
is no advantage to higher sampling rates – unless we can prove that ultrasonic
frequencies somehow contribute to human perception.
– The author is unaware of any experiment meeting double blind standards that
supports this claim.
Absolute timing and Human Physiology
•
“The human brain is a computer of great sophistication and complexity, with a
clock frequency of 1kHz.” (Hiroshi Riquimaroux)
•
The author is unaware of any experiment that shows musical timing accuracy
in speech or music that is better than about 1ms.
Physiological processes do exist in binaural hearing with interchannel timing
accuracies down to 2 microseconds, but this is not the same thing as long term
timing accuracy over fractions of a second.
These interchannel timing differences are exactly preserved at common
sampling rates.
•
•
•
Hiroshi’s comment pretty much sums it up!
Ultrasonic Directivity
•
The directivity of a tweeter depends on the diameter of the diaphragm and the
frequency. As written in Matlab, using a Bessel function of order 1:
–
–
–
If a is the diaphragm diameter,and lambda is the wavelength,and p is the sound pressure,
mu = 2*pi*a/lambda;
p = 2*besselj(1,mu*sin(theta))./(mu*sin(theta));
To actually hear
ultrasonics the listener
must be very carefully
aligned with the driver,
both horizontally and
vertically.
This precision of
alignment is unlikely in
music listening
From Philip Morse: Vibration
and Sound, Second edition,
McGraw-Hill, 1948
The frequency transmission of the pinnae and
middle ear
From: B. C. J. Moore, B. R.
Glasberg and T. Baer, “A model
for the prediction of thresholds,
loudness and partial loudness,” J.
Audio Eng. Soc., vol. 45, pp.
224-240 (1997).
Note that the external pinnae structures and the mechanics of the middle ear
severely attenuate sound transmission above 4kHz. The slope of this curve would
predict a transmission factor of –40dB at 30kHz
Conclusions for “High Definition” Audio:
•
•
•
•
•
•
•
Adding ultrasonics to a recording technique does NOT improve time
resolution of typical signals – either for imaging or precision of tempo. The
presumption that it does is based on a misunderstanding of both information
theory and human physiology.
Karou and Shogo have shown that ultrasonic harmonics of a 2kHz signal are
NOT audible in the absence of external (non-human) intermodulation
distortion.
Their experiments put a limit on the possibility that a physiological nonlinearity can make ultrasonic harmonics perceptible. They find that such a
non-linearity does not exist at ultrasonic sound pressure levels below 80dB.
All commercial recordings tested by the author as of 6/1/03 contained either
no ultrasonic information, or ultrasonic harmonics at levels more than 40dB
below the fundamentals.
Our experiments suggest that the most important source of audible
intermodulation for ultrasonics is the electronics, not in the transducers.
Some consumer grade equipment makes a tacit admission of the inaudibility of
frequencies above 22kHz by simply not reproducing them. Yet the advertising
for these products claims the benefits of “higher resolution.”
Even assuming ultrasonics are audible, loudspeaker directivity creates an
unusually tiny sweet spot, both horizontally and vertically.
Mid-Frequency Intermodulation distortion
or “why do my loudspeakers sound fuzzy?
• Problem:
– Loudspeaker reproduction of massed chorus and orchestra is often
perceived (by the author) as harsher than the live chorus.
– The degree of harshness seemed to depend on the loudspeaker
type, and on the playback level.
• It seemed worthwhile to investigate whether this
perception could be related to intermodulation distortion.
The validity of short duration A/B tests
• Floyd Toole has established a protocol for loudspeaker evaluation that
allows rapid comparison of two loudspeakers in the identical acoustical
location.
• Tests using this protocol have proven to be reliable, in that they
consistently rank-order loudspeakers in a way that is robust for
different listeners and for repeated tests.
• But it is not clear that rapid A/B tests are the ideal way to test for
intermodulation distortion, which may require a period of time to be
perceived.
– In a rapid A/B test the primary perception is loudness.
– Once loudness has been controlled, differences in frequency response and
timbre dominate the result.
• Human hearing adapts to errors in spectrum and timbre over a period
of 10 to 20 minutes.
– Even an old fashioned phonograph sounds pretty good once you get used
to it!
– And many of the monitoring loudspeakers in common use are severely
colored. The people who use them say they sound fine.
• Is it possible that after adapting to spectrum and timbre
intermodulation problems might become more apparent?
Test Signals
We wanted test signals that would mimic the levels and frequencies found in
choral and orchestral music.
This music typically has fundamentals in the range of 100Hz to 1000Hz, with
substantial high-order harmonics.
The mix of fundamentals and harmonics are related by common musical intervals.
A sweep signal consisting of two or more pitches was chosen, where the pitch
intervals was held constant. Preliminary tests showed that the harmonic
content of the chosen pitches did not significantly affect either the measured or
perceived amount of intermodulation distortion
Eventually a minor triad was chosen as a test signal. The signal consists of three
equal amplitude sine waves. A root, the minor third above, and the fifth above
the root. The minor third was chosen as even tempered, and the fifth was
chosen to be perfect.
Thus a sweep would consist of a frequency f0 which sweeps from 250Hz to 4kHz,
in combination with f1 = 1.1225*f0, in combination with f2 = 1.5*f0.
Sweep rate and waveform
The sweep rate was set at 10 seconds for a four octave sweep.
The sweep is then repeated at a 6dB higher amplitude, until the maximum
level is reached.
Each block in the waveform below is 10 seconds long, and sweeps f0 over
four octaves.
In this case the signal has been formed by summing the sines of f0, f1, and
f2, resulting in a symmetric output signal.
Fine waveform and spectrum
Result – speaker 1
Typical output
spectrum at
the highest
level.
Analysis
• Results were analyzed with a C language program that
eliminated the source tones with a tracking filter.
• This program outputs files that can be plotted with
MATLAB.
Notice the total distortion
is not strongly level
dependent in this
loudspeaker. Nor is it
strongly dependent on
frequency.
Is this possibly a source of
“fuzziness?”
Analysis of distortion
• The tracking filter selects two harmonics
– One characteristic of symmetric distortion,
– And one characteristic of asymmetric distortion.
Symmetric distortion
Asymmetric distortion
Notice that this loudspeaker has a symmetric distortion characteristic almost
independent of level, at least below 1kHz. Asymmetric distortion rises with level.
Is the observed distortion audible?
• The observed distortion is audible on the test signal, particularly on the
high level segment. But only if the test subject wears earplugs.
• Distortion is audible on the lower level portions also, and it seems
relatively independent of level.
• But is the perceived distortion in the speaker – or is it in the listener?
• As a test, the same signal was reproduced through three loudspeakers,
one for each frequency, f0, f1, and f2.
– The perception was subtly different when the signals were combined and
reproduced through a single loudspeaker.
– But these differences could be ascribed to the non-anechoic conditions of
the test.
Human hearing is inherently non-linear
Hair cells fire when the ion
channel controlled by the
hair opens.
This causes a burst of neural
activity at the zero-crossings
of the pressure waveform.
This process is similar to a
half-wave rectifier followed
by a differentiator.
All the sounds we hear pass through this asymmetric non-linear system.
We perceive the signals as undistorted only through the action of the filters in
the basilar membrane.
These filters are not particularly effective at low frequencies!
Hair cell firing
• Hair cells act as a half-wave rectifier. We are unaware of the
(negative) half of the waveform.
Result of the half-wave rectification:
• The pitch of low frequencies is determined not through the basilar
membrane filters, but through the time intervals between nerve firings.
• Consequently we cannot distinguish between real frequencies and
subharmonics generated through the half-wave rectification process.
– This leads to the well-known phenomenon of “false bass”
• Listening to two tones that are harmonically related will often produce the
perception of the fundamental. For example, a tone at 50Hz will be heard
when 100Hz and 150Hz are played together.
• Complex low frequency signals, such as a minor triad, are heard as an
un-interpretable mix of fundamentals and harmonics.
– Composers – outside of grundge rock – tend to avoid them!
Example – a low frequency triad
• We can generate a minor triad sweep from 80Hz to 320Hz.
– The lower frequencies simply sound as if the loudspeaker is broken
• Only above 250Hz do we begin to resolve the pitches that make up the
harmony.
Distortion Models - symmetric
• We need a mathematical model for loudspeaker distortion that will
allow us to find the “just noticeable level” at which distortion becomes
perceivable.
• After a great deal of head-scratching two simple models were chosen:
• 1. Symmetric compression – where delta is typically ~0.1 or less:
–
–
–
–
If the signal voltage is positive
Vout = exp((1-delta)*log(Vin)
If the signal voltage is negative
Vout = -exp((1-delta)*log(abs(Vin))
• This distortion is identical to a uniform compression of the waveform.
If delta = 0.1, then the waveform is compressed by 1dB for every 10dB
of level increase.
• The advantage of this model is that the percentage distortion is
independent of the signal level and spectrum.
– There is a small dependence on crest factor.
Distortion Models – asymmetric
•
•
A simple asymmetric distortion model can be created by using a different gain
for the positive and negative signal voltages.
If the signal voltage is positive:
– Vout = (1-delta_a)*Vin;
•
If the signal voltage is negative:
– Vout = Vin;
•
This model also produces a distortion that is independent of level and
spectrum.
•
A C language program was written that applies these two distortions to a
stereo input file, measures the resulting percentage of distortion, and outputs
the distortion only as a stereo file.
•
This program allows us to listen to the result of the distortion process on a
wide variety of input signals. Since the distortion alone is the output, various
distortion percentages can be created by simply mixing the distortion with the
original signal.
Distortion test with Cool-Edit
Here is a segment
from the Faure
Requiem, repeated
once
Here is the same
signal with delta =
0.05 and adelta =
0.03. The distortion
has been amplified
20dB and the
original signal
removed.
Sum of signal and distortion
30%
15%
The first section is distorted – the repeat is clean.
15% distortion is quite difficult to hear with this signal!!!
7%
So why does it sound fuzzy?
Filter the short segment of Faure
at 1kHz with a 100Hz bandwidth.
Playing this signal produces a
shattering perception, particularly
at high levels.
Considerable energy in the 100Hz
region is perceived, even though
there is no energy in that
frequency range.
Result – hair-cell distortion produces the
perception of shattering
• Explaination:
– A 100Hz bandwidth noise-like signal at 1kHz creates intermodulation
products in the 100Hz region when passed through an asymmetric
detector.
– These subharmonics may excite the neural sensors for low frequencies
directly through motion of the basilar membrane.
– They also might be directly perceived by correlation detectors in the 1kHz
neural channels.
• As the frequency is raised above 1kHz both detection mechanisms will
be less active.
– In fact, a 4kHz signal with a 100Hz bandwidth produces very little
shattering, and few perceptual subharmonics.
– A 4kHz signal with a 400Hz bandwidth sounds pretty bad.
Shattering at high frequencies
• Try an 8kHz signal with a 250Hz bandwidth:
• How about 18kHz and a 500Hz bandwidth?
• How about 15kHz and 500Hz bandwidth?
Converter Intermodulation
• Very inexpensive converters can have high
intermodulation distortion at high frequencies
– For example, the converters in this laptop.
• Example – 15kHz 500Hz bandwidth as output from this
laptop.
Conclusions – fuzzy speakers
• The loudspeakers tested have intermodulation distortion
lower than the threshold of detection for complex tones.
• Non-linear distortion in human hearing appears to account
for the audible distortion in full chorus.
– The non-linearities – particularly at high frequencies may be a
form of age-related hearing loss.
– This type of distortion may be well understood by researchers in
the fields of hearing and speech, or hearing pathology.
Conclusions: A/B tests
• No evidence was uncovered in this study that would
invalidate rapid, blind, A/B tests as the gold standard for
audio research.
• But the possibility remains…
– Particularly in the study of room acoustics
• intelligibility, muddiness, and envelopment all may depend on the
time period devoted to listening to a particular acoustic signal.