JND in Sound Intensity

Download Report

Transcript JND in Sound Intensity

Psychoacoustics and
MP3 audio
encoding
Physics
of Music
PHY103
MP3
• MPEG is moving pictures experts group.
set up by ISO (international standards
organization) every few years issues a
standard MPEG1 (1992), MPEG2(1994)..
• MP3 stands for MPEG audio layer III
• Longer history – age of photo-video
compression – in part started with audio
compression experiments in the late ’80s
Auditory Coding
1) Time frequency decomposition – divide
the signal into pieces, obtain the spectrum
of each piece
2) Use psycho-acoustic masking model to
determine what information to keep
3) Store the information in the most compact
way possible – minimize the bitrate and
maximize the audible auditory content
4) System of synchronization
Encoding and Decoding
Encoding:
• Auditory signal (from a recording) is coded into
an mp3 file containing carefully stored spectral
information
Decoding:
• mp3 file is turned back into an auditory file that
can be output to your speakers
Streaming:
• This can be done in real time even if you don’t
have the entire file
Lossy vs Lossless compression
• Compression: Store in a very compact format, more
compact than the original audio file
• Lossless compression means no information is removed
• MP3 is a lossy type of compression. Information is lost
during compression. Only inaudible information should be
removed. Topic of current research on whether expert
listeners can hear differences and how much is enough ...
• MP3 achieves a 10:1 compression ratio!
• This enables bit-streaming, makes storing audio very
compact
Adding noise
• Rather than removing information MP3
adds noise. This is done by describing the
signal with degraded digital precision.
• If you fail to digitize something sufficiently
accurately, this is equivalent to adding noise
• The added noise should be inaudible it is
below the mask threshold
Easy chops:
• Don’t bother storing information outside the
range of hearing (outside 40Hz-15kHz)
• Stereo info not stored for low frequencies
Bad ways to compress an audio file
• Reduce the total number of bits per sample (e.g.
32 bit to 16 or 16 to 8 bit)  this gives you a
factor of 2 in compression. However you get a
noisier signal
• Reduce the sampling rate (44kHz to 22kHz or
22kHz to 10kHz). Total loss of all high frequency
information. Again only a gain of a factor of 2 in
size. Equivalent to a high pass filter.
• A factor 10:1 in compression cannot be achieved
using linear compression schemes
Masking
13dB
critical band
If a dominant tone is
present then noise
can be added at
frequencies next to it
and this noise will not
be heard. Less
precision is required to
store nearby
frequencies.
Definition of masking
• The process by which the threshold of
audibility for one sound is raised by the
presence of another (masking) sound
• The amount by which the threshold is raised
by the masker (in dB).
critical band
The wider the noise
bandwidth the more
the signal (sine wave)
is masked.
critical band
A sine (signal) in the
presence of noise that
has a band width (in
frequency) centered
around the signal.
Past a particular
frequency width the
masking doesn’t
increase.
Critical band width as a function of frequency
Size of critical
band is
typically one
tenth of the
frequency
Critical band concept
• Only a narrow band of frequencies surrounding
the tone – those within the critical band contribute
to masking of the tone
• When the noise just masks the tone, the power of
the tone divided by the power of the noise inside
the band is a constant.
The nature of the auditory filter
• The auditory filter is not necessarily square –
actually it is more like a triangle shape
• Critical band width is sometimes referred to as
ERB (equivalent rectangular bandwidth)
• Shape difficult to measure in psychoacoustic
experiments because of side band listening affects
some innovative experiments (notched filtered
noise + signal) designed to measure the actual
shape of the filter).
Physiological reasons for the
masking
• Basal membrane? The critical bandwidths
at different frequencies correspond to fixed
distances along the basal membrane.
• However the masking could be a result of
feedback in the neuron firing instead.
Negative reinforcement or suppression of
signals. Or swamping of signals.
Pitch perception
Ability to
discriminate
between a change
in frequency as a
function of pulse
duration
DLF (Difference
Limen for
Frequency) given
in % of central
Pitch perception vs masking
• Note our ability to detect pitch changes is at
the level of 0.25% well below the width of
the critical band.
• This precision requires active hair/basal
membrane interactions in the cochlea
Temporal effects - nonsimultaneous masking
• The peak ratio of the masker is important -- that
means its variations in volume as a function of
time compared to its rms value. Short loud peaks
don’t necessarily contribute to the masking as
much as a continuous noise.
• Both forward and backward masking - masking
can occur if a loud masker is played just after the
signal!
• Masking decays to 0 after 100-200ms
Physiological explanations for
temporal masking
• Basal membrane is ringing preventing
detection in that region for a particular time
• Neurons take a while to recover - neural
fatigue
Comodulation masking release
• A masked signal if comodulated with
frequencies outside the critical band can be
detected below the masking threshold
• In the same way that the
overtones/spectrum is used to identify a
sound. Sounds outside the critical band,
since they are modulated the same as the
signal, are used to pull it out (detect it) from
more than one critical band region.
Perception of loudness
Just noticeable difference
• JND in Sound Intensity
• A useful general reference is that the just
noticeable difference in sound intensity for the
human ear is about 1 decibel.
• JND = 1 decibel
• In fact, the use of the factor of 10 in the definition
of the decibel is to create a unit which is about the
least detectable change in sound intensity.
JND as a function of loudness
• There are some
variations. The JND is
about 1 dB for soft
sounds around 30-40
dB at low and midrange
freqencies. It may drop
to 1/3 to 1/2 a decibel
for loud sounds.
• Caution must be used in
applying the "one
decibel" criterion. It
presumes that you are
increasing the same
sound by one decibel.
Loudness and the Critical Band
• When two sounds of equal loudness when sounded
separately are close together in pitch, their combined
loudness when sounded together will be only slightly
louder than one of them alone. They may be said to be in
the same critical band where they are competing for the
same nerve endings on the basilar membrane of the inner
ear. According to the place theory of pitch perception,
sounds of a given frequency will excite the nerve cells of
the organ of Corti only at a specific place. The available
receptors show saturation effects which lead to the general
rule of thumb for loudness by limiting the increase in
neural response.
Outside the critical band
• If the two sounds are widely separated in pitch, the
perceived loudness of the combined tones will be
considerably greater because they do not overlap on the
basilar membrane and compete for the same hair cells.
Pitch information area for
complex tones
Pitch depends on partial pitches
• Butler 3.5b second of each pair has partials
10% sharp. Perceived pitch change
depends on frequency
Timbre depends on frequency
• First tone has partials 1,2,3,4,5
• Second tone has partials 1,3,5,7,9
• Difference in timbre depends on frequency
of fundamental
MP3 schematic
• Input: 16 bit at 44kHz sampling is 768kbit/s
• Filter bank: band pass filter into 32 sub-bands
each centered at a different frequency
• MDCT: Modified Discrete Cosine Transform–
each sub-band is divided into time windows.
• Windows overlap to get rid of a problem called
aliasing (high frequencies are confused with low
ones). Overlap needed for MDCT
13 dB miracle
• if the signal is 13 dB louder than then noise
then the noise can’t be heard.
• Each sub-band is quantized differently
depending upon the masking threshold
estimated in that band
• FFT is used to compute the masking threshholds
Pushing MP3 to its limits
-uncompressed
-over compressed mp3
• Above compressing to 60kbps
• Using home.c4.scale.AIFF show mp3 options DEMO
with Adobe to experiment
Limits of MP3
• Above ~80kbps (kilo bits per second) and
22kHz sampling I find I get reasonable
sound.
• Compressing beyond this can do pretty
weird things – I found that noise sounded
weird and lack of high frequencies led to
lost brilliance in timbre - also attacks
suffered pitch and timbre changes
Auditory illusions
• Descending pitch illusion
• A melody of silences
http://asa.aip.or
g/demo27.html
http://www.kyushuid.ac.jp/~ynhome/EN
G/Demo/illusions.ht
ml