Transcript 19-Audition

Motor Theory Remnants
April 3, 2012
Dirty Work
• Project Reports #5 to turn in.
• On Thursday, we’ll talk about the muscles that control
articulation…
• And do a slightly messy static palatography demo
• At the end of today, we’ll do the USRI evaluations.
Another Piece of the Puzzle
• Another interesting finding which has been used to
argue for the “speech is special” theory is duplex
perception.
• Take an isolated F3 transition:
and present it to one ear…
Do the Edges First!
• While presenting this spectral frame to the other ear:
Two Birds with
One Spectrogram
• The resulting combo is perceived in duplex fashion:
• One ear hears the F3 “chirp”;
• The other ear hears the combined stimulus as “da”.
Duplex Interpretation
• Check out the spectrograms in Praat.
• Mann and Liberman (1983) found:
• Discrimination of the F3 chirps is gradient when
they’re in isolation…
• but categorical when combined with the spectral
frame.
• (Compare with the F3 discrimination experiment with
Japanese and American listeners)
• Interpretation: the “special” speech processor puts
the two pieces of the spectrogram together.
fMRI data
• Benson et al. (2001)
• Non-Speech stimuli = notes, chords, and chord
progressions on a piano
fMRI data
• Benson et al. (2001)
• Difference in activation for natural speech stimuli versus
activation for sinewave speech stimuli
Mirror Neurons
• In the 1990s, researchers in Italy discovered what they
called mirror neurons in the brains of macaques.
• Macaques had been trained to make grasping motions
with their hands.
• Researchers recorded the activity of single neurons
while the monkeys were making these motions.
• Serendipity:
• the same neurons fired when the monkeys saw the
researchers making grasping motions.
•  a neurological link between perception and action.
• Motor theory claim: same links exist in the human brain,
for the perception of speech gestures
Moving On…
• One important lesson to take from the motor theory
perspective is:
• The dynamics of speech are generally more
important to perception than static acoustic cues.
• Note: visual chimerism and March Madness.
Auditory Chimeras
• Speech waveform + music spectrum:
frequency
bands
1
2
4
8
16
32
• Music waveform + speech spectrum:
frequency
bands
1
2
4
8
16
32
Originals:
Source: http://research.meei.harvard.edu/chimera/chimera_demos.html
Auditory Chimeras
• Speech1 waveform + speech2 spectrum:
frequency
bands
1
2
4
6
8
16
• Speech2 waveform + speech1 spectrum:
frequency
bands
Originals:
1
2
4
6
8
16
Motor Theory, in a nutshell
•
The big idea:
•
•
We perceive speech as abstract “gestures”, not
sounds.
Evidence:
1. The perceptual interpretation of speech differs
radically from the acoustic organization of speech
sounds
2. Speech perception is multi-modal
3. Direct (visual, tactile) information about gestures can
influence/override indirect (acoustic) speech cues
4. Limited top-down access to the primary, acoustic
elements of speech
Audition
(or, how we hear things)
April 3, 2012
How Do We Hear?
• The ear is the organ of hearing. It converts sound waves
into electrical signals in the brain.
• the process of “audition”
• The ear has three parts:
• The Outer Ear
• sound is represented acoustically (in the air)
• The Middle Ear
• sound is represented mechanically (in solid bone)
• The Inner Ear
• sound is represented in a liquid
The Ear
Outer Ear Fun Facts
• The pinna, or auricle, is a bit more receptive to sounds
from the front than sounds from the back.
• It functions primarily as “an earring holder”.
• Sound travels down the ear canal, or auditory meatus.
• Length  2 - 2.5 cm
• Sounds between  3500-4000 Hz resonate in the ear
canal
• The tragus protects the opening to the ear canal.
• Optionally provides loudness protection.
• The outer ear dead ends at the eardrum, or tympanic
membrane.
The Middle Ear
the anvil
(incus)
the hammer
(malleus)
the stirrup
(stapes)
eardrum
The Middle Ear
• The bones of the middle ear are known as the ossicles.
• They function primarily as an amplifier.
• = increase sound pressure by about 20-25 dB
• Works by focusing sound vibrations into a smaller area
• area of eardrum = .55 cm2
• area of footplate of stapes = .032 cm2
• Think of a thumbtack...
Concentration
• Pressure (on any given area) = Force / Area
• Pushing on a cylinder provides
no gain in force at the other end...
• Areas are equal on both sides.
• Pushing on a thumb tack provides
a gain in force equal to A1 / A2.
• For the middle ear ,
force gain 
• .55 / .032  17
Leverage
• The middle ear also exerts a lever action on the inner
ear.
• Think of a crowbar...
• Force difference is
proportional to ratio of
handle length to end length.
• For the middle ear:
• malleus length /
stapes length
• ratio  1.3
Conversions
• Total amplification of middle ear  17 * 1.3  22
• increases sound pressure by 20 - 25 dB
• Note: people who have lost their middle ear bones can
still hear...
• With a 20-25 dB loss in sensitivity.
• (Fluid in inner ear absorbs 99.9% of acoustic energy)
• For loud sounds (> 85-90 dB), a reflex kicks in to
attenuate the vibrations of the middle ear.
• this helps prevent damage to the inner ear.
The Attenuation Reflex
• Requires 50-100
msec of reaction time.
• Poorly attenuates
sudden loud noises
• Muscles fatigue after
15 minutes or so
• Also triggered by
speaking
tensor
tympani
stapedius
The Inner Ear
• In the inner ear there is a
snail-shaped structure
called the cochlea.
• The cochlea:
• is filled with fluid
• consists of several
different membranes
• terminates in membranes
called the oval window and
the round window.
Cochlea Cross-Section
• The inside of the cochlea is divided into three sections.
• In the middle of them all is the basilar membrane.
Contact
• On top of the
basilar membrane
are rows of hair
cells.
• We have about 3,500 “inner” hair cells...
• and 15,000-20,000 “outer” hair cells.
How does it work?
• On top of each hair cell
is a set of about 100 tiny
hairs (stereocilia).
• Upward motion of the
basilar membrane
pushes these hairs into
the tectorial membrane.
• The deflection of the hairs opens up channels in the hair
cells.
• ...allowing the electrically charged endolymph to flow
into them.
• This sends a neurochemical signal to the brain.
An Auditory Fourier Analysis
• Individual hair cells in
the cochlea respond
best to particular
frequencies.
• General limits:
20 Hz - 20,000 Hz
• Cells at the base
respond to high
frequencies;
tonotopic organization of the
cochlea
• Cells at the apex
respond to low.
How does this work?
• Hermann von Helmholtz (again!) first proposed the place
theory of cochlear organization.
• Original idea: one hair cell for each frequency.
• a.k.a. the “resonance theory”
• But...we can perceive more frequencies than we have
hair cells for.
• The rate theory emerged as an alternative:
• Frequency of cell firing encodes frequencies in the
acoustic signal.
• a.k.a. the “frequency theory”
• Problem: cell firing rate is limited to 1000 Hz...
Synthesis
• The volley theory attempted to salvage the frequency
rate proposal.
• Idea: frequency rates higher than 1000 Hz are “volleyed”
back and forth between individual hair cells.
• There is evidently considerable evidence for this
proposal.
Traveling Waves (in the ear!)
• Last but not least, there is the traveling wave theory.
• Idea: waves of different frequencies travel to a different
extent along the cochlea.
• Like wavelength:
• Higher frequency waves are shorter
• Lower frequency waves are longer
The Traveling Upshot
• Lower frequency waves travel the length of the
cochlea...
• but higher frequencies cut off after a short distance.
• All cells respond to lower frequencies (to some extent),
• but fewer cells respond to high frequency waves.
• Individual hair cells thus function like low-pass filters.
Hair Cell Bandwidth
• Each hair cell responds to a range of frequencies,
centered around an optimal characteristic frequency.
Frequency Perception
• In reality, there is (unfortunately?) more than one truth--
• Place-encoding (traveling wave theory) is probably
more important for frequencies above 1000 Hz;
• Rate-encoding (volley theory) is probably more
important for frequencies below 1000 Hz.
• Interestingly, perception of frequencies above 1000 Hz
is much less precise than perception of frequencies below
1000 Hz.
• Match this tone:
• To the tone that is twice the frequency:
Higher Up
• Now try it with this tone:
• Compared to these tones:
• Idea: listeners interpret pitch differences as (absolute)
distances between hair cells in the cochlea.
• Perceived pitch is expressed in units called mels.
• Twice the number of mels = twice as high of a
perceived pitch.
• Mels = 1127.01048 * ln (1 + F/700)
• where acoustic frequency (F) is expressed in Hertz.
The Mel Scale
Equal Loudness Curves
• Perceived loudness also depends on frequency.