Models of the auditory system

Download Report

Transcript Models of the auditory system

The auditory system
Romain Brette
Ecole Normale Supérieure
Romain Brette ([email protected])
What is sound?
Hearing vs. seeing
Hearing
Acoustical waves, 20 – 20,000 Hz
= 1.7 cm – 17 m
Seeing
Electromagnetic waves, 380-740 nm
Hearing vs. seeing
Hearing
Acoustical waves, 20 – 20,000 Hz
= 1.7 cm – 17 m
Seeing
Electromagnetic waves, 380-740 nm
Hearing vs. seeing
Hearing
Seeing
Acoustical waves, 20 – 20,000 Hz
= 1.7 cm – 17 m
Electromagnetic waves, 380-740 nm
Information about volumes
Information about surfaces
Hearing vs. seeing
Hearing
Seeing
Acoustical waves, 20 – 20,000 Hz
= 1.7 cm – 17 m
Electromagnetic waves, 380-740 nm
Information about volumes
Information about surfaces
Sounds are produced by sources
Light is reflected by sources
Hearing vs. seeing
Hearing
Seeing
Acoustical waves, 20 – 20,000 Hz
= 1.7 cm – 17 m
Electromagnetic waves, 380-740 nm
Information about volumes
Information about surfaces
Sounds are produced by sources
Light is reflected by sources
The source is transient, sounds are
« events »
The source is persistent, one can
« look around » a visual object
Hearing vs. seeing
Hearing
Sounds from different locations are
mixed at the ear
Seeing
Light rays from different locations
are separated in the eye
The information in sound
Spatial location
Vision:
1) Direction of an object is mapped to
place on the retina.
2) Place on the retina varies
systematically with self-generated
movements.
Hearing:
1) Direction is mapped to relationships
between binaural signals, among
other cues
2) Relationships vary systematically
with self-generated movements,
3) but only if sounds are repeated
More about this: http://briansimulator.org/category/romains-blog/what-is-sound/
The information in sound
Shape
Vision: the way the visual field changes with
viewpoint determines the visual shape
Hearing: the sound does not change with
viewpoint.
But: there is information about shape in the spectrum.
Larger object => smaller frequencies (= change of space units).
In speech: shape of the vocal tract is linguistic information
M. Kac (1966) Can one hear the shape of a drum? Am. Math. Monthly 73 (4)
W.W. Gaver (1993) What in the world do we hear? Ecological Psychology 5(1)
The information in sound
Pitch
In voiced vowels, the glottis opens and closes at a fast rate, producing a periodic sound
(typically about 100 Hz for men, 200 Hz for women).
Vowel ‘o’
Repetition rate contains information about intonation and
speaker (used for grouping)
The information in sound
Summary: what the auditory system needs to process
- Precise temporal and
intensity relationships
between binaural signals
- Frequency spectrum
- Temporal information
- More generally: spectrotemporal information at
different scales
The time-frequency trade-off:
t*f>1/2 (Gabor)
Anatomy and physiology of the
auditory system
The ear
inner ear
vestibular system
(head movements)
cochlea
(hearing)
cochlea
inner
ear
outer ear
middle
ear
The basilar membrane
The basilar membrane
Hair cells
tectorial membrane
outer hair cells
inner hair
cells
basilar membrane
auditory
nerve
Hair cells
K+ channels open when the
stereocilia is deflected
Auditory nerve fibers
Response curves
Tuning curves
(threshold)
Phase locking
Response to a tone (multiple trials):
Phase
Time (ms)
« Phase locking »: neurons fire at preferred
phases of the input tone
Phase locking
(barn owl)
Response to a tone (multiple trials):
Phase
Time (ms)
« Phase locking »: neurons fire at preferred
phases of the input tone
Vector strength
A simple model of auditory nerve fibers
sound
bank of filters
half-wave
rectification
(+ possibly low-pass filtering for
decrease of phase-locking)
+ random spikes (Poisson)
NB: does not capture nonlinear effects
PF
InsC
AII
AI
AI
AII
InsC
PF
DMGB
DMGB
MMGB
SC
SC
MMGB
VMGB
VMGB
DC
SGN
LN
The rest of
the auditory
system
DC
SGN
ICC
ICC
LN
DNLL
DNLL
INLL
INLL
VNLL
VNLL
NCAT
DCN
PVCN
N.VIII
LNTB
LNTB
LSO
AVCN
MSO
SPN
SPN
MNTB
MNTB
MNTB
NCAT
LSO
MSO
DCN
AVCN
PVCN
Sound localization: acoustical
cues
3D localization
(azimuth)
= azimuth
d = elevation
Acoustical cues for sound localization
or head related impulse
responses (time domain; HRIRs)
(convolution)
Other cues for elevation:
• pinna filters out specific frequencies
depending on elevation
Other cues for distance:
• level is distance-dependent
• high frequencies are more
filtered with distance
• reverberation correlates with
distance
HRTFs and HRIRs in the rabbit
HRIR
HRTF
Kim et al., JARO (2010)
Interaural time differences (ITDs)
Path length difference with spherical head:
r(sin θ + θ)
ITD: (r/c)(sin θ + θ)
(Woodworth formula)
distant sound
source = plane wave
This is valid when wavelength << head width
Low frequencies: (3r/c)*sin θ
Kuhn, JASA 62(1), 157-167 (1977)
(c=340 m/s)
Frequency-dependence of ITDs
Frequency-dependence of ITDs
relevant range for ITDs
different
directions
Maximum human ITD: about 700 µs in HF, up to 1000 µs in LF
ILDs for sinusoidal stimuli
Large ILDs at high
frequencies and sources on
the side (head shadowing)
Very small ILDs in low
frequencies (for distant
sources)
Adapted from Feddersen et al. (1957)
Duplex theory




For low frequencies, ILDs are very small
For high frequencies, ITDs (for pure tones) are
ambiguous, i.e., when wavelength<max. ITD
Duplex theory (Lord Rayleigh, 1907): ITDs are used at
low frequencies, ILDs at high frequencies (threshold
around 1500 Hz)
Confirmed with psychophysical experiments (using
conflicting cues; Wightman & Kistler, 1992)
Elevation (deg)
Monaural spectral cues
The pinna introduces elevation-dependent spectral notches
Hofman et al., Nature (1998)
Sound localization: anatomy and
physiology
The first binaural structures
In the superior olivary complex (SOC) in the brainstem:
The lateral superior olive
ILD-sensitive neurons
The medial superior olive
ITD-sensitive neurons
Golgi stainings in cat by Ramon y Cajal, 1907
ITD and ILD pathways (mammals)
Cochlear nucleus
Bushy cells are more precise than
auditory nerve fibers!
Likely reason: averaging (several AN
inputs/cell) + perhaps gap junctions
The medial superior olive (MSO)
« best delay »
left
right
ITD
Neuron responses consistent with cross-correlation of monaural inputs
Cross-correlation, ITD and coincidence
detection
Two monaural signals:
SL(t)
SR(t) = a*SL(t-ITD)
Cross-correlation: C(s) = <SL(t)SR(t+s)>
Max. when s=ITD
Coincidence rate between two Poisson processes = cross-correlation (at s=0)
The Jeffress model
The Jeffress model
ITD is encoded by the
activation pattern of
neurons with
heterogeneous tunings
(Movie by Tom Yin)
Rate (Hz)
The Jeffress model
ITD is mapped to a
pattern of neural
activation
« Best delay » = difference
between monaural delays
delay lines
Theoretical appeal
Firing rate of cross-correlator neurons:
best delay
stimulus ITD
Rate is max. when d = ITD, for any sound S:
Estimators based on the Jeffress model:
• Peak coding
• Centroid estimator (Colburn/Stern)
ITD =-0.3 ms
Origin of internal delays
Observed: greater
delays at LFs
The hemispheric model
Testing the Jeffress model in small
mammals
« Best delay » = 400 µs
For each neuron, one measures firing rate vs. ITD
Gerbil MSO
(Day & Semple 2011)
Observations in many species:
1) Contralateral bias
2) Best delay is inversely correlated
with best frequency.
3) A number of large best
delays
This looks like a contradiction of the place code hypothesis!
« natural » ITDs
The hemispheric model of ITD processing
In small mammals: best delay around ±π/4
Guinea pig
Two-channel model: in each frequency band, 2
neural populations tuned at symmetrical best
delays outside physiological range of ITDs.
The relative activity indicates the ITD
(ratio of activities, for level independence).
(McAlpine et al., 2001; Harper & McAlpine, 2004)
Conceptual problems with the hemispheric
model




ITD code is ambiguous at high frequency
ITD estimation is not robust to noise
ITD estimation is not robust to sound spectrum
Many BDs within the physiological range
Sub-optimality of the hemispheric model:
Brette R (2010) On the interpretation of sensitivity analyses of neural responses, JASA 128(5), 2965-2972.
The synchrony field model
Puzzling observations
Gerbil MSO (Day & Semple 2011)
For some cells, the « best delay » depends on input
frequency.
Best phase
For a pure delay:
best phase (BP) = best delay (BD) * frequency (f)
PUT A CELL
CD
(cat IC)
CP
Frequency
Linear regression:
BP=CP+CD*f
CP
Not a pure delay!
CD (ms)
Not a pure phase!
ITDs in real life
FR,FL = location-dependent acoustical filters
(HRTFs/HRIRs)
Delay:
high frequency
low frequency
ITDs:
ITD (ms)
FRONT
Frequency
BACK
Binaural structure and synchrony receptive
fields
FR,FL = HRTFs/HRIRs (location-dependent)
NA, NB = neural filters
(e.g. basilar membrane filtering)
input to neuron A: NA*FR*S (convolution)
input to neuron B: NB*FL*S
Synchrony when: NA*FR = NB*FL
« Synchrony receptive field of (A,B) »
SRF(A,B) = set of filter pairs (FL,FR)
= set of source locations
= spatial receptive field
Brette (2012), Computing with neural synchrony. PLOS Comp Biol
Independent of
source signal S
The hypothesis
Each binaural neuron encodes an element of binaural structure
NB*FL*S
FL*S
NA*FR*S
FR*S
Best phase of a neuron vs. frequency
= Interaural phase difference vs. frequency for
preferred source location
Best phase
Experimental prediction
CP
Cells (cat IC)
PUT A CD
CELL
Input frequency (Hz)
HRTFs
HRTFs
Cells (IC)
Dendrites and coincidence
detection
Dendrites of binaural neurons
Avian NL neurons
Mammalian MSO neurons
Coincidence detection with dendrites
The problem: the
neuron responds to
both monaural and
binaural coincidences
With dendrites: the neuron is more
ITD selective because it responds
better to binaural coincidences.
(Agmon-Snir et al., Nature 1998)
Mechanism
Monaural coincidence
Esyn
nonlinear effect
dendrite
second spike less effective (current to
proportional (Esyn - V))
soma
Binaural coincidence
summation
left dendrite
soma
right dendrite
Binaural conductance threshold
gTh = monaural conductance threshold
A simplified model
With 3 compartments:
Soma
Dendrites
(left, right)