Hearing and Speech

Transcript Hearing and Speech

Cognitive Architectures
Hearing and Speech
Based on book Cognition, Brain and Consciousness ed. Bernard J. Baars
Janusz A. Starzyk
EE141
1
Sound and hearing basics
Time domain sinewave signal and the
same signal in time-frequency domain


Complex sound signals can be decomposed into a series of sinewave signals
of various frequencies.
Human auditory system detects sounds in the range of 20 Hz to 20 kHz


bats and whales can hear up to 100 kHz
Musicians can detect the difference between 1000 Hz and 1001 Hz
2
EE141
Sound and hearing basics




20 msec is needed for
the onset of a consonant
200 msec is time of an
average syllable
And 2000 msec is
needed for a sentence
These various time
scales and other
parameters of the sound
like timbre or intensity
must be properly
processed to recognize
speech or music.
EE141
A spectrogram of a speech signal – frequency is represented
3
on the y-axis
Sound and hearing basics
Near total silence - 0 dB
A whisper - 15 dB
Normal conversation - 60 dB
A lawnmower - 90 dB
A car horn - 110 dB
A rock concert - 120 dB
A gunshot - 140 dB
Human and cat
hearing
sensitivity

Dynamic range of human hearing system is very broad from 1 SPL
(sound pressure level where hearing is accruing) to 1015 SPL or 150
4
dB SPL.
EE141
Sound and hearing basics
There are two cochlear
windows – oval and
round.
Stapes coveys sound
vibrations through oval
window to inner ear
fluids.



Sound wave caused by vibrating objects moves through the air and enters
external auditory canal reaching membrane or eardrum.
Vibrations propagate through the middle ear through mechanical action of three
bones the hammer, anvil and stirrup (or malleus, incus and stapes).
Because of the length of the ear canal, it is capable of amplifying sounds with
5
frequencies of approximately 3000 Hz.
EE141
Sound and hearing basics



The cochlea and the semicircular canals are filled with a water-like
fluid.
Cochlea in the inner ear contains a basilar membrane.
Traveling wave of sound moves across the basilar membrane
moving the small hair-like nerve cells.
6
EE141
Sound and hearing basics
Pathways at the auditory
brainstem



The inner surface of the cochlea is lined with over 16 000 hair-like nerve
cells which perform one of the most critical roles in our ability to hear.
Each hair cell has a natural sensitivity to a particular frequency of
vibration.
The brain decodes the sound frequencies based on which hair cells along
the basilar membrane are activated this is known as place principle.7
EE141
Inner ear details
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000
EE141
8
Inner ear details
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
EE141
9
Inner ear details
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000
EE141
10
Inner ear details
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
EE141
11
Inner ear details
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
EE141
12
Inner ear details
Figure 30-5
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
EE141
13
Inner ear details
Figure 30-5
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
EE141
14
Inner ear details
From E. R. Kandel et all. “Principles of Neural Science”, McGraw-Hill , 2000.
EE141
15
The central auditory system





The auditory system has
many stages from the
ear, to the brainstem, to
subcortical nuclei, and to
cortex.
Ascending (affarent)
pathways transmit
information from the
periphery to cortex.
The neuron signals
travel starting from the
auditory nerve to the
lower (ventral) cochlear
nucleus.
Then signal travels
through lateral
lemniscus, inferior
colliculus, thalamus, to
auditory cortex.
A key task of the
ascending pathway is to
localize
sound in space.
EE141
16
The central auditory system




The descending (efferent)
pathways from auditory
system cortex go down to
periphery under cortical
control.
This control extends all
the way to hair cells in
the cochlea.
Descending pathway
provides ‘top down’
information critical for
selective attention and
perception in a noisy
environment.
Besides ascending and
descending pathways
there is connection
between left and right
auditory pathways
through corpus callosum
and other brain regions.
EE141
17
Auditory cortex



Auditory cortex
specializes in sound
processing.
It serves as a hub for
sound processing and
interacts with other
systems within cortex and
back down the
descending path to the
cochlea.
These processes provide
a wide range of
perceptual abilities like
selecting a single
person's voice in a
crowded space or
recognizing melody even
when it is played off-key.
18
EE141
Auditory cortex

In humans primary auditory
cortex is located within
Heschl’s gyrus.
 Heschl’s gyrus corresponds to
Brodmann’s area 41.

Another important region in
auditory cortex is planum
temporale located posterior to
Heschl’s gyrus.
 Planum temporale is much
larger in the left hemisphere
(up to 10 times) in right
handed individuals.
 It plays important role in
language understanding.

Posterior to planum temporale
is Broadmann area 22 that
Carl Wernicke associated with
speech comprehension
(Wernicke area).
19
EE141
Auditory cortex




There are several types of neurons in
the auditory system.
They have different response
properties for coding frequency,
intensity, and timing information in
sounds as well as encoding spatial
information for localizing sounds in
space.
Main cells of cochlear nucleus and
their corresponding post stimulus time
(PST) histograms.
Sound stimulus used is typically 25
ms tone bursts at the center
frequency and sound level 30 dB
above threshold.
20
EE141
Auditory cortex




Receptive fields of auditory neurons
have different sensitivity to the
location of the sound source (in
azimuth angle) and its loudness (in
dB).
The top neuron sensitivity is to a
broad range of sound intensity
located to the right with larger
sensitivity to louder signals.
The lower neuron sensitivity is more
narrowly tuned to sounds level 3060 dB located slightly to the left of
center.
Broadly tuned neurons are useful for
detection of the sound source, while
narrowly tuned give more precise
information needed to locate the
sound source like more precise
direction of the sound and its
loudness level.
21
EE141
Auditory cortex

Auditory tonotopic cortical fields of a
cat.
 a) lateral view
 b) lateral view “unfolded’ to show
parts hidden within sulci.

The four tonotopic fields are:






Anterior (A)
Primary (AI)
Posterior (P) and
Ventroposterior (VP)
Positions of the lowest and highest
center frequencies in these fields
are indicated in (b)
Other cortical areas have a little
tonotopy: seconday (AII), ventral
(V), temporal (T), and
dorsoposterior (DP).
22
EE141
Functional mapping of auditory processing






The planum temporale (PT) location
close to Wernicke’s area for speech
comprehension, points towards its
role as the site for auditory speech
and language processing.
However neuroimaging studies of PT
provide evidence that functional role
of PT is not limited to speech.
PT is a hub for auditory scene
analysis, decoding sensory inputs
and comparing them to memories and
past experiences.
PT further directs cortical processing
to decode spatial location and
auditory object identification.
Planum temporale and its major
associations: lateral superior temporal
gyrus (STG), superior temporal sulcus
(STS), middle temporal gyrus (MTG),
parieto-temporal operculum (PTO),
inferior parietal lobe (IPL).
EE141
23
Functional mapping of auditory processing




PT as a hub for auditory and spatial
analysis.
In a crowded environment it is
important to decode auditory objects
such as friend’s voice, alarm signal or
a squeaking wheel.
To do so, auditory system must
determine where sounds are
occurring in space, and what they
represent.
All these will be associated with other
sensory inputs like vision, smell, or
feel and memory associations.
24
EE141
Functional mapping of auditory processing
Neurons’ response to
interaural time
difference (ITD) and
interaural level
difference (ILD)
Abbreviations:
CN – cochlear nucleus
MSO – medial superior olive
LSO – lateral superior olive
MNTB – medial nucleus of
the trapezoidal body

To determine where the sound is coming from, two cues are used:
 Interaural (between ear) time difference
 Interaural level difference


Sensitivity to time difference must be smaller than millisecond.
The head produces a ‘sound shadow’ so that the sound reaching farther
ear is slightly weaker.
25
EE141
Functional mapping of auditory processing


It was demonstrated
that musical
conductors were able
to better locate sound
sources in a musical
score
They demonstrated
higher sensitivity to
sounds presented in
peripheral listening
than other groups
including other
musicians.
26
EE141
Functional mapping of auditory processing




Auditory objects are
categorized into human
voices, musical instruments,
animal sounds, etc.
Auditory objects are learned
over our lifetime, and
associations are stored in
the memory.
Auditory areas in superior
temporal cortex are activated
both by recognized and
unrecognized sounds.
Recognized sounds also
activate superior temporal
sulcus and middle temporal
gyrus (MTG).
Fig. (c) shows difference between
Activations for recognized sounds
and unrecognized sounds
EE141
27
Functional mapping of auditory processing



Binder and colleagues
propose that middle temporal
gyrus (MTG) is the region
that associates sounds and
images.
This is in agreement with
case studies of patients who
suffered from auditory
agnosia (inability to
recognize sounds).
Research results showed
that auditory object
perception is a complex
process and involves
multiple brain regions in both
hemispheres.
Brain activities in auditory
processing – cross sections at
different depth
EE141
28
Cocktail party effect



How auditory system separates sounds coming from
different sources?
Bregman (1990) proposed a model for such
segregation.
It contains four elements:







The source
The stream
Grouping
Stream segregation
The source is the sound signal. It represents physical
features like frequency, intensity, spatial location.
The stream is the percept of the sound and represents
psychological aspects depending on individual.
Grouping – creates stream
 Simultaneous grouping e.g. instruments in the orchestra
 Sequential grouping e.g. grouping sounds across time

Stream segregation into objects.
EE141
29
Cocktail party effect

Bergman grouping
principles:
 Proximity: sounds that are
close in time are grouped.
 Closure: if a sound does not
belong to the stream (like
cough during a lecture) are
excluded.
 Good continuation: sounds
that follow smoothly each
other (similar to proximity).
 Common fate: sounds that
come from the same
location or coincide in time
(orchestra).
 Exclusive allocation –
selective listening (focus on
one stream).
EE141
Cortical areas of auditory stream analysis:
intraparietal sulcus (IPS) is involved in binding of
multimodal information (vision, touch, sound)
30
Cocktail party effect

There is a growing evidence that like in visual stream cortical
networks for decoding ‘what’ and ‘where’ information in sound are
processed in separate but highly interactive processing streams.
Audio (blue) and visual (pink)
processing areas in macaque
brain, and ‘what’, ‘where’ audio
processing streams
EE141
Human brain processing:
Blue – language specific phonological structure
Lilac – phonetic cues and speech features
Purple – intelligible speech
Pink – verbal short term memory
31
Green – auditory spatial tasks
Speech perception

There is no agreement how speech is coded in the brain.
 What are the speech ‘building blocks’?

A natural way would be to code words based on phonemes.
 Word ‘dig’ would be obtained by identifying a sequence of
phonemes

Perhaps a syllable is the appropriate unit?

We must decode not only ‘what ‘ but ‘who’ and ‘when’
as well to understand temporal order of phonemes,
syllables, words, and sentences.
EE141
 The speech signal must be evaluated on the scale of times
from 20 ms to 2000 ms independently of the pitch (high for a 32
child, low for a man), loud or quiet, fast or slow.
Speech perception

Early attempts in simplifying the speech processing were done in Bell Labs
by Homer Dudley who developed vocoder:
 Vocoder (voice + coder) was able to reduce speech signal for a transmission
over long telephone circuits by analyzing and recoding speech.
 Cochlear implants that stimulate auditory system are based on the vocoder
technology for some types of hearing loss.
33
EE141
Speech perception


A second invention spectrograph developed in Bell Labs
during World War II produced voice picture with frequency on
y-axis, time on x-axis and intensity as a level of grey.
Problems in analyzing spectrograms:
 Gaps or silences do not mark when the word begins and ends.
 Individual phonemes change depending on what phonemes were
34
before and after them.
EE141
What is wrong with the short-term spectrum?
Inconsistent (same message, different representation)
Shannon (1998) showed that a minimum information for speech decoding
is included in the shape of the speech signal called temporal envelope
short-term spectrum
frequency
35
EE141
Speech perception


Lack of invariant features in speech spectrogram forced
researchers to look for other ways of speech perception.
The motor theory developed by Liberman (1985) assumes
domain-specific approach to speech.
 This theory suggests that speech perception is tightly coupled with
speech production
 While acoustics of phonemes lack invariance, the motor gestures to
produce the speech is invariant and can be accessed in speech
perception.

Another theory developed by Tallal assumes that speech and
language are domain-general.
 In this theory left-hemisphere language organization is not result of
domain-specific development, but results from domain general bias of
the left hemisphere for decoding rapidly changing sounds (such as
those contained in speech).

It is likely that the neural system uses a combination of
domain-specific and domain-general processing for speech
perception.
36
EE141
Speech perception


A process model for word
comprehension.
Language areas.
37
EE141
Speech perception
Brain response to: Words
Pseudowords
Reversed speech



Binder and colleagues (1997) studied activation of brain areas to
words, reverse speech and pseudowords and found that Heschl’s
gyrus and the planum temporale were activated similarly for all
stimuli.
This supports the notion of hierarchical processing of sounds with
Heschl’s gyrus representing early sensory analysis.
Speech signals activated larger portion of auditory cortex than nonspeech sounds in posterior superior temporal gyrus and superior
temporal sulcus, but there was no difference in activation between
words, pseudowords and reversed speech.
 The conclusion is that these regions do not reflect semantic processing
38
of
the
words
but
reflect
phonological
processing
of
the
speech
sounds.
EE141
Speech perception and production

Speech perception and production are
tightly coupled.
 One explanation is that when we speak
we hear our voice.


Wernicke proposed a model for language
processing that links a pathway from
auditory speech perception to motor
speech production
The verbal signal enters the primary
cortex (A) and then Wernicke’s area (WA)
 The response will be formulated in Broca’s
area (B) and the primary motor cortex
(M).


We can listen and respond to our own
speech using the same brain regions.
Producing internal response to a
question will result in silent speaking
to ourselves.
39
EE141
Damage to speech perceptual system
Phonetic foils





Damage to speech perceptual system may be caused by strokes that
block the blood flow to the brain area and cause death of neurons.
When the stroke impairs the language functions it is called aphasia.
Paul Broca discovered aphasia in the region in frontal lobe important
for speech production.
Carl Wernicke discovered a region in temporal lobe important for
speech perception.
Experiments by Blumstein tested phonetic deficits and semantic
deficits by providing patients with four choices in the test:
 correct word, semantic foil, phonetic foil and unrelated foil (e.g. peas,
carrots, keys, and calculator)
EE141
40
Learning and plasticity







An important theme in studying human cognition is to find out
how new information is encoded during learning and how the
brain adapts – plasticity.
Much of what is known about plasticity of the auditory system
is due to deprivation in animal study.
Both cochlea and brainstem are organized tonotopically and
this organization is reflected in auditory cortex.
After cochlea or brainstem are lesioned some frequencies are
no longer transmitted to auditory cortex and then cortex is
studied for changes reflecting neural plasticity.
Changes in neural response in auditory cortex were observed
in human after sudden hearing loss.
Children with hearing loss showed some maturational lag
comparing to typical development, however after having
cochlear implants, their auditory system continued to mature
in a typical fashion.
This indicates plasticity of the auditory cortex.
41
EE141
Learning and plasticity


Plasticity due to
learning was
observed in laboratory
animals using
classical conditioning
– presented tones
were paired with mild
electrical shock so the
animal learned
sounds more relevant
to survival (avoiding
shock).
Plasticity related
changes were more
pronounced for higher
motivational levels.
 Trained tones were
4.1-8kHz and
 motivational levels
were high (red)
medium (black) and
low (blue)
EE141
Untrained
Trained
Cortical area change for the desired signal
42
frequency for different motivational levels
Auditory awareness
Auditory system is the last to fall asleep and the first to wake up.
 People in sleep respond to their names better than to other sounds.
 Figure compares responses in auditory cortex during awaken and
43
sleep
states.
EE141

Auditory imagery
Brain areas
active for
imagined sounds




Sounds are played in our head all day even if we do not hear them.
Some are voluntary and uncalled for like a melody or your inner voice.
Some are planned like when you rehearse a verse or a telephone
number in your head.
Halpern and colleagues (2004) showed that non-primary auditory
cortex is active during imagined (and not heard) sounds.
44
EE141
Auditory imagery


A related results were obtained by Jancke and colleagues (2005).
They used fMRI images to compare neural responses to real sounds
and to imagined sounds.
 Imagined sounds activate similar regions in auditory cortex as the real ones.
45
EE141
Summary









We discussed organization of the acoustic system
Learned sound and hearing basics
Traced auditory pathways
Analyzed organization of auditory cortex
Observed functional mapping of auditory processing
Discussed sound and music perception
Effect of learning on sound processing
Research on animals confirmed existence of ‘what’ and
‘where’ pathways in auditory system, however these
pathways may be organized differently in humans.
When you hear uncalled melody in your head, think which of
your brain areas are activated.
46
EE141

Hearing and Speech

Transcript Hearing and Speech

Directory