20-Perception
Download
Report
Transcript 20-Perception
Perception + Vocal Tract
Physiology
November 25, 2014
Stuff to Remember
• The final homework is due on Thursday!
• Categorical perception
• The final interim course project report is due on Tuesday
of next week.
• We’ll do our palatography demo on Tuesday of next week,
as well.
• You won’t have to put up with me on Thursday!
• We have a mystery spectrogram to solve!
More Evidence for Modularity
• It has also been observed that speech is perceived
multi-modally.
• i.e.: we can perceive it through vision, as well as
hearing (or some combination of the two).
• We’re perceiving “gestures”
• …and the gestures are abstract.
• Interesting evidence: McGurk Effect
McGurk Effect, revealed
Audio
Visual
Perceived
ba
+
ga
da
ga
+
ba
bga
• Some interesting facts:
• The McGurk Effect is exceedingly robust.
• Adults show the McGurk Effect more than children.
• Americans show the McGurk Effect more than
Japanese.
Original McGurk Data
• Stimulus:
Auditory
Visual
ba-ba
ga-ga
• Response types:
Auditory: ba-ba
Fused:
da-da
Visual:
Combo:
gabga, bagba
ga-ga
Age
Auditory
Visual
Fused
Combo
3-5
19%
36
81
0
7-8
36
0
64
0
0
98
0
18-40 2
Original McGurk Data
• Stimulus:
Auditory
Visual
ga-ga
ba-ba
• Response types:
Auditory: ba-ba
Fused:
da-da
Visual:
Combo:
gabga, bagba
ga-ga
Age
Auditory
Visual
Fused
Combo
3-5
57%
10
0
19
7-8
36
21
11
32
18-40 11
31
0
54
Audio-Visual Sidebar
• Visual cues affect the perception of speech in nonmismatched conditions, as well.
• Scientific studies of lipreading date back to the early
twentieth century
• The original goal: improve the speech perception
skills of the hearing-impaired
• Note: visual speech cues often complement audio
speech cues
• In particular: place of articulation
• However, training people to become better lipreaders
has proven difficult…
• Some people got it; some people don’t.
Sumby & Pollack (1954)
• First investigated the influence of visual information on the
perception of speech by normal-hearing listeners.
• Method:
• Presented individual word tokens to listeners in noise,
with simultaneous visual cues.
• Task: identify spoken word
• Clear:
• +10 dB SNR:
• + 5 dB SNR:
• 0 dB SNR:
Sumby & Pollack data
Auditory-Only
Audio-Visual
• Visual cues provide an intelligibility boost equivalent to
a 12 dB increase in signal-to-noise ratio.
Tadoma Method
• Some deaf-blind people learn to perceive speech
through the tactile modality, by using the Tadoma
method.
Audio-Tactile Perception
• Fowler & Dekle: tested ability of (naive) college students
to perceive speech through the Tadoma method.
• Presented synthetic stops auditorily
• Combined with mismatched tactile information:
• Ex: audio /ga/ + tactile /ba/
• Also combined with mismatched orthographic information:
• Ex: audio /ga/ + orthographic /ba/
• Task: listeners reported what they “heard”
• Tactile condition biased listeners more towards “ba”
responses
Fowler & Dekle data
read “ba”
orthographic
mismatch
condition
felt “ba”
tactile
mismatch
condition
Another Piece of the Puzzle
• Another interesting finding which has been used to
argue for the “speech is special” theory is duplex
perception.
• Take an isolated F3 transition:
and present it to one ear…
Do the Edges First!
• While presenting this spectral frame to the other ear:
Two Birds with
One Spectrogram
• The resulting combo is perceived in duplex fashion:
• One ear hears the F3 “chirp”;
• The other ear hears the combined stimulus as “da”.
Duplex Interpretation
• Check out the spectrograms in Praat.
• Mann and Liberman (1983) found:
• Discrimination of the F3 chirps is gradient when
they’re in isolation…
• but categorical when combined with the spectral
frame.
• (Compare with the F3 discrimination experiment with
Japanese and American listeners)
• Interpretation: the “special” speech processor puts
the two pieces of the spectrogram together.
fMRI data
• Benson et al. (2001)
• Non-Speech stimuli = notes, chords, and chord
progressions on a piano
fMRI data
• Benson et al. (2001)
• Difference in activation for natural speech stimuli versus
activation for sinewave speech stimuli
Mirror Neurons
• In the 1990s, researchers in Italy discovered what they
called mirror neurons in the brains of macaques.
• Macaques had been trained to make grasping motions
with their hands.
• Researchers recorded the activity of single neurons
while the monkeys were making these motions.
• Serendipity:
• the same neurons fired when the monkeys saw the
researchers making grasping motions.
• a neurological link between perception and action.
• Motor theory claim: same links exist in the human brain,
for the perception of speech gestures
Moving On…
• One important lesson to take from the motor theory
perspective is:
• The dynamics of speech are generally more
important to perception than static acoustic cues.
• Note: visual chimerism and March Madness.
Auditory Chimeras
• Speech waveform + music spectrum:
frequency
bands
1
2
4
8
16
32
• Music waveform + speech spectrum:
frequency
bands
1
2
4
8
16
32
Originals:
Source: http://research.meei.harvard.edu/chimera/chimera_demos.html
Auditory Chimeras
• Speech1 waveform + speech2 spectrum:
frequency
bands
1
2
4
6
8
16
• Speech2 waveform + speech1 spectrum:
frequency
bands
Originals:
1
2
4
6
8
16
Motor Theory, in a nutshell
•
The big idea:
•
•
We perceive speech as abstract “gestures”, not
sounds.
Evidence:
1. The perceptual interpretation of speech differs
radically from the acoustic organization of speech
sounds
2. Speech perception is multi-modal
3. Direct (visual, tactile) information about gestures can
influence/override indirect (acoustic) speech cues
4. Limited top-down access to the primary, acoustic
elements of speech
Vocal Tract Physiology
November 25, 2014
The Toolkit
•
There are four primary active articulators in speech.
•
(articulators we can move around )
1. The lips
2. The lower jaw (mandible)
3. The tongue
4. The velum
•
The pharynx can also be constricted, to some extent.
•
Separate sets of muscles control each articulator...
Articulatory Speed
• The gold medal goes to the tongue tip...
• which is capable of 7.2 - 9.6 movements per
second.
• The rest:
• Mandible
5.9 - 8.4 movements per second
• Back of tongue 5.4 - 8.9
• Velum
5.2 - 7.8
• Lips
5.7 - 7.7
• Note: lips can be raised and lowered faster than they
can be protruded and rounded.
1. The Lips
• The orbicularis oris
muscle surrounds the lips.
• Contraction compresses
and rounds the lips.
• A muscle called the
mentalis also protrudes
the lips.
• Contraction of the
risorius muscle retracts
the corners of the lips...
• and spreads them.
By the way...
• The vowel [i] is typically produced with active lip
spreading.
• “Say cheese!”
• What acoustic effect would this have?
• Lips Normal:
• Lips Spread:
• Check ‘em out in Praat.
2. The Jaw
• Several different muscles are used to both lower and
raise the mandible.
• Primary raisers:
• Masseter
• Temporalis
• Internal
pterygoid
2. The Jaw
• Several different muscles are used to both lower and
raise the mandible.
• Lowerers:
• Anterior belly
digastricus
• Geniohyoid
• Mylohyoid
• Note: in lowering, the mandible also retracts.