20-Speech-Perception..

Download Report

Transcript 20-Speech-Perception..

Speech Perception
April 3, 2014
Perceptual Fun!
• We have one last homework, due next Thursday!
• I’ll explain how it works after we walk through the
mechanics of categorical perception.
• On Tuesday we’ll do some palatography.
• Let’s watch a video of some dogs playing the piano!
Identification
• In Categorical Perception:
• All stimuli within a category boundary should be
labeled the same.
Discrimination
• Original task: ABX
discrimination
• Stimuli across
category boundaries
should be 100%
discriminable.
• Stimuli within
category boundaries
should not be
discriminable at all.
In practice, categorical perception means: the
discrimination function can be determined from the
identification function.
Identification  Discrimination
• Let’s consider a case where the two sounds in a
discrimination pair are the same.
• Example: the pair is stimulus 3 followed by stimulus 3
• Identification data--Stimulus 3 is identified as:
• [b] 95% of the time
• [d] 5% of the time
• The discrimination pair will be perceived as:
• [b] - [b]
-
.95 * .95 = .9025
• [d] - [d]
-
.05 * .05 = .0025
• Probability of same response is predicted to be:
• (.9025 + .0025) = .905 = 90.5%
Identification  Discrimination
• Let’s consider a case where the two sounds in a
discrimination pair are different.
• Example: the pair is stimulus 9 followed by stimulus 11
• Identification data:
• Stimulus 9: [d] 80% of the time, [g] 20% of the time
• Stimulus 11: [d] 5% of the time, [g] 95% of the time
• The discrimination pair will be perceived as:
• [d] - [d]
-
.80 * .05 = .04
• [g] - [g]
-
.20 * .95 = .19
• Probability of same response is predicted to be:
• (.04 + .19) = 23%
Discrimination
• In this discrimination
graph-• Solid line is the
observed data
• Dashed line is the
predicted data
(on the basis of the
identification scores)
Note: the actual listeners did a little bit better than the
predictions.
Categorical, Continued
• Categorical Perception was also found for VOT
distinctions.
• And for stop/glide/vowel distinctions:
10 ms transitions:
[b] percept
60 ms transitions:
[w] percept
200 ms transitions:
[u] percept
Interpretation
• Main idea: in categorical perception, the mind translates
an acoustic stimulus into a phonemic label. (category)
• The acoustic details of the stimulus are discarded in
favor of an abstract representation.
• A continuous acoustic signal:
• Is thus transformed into a series of linguistic units:
The Next Level
• Interestingly, categorical perception is not found for
non-speech stimuli.
• Miyawaki et al: tested perception of an F3 continuum
between /r/ and /l/.
The Next Level
• They also tested perception of the F3 transitions in
isolation.
• Listeners did not perceive these transitions categorically.
The Implications
• Interpretation: we do not perceive speech in the same
way we perceive other sounds.
• “Speech is special”…
• and the perception of speech is modular.
• A module is a special processor in our minds/brains
devoted to interpreting a particular kind of environmental
stimuli.
Module Characteristics
•
You can think of a module as a “mental reflex”.
•
A module of the mind is defined as having the following
characteristics:
1. Domain-specific
2. Automatic
3. Fast
4. Hard-wired in brain
5. Limited top-down access (you can’t “unperceive”)
•
Example: the sense of vision operates modularly.
A Modular Mind Model
central
judgment, imagination,
memory, attention
processes
modules
vision
hearing
touch
speech
transducers
eyes
ears
skin
etc.
external, physical reality
Remember this stuff?
• Speech is a “special” kind of sound because it exhibits
spectral change over time.
•  it’s processed by the speech module, not by the
auditory module.
SWS Findings
• The uninitiated either hear sinewave speech as speech or
as “whistles”, “chirps”, etc.
• Claim: once you hear it as speech, you can’t go back.
• The speech module takes precedence
• (Limited top-down access)
• Analogy: it’s impossible to not perceive real speech as
speech.
• We can’t hear the individual formants as whistles,
chirps, etc.
• Motor theory says: we don’t perceive the “sounds”, we
perceive the gestures which shape the spectrum.
More Evidence for Modularity
• It has also been observed that speech is perceived
multi-modally.
• i.e.: we can perceive it through vision, as well as
hearing (or some combination of the two).
•  We’re perceiving “gestures”
• …and the gestures are abstract.
• Interesting evidence: McGurk Effect
McGurk Effect, revealed
Audio
Visual
Perceived
ba
+
ga

da
ga
+
ba

gba
• Some interesting facts:
• The McGurk Effect is exceedingly robust.
• Adults show the McGurk Effect more than children.
• Americans show the McGurk Effect more than
Japanese.
Original McGurk Data
• Stimulus:
Auditory
Visual
ba-ba
ga-ga
• Response types:
Auditory: ba-ba
Fused:
da-da
Visual:
Combo:
gabga, bagba
ga-ga
Age
Auditory
Visual
Fused
Combo
3-5
19%
36
81
0
7-8
36
0
64
0
0
98
0
18-40 2
Original McGurk Data
• Stimulus:
Auditory
Visual
ga-ga
ba-ba
• Response types:
Auditory: ba-ba
Fused:
da-da
Visual:
Combo:
gabga, bagba
ga-ga
Age
Auditory
Visual
Fused
Combo
3-5
57%
10
0
19
7-8
36
21
11
32
18-40 11
31
0
54
Audio-Visual Sidebar
• Visual cues affect the perception of speech in nonmismatched conditions, as well.
• Scientific studies of lipreading date back to the early
twentieth century
• The original goal: improve the speech perception
skills of the hearing-impaired
• Note: visual speech cues often complement audio
speech cues
• In particular: place of articulation
• However, training people to become better lipreaders
has proven difficult…
• Some people got it; some people don’t.
Sumby & Pollack (1954)
• First investigated the influence of visual information on the
perception of speech by normal-hearing listeners.
• Method:
• Presented individual word tokens to listeners in noise,
with simultaneous visual cues.
• Task: identify spoken word
• Clear:
• +10 dB SNR:
• + 5 dB SNR:
• 0 dB SNR:
Sumby & Pollack data
Auditory-Only
Audio-Visual
• Visual cues provide an intelligibility boost equivalent to
a 12 dB increase in signal-to-noise ratio.
Tadoma Method
• Some deaf-blind people learn to perceive speech
through the tactile modality, by using the Tadoma
method.
Audio-Tactile Perception
• Fowler & Dekle: tested ability of (naive) college students
to perceive speech through the Tadoma method.
• Presented synthetic stops auditorily
• Combined with mismatched tactile information:
• Ex: audio /ga/ + tactile /ba/
• Also combined with mismatched orthographic information:
• Ex: audio /ga/ + orthographic /ba/
• Task: listeners reported what they “heard”
• Tactile condition biased listeners more towards “ba”
responses
Fowler & Dekle data
read “ba”
orthographic
mismatch
condition
felt “ba”
tactile
mismatch
condition
Another Piece of the Puzzle
• Another interesting finding which has been used to
argue for the “speech is special” theory is duplex
perception.
• Take an isolated F3 transition:
and present it to one ear…
Do the Edges First!
• While presenting this spectral frame to the other ear:
Two Birds with
One Spectrogram
• The resulting combo is perceived in duplex fashion:
• One ear hears the F3 “chirp”;
• The other ear hears the combined stimulus as “da”.
Duplex Interpretation
• Check out the spectrograms in Praat.
• Mann and Liberman (1983) found:
• Discrimination of the F3 chirps is gradient when
they’re in isolation…
• but categorical when combined with the spectral
frame.
• (Compare with the F3 discrimination experiment with
Japanese and American listeners)
• Interpretation: the “special” speech processor puts
the two pieces of the spectrogram together.
fMRI data
• Benson et al. (2001)
• Non-Speech stimuli = notes, chords, and chord
progressions on a piano
fMRI data
• Benson et al. (2001)
• Difference in activation for natural speech stimuli versus
activation for sinewave speech stimuli
Mirror Neurons
• In the 1990s, researchers in Italy discovered what they
called mirror neurons in the brains of macaques.
• Macaques had been trained to make grasping motions
with their hands.
• Researchers recorded the activity of single neurons
while the monkeys were making these motions.
• Serendipity:
• the same neurons fired when the monkeys saw the
researchers making grasping motions.
•  a neurological link between perception and action.
• Motor theory claim: same links exist in the human brain,
for the perception of speech gestures
Motor Theory, in a nutshell
•
The big idea:
•
•
We perceive speech as abstract “gestures”, not
sounds.
Evidence:
1. The perceptual interpretation of speech differs
radically from the acoustic organization of speech
sounds
2. Speech perception is multi-modal
3. Direct (visual, tactile) information about gestures can
influence/override indirect (acoustic) speech cues
4. Limited top-down access to the primary, acoustic
elements of speech