15-Motor-Theory

Download Report

Transcript 15-Motor-Theory

Motor Theory +
Signal Detection Theory
March 23, 2010
Oh Yeahs.
• Nasometer labs due!
• Dental vs. alveolar vs. bilabial release bursts.
• Examples from Yanyuwa:
• Examples from Hindi:
• Creating synthetic formant transitions: KlattTalk.
KlattTalk
• KlattTalk has since become the standard for formant
synthesis. (DECTalk)
http://www.asel.udel.edu/speech/tutorials/synthesis/vowels.html
Categorical Perception
• Categorical perception =
• continuous physical distinctions are perceived in
discrete categories.
• In the in-class perception experiment:
• There were 11 different syllable stimuli
• They only differed in the locus of their F2 transition
• F2 Locus range = 726 - 2217 Hz
Source: http://www.ling.gu.se/~anders/KatPer/Applet/index.eng.html
Stimulus #1
Stimulus #6
Stimulus #11
Example
stimuli from
the in-class
experiment.
Identification
• In Categorical Perception:
• All stimuli within a category boundary should be
labeled the same.
Discrimination
• Original task: ABX
discrimination
• Stimuli across
category boundaries
should be 100%
discriminable.
• Stimuli within
category boundaries
should not be
discriminable at all.
In practice, categorical perception means: the
discrimination function can be determined from the
identification function.
Identification  Discrimination
• Let’s consider a case where the two sounds in a
discrimination pair are the same.
• Example: the pair is stimulus 3 followed by stimulus 3
• Identification data--Stimulus 3 is identified as:
• [b] 95% of the time
• [d] 5% of the time
• The discrimination pair will be perceived as:
• [b] - [b]
-
.95 * .95 = .9025
• [d] - [d]
-
.05 * .05 = .0025
• Probability of same response is predicted to be:
• (.9025 + .0025) = .905 = 90.5%
Identification  Discrimination
• Let’s consider a case where the two sounds in a
discrimination pair are different.
• Example: the pair is stimulus 9 followed by stimulus 11
• Identification data:
• Stimulus 9: [d] 80% of the time, [g] 20% of the time
• Stimulus 11: [d] 5% of the time, [g] 95% of the time
• The discrimination pair will be perceived as:
• [d] - [d]
-
.80 * .05 = .04
• [g] - [g]
-
.20 * .95 = .19
• Probability of same response is predicted to be:
• (.04 + .19) = 23%
Discrimination
• In this discrimination
graph-• Solid line is the
observed data
• Dashed line is the
predicted data
(on the basis of the
identification scores)
Note: the actual listeners did a little bit better than the
predictions.
Categorical, Continued
• Categorical Perception was also found for
stop/glide/vowel distinctions:
10 ms transitions:
[b] percept
60 ms transitions:
[w] percept
200 ms transitions:
[u] percept
Interpretation
• Main idea: in categorical perception, the mind translates
an acoustic stimulus into a phonemic label. (category)
• The acoustic details of the stimulus are discarded in
favor of an abstract representation.
• A continuous acoustic signal:
• Is thus transformed into a series of linguistic units:
The Next Level
• Interestingly, categorical perception is not found for
non-speech stimuli.
• Miyawaki et al: tested perception of an F3 continuum
between /r/ and /l/.
The Next Level
• They also tested perception of the F3 transitions in
isolation.
• Listeners did not perceive these transitions categorically.
The Implications
• Interpretation: we do not perceive speech in the same
way we perceive other sounds.
• “Speech is special”…
• and the perception of speech is modular.
• A module is a special processor in our minds/brains
devoted to interpreting a particular kind of environmental
stimuli.
Module Characteristics
•
You can think of a module as a “mental reflex”.
•
A module of the mind is defined as having the following
characteristics:
1. Domain-specific
2. Automatic
3. Fast
4. Hard-wired in brain
5. Limited top-down access (you can’t “unperceive”)
•
Example: the sense of vision operates modularly.
A Modular Mind Model
central
judgment, imagination,
memory, attention
processes
modules
vision
hearing
touch
speech
transducers
eyes
ears
skin
etc.
external, physical reality
Remember this stuff?
• Speech is a “special” kind of sound because it exhibits
spectral change over time.
•  it’s processed by the speech module, not by the
auditory module.
SWS Findings
• The uninitiated either hear sinewave speech as speech or
as “whistles”, “chirps”, etc.
• Claim: once you hear it as speech, you can’t go back.
• The speech module takes precedence
• (Limited top-down access)
• Analogy: it’s impossible to not perceive real speech as
speech.
• We can’t hear the individual formants as whistles,
chirps, etc.
• Motor theory says: we don’t perceive the “sounds”, we
perceive the gestures which shape the spectrum.
McGurk Videos
McGurk Effect explained
Audio
Visual
Perceived
ba
+
ga

da
ga
+
ba

ba (bga)
• Some interesting facts:
• The McGurk Effect is exceedingly robust.
• Adults show the McGurk Effect more than children.
• Americans show the McGurk Effect more than
Japanese.
Original McGurk Data
• Stimulus:
Auditory
Visual
ba-ba
ga-ga
• Response types:
Auditory: ba-ba
Fused:
da-da
Visual:
Combo:
gabga, bagba
ga-ga
Age
Auditory
Visual
Fused
Combo
3-5
19%
36
81
0
7-8
36
0
64
0
0
98
0
18-40 2
Original McGurk Data
• Stimulus:
Auditory
Visual
ga-ga
ba-ba
• Response types:
Auditory: ba-ba
Fused:
da-da
Visual:
Combo:
gabga, bagba
ga-ga
Age
Auditory
Visual
Fused
Combo
3-5
57%
10
0
19
7-8
36
21
11
32
18-40 11
31
0
54
Audio-Visual Sidebar
• Visual cues affect the perception of speech in nonmismatched conditions, as well.
• Scientific studies of lipreading date back to the early
twentieth century
• The original goal: improve the speech perception
skills of the hearing-impaired
• Note: visual speech cues often complement audio
speech cues
• In particular: place of articulation
• However, training people to become better lipreaders
has proven difficult…
• Some people got it; some people don’t.
Sumby & Pollack (1954)
• First investigated the influence of visual information on the
perception of speech by normal-hearing listeners.
• Method:
• Presented individual word tokens to listeners in noise,
with simultaneous visual cues.
• Task: identify spoken word
• Clear:
• +10 dB SNR:
• + 5 dB SNR:
• 0 dB SNR:
Sumby & Pollack data
Auditory-Only
Audio-Visual
• Visual cues provide an intelligibility boost equivalent to
a 12 dB increase in signal-to-noise ratio.
Tadoma Method
• Some deaf-blind people learn to perceive speech
through the tactile modality, by using the Tadoma
method.
Audio-Tactile Perception
• Fowler & Dekle: tested ability of (naive) college students
to perceive speech through the Tadoma method.
• Presented synthetic stops auditorily
• Combined with mismatched tactile information:
• Ex: audio /ga/ + tactile /ba/
• Also combined with mismatched orthographic information:
• Ex: audio /ga/ + orthographic /ba/
• Task: listeners reported what they “heard”
• Tactile condition biased listeners more towards “ba”
responses
Fowler & Dekle data
read “ba”
orthographic
mismatch
condition
felt “ba”
tactile
mismatch
condition
fMRI data
• Benson et al. (2001)
• Non-Speech stimuli = notes, chords, and chord
progressions on a piano
fMRI data
• Benson et al. (2001)
• Difference in activation for natural speech stimuli versus
activiation for sinewave speech stimuli
Mirror Neurons
• In the 1990s, researchers in Italy discovered what they
called “mirror neurons” in the brains of macaques.
• Macaques had been trained to make grasping motions
with their hands.
• Researchers recorded the activity of single neurons
while the monkeys were making these motions.
• Serendipity:
• the same neurons fired when the monkeys saw the
researchers making grasping motions.
•  a neurological link between perception and action.
• Motor theory claim: same links exist in the human brain,
for the perception of speech gestures
Motor Theory, in a nutshell
• The big idea:
• We perceive speech as abstract “gestures”, not
sounds.
• Evidence:
• The perceptual interpretation of speech differs
radically from the acoustic organization of speech
sounds
• Speech perception is multi-modal
• Direct (visual, tactile) information about gestures can
influence/override indirect (acoustic) speech cues
• Limited top-down access to the primary, acoustic
elements of speech