Perception and Phonotactics

Download Report

Transcript Perception and Phonotactics

Speech Perception
Richard Wright
Linguistics 453
Class Overview
Physiology
 Auditory Shaping of the signal
 Auditory Cues
 Normalization and Context
 Experiment types

Physiology 1: The Ear
Outer: Pinna, Ear Canal, Ear Drum
 Middle: Ossicles, Oval Window
 Inner: Cochlea — Basilar Membrane, Tectorial

Membrane, Hair Cells
Physiology 1: The Outer Ear
Pinna: directional hearing
 Ear Canal: high frequency emphasis
(very short resonator closed at one end)
 Ear Drum: membrane’s vibrations
convert pressure fluctuations to
mechanical movement

Physiology 1: The Middle Ear
Ossicles (Malleus, Incus, Stapes):
Convert eardrum movement to
movement of oval window —
overcomes air to fluid impedance.
 Lower frequency emphasis (5004000 Hz)
 Lessen impact of very loud noises by
stiffening (damping)

Physiology 1: The Inner Ear

Cochlea: fluid filled cavity, wave propagation in
fluid caused by movement of oval window

Basilar Membrane:stiff and narrow at base

— wide and flaccid at apex: base = high
frequencies and apex = low frequencies (acts like
series of band pass filters). Most of membrane is
devoted to sounds below 5000 Hz.
Shearing between Basilar and Tectorial
membranes displace hair
nerve endings
cells exciting cochlear
Physiology 2: Nerual Pathway
Cochlear Nerve
 Cochlear Nucleus
 Lateral Lemniscus
 Auditory Cortex

Medial
geniculate
CIC
Auditory
raditaions
Cortex
Inferior
coliculus
Probst
Lateral
lemnis cus
Superior
olive
Held
Monakow
Cochlear
nucleus
Cochlear
nerve
Mid-line
Auditory Shaping of the Signal
Frequency Selectivity: Changes in
frequency of stimulus do not result in
equivalent changes in sensitivity
 Non-linear loudness sensitivity
 Phase Locking and noise reduction
 Lateral Inhibition and Tuning
 Onsets and neural spikes

Frequency Selectivity
Onset Advantage
Delgutte and Kiang (1984)
What are Cues?

Cues: information in the signal that
listeners use in recovering the segmental
content of the utterance
–
–
–
–
Place cues
Manner cues
Voicing cues
Vowel quality cues
Distribution of Cues
Place cues
stop release burst
fricative noise
F3
F2
F1
F2 transitions
nasal pole and zero
Distribution of Cues
Manner cues
stop release burst
slope of formant
transitions
nasalization
of vowel
F3
F2
F1
abruptness and
degree of attenuation
fricative noise
nasal pole and zero
Distribution of Cues
Voicing cues
release burst amplitude
vowel duration
aspiration noise
F3
F2
F1
vowel
duration
VOT
stricture
duration
periodicity
Distribution of Cues

Stop release bursts are very brief and
difficult to recover: stops rely on formant
transition cues
Distribution of Cues
Stop release bursts are very brief and
difficult to recover: stops rely on formant
transition cues
 Fricative noise, particularly sibilant,
contains robust cues: fricatives may be
recovered in the absence of formant
transitions

Distribution of Cues
Stop release bursts are very brief and
difficult to recover: stops rely on formant
transition cues
 Fricative noise, particularly sibilant,
contains robust cues: fricatives may be
recovered in the absence of formant
transitions
 Nasals contain strong manner cues but
weak place cues

Onset Advantage
Redundancy advantage:
Onset stops automatically have both a release
burst and a set of formant transitions
Coda stops may be unreleased and therefore
have less cue redundancy
Onset Advantage
Onset consonant with flanking vowels
Experimental Tasks
Identification
 Discrimination
 Rating
 Method of Adjustment (MOA)

Exp.Tasks 1: Identification
Listeners are asked to identify stimuli as
speech sounds...
 Open set: options open
 Forced choice: listeners choices
constrained

Experiment 1: Onset vs Coda

Stimuli
–
–
–
–
male speaker of American English
/ba, da, ga, ab, ad, ag/ bursts excised
16 bit, 22 kHz
mixed in three levels of white noise:
• no noise
• noise at 2 dB above RMS of signal
• noise at 2 dB below RMS of signal
Experiment 1: Onset vs Coda

Task
–
–
–
–
–
onsets & codas mixed and randomized
presented binaurally over headphones
3 way forced choice task: “B D G”
labeled button press
self paced
Exp.Tasks 2: Discrimination
Listeners are asked to respond “same” or
“different” to presented sets of stimuli
 AX discrimination: fixed initial stimulus,
variable second stimulus (same/different)
 ABX discrimination: two fixed initial
stimuli, variable third stimulus (same A,
same B)

Experiment 2: vowel discrimination

Stimuli
– Synthetic vowel continuum
– Equal steps: 2.37 Bark along F1-F2
dimension
– 16 bit, 11 kHz
– variable AX design
Experiment 2: vowel discrimination

Task
–
–
–
–
same/different response to vowel pairs
presented binaurally over headphones
labeled button press
speeded (limited time to decide)
Exp.Tasks 3: Ratings
Listeners are asked to rate a stimulus in
some way: goodness, similarity,
accentedness
 Example: Effect of intonational contour
on naturalness: listeners hear sentences
with and without f0 contour and rate
naturalness on a 1-5 scale.

Exp.Tasks 4: MOA

Listeners are asked to adjust a stimulus
along some dimensions until it fits some
criterion: matches another stimulus,
sounds most natural, matches a category,
etc. (can be identification, discrimination,
or rating exp.)
Advantages and shortcomings 1

Open identification
– Good: most natural, subjects understand
– Bad: time consuming, little control of variables, stats difficult
(non-comparable resoponses across subjects

Forced choice identification
– Good: less time consuming, control of response variables
– Bad: not as natural
Advantages and shortcomings 2

Discrimination
– Good: allows experimenter to map relationship between
classification and discrimination
– Bad: very time consuming, not at all natural, unintuitive to
subjects
Advantages and shortcomings 3

Rating
– Good: allows experimenter to map preferences in a
multidimensional space, allows for correlation between one or
more aspects of stimulus
– Bad: hard to control interactions between preferences and
stimulus variables, not that natural
Advantages and shortcomings 4

Method of adjustment (MOA)
– Good: much quicker method of mapping multidimensional
perceptional
– Bad: not natural, complex interaction of stimulus variables