18-Auditory-Percepti..
Download
Report
Transcript 18-Auditory-Percepti..
Auditory Perception
April 9, 2009
Auditory vs. Acoustic
•
So far, we’ve seen two different auditory measures:
1. Mels (unit of perceived pitch)
•
Auditory correlate of Hertz (frequency)
2. Sones (unit of perceived loudness)
•
•
Auditory correlate of decibels (intensity)
Both were derived from pitch and loudness estimation
experiments…
Masking
• Another scale for measuring auditory frequency
emerged in the 1960s.
• This scale was inspired by the phenomenon of auditory
masking.
• One sound can “mask”, or obscure, the perception of
another.
• Unmasked:
• Masked:
• Q: How narrow can we make the bandwidth of the
noise, before the sinewave becomes perceptible?
• A: Masking bandwidth is narrower at lower frequencies.
Critical Bands
• Using this methodology, researchers eventually
determined that there were 24 critical bands of hearing.
• The auditory system integrates all acoustic energy
within each band.
• Two tones within the same critical band of
frequencies sound like one tone
• Ex: critical band #9 ranges from 920-1080 Hz
• F1 and F2 for
might merge together
• Each critical band 0.9 mm on the basilar membrane.
• The auditory system consists of 24 band-pass filters.
• Each filter corresponds to one unit on the Bark scale.
Bark Scale of Frequency
• The Bark scale converts acoustic frequencies into
numbers for each critical band
Bark Table
Band
Center Bandwidth
Band
Center Bandwidth
1
50
20-100
13
1850
1720-2000
2
150
100-200
14
2150
2000-2320
3
250
200-300
15
2500
2320-2700
4
350
300-400
16
2900
2700-3150
5
450
400-510
17
3400
3150-3700
6
570
510-630
18
4000
3700-4400
7
700
630-770
19
4800
4400-5300
8
840
770-920
20
5800
5300-6400
9
1000
920-1080
21
7000
6400-7700
10
1170
1080-1270
22
8500
7700-9500
11
1370
1270-1480
23
10500
9500-12000
12
1600
1480-1720
24
13500
12000-15500
Your Grandma’s Spectrograph
• Originally, spectrographic analyzing filters were
constructed to have either wide or narrow bandwidths.
Spectral Differences
• Acoustic vs. auditory spectra of
F1 and F2
Cochleagrams
• Cochleagrams are spectrogram-like representations
which incorporate auditory transformations for both pitch
and loudness perception
• Acoustic spectrogram vs. auditory cochleagram
representation of Cantonese word
• Check out Peter’s vowels in Praat.
Cochlear Implants
• Cochlear implants transmit sound directly to the
cochlea through a series of band-pass filters…
• like the critical bands in our native auditory system.
• These devices can benefit profoundly deaf listeners with
nerve deafness.
• = loss of working hair cells in the inner ear.
• Contrast with: a hearing aid, which is simply an amplifier.
• Old style: amplifies all frequencies
• New style: amplifies specific frequencies, based on a
listener’s particular hearing capabilities.
Cochlear Implants
A Cochlear Implant artificially stimulates the nerves
which are connected to the cochlea.
Nuts and Bolts
•
The cochlear implant chain of events:
1. Microphone
2. Speech processor
3. Electrical stimulation
•
What the CI user hears is entirely determined by the
code in the speech processor
•
Number of electrodes stimulating the cochlea ranges
between 8 to 22.
•
•
poor frequency resolution
Also: cochlear implants cannot stimulate the low
frequency regions of the auditory nerve
Noise Vocoding
• The speech processor operates like a series of critical
bands.
• It divides up the frequency scale into 8 (or 22) bands and
stimulates each electrode according to the average
intensity in each band.
This results in what sounds (to us) like a highly degraded
version of natural speech.
What CIs Sound Like
• Check out some nursery rhymes which have been
processed through a CI simulator:
CI Perception
• One thing that is missing from vocoded speech is F0.
• …It only encodes spectral change.
• Last year, Aaron Byrnes put together an experiment
testing intonation perception in CI-simulated speech for
his honors thesis.
• Tested: discrimination of questions vs. statements
• And identification of most prominent word in a
sentence.
• 8 channels:
• 22 channels:
The Findings
• CI User:
• Excellent identification of the most prominent word.
• At chance (50%) when distinguishing between
statements and questions.
• Normal-hearing listeners (hearing simulated speech):
• Good (90-95%) identification of the prominent word.
• Not too shabby (75%) at distinguishing statements
and questions.
• Conclusion 1: F0 information doesn’t get through the CI.
• Conclusion 2: Noise-vocoded speech might not be a
completely accurate CI simulation.
Mitigating Factors
• The amount of success with Cochlear Implants is highly
variable.
• Works best for those who had hearing before they
became deaf.
• The earlier a person receives an implant, the better they
can function with it later in life.
• Works best for (in order):
• Environmental Sounds
• Speech
• Speaking on the telephone (bad)
• Music (really bad)
Practical Considerations
• It is largely unknown how well anyone will perform with a
cochlear implant before they receive it.
• Possible predictors:
• lipreading ability
• rapid cues for place are largely obscured by the
noise vocoding process.
• fMRI scans of brain activity during presentation of
auditory stimuli.
Infrared Implants?
• Some very recent research
has shown that cells in the
inner ear can be activated
through stimulation by
infrared light.
• This may enable the eventual development of cochlear
implants with very precise frequency and intensity tuning.
• Another research strategy is that of trying to regrow hair
cells in the inner ear.
One Last Auditory Thought
• Frequency
coding of
sound is
found all the
way up in
the auditory
cortex.
• Also: some
neurons
only fire
when
sounds
change.
A Philosophical Interlude
•
Q: What’s a category?
•
A classical answer:
•
•
All members of the category exhibit the same properties.
•
•
A category is defined by properties.
No non-members of the category exhibit all of those
properties.
The properties of any member of the category may be
split into:
•
Definitive properties
•
Incidental properties
Classical Example
•
A rectangle (in Euclidean geometry) may be defined as
having the following properties:
1. Four-sided, two-dimensional figure (quadrilateral)
2. Four right angles
This is a rectangle.
Classical Example
•
Adding a third property gives the figure a different
category classification:
1. Four-sided, two-dimensional figure (quadrilateral)
2. Four right angles
3. Four equally long sides
This is a square.
Classical Example
•
Altering other properties does not change the category
classification:
1. Four-sided, two-dimensional figure (quadrilateral)
2. Four right angles
definitive properties
3. Four equally long sides
This is still a square.
A. Is red.
incidental property
Classical Linguistic Categories
• Formal phonology traditionally defined all possible speech
sounds in terms of a limited number of properties, known
as “distinctive features”. (Chomsky + Halle, 1968)
[d] = [CORONAL, +voice, -continuant, -nasal, etc.]
[n] = [CORONAL, +voice, -continuant, +nasal, etc.]
…
• Similar approaches have been applied in syntactic
analysis. (Chomsky, 1974)
Adjectives
= [+N, +V]
Prepositions
= [-N, -V]
Prototypes
•
The psychological reality of classical categories was
called into question by a series of studies conducted by
Eleanor Rosch in the 1970s.
•
Rosch claimed that categories were organized around
privileged category members, known as prototypes.
•
•
(instead of being defined by properties)
Evidence for this theory initially came from linguistic tasks:
1. Semantic verification (Rosch, 1975)
•
Is a robin a bird?
•
Is a penguin a bird?
2. Category member naming.
Prototype Category Example:
“Bird”
Exemplar Categories
• Cognitive psychologists in the late ‘70s (e.g., Medin &
Schaffer, 1978) questioned the need for prototypes.
• Phenomena explained by prototype theory could be
explained without recourse to a category prototype.
• The basic idea:
• Categories are defined by extension.
• Neither prototypes nor properties are necessary.
• Categorization works by comparing new tokens to all
exemplars in memory.
• Generalization happens on the fly.
A Category, Exemplar-style
“square”
Back to Perception
• When people used to talk about categorical perception,
they meant perception of classical categories.
• A stop is either a [b] or a [g]
• (no in between)
• Remember: in classical categories, there are:
• definitive properties
• incidental properties
• Q: What are the properties that define a stop category?
• The definitive properties must be invariant.
• (shared by all category members)
• So…what are the invariant properties of stop categories?
The Acoustic Hypothesis
• People have looked long and hard for invariant acoustic
properties of stops, with little success.
• (and some people are still looking)
• Frequency
values of compact
(synthetic) bursts
cueing different
places of
articulation, in
various vowel
contexts.
(Liberman et al., 1952)
Theoretical Revision
• Since invariant acoustic properties could not be found
(especially for velars)…
• It was assumed that listeners perceived (articulatory)
gestures, not (acoustic) sounds.
• Q: What invariant articulatory properties define stop
categories?
• A: If they exist, they’re hard to find.
• Motor Theory Revision #2: Listeners perceive “intended”
gestures.
• Note: “intentions” are kind of impossible to observe.
• But they must be invariant…right?
Another Brick in the Wall
• Another problem for motor theory:
• Perception of speech sounds isn’t always categorical.
• In particular: vowels are perceived in a more gradient
fashion than stops.
• However, vowel perception becomes more categorical
when the vowels are extremely short.
• It’s also hard to
identify any
invariant acoustic
properties for
vowels.
• Variation is
rampant across:
• tokens
• speakers
• genders
• dialects
• age groups, etc.
• Variability = a huge problem for speech perception.
More Problems
• Also: infants exhibit categorical perception, too…
• Even though they don’t know category labels.
• Chinchillas can do it, too!
An Alternative
• It has been proposed that phoneme categories are
defined by prototypes…
• which we use to identify vowels in speech.
• One relevant finding: the perceptual magnet effect.
• Part 1: play listeners a continuum of synthetic vowels in
the neighborhood of [i].
• Task: judge how much each one sounds like [i].
• Some are better = prototypical
• Others are worse = non-prototypes
Perceptual Magnets
Same? Different?
• Part 2: define either a
prototype or a nonprototype as a category
center.
• Task: determine
whether other vowels on
the continuum belong to
those categories.
• Result: more same responses when the category center
is a prototype.
• Prototype = a “perceptual magnet”
Prototypes, continued
• The perceptual magnet prototypes are usually located at
a listener’s average F1 and F2 values for [i].
• 4-month olds exhibit the perceptual magnet effect…
• but monkeys do not.
• Note: the prototype is the only thing that has to be
“invariant” about the category.
• particular properties aren’t important.
• Testing a prototype model on the Peterson & Barney data
yielded 51% correct classification.
• (Human listeners got 94% correct)
• Variability is still hard to deal with.
Flipping the Script
• Another approach to speech perception is to preserve all
variability that we hear…
• Rather than boiling it down to properties or prototypes.
• In this model, speech categories are defined by
extension.
• = consist of exemplars
• So, your mental representaton of /b/ consists of every
token of /b/ you’ve ever heard in your life.
• …rather than any particular acoustic or articulatory
properties.
• Analogy: phonetics field project notes
• (your mind is a pack rat)
Exemplar Categorization
1. Stored memories of speech experiences are known as
traces.
•
Each trace is linked to a category label.
2. Incoming speech tokens are known as probes.
3. A probe activates the traces it is similar to.
•
Note: amount of activation is proportional to similarity
between trace and probe.
•
Traces that closely match a probe are activated a lot;
•
Traces that have no similarity to a probe are not
activated much at all.
• A (pretend)
example: traces =
vowels from the
Peterson & Barney
data set.
probe
• Activation of each
trace is proportional
to distance (in
vowel space) from
the probe.
low activation
highly
activated
traces
*
Echoes from the Past
• The combined average of activations from exemplars in
memory is summed to create an echo of the perceptual
system.
• This echo is more general features than either the
traces or the probe.
• Inspiration: Francis Galton
Exemplar Categorization II
• For each category label…
• The activations of the traces linked to it are summed
up.
• The category with the most total activation wins.
• Note: we use all exemplars in memory to help us
categorize new tokens.
• Also: any single trace can be linked to different kinds of
category labels.
• Test: Peterson & Barney vowel data
• Exemplar model classified 81% of vowels correctly.
Exemplar Predictions
• Point: all properties of all exemplars play a role in
categorization…
• Not just the “definitive” ones.
• Prediction: non-invariant properties of speech categories
should have an effect on speech perception.
• E.g., the voice in which a [b] is spoken.
• Or even the room in which a [b] is spoken.
• Is this true?
• Let’s find out…
Another Experiment!
• Circle whether each word is a new or old word in the list.
1.
9.
17.
2.
10.
18.
3.
11.
19.
4.
12.
20.
5.
13.
21.
6.
14.
22.
7.
15.
23.
8.
16.
24.
Another Experiment!
• Circle whether each word is a new or old word in the list.
25.
33.
26.
34.
27.
35.
28.
36.
29.
37.
30.
38.
31.
39.
32.
40.
Continuous Word Recognition
• In a “continuous word recognition” task, listeners hear a
long sequence of words…
• some of which are new words in the list, and some of
which are repeats.
• Task: decide whether each word is new or a repeat.
• Twist: some repeats are presented in a new voice;
• others are presented in the old (same) voice.
• Finding: repetitions are identified more quickly and more
accurately when they’re presented in the old voice.
(Palmeri et al., 1993)
• Implication: we store voice + word info together in
memory.