Development of speech categories

Download Report

Transcript Development of speech categories

Acoustic Continua and
Phonetic Categories
Frequency - Tones
Frequency - Tones
Frequency - Tones
Frequency - Tones
Frequency - Complex Sounds
Frequency - Complex Sounds
Frequency - Vowels
• Vowels combine acoustic energy at a number of different
frequencies
• Different vowels ([a], [i], [u] etc.) contain acoustic energy
at different frequencies
• Listeners must perform a ‘frequency analysis’ of vowels in
order to identify them
(Fourier Analysis)
Any function can be decomposed in terms of sinusoidal (=
sine wave) functions (‘basis functions’) of different
frequencies that can be recombined to obtain the original
function. [Wikipedia entry on Fourier Analysis]
Time -->
Amplitude
Frequency
Joseph Fourier (1768-1830)
Frequency - Male Vowels
Frequency - Male Vowels
Frequency - Female Vowels
Frequency - Female Vowels
Synthesized Speech
•Allows for precise control of sounds
•Valuable tool for investigating perception
Timing - Voicing
Voice Onset Time (VOT)
60 msec
English VOT production
• Not uniform
• 2 categories
Perceiving VOT
‘Categorical Perception’
Discrimination
Same/Different
Discrimination
Same/Different
0ms 60ms
Discrimination
Same/Different
0ms 60ms
Same/Different
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
40ms 40ms
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
40ms 40ms
Why is this pair difficult?
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Why is this pair difficult?
(i) Acoustically similar?
Same/Different
40ms 40ms
(ii) Same Category?
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
A More Systematic Test
Why is this pair difficult?
(i) Acoustically similar?
Same/Different
40ms 40ms
(ii) Same Category?
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
40ms 40ms
A More Systematic Test
0ms
20ms
20ms
40ms
40ms
60ms
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
40ms 40ms
A More Systematic Test
D
0ms
20ms
D
D
20ms
40ms
T
T
40ms
60ms
T
Within-Category Discrimination is Hard
Cross-language Differences
R
L
Cross-language Differences
R
R
L
L
Cross-Language Differences
English vs.
Japanese R-L
Cross-Language Differences
English vs. Hindi
alveolar [d]
retroflex [D]
?
Russian
-40ms
-30ms
-20ms
-10ms
0ms
10ms
Kazanina et al., 2006
Proceedings of the National
Academy of Sciences, 103, 11381-6
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
40ms 40ms
A More Systematic Test
D
0ms
20ms
D
D
20ms
40ms
T
T
40ms
60ms
T
Within-Category Discrimination is Hard
Quantifying Sensitivity
Quantifying Sensitivity
• Response bias
• Two measures of discrimination
– Accuracy: how often is the judge correct?
– Sensitivity: how well does the judge distinguish the categories?
• Quantifying sensitivity
– Hits
False Alarms
Misses
Correct Rejections
– Compare p(H) against p(FA)
Quantifying Sensitivity
• Is one of these more impressive?
– p(H) = 0.75, p(FA) = 0.25
– p(H) = 0.99, p(FA) = 0.49
• A measure that amplifies small percentage differences at extremes
z-scores
Normal Distribution
Dispersion
around mean
Standard Deviation
A measure of dispersion
around the mean.
Mean (µ)
Carl Friederich Gauss (1777-1855)
√(
∑(x - µ)2
n
)
The Empirical Rule
1 s.d. from mean: 68% of data
2 s.d. from mean: 95% of data
3 s.d. from mean: 99.7% of data
Normal Distribution
Standard deviation
 = 2.5 inches
Heights of American
Females, aged 18-24
Mean (µ)
65.5 inches
Quantifying Sensitivity
• A z-score is a reexpression of a data point in units of standard
deviations.
(Sometimes also known as standard score)
• In z-score data, µ = 0,  = 1
• Sensitivity score
d’ = z(H) - z(FA)
See Excel worksheet
sensitivity.xls
Quantifying Differences
(Näätänen et al. 1997)
(Aoshima et al. 2004)
(Maye et al. 2002)
Normal Distribution
Dispersion
around mean
Standard Deviation
A measure of dispersion
around the mean.
Mean (µ)
√(
∑(x - µ)2
n
)
The Empirical Rule
1 s.d. from mean: 68% of data
2 s.d. from mean: 95% of data
3 s.d. from mean: 99.7% of data
• If we observe 1 individual, how likely is it
that his score is at least 2 s.d. from the
mean?
• Put differently, if we observe somebody
whose score is 2 s.d. or more from the
population mean, how likely is it that the
person is drawn from that population?
• If we observe 2 people, how likely is it that
they both fall 2 s.d. or more from the mean?
• …and if we observe 10 people, how likely
is it that their mean score is 2 s.d. from the
group mean?
• If we do find such a group, they’re probably
from a different population

• Standard Error
is the Standard Deviation of sample means.

n
• If we observe a group whose mean differs
from the population mean by 2 s.e., how
likely is it that this group was drawn from
the same population?
Development of Speech
Perception in Infancy
Voice Onset Time (VOT)
60 msec
Perceiving VOT
‘Categorical Perception’
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
40ms 40ms
A More Systematic Test
D
0ms
20ms
D
D
20ms
40ms
T
T
40ms
60ms
T
Within-Category Discrimination is Hard
Abstraction
• Representations
– Sound encodings - clearly non-symbolic, but otherwise unclear
– Phonetic categories
– Memorized symbols: /k/ /æ/ /t/
• Behaviors
–
–
–
–
Successful discrimination
Unsuccessful discrimination
‘Step-like’ identification functions
Grouping different sounds
Let’s Learn Inuktitut!
Video: Nunavik: Building on the Knowledge of Ancestors
Vowels
Consonants
Three Classics
Development of Speech Perception
• Unusually well described in past 30 years
• Learning theories exist, and can be tested…
• Jakobson’s suggestion: children add feature contrasts to
their phonological inventory during development
Roman Jakobson, 1896-1982
Kindersprache, Aphasie und allgemeine Lautgesetze, 1941
Developmental Differentiation
Universal
Phonetics
0 months
Native Lg.
Phonetics
6 months
Native Lg.
Phonology
12 months
18 months
#1 - Infant Categorical Perception
Eimas, Siqueland, Jusczyk &
Vigorito, 1971
Discrimination
Same/Different
0ms 60ms
Same/Different
0ms 10ms
Same/Different
40ms 40ms
A More Systematic Test
D
0ms
20ms
D
D
20ms
40ms
T
T
40ms
60ms
T
Within-Category Discrimination is Hard
high amplitude sucking
non-nutritive sucking
English VOT Perception
To Test 2-month olds
High Amplitude
Sucking
Eimas et al. 1971
General Infant Abilities
• Infants’ show Categorical Perception of
speech sounds - at 2 months and earlier
• Discriminate a wide range of speech
contrasts (voicing, place, manner, etc.)
• Discriminate Non-Native speech contrasts
e.g., Japanese babies discriminate r-l
e.g., Canadian babies discriminate d-D
[these findings based mostly on looking/headturn studies w/ 6 month olds]
Universal Listeners
• Infants may be able to discriminate all
speech contrasts from the languages of the
world!
How can they do this?
• Innate speech-processing capacity?
• General properties of auditory system?
What About Non-Humans?
• Chinchillas show categorical perception of
voicing contrasts!
PK Kuhl & JD Miller, Science, 190, 69-72 (1975)
Suitability of Animal Models
More recent findings…
1. Auditory perceptual abilities in macaque
monkeys and humans differ in various ways
2. Discrimination sensitivity for b-p continua is
more fine-grained in (adult) humans (Sinnott &
Adams, JASA, 1987)
3. Sensitivity to cues to r-l distinctions is
different, although trading relations are
observed in humans and macaques alike
(Sinnott & Brown, JASA, 1997)
4. Some differences in vowel sensitivity…
Joan Sinnott, U. of S. Alabama
#2 - Becoming a Native Listener
Werker & Tees, 1984
When does Change Occur?
• About 10 months
Janet Werker
U. of British Columbia
Conditioned Headturn Procedure
When does Change Occur?
• Hindi and Salish
contrasts tested
on English kids
Janet Werker
U. of British Columbia
Conditioned Headturn Procedure
What do Werker’s results show?
• Is this the beginning of efficient memory
representations (phonological categories)?
• Are the infants learning words?
• Or something else?
Korean has [l] & [r]
[rupi]
“ruby”
[kiri]
“road”
[saram]
“person”
[irumi]
“name”
[ratio]
“radio”
[mul]
“water”
[pal]
“big”
[s\ul]
“Seoul”
[ilkop]“seven”
[ipalsa]
“barber”
#3 - What, no minimal pairs?
Stager & Werker, 1997
A Learning Theory…
• How do we find out the contrastive phonemes of a
language?
• Minimal Pairs
Word Learning
• Stager &
Werker 1997
‘bih’ vs. ‘dih’
and
‘lif’ vs. ‘neem’
PRETEST
HABITUATION
TEST
SAME
SWITCH
Word learning results
• Exp 2 vs 4
Why Yearlings Fail on Minimal Pairs
• They fail specifically when the task requires
word-learning
• They do know the sounds
• But they fail to use the detail needed for
minimal pairs to store words in memory
• !!??
One-Year Olds Again
• One-year olds know the surface sound
patterns of the language
• One-year olds do not yet know which
sounds are used contrastively in the
language…
• …and which sounds simply reflect
allophonic variation
• One-year olds need to learn contrasts
Maybe not so bad after all...
• Children learn the feature contrasts of their
language
• Children may learn gradually, adding features
over the course of development
• Phonetic knowledge does not entail
phonological knowledge
Roman Jakobson, 1896-1982
Werker et al. 2002
14 months
17 months
14
0
20 months
17
60
20
300
600
Swingley & Aslin, 2002
• 14-month olds did recognize mispronunciations of familiar
words
Dan Swingley, UPenn
Alternatives to Reviving Jakobson
• Word-learning is very hard for younger children, so detail
is initially missed when they first learn words
• Many exposures are needed to learn detailed word forms at
early stages of word-learning
• Success on the Werker/Stager task seems to be related to
the vocabulary spurt, rapid growth in vocabulary after ~50
words
(Dietrich, Swingley, & Werker 2007)
Exp 1: tam - ta:m
Exp 2: tæm - tæ:m
Exp 3: ta/æm - tem
Length factor ~1.8-2.0
(Dietrich, Swingley, & Werker 2007)
Questions about Development
6-12 Months: What Changes?
Structure Changing
Patricia Kuhl
U. of Washington
Structure Adding
• Evidence for Structure Adding
(i) Some discrimination retained when sounds presented
close together (e.g. Hindi d-D contrast)
(ii) Discrimination abilities better when people hear sounds
as non-speech
(iii) Adults do better than 1-year olds on some sound
contrasts
• Evidence for Structure Changing
(i) No evidence of preserved non-native category
boundaries in vowel perception
Sources of Evidence
• Structure-changing: mostly from vowels
• Structure-adding: mostly from consonants
• Conjecture: structure-adding is correct in domains where
there are natural articulatory (or acoustic) boundaries
[cf. Phillips 2001, Cogn. Sci., 25, 711-731]
So how do infants learn…?
Slides: Swingley 2006, ICIS
Slides: Swingley 2006, ICIS
Slides: Swingley 2006, ICIS
5 hours’ exposure to Mandarin
± human interaction
[2003, Proceedings of the National Academy of Sciences]
Alveo-palatals
affricate
fricative
Jessica Maye, Northwestern U.
• Infants at age 6-8 months are still ‘universal listeners’, cf.
Pegg & Werker (1997)
• Infants trained on bi-modal distribution show ‘novelty
preference’ for test sequence with fully alternating
sequence
• How could the proposal scale up?
60
50
40
E
30
ee
20
10
0
0
50
100
150
200
250
300
400
70
60
50
40
E
ee
sum
30
20
10
0
0
50
100
150
200
250
300
400
p(a) = p(b)
p(a) = 2 x p(b)
1.0
.5
.25
.1
Slides: Swingley 2006, ICIS
Slides: Swingley 2006, ICIS
Fenson et al. 2000
bird
dog
duck
doll
bread
candy
head
dish
radio
outside
feed
today
dark
toast
hat
ants
tooth
table
television
blanket
outside
plant
wait
today
fast
hurt
soft
out
MacArthur Short CDI - 89 items
stroller
kitty
water
babysitter
pretty
patty cake
bottle
kitchen
don’t
night (night)
Fei Xu, UBC
Xu & Carey 1996
10 mo.: no surprise
12 mo.: surprise
--> “10 month olds do not
represent basic sortal/kind
concepts”
Xu 2002
Add words!
9 mo.:
Fulkerson & Waxman 2007
Words
Tones
12 months
µ = .59, p = .007
µ = .53, p = .2
6 months
µ = .63, p < .001
µ = .54, p = .2
Yeung & Werker 2008
Naturally produced Hindi syllables
Dental vs. retroflex
A. Familiarize sound-object links
B. Test sound discrimination only
Exp1: consistent links
Exp2: inconsistent links
Effect of Type (±alternating)
Exp1: F(1,18) = 5.74, p < .05
Exp2: F(1,18) = 0.53, p = .47