CTT samt TMH

Download Report

Transcript CTT samt TMH

Processing the Prosody of
Oral Presentations
Rebecca Hincks
KTH, The Royal Institute of Technology
Department of Speech, Music and Hearing
The Unit for Language and Communication
[1]
English in Sweden
• A second language rather
than a foreign language
• Nearly all beginners are
children
• ASR not appropriate or
necessary for acquisition
of sounds
Rebecca Hincks
[2]
Support for advanced L2 users?
• Vision: Speech checker analogous to a
spellchecker or grammar checker
• Practice an oral presentation, get feedback on:
– Lexicon
– Pronunciation
– Prosody
• Making a presentation can be difficult in a native
language, and is even more difficult in an L2
• Standard advice for how to deliver a presentation–
Use a lively voice, don’t speak too fast, take
pauses
• These qualities can be processed automatically
using speech analysis
Rebecca Hincks
[3]
What is a lively voice?
• A voice that varies in pitch and rhythm
• A voice that shows enthusiasm
• Difficult for native speakers, but more difficult for
non-native speakers
• Studies have shown that non-natives use a more
narrowed pitch range than natives (Pickering 2004)
• Tools for helping speakers increase their liveliness
should be welcomed
• Research Question: How can we measure
liveliness automatically?
Rebecca Hincks
[4]
Corpus of student speech
• Audio recordings of 35 ten-minute presentations in
English made by engineering students
• Recordings made in the classroom
• Selected 10 women and 10 men
– Varied levels of ability in English
– All native speakers of Swedish
• Written feedback on the presentations from
teachers and classmates
• In preparation: listener ratings of liveliness and
fluency
Rebecca Hincks
[5]
Pitch dynamism quotient, PDQ
Standard deviation of F0 in Hertz
PDQ =
Mean F0 in Hertz
• F0 = Fundamental frequency = pitch
• Necessary to normalize the standard
deviation in order to compare voices that are
naturally high or naturally low
Rebecca Hincks
[6]
Time, frequencies and editing
• Between 7 and 10 minutes per person
• Divided in intervals of (1 min, 30 s, 15 s,) 10
seconds
• WaveSurfer’s ESPS settings: 60-400 Hz men, 75600 Hz women
• Have also analyzed at 25-400 Hz men, 25-500 Hz
women
• Visually inspected every contour and edited away
as many errors as possible
Rebecca Hincks
[7]
Mean pitch dynamism quotient for 7-10 minutes of speech
0.25
Females
Males
Mean PDQ
0.20
0.15
0.10
90MN4
89EH4
88TO4
88KS4
85OM4
80NB4
70JH3
69TM2
68HÖ3
64MN2
63NW3
63AJ3
58EL2
58CN3
56PT2
54ML3
52VJ2
50TN2
45GK1
42CS1
0.05
Student, by placement test
Rebecca Hincks
[8]
Three proficient speakers
0.35
Pitch dynamism quotient
0.30
0.25
0.20
0.15
0.10
85OM
0.05
88TO
89EH
0.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Consecutive time periods of 10 seconds
Rebecca Hincks
[9]
Lively speaker 1
0.35
Speaker 85OM4
0.30
PDQ
0.25
0.20
0.15
0.10
0.05
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
“the divergence”
• Mean PDQ: .23
Rebecca Hincks
“well-structured,” “confident,” “easy to
follow,” “very coherent,” and the
speech “well-modulated” and with
“varied intonation.”
[10]
Lively speaker 2
0.35
• Mean PDQ: .21
Speaker 88TO4
0.30
Her presentation was
“well-rehearsed” and
“professional.”
PDQ
0.25
0.20
0.15
0.10
0.05
0.00
1
4
7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
Rebecca Hincks
[11]
0.35
Monotone
speaker
0.30
0.25
PDQ
• Mean PDQ: .12
Speaker 89EH4
0.20
0.15
0.10
“why is voice over IP interesting?
0.05
0.00
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Consecutive time periods of 10 seconds
• Medel PDQ: .12
Rebecca Hincks
Delivery was “a little deadpan,” “more
animated facial expressions would be
good,” and the presentation would be
improved by “showing more
enthusiasm.”
[12]
Selection of files for listening test
Test values; 9 per speaker
• 3 lowest PDQ
• 3 closest to
mean
• 3 highest
0.40
Males
0.35
Females
0.30
PDQ
0.25
0.20
0.15
0.10
0.05
0.00
0
10
20
30
40
50
60
70
80
90
Individual ten-second segment
Rebecca Hincks
[13]
Conclusions
• Normalized standard deviation can be used as a
measure of liveliness in speaking styles used for
oral presentations
• Hypothesis: PDQ values over .15 lively, over .30
very lively, between .20 and .25 a good target
- Different preferences depending on personality and culture?
• Unclear effect of Swedish L1 and of proficiency in
English
• Applications: teaching, presentation skills
• Appropriate feedback: not values but a talking
head that moves from alert to sleepy
Rebecca Hincks
[14]
Thank you for your attention…
Rebecca Hincks
[15]