Transcript Slide 1
Enhancing Conversational Speech Quality
of VoIP in a Wired/Wireless Environment
Illinois Center for
Wireless Systems
Batu Sat and Benjamin W. Wah
Goals
Background
Design of VoIP End Clients
Wireles
s
Public
Internet
• Achieving high and consistent perceptual conversational quality
• Enabling natural and efficient conversation among users
• Real-time adaptation to changing network delay & loss conditions
• Suitable for any communication device using any IP network
Private IP
Network
VoIP
• Providing interactive speech among multiple users
• Utilizing public and private wired/wireless IP networks
• Independent of locations of users and devices used
IP Networks
• Long-haul, WAN, LAN wired/wireless networks
• Non-stationary real-time packet arrivals and losses
• Large disparity in delay and loss behavior among clients
• Complex QoS with multiple IP providers and without cost model
• Quality measured and maintained at end-points
• Better scalability with end-to-end strategies
Conversational Dynamics & Quality
Conversational Dynamics
• Different network delays among clients
• Multiple realities in VoIP in contrast to face-to-face conversation
• Perception of delays and efficiency affected by conversational
switching (turn-taking) frequency
(A & B’s common perspective)
Face-to-face setting:
VoIP setting:
A
B
A
B’
A
B
B’
A
time
A speaks
(A’s perspective)
time
A thinks
(B’s perspective)
A’
B
A’
B
Legend:
time
B speaks
B thinks
MED(AB)
MED(BA)
Conversational Speech Quality
• Multiple dimensions in user perception of quality
• Quality of one-way speech segments
• Naturalness and rhythm of conversation, mutual-silence durations
Trade-Offs
• Trade-offs among mouth-to-ear delay (MED), redundancy, and
amount of packets not received in time for play-out (UCFLR)
Trace
#
Jitter
Loss
2
Low
1.7%
5
-
17%
9
High
0.1%
10
-
33%
• Difficulty under dynamic delay spikes and bursty losses
• With longer MED
• Improved one-way speech quality
• Degraded symmetry and efficiency of interactive conversation
• Trade-off between minimizing pair-wise MED and maintaining a
balance among MEDs perceived by users in a conversation
Challenges
Quality Metrics
• No objective metrics for quantifying conversational speech quality
• Costly non-repeatable subjective tests with full implementation
Design of Play-out Scheduling and Loss Concealment
• Under dynamic packet delays and losses
Proposed Solutions
Collection of Traces on Delays and Losses
• Using Planet-Lab nodes for collecting end-to-end traces
• With packet periods and payloads typical of VoIP applications
Modeling of Two-Party and Multi-Party Conversations
• Utilizing human psychological models when possible
• Subjective tests to obtain parameters for simulating dynamics
Evaluation of Conversational Speech Quality (CSQ)
• Identification of human-observable and system-measurable metrics
• Modeling CSQ as function of these metrics
• Designing human subjective tests
Designing Play-out Scheduling/Loss concealment schemes
• Trade-offs on system measurable and human-observable metrics
• Schemes for real-time collection and relay of network statistics
• Schemes for real-time adaptive POS and LCS
Results
None of the previous algorithms provides consistent balance
between one-way speech quality and conversational interactivity
Our scheme
• Hugging delay curve closely
• Minimizing delay degradations
• Providing good one-way quality
• Maximizing human quality perception