QoS Measurement and Management for VoIP

Transcript QoS Measurement and Management for VoIP

QoS Measurement and
Management for VoIP
Wenyu Jiang
IRT Lab
March 5, 2003
Introduction to VoIP &
IP Telephony

Transport of voice packets over IP networks
 Cost savings
– Consolidates voice and data networks
– Avoids leased lines, long-distance toll calls

Smart and new services
– Call management (filtering, TOD forwarding): CPL
– Better than PSTN quality: wide-band codecs

Protocols and Standards
– Signaling: SIP (IETF), H.323 (ITU-T)
– Transport: RTP/RTCP (IETF)
Practical Issues in VoIP

Quality of Service (QoS)
– Internet is a best-effort network
 Loss, delay and jitter
 Users expect at least PSTN quality for VoIP!

Ease of deployment
– Requires seamless integration with legacy
networks (PSTN/PBX)
– Security is a must

High yardstick of service availability
– Can your network achieve 99.999% up time?
Outline

QoS measurement
– Objective vs. subjective metrics
– Automated measurement of subjective quality

QoS management: improving your quality
– End-to-End: FEC, LBR, PLC
– Network provisioning: voice traffic aggregation

Reality check
– Performance of end-points (IP phones, …)
– Deployment issues in VoIP
– Evaluation of VoIP service availability through
Internet measurement
Workings of a VoIP Client

Audio is packetized, encoded and transmitted
 Forward error correction (FEC) may be used
to recover lost packets
 Playout control smoothes out jitter to
minimize late losses; coupled with FEC
 Packet loss concealment (PLC)
– Last line of “defense” after FEC and playout
multimedia
packets with FEC
Internet
added
loss, jitter
FEC
recovery
unrecoverable
losses by FEC
playout
delay
control
FEC affects playout control
added
late
losses
loss
concealment
& decoding
LBR: An Alternative to FEC
An (n,k) block FEC code can recover  n-k losses
 Low Bit-rate Redundancy (LBR)

– Transmit a lower bit-rate version of original audio
– No notion of “blocks”
– Not bit-exact recovery
transmission time
FEC block 1
A
C
B
A
E
D
B
FEC block 2
C
F
D
FEC data
FEC data
transmission time
A
LBR data
B
C
a'
D
b'
E
c'
F
d'
Objective QoS Metrics: Loss

Internet packet loss is often bursty
– May worsen voice quality than random (Bernoulli) loss

Characterization of packet loss
– 2-state Markov (Gilbert) model: conditional loss prob.
p
1-q = p c
1-p
0
1
(loss)
(non-loss)
q
– More detailed models, but more states!


Extended Gilbert model, nth order Markov model
Hidden Markov model, Gilbert-Elliot model, inter-loss distance
– More states  Larger test set, loss of big picture, and


Adaptive applications can trade-off model accuracy for fast feedback
Gilbert model provides an acceptable compromise
Effect of Gilbert Loss Model

Loss burst distribution of a packet trace
– Roughly, though not exactly exponential

Loss burstiness on FEC performance
– FEC less efficient under bursty loss
3
Packet trace
Gilbert model
2.5
p_f: final loss% after FEC
number of occurrences
1000
100
Gilbert
Bernoulli
2
1.5
10
1
1
0.5
0.1
0
2
4
6
8
Loss burst length
10
12
0
10
20
30
40
conditional loss p_c (%)
50
60
Objective QoS Metrics: Delay
Complementary Conditional CDF (C3DF)
f (t )  P[di  t | di l  t ], lag l  1,2,3,..., di : delay of packet i
– More descriptive than auto-correlation function (ACF)
– Delay correlation rises rapidly beyond a threshold
– Approximates conditional late loss probability
1
unconditional
0.8
y: probability

lag=1
0.6
lag=2
0.4
lag=3
lag=5
0.2
lag=20
0
0
lag=10
0.05
0.1
0.15
x: delay (sec)
0.2
0.25
0.3
Subjective QoS Metrics

Perceived quality
MOS Grade
– Mean Opinion Score (MOS) Excellent
 ITU-T P.800/830
Good
 Obtained via listening tests
Fair
Poor
– MOS variations
Bad
 DMOS (Degradation)
 CMOS (Comparison)
 MOSc (Conversational): considers delay
 A/B preference

Pros: more meaningful to end users
 Cons: time consuming, labor intensive
Score
5
4
3
2
1
Effect of Loss Model on
Perceived Quality

Codec: G.729 (8kb/s ITU std)
 Random (Bernoulli) vs. bursty (Gilbert) loss
– Bursty  lower MOS
– True even when FEC or LBR is used
Effect of random vs. bursty loss on MOS quality
4.5
random (Bernoulli) loss
bursty (Gilbert) loss
4
4.5
4
MOS
3.5
MOS
random vs. bursty loss on FEC (G.723.1) quality
5
3.5
3
3
2.5
2
0.02
2.5
0.04
0.06
0.08
0.1
loss probability
0.12
2
0.02
FEC (3,2) (Gilbert)
FEC (3,2) (Bernoulli)
0.04
0.06
0.08
0.1
loss probability
0.12
Going Further: Bridging
Objective and Subjective Metrics

The E-model (ITU-T G.107/108)
– Originally for telephone network planning
– Considers various impairments
– Reduces to delay and loss impairment when adapted for
VoIP

Objective quality estimation algorithms
– Suitable when network stats is not available, e.g.,
phone-to-phone service with IP in between.
– Speech recognition performance may be used as a
quality predictor, by comparing with original text
The E-model
Map from loss and delay to
impairment scores (Ie, Id)
 Compute a gross score (R
value) and map to MOSc
 Limited number of codec
loss impairment mappings
35
E-model Id
50
45
Ie (loss impairment)

40
35
30
25
20
15
10
0
25
3.5
Id (delay impairment)
4
MOS
R to MOS mapping
3
2.5
15
2
10
1.5
5
1
0
0.18
4.5
30
20
G.729 T=20ms random loss
0.03 0.06 0.09 0.12 0.15
average loss probability
0
50 100 150 200 250 300 350 400
delay (ms)
0.5
20
40
60
R value
80
100
Using Speech Recognition to
Predict MOS

Evaluation of automatic speech recognition
(ASR) based MOS prediction
– IBM ViaVoice Linux version
– Codec used: G.729
– Performance metric
 absolute word recognition ratio
# of correctly recognized words
Rabs 
total # of spoken wor ds
 relative word recognition ratio
Rabs( p)
Rrel ( p) 
, p is loss probabilit y
Rabs(0%)
Recognition Ratio vs. MOS
Impact of packet loss on audio quality
3.6

Both MOS and Rabs
decrease w.r.t. loss
 Then, eliminate
middle variable p
3.2
MOS
3
2.8
2.6
2.4
2.2
2
mapping from speech recognition performance to MOS
3.8
speech recognition performance
3.6
2
4
6
8
10
loss rate (%)
12
14
16
word recognition ratio (%)
40
3.2
MOS
0
Impact of packet loss on automatic speech recognition
44
G.729 codec
42
3.4
38
3
36
2.8
34
2.6
32
2.4
30
2.2
2
28
G.729 codec
3.4
28
30
32
34
36
38
40
word recognition ratio (%)
42
44
0
2
4
6
8
10
loss rate (%)
12
14
16
Speaker Dependency
3.8

Absolute performance
is speaker-dependent
 But relative word
recognition ratio is not
 Suitable for MOS
prediction
3.4
MOS
3.2
3
2.8
2.6
2.4
2.2
2
0.65
0.7
0.75
0.8
0.85
0.9
0.95
relative word recognition ratio R_rel
1
1
relative word recognition ratio R_rel
0.9
Speaker A
Speaker B
Speaker C
Speaker A
Speaker B
Speaker C
0.95
word recognition ratio
0.8
0.7
0.6
0.9
0.85
0.5
0.4
0.8
0.75
0.3
0.2
Speaker A
Speaker B
Speaker C
3.6
0
2
4
6
8
10
12
packet loss probability p (%)
14
16
0.7
0.65
0
2
4
6
8
10
12
packet loss probability p (%)
14
16
Summary of QoS
Measurement

Loss burstiness:
– Affects (generally worsens) perceived quality as well
as FEC performance
– May be described with, e.g., a Gilbert model

Delay correlation:
– Increases rapidly beyond a threshold, revealed through
Complementary Conditional CDF (C3DF)
– Late losses are also bursty

Perceived quality (MOS) estimation
– Analytical: the E-model
– If network statistics N/A: relative word recognition
ratio can provide speaker-independent MOS prediction
Outline

QoS measurement
– Objective vs. subjective metrics
– Automated measurement of subjective quality

QoS management: improving your quality
– End-to-End: FEC, LBR, PLC
– Network provisioning: voice traffic aggregation

Reality check
– Performance of VoIP end-points (IP phones, …)
– Deployment issues in VoIP
– Evaluation of VoIP service availability through Internet
measurement
Quality of FEC vs. LBR

FEC is substantially and consistently better
– At comparable bandwidth overhead
– Across all codec configurations tested
FEC vs. LBR based on G.723.1
FEC vs. LBR based on AMR
4
4
3.5
3.5
MOS
4.5
MOS
4.5
3
2.5
2
0.02
3
J: FEC (2,1)
I: G.723.1 LBR
0.04
0.06
0.08
loss probability
2.5
0.1
0.12
G.729+G.723.1 LBR
2
0.02
N: AMR12.2+FEC (3,2)
M: AMR12.2+6.7 LBR
0.04
0.06
0.08
loss probability
AMR LBR
0.1
0.12
Quality of FEC under Bursty
Loss
Packet interval T has a stronger effect on
MOS with FEC than without FEC
MOS (Mean Opinion Score)

conditional loss probability p_c = 30%
4.5
T=40ms, FEC
4
T=20ms, FEC
3.5
T=20ms
0.5-0.6 MOS
T=40ms
3
2.5
2
0.02
0.04
0.06
0.08
0.1
0.12
p_u (overall loss rate)
0.14
0.16
0.18
FEC MOS Optimization
Considering Delay Effect
Larger T  FEC efficiency, but delay 
 Optimizing T with the E-model

– Calculate final loss probability after FEC, apply delay impairment
of FEC, map to MOSc

Prediction close to FEC MOS test results
– Suitable for analytical perceived quality prediction
FEC MOS prediction, p_c=30%
FEC MOS optimization, Id != 0, d=3*T
4
4.2
p_u=4%
p_u=8%
p_u=12%
p_u=16%
E-model prediction T=40ms
real MOS test T=40ms
4
3.8
3.5
MOS_c
MOS_c
3.6
3.4
3
3.2
3
2.5
2.8
2.6
2
20
40
60
80
100 120
packet interval T (ms)
140
160
180
2.4
0
2
4
6
8
10
original loss rate (%)
12
14
16
Trade-off Analysis between
Codec Robustness and FEC
3 loss repair options
– FEC, LBR, PLC

Loss-resilient codec
– Better PLC
 iLBC (IETF)
– But more bit-rates
– Better than FEC?
4
iLBC 14kb/s
G.729 8kb/s
G.723.1 6.3kb/s
3.5
MOS

3
2.5
2
1.5
0
0.03 0.06 0.09 0.12 0.15
average loss probability
Observations and Results

When considering delay:
– iLBC is usually preferred in low loss conditions
– G.729 or G.723.1 + FEC better for high loss

Example: max bandwidth 14 kb/s
– Consider delay impairment (use MOSc)
4
iLBC,no FEC
G.729+(5,3)
G.723.1+(2,1),T=60ms
3.8
MOS_c
3.6
MOS_c
3.4
3.2
3
2.8
2.6
2.4
0
0.03
0.06
0.09
0.12
average loss probability
0.15
4
iLBC Max BW: 14 kb/s
3.8
G.723.1+(2,1),T=60ms
3.6
3.4
3.2 G.729+(5,3)
3
2.8
2.6
2.4
0 0.03 0.06 0.09 0.12 0.15
average loss probability
Effect of Max Bandwidth on
Achievable Quality

14 to 21 kb/s: significant improvement in MOSc
 From 21 to 28 kb/s: marginal change due to
increasing delay impairment by FEC
4
3.8
3.6
MOS_c
3.4
3.2
3
2.8
Max BW: 14 kb/s
Max BW: 21 kb/s
Max BW: 28 kb/s
2.6
2.4
0
0.03
0.06
0.09
0.12
average loss probability
0.15
Provisioning a VoIP Network

Silence detection/suppression
– Transmit only during On period, saves bandwidth
– Allows traffic aggregation through statistical multiplexing

Characteristics of On/Off patterns in VoIP
– Traditionally found to be exponentially distributed
– Modern silence detectors (G.729B VAD, NeVoT SD) produce
different patterns
talk-spurt/gap distribution, G.729B VAD
real spurt CDF
exponential spurt CDF
real gap CDF
exponential gap CDF
0.1
complementary CDF
complementary CDF
1
0.01
0.001
0.01
0.001
0.0001
1e-05
talk-spurt/gap distribution, Nevot SD (default setting)
1
real spurt CDF
exponential spurt CDF
real gap CDF
exponential gap CDF
0.1
0.0001
0
50
100 150 200 250 300 350 400 450 500
spurt/gap duration (in 10 ms frames)
1e-05
0
200
400
600
800
spurt/gap duration (in 10 ms frames)
1000
Traffic Aggregation Simulation



Token bucket filter with N sources, R: reserved to peak BW ratio
CDF model resembles trace model in most cases
Exponential (traditional) model
– Under-predicts out-of-profile packet probability;
– Under-prediction ratio  as token buffer size B 

Similar results for NeVoT SD
Summary of QoS
Management

End-to-End
– FEC is superior in quality to LBR
– Codec robustness is better than FEC in low loss
conditions


Combining both schemes brings the best of both sides
Network provisioning
– Observation: New silence detectors (G.729B, NeVoT SD)
 non-exponential voice On/Off patterns
– Result: performance of voice traffic aggregation  under
new On/Off patterns
– Important in traffic engineering and Service Level
Agreement (SLA) validation
Outline

QoS measurement
– Objective vs. subjective metrics
– Automated measurement of subjective quality

QoS management: improving your quality
– End-to-End: FEC, LBR, PLC
– Network provisioning: voice traffic aggregation

Reality check
– Performance of end-points (IP phones, …)
– Deployment issues in VoIP
– Assessment of VoIP service availability through Internet
measurement
Mouth-to-ear Delay of VoIP
End-points



All receivers can adjust M2E delay adaptively whenever it
is too low or too high
M2E delay depends mainly on receiver (esp. RAT)
HW phones have relatively low delay (~45-90ms)
Effect of Sender and Receiver
50
45
40
35
0
50
100
150 200
time (sec)
240
220
Sender: 3Com
Sender: Cisco
200
Sender: Mediatrix
180
Sender: Pingtel
Sender: RAT
160
140
120
100
80
60
40
3Com
Cisco Mediatrix
Receiver
M2E delay (ms)
M2E delay (ms)
60 experiment 1-1
experiment 1-2
silence gaps
55
250
300
350
Pingtel
RAT
But Adaptiveness  Perfection

Symptom of
playout buffer
underflow
 Waveforms are
dropped
 Occurred at
point of delay
adjustment
 Bugs in
software?
 LAN  perfect
quality?
Major Observations



Overall: end-points matter a lot!
HW IP phones: 45-90ms average M2E delay
SW clients:
– Messenger 2000 lowest (68ms), XP (96-120ms)
 c.f. GSMPSTN: 110ms either direction
– NetMeeting very bad (> 400ms)

PLC robustness
– Acceptable in all 3 IP phones tested, Cisco phone more robust

Silence detection/suppression
– Works for speech input
– Often fails for non-speech (e.g., music) input
 Generates many unnatural gaps
 Not good for customer support center (on-hold music)!

Acoustic echo cancellation (AEC):
– Good on most IP phones (Echo Return Loss > 40 dB)
– But some do not implement AEC at all
Reality Check #2: IP
Telephony Deployment

Localized deployment at Columbia Univ.
Regular phone
Telephone
Switch/PBX
Conference
Server
Voicemail
Server
sipd
T1/E1
RTP/SIP
SIP/PSTN Gateway
IP Phones
SIP proxy,
redirect
server
SQL
database
Web based
configuration
Web
Server
Core Server
Server status
monitoring
Issues and Lessons Learned

PSTN/PBX integration
– Requires full understanding of legacy networks

Lower layer (e.g., T1 line configuration)
– Parameters must match on both PSTN/PBX and gateway!

PBX access configurations
– To ensure calls go through in both directions

Address translation (dial-plan) in both directions
– Previous lessons/experiences can help greatly


E.g., second gateway installed in weeks instead of months
Security
– Issue: SIP/PSTN gateway has no authentication feature
– Solution:


Use gateway’s access control lists to block direct calls
SIP proxy server handles authentication using record-route
Reality Check #3: VoIP
Service Availability

Focus on availability rather than traditional QoS
– Delay is a minor issue; FEC recovers most isolated losses
– Ability to make a call is vital, especially in emergency

Internet measurement sites:
– 14 nodes worldwide, not just Internet2 and alike

Definitions:
– Availability = MTBF / (MTBF + MTTR)
– Availability = successful calls / first call attempts




Equipment availability: 99.999% (“5 nines”)  5 minutes/year
AT&T: 99.98% availability (1997)
IP frame relay SLA: 99.9%
UK mobile phone survey: 97.1-98.8%
First Look of Availability

Call success probability:
– 62,027 calls succeeded, 292 failed
 99.53% availability
– Roughly constant across I2, I2+,
commercial ISPs: 99.39-99.58%

Overall network loss
– PSTN: once connected, call
usually of good quality

exception: mobile phones
– Compute % time below loss
threshold


5% loss causes degradation for many
codecs
others acceptable till 20%
loss
0%
5%
10%
20%
All
82.3
97.48
99.16
99.75
ISP
78.6
96.72
99.04
99.74
I2
97.7
99.67
99.77
99.79
I2+
86.8
98.41
99.32
99.76
US
83.6
96.95
99.27
99.79
Int.
81.7
97.73
99.11
99.73
US
ISP
73.6
95.03
98.92
99.79
Int.
ISP
81.2
97.60
99.10
99.71
Network Outages
Sustained packet losses
interpolation)






23% packet losses are outages
Make up significant part of 0.25%
unavailability
Symmetric: AB  BA
Spatially correlated: AB  
AX
Not correlated across networks
(e.g., I2 and commercial)
Mostly short (a few seconds), but
some are very long (100’s of
seconds), make up majority of
outage time
Complementary CDF
– arbitrarily defined at 8 packets
– far beyond recoverable (FEC,
1
US Domestic paths
International paths
0.1
0.01
0.001
0.0001
0
1
Complementary CDF

50 100 150 200 250 300 350 400
outage duration (sec)
all paths
Internet2
0.1
0.01
0.001
0.0001
1e-05
0
50 100 150 200 250 300 350 400
outage duration (sec)
Outage-induced Call Abortion
Probability





Long interruption  user
likely to abandon call
from E.855 survey: P[holding]
= e-t/17.26 (t in seconds)
 half the users will abandon
call after 12s
2,566 have at least one outage
946 of 2,566 expected to be
dropped  1.53% of all calls
all
1.53%
I2
1.16%
I2+
1.15%
ISP
1.82%
US
0.99%
Int.
1.78%
US ISP
0.86%
Int. ISP
2.30%
Summary of Service
Availability

Through several metrics, one can translate from
network loss to VoIP service availability (no
Internet dial-tone)
 Current results show availability far below five
9’s, but comparable to mobile telephony
– Outage statistics are similar in research and ISP
networks

Working on identifying fault sources and locations
 Additional measurement sites are welcome
Conclusions

Measuring QoS
– Loss burstiness and delay correlation affects (generally worsens)
perceived quality
– Bridging objective and subjective metrics: the E-model, or speech
recognition based MOS prediction
– Performance of real products: IP phones and soft clients

Ensuring/improving QoS
– Network provisioning (voice traffic aggregation)
 Efficient, but may be expensive to deploy and manage
– End-to-End (FEC > LBR, PLC)
 Easier to deploy, but must control overhead of FEC

Reality Check
– Good implementation at the end-point (e.g., IP phones) is vital
– VoIP deployment requires PSTN integration and security
– Service availability is crucial for VoIP, but still far from 99.999%
over the Internet
Ongoing and Future Work

Sampling Internet performance
– Where do the problems reside?
 Access networks (Cable, DSL), or
 International paths?
– How can we solve these problems?
 Can adaptive FEC react fast enough to changes in
network conditions?

Playout delay behaviors of VoIP end-points
– How well do they react to jitter, delay spikes?

QoS Measurement and Management for VoIP

Transcript QoS Measurement and Management for VoIP

Directory