QoS Measurement and Management for VoIP
Download
Report
Transcript QoS Measurement and Management for VoIP
QoS Measurement and
Management for VoIP
Wenyu Jiang
IRT Lab
March 5, 2003
Introduction to VoIP &
IP Telephony
Transport of voice packets over IP networks
Cost savings
– Consolidates voice and data networks
– Avoids leased lines, long-distance toll calls
Smart and new services
– Call management (filtering, TOD forwarding): CPL
– Better than PSTN quality: wide-band codecs
Protocols and Standards
– Signaling: SIP (IETF), H.323 (ITU-T)
– Transport: RTP/RTCP (IETF)
Practical Issues in VoIP
Quality of Service (QoS)
– Internet is a best-effort network
Loss, delay and jitter
Users expect at least PSTN quality for VoIP!
Ease of deployment
– Requires seamless integration with legacy
networks (PSTN/PBX)
– Security is a must
High yardstick of service availability
– Can your network achieve 99.999% up time?
Outline
QoS measurement
– Objective vs. subjective metrics
– Automated measurement of subjective quality
QoS management: improving your quality
– End-to-End: FEC, LBR, PLC
– Network provisioning: voice traffic aggregation
Reality check
– Performance of end-points (IP phones, …)
– Deployment issues in VoIP
– Evaluation of VoIP service availability through
Internet measurement
Workings of a VoIP Client
Audio is packetized, encoded and transmitted
Forward error correction (FEC) may be used
to recover lost packets
Playout control smoothes out jitter to
minimize late losses; coupled with FEC
Packet loss concealment (PLC)
– Last line of “defense” after FEC and playout
multimedia
packets with FEC
Internet
added
loss, jitter
FEC
recovery
unrecoverable
losses by FEC
playout
delay
control
FEC affects playout control
added
late
losses
loss
concealment
& decoding
LBR: An Alternative to FEC
An (n,k) block FEC code can recover n-k losses
Low Bit-rate Redundancy (LBR)
– Transmit a lower bit-rate version of original audio
– No notion of “blocks”
– Not bit-exact recovery
transmission time
FEC block 1
A
C
B
A
E
D
B
FEC block 2
C
F
D
FEC data
FEC data
transmission time
A
LBR data
B
C
a'
D
b'
E
c'
F
d'
Objective QoS Metrics: Loss
Internet packet loss is often bursty
– May worsen voice quality than random (Bernoulli) loss
Characterization of packet loss
– 2-state Markov (Gilbert) model: conditional loss prob.
p
1-q = p c
1-p
0
1
(loss)
(non-loss)
q
– More detailed models, but more states!
Extended Gilbert model, nth order Markov model
Hidden Markov model, Gilbert-Elliot model, inter-loss distance
– More states Larger test set, loss of big picture, and
Adaptive applications can trade-off model accuracy for fast feedback
Gilbert model provides an acceptable compromise
Effect of Gilbert Loss Model
Loss burst distribution of a packet trace
– Roughly, though not exactly exponential
Loss burstiness on FEC performance
– FEC less efficient under bursty loss
3
Packet trace
Gilbert model
2.5
p_f: final loss% after FEC
number of occurrences
1000
100
Gilbert
Bernoulli
2
1.5
10
1
1
0.5
0.1
0
2
4
6
8
Loss burst length
10
12
0
10
20
30
40
conditional loss p_c (%)
50
60
Objective QoS Metrics: Delay
Complementary Conditional CDF (C3DF)
f (t ) P[di t | di l t ], lag l 1,2,3,..., di : delay of packet i
– More descriptive than auto-correlation function (ACF)
– Delay correlation rises rapidly beyond a threshold
– Approximates conditional late loss probability
1
unconditional
0.8
y: probability
lag=1
0.6
lag=2
0.4
lag=3
lag=5
0.2
lag=20
0
0
lag=10
0.05
0.1
0.15
x: delay (sec)
0.2
0.25
0.3
Subjective QoS Metrics
Perceived quality
MOS Grade
– Mean Opinion Score (MOS) Excellent
ITU-T P.800/830
Good
Obtained via listening tests
Fair
Poor
– MOS variations
Bad
DMOS (Degradation)
CMOS (Comparison)
MOSc (Conversational): considers delay
A/B preference
Pros: more meaningful to end users
Cons: time consuming, labor intensive
Score
5
4
3
2
1
Effect of Loss Model on
Perceived Quality
Codec: G.729 (8kb/s ITU std)
Random (Bernoulli) vs. bursty (Gilbert) loss
– Bursty lower MOS
– True even when FEC or LBR is used
Effect of random vs. bursty loss on MOS quality
4.5
random (Bernoulli) loss
bursty (Gilbert) loss
4
4.5
4
MOS
3.5
MOS
random vs. bursty loss on FEC (G.723.1) quality
5
3.5
3
3
2.5
2
0.02
2.5
0.04
0.06
0.08
0.1
loss probability
0.12
2
0.02
FEC (3,2) (Gilbert)
FEC (3,2) (Bernoulli)
0.04
0.06
0.08
0.1
loss probability
0.12
Going Further: Bridging
Objective and Subjective Metrics
The E-model (ITU-T G.107/108)
– Originally for telephone network planning
– Considers various impairments
– Reduces to delay and loss impairment when adapted for
VoIP
Objective quality estimation algorithms
– Suitable when network stats is not available, e.g.,
phone-to-phone service with IP in between.
– Speech recognition performance may be used as a
quality predictor, by comparing with original text
The E-model
Map from loss and delay to
impairment scores (Ie, Id)
Compute a gross score (R
value) and map to MOSc
Limited number of codec
loss impairment mappings
35
E-model Id
50
45
Ie (loss impairment)
40
35
30
25
20
15
10
0
25
3.5
Id (delay impairment)
4
MOS
R to MOS mapping
3
2.5
15
2
10
1.5
5
1
0
0.18
4.5
30
20
G.729 T=20ms random loss
0.03 0.06 0.09 0.12 0.15
average loss probability
0
50 100 150 200 250 300 350 400
delay (ms)
0.5
20
40
60
R value
80
100
Using Speech Recognition to
Predict MOS
Evaluation of automatic speech recognition
(ASR) based MOS prediction
– IBM ViaVoice Linux version
– Codec used: G.729
– Performance metric
absolute word recognition ratio
# of correctly recognized words
Rabs
total # of spoken wor ds
relative word recognition ratio
Rabs( p)
Rrel ( p)
, p is loss probabilit y
Rabs(0%)
Recognition Ratio vs. MOS
Impact of packet loss on audio quality
3.6
Both MOS and Rabs
decrease w.r.t. loss
Then, eliminate
middle variable p
3.2
MOS
3
2.8
2.6
2.4
2.2
2
mapping from speech recognition performance to MOS
3.8
speech recognition performance
3.6
2
4
6
8
10
loss rate (%)
12
14
16
word recognition ratio (%)
40
3.2
MOS
0
Impact of packet loss on automatic speech recognition
44
G.729 codec
42
3.4
38
3
36
2.8
34
2.6
32
2.4
30
2.2
2
28
G.729 codec
3.4
28
30
32
34
36
38
40
word recognition ratio (%)
42
44
0
2
4
6
8
10
loss rate (%)
12
14
16
Speaker Dependency
3.8
Absolute performance
is speaker-dependent
But relative word
recognition ratio is not
Suitable for MOS
prediction
3.4
MOS
3.2
3
2.8
2.6
2.4
2.2
2
0.65
0.7
0.75
0.8
0.85
0.9
0.95
relative word recognition ratio R_rel
1
1
relative word recognition ratio R_rel
0.9
Speaker A
Speaker B
Speaker C
Speaker A
Speaker B
Speaker C
0.95
word recognition ratio
0.8
0.7
0.6
0.9
0.85
0.5
0.4
0.8
0.75
0.3
0.2
Speaker A
Speaker B
Speaker C
3.6
0
2
4
6
8
10
12
packet loss probability p (%)
14
16
0.7
0.65
0
2
4
6
8
10
12
packet loss probability p (%)
14
16
Summary of QoS
Measurement
Loss burstiness:
– Affects (generally worsens) perceived quality as well
as FEC performance
– May be described with, e.g., a Gilbert model
Delay correlation:
– Increases rapidly beyond a threshold, revealed through
Complementary Conditional CDF (C3DF)
– Late losses are also bursty
Perceived quality (MOS) estimation
– Analytical: the E-model
– If network statistics N/A: relative word recognition
ratio can provide speaker-independent MOS prediction
Outline
QoS measurement
– Objective vs. subjective metrics
– Automated measurement of subjective quality
QoS management: improving your quality
– End-to-End: FEC, LBR, PLC
– Network provisioning: voice traffic aggregation
Reality check
– Performance of VoIP end-points (IP phones, …)
– Deployment issues in VoIP
– Evaluation of VoIP service availability through Internet
measurement
Quality of FEC vs. LBR
FEC is substantially and consistently better
– At comparable bandwidth overhead
– Across all codec configurations tested
FEC vs. LBR based on G.723.1
FEC vs. LBR based on AMR
4
4
3.5
3.5
MOS
4.5
MOS
4.5
3
2.5
2
0.02
3
J: FEC (2,1)
I: G.723.1 LBR
0.04
0.06
0.08
loss probability
2.5
0.1
0.12
G.729+G.723.1 LBR
2
0.02
N: AMR12.2+FEC (3,2)
M: AMR12.2+6.7 LBR
0.04
0.06
0.08
loss probability
AMR LBR
0.1
0.12
Quality of FEC under Bursty
Loss
Packet interval T has a stronger effect on
MOS with FEC than without FEC
MOS (Mean Opinion Score)
conditional loss probability p_c = 30%
4.5
T=40ms, FEC
4
T=20ms, FEC
3.5
T=20ms
0.5-0.6 MOS
T=40ms
3
2.5
2
0.02
0.04
0.06
0.08
0.1
0.12
p_u (overall loss rate)
0.14
0.16
0.18
FEC MOS Optimization
Considering Delay Effect
Larger T FEC efficiency, but delay
Optimizing T with the E-model
– Calculate final loss probability after FEC, apply delay impairment
of FEC, map to MOSc
Prediction close to FEC MOS test results
– Suitable for analytical perceived quality prediction
FEC MOS prediction, p_c=30%
FEC MOS optimization, Id != 0, d=3*T
4
4.2
p_u=4%
p_u=8%
p_u=12%
p_u=16%
E-model prediction T=40ms
real MOS test T=40ms
4
3.8
3.5
MOS_c
MOS_c
3.6
3.4
3
3.2
3
2.5
2.8
2.6
2
20
40
60
80
100 120
packet interval T (ms)
140
160
180
2.4
0
2
4
6
8
10
original loss rate (%)
12
14
16
Trade-off Analysis between
Codec Robustness and FEC
3 loss repair options
– FEC, LBR, PLC
Loss-resilient codec
– Better PLC
iLBC (IETF)
– But more bit-rates
– Better than FEC?
4
iLBC 14kb/s
G.729 8kb/s
G.723.1 6.3kb/s
3.5
MOS
3
2.5
2
1.5
0
0.03 0.06 0.09 0.12 0.15
average loss probability
Observations and Results
When considering delay:
– iLBC is usually preferred in low loss conditions
– G.729 or G.723.1 + FEC better for high loss
Example: max bandwidth 14 kb/s
– Consider delay impairment (use MOSc)
4
iLBC,no FEC
G.729+(5,3)
G.723.1+(2,1),T=60ms
3.8
MOS_c
3.6
MOS_c
3.4
3.2
3
2.8
2.6
2.4
0
0.03
0.06
0.09
0.12
average loss probability
0.15
4
iLBC Max BW: 14 kb/s
3.8
G.723.1+(2,1),T=60ms
3.6
3.4
3.2 G.729+(5,3)
3
2.8
2.6
2.4
0 0.03 0.06 0.09 0.12 0.15
average loss probability
Effect of Max Bandwidth on
Achievable Quality
14 to 21 kb/s: significant improvement in MOSc
From 21 to 28 kb/s: marginal change due to
increasing delay impairment by FEC
4
3.8
3.6
MOS_c
3.4
3.2
3
2.8
Max BW: 14 kb/s
Max BW: 21 kb/s
Max BW: 28 kb/s
2.6
2.4
0
0.03
0.06
0.09
0.12
average loss probability
0.15
Provisioning a VoIP Network
Silence detection/suppression
– Transmit only during On period, saves bandwidth
– Allows traffic aggregation through statistical multiplexing
Characteristics of On/Off patterns in VoIP
– Traditionally found to be exponentially distributed
– Modern silence detectors (G.729B VAD, NeVoT SD) produce
different patterns
talk-spurt/gap distribution, G.729B VAD
real spurt CDF
exponential spurt CDF
real gap CDF
exponential gap CDF
0.1
complementary CDF
complementary CDF
1
0.01
0.001
0.01
0.001
0.0001
1e-05
talk-spurt/gap distribution, Nevot SD (default setting)
1
real spurt CDF
exponential spurt CDF
real gap CDF
exponential gap CDF
0.1
0.0001
0
50
100 150 200 250 300 350 400 450 500
spurt/gap duration (in 10 ms frames)
1e-05
0
200
400
600
800
spurt/gap duration (in 10 ms frames)
1000
Traffic Aggregation Simulation
Token bucket filter with N sources, R: reserved to peak BW ratio
CDF model resembles trace model in most cases
Exponential (traditional) model
– Under-predicts out-of-profile packet probability;
– Under-prediction ratio as token buffer size B
Similar results for NeVoT SD
Summary of QoS
Management
End-to-End
– FEC is superior in quality to LBR
– Codec robustness is better than FEC in low loss
conditions
Combining both schemes brings the best of both sides
Network provisioning
– Observation: New silence detectors (G.729B, NeVoT SD)
non-exponential voice On/Off patterns
– Result: performance of voice traffic aggregation under
new On/Off patterns
– Important in traffic engineering and Service Level
Agreement (SLA) validation
Outline
QoS measurement
– Objective vs. subjective metrics
– Automated measurement of subjective quality
QoS management: improving your quality
– End-to-End: FEC, LBR, PLC
– Network provisioning: voice traffic aggregation
Reality check
– Performance of end-points (IP phones, …)
– Deployment issues in VoIP
– Assessment of VoIP service availability through Internet
measurement
Mouth-to-ear Delay of VoIP
End-points
All receivers can adjust M2E delay adaptively whenever it
is too low or too high
M2E delay depends mainly on receiver (esp. RAT)
HW phones have relatively low delay (~45-90ms)
Effect of Sender and Receiver
50
45
40
35
0
50
100
150 200
time (sec)
240
220
Sender: 3Com
Sender: Cisco
200
Sender: Mediatrix
180
Sender: Pingtel
Sender: RAT
160
140
120
100
80
60
40
3Com
Cisco Mediatrix
Receiver
M2E delay (ms)
M2E delay (ms)
60 experiment 1-1
experiment 1-2
silence gaps
55
250
300
350
Pingtel
RAT
But Adaptiveness Perfection
Symptom of
playout buffer
underflow
Waveforms are
dropped
Occurred at
point of delay
adjustment
Bugs in
software?
LAN perfect
quality?
Major Observations
Overall: end-points matter a lot!
HW IP phones: 45-90ms average M2E delay
SW clients:
– Messenger 2000 lowest (68ms), XP (96-120ms)
c.f. GSMPSTN: 110ms either direction
– NetMeeting very bad (> 400ms)
PLC robustness
– Acceptable in all 3 IP phones tested, Cisco phone more robust
Silence detection/suppression
– Works for speech input
– Often fails for non-speech (e.g., music) input
Generates many unnatural gaps
Not good for customer support center (on-hold music)!
Acoustic echo cancellation (AEC):
– Good on most IP phones (Echo Return Loss > 40 dB)
– But some do not implement AEC at all
Reality Check #2: IP
Telephony Deployment
Localized deployment at Columbia Univ.
Regular phone
Telephone
Switch/PBX
Conference
Server
Voicemail
Server
sipd
T1/E1
RTP/SIP
SIP/PSTN Gateway
IP Phones
SIP proxy,
redirect
server
SQL
database
Web based
configuration
Web
Server
Core Server
Server status
monitoring
Issues and Lessons Learned
PSTN/PBX integration
– Requires full understanding of legacy networks
Lower layer (e.g., T1 line configuration)
– Parameters must match on both PSTN/PBX and gateway!
PBX access configurations
– To ensure calls go through in both directions
Address translation (dial-plan) in both directions
– Previous lessons/experiences can help greatly
E.g., second gateway installed in weeks instead of months
Security
– Issue: SIP/PSTN gateway has no authentication feature
– Solution:
Use gateway’s access control lists to block direct calls
SIP proxy server handles authentication using record-route
Reality Check #3: VoIP
Service Availability
Focus on availability rather than traditional QoS
– Delay is a minor issue; FEC recovers most isolated losses
– Ability to make a call is vital, especially in emergency
Internet measurement sites:
– 14 nodes worldwide, not just Internet2 and alike
Definitions:
– Availability = MTBF / (MTBF + MTTR)
– Availability = successful calls / first call attempts
Equipment availability: 99.999% (“5 nines”) 5 minutes/year
AT&T: 99.98% availability (1997)
IP frame relay SLA: 99.9%
UK mobile phone survey: 97.1-98.8%
First Look of Availability
Call success probability:
– 62,027 calls succeeded, 292 failed
99.53% availability
– Roughly constant across I2, I2+,
commercial ISPs: 99.39-99.58%
Overall network loss
– PSTN: once connected, call
usually of good quality
exception: mobile phones
– Compute % time below loss
threshold
5% loss causes degradation for many
codecs
others acceptable till 20%
loss
0%
5%
10%
20%
All
82.3
97.48
99.16
99.75
ISP
78.6
96.72
99.04
99.74
I2
97.7
99.67
99.77
99.79
I2+
86.8
98.41
99.32
99.76
US
83.6
96.95
99.27
99.79
Int.
81.7
97.73
99.11
99.73
US
ISP
73.6
95.03
98.92
99.79
Int.
ISP
81.2
97.60
99.10
99.71
Network Outages
Sustained packet losses
interpolation)
23% packet losses are outages
Make up significant part of 0.25%
unavailability
Symmetric: AB BA
Spatially correlated: AB
AX
Not correlated across networks
(e.g., I2 and commercial)
Mostly short (a few seconds), but
some are very long (100’s of
seconds), make up majority of
outage time
Complementary CDF
– arbitrarily defined at 8 packets
– far beyond recoverable (FEC,
1
US Domestic paths
International paths
0.1
0.01
0.001
0.0001
0
1
Complementary CDF
50 100 150 200 250 300 350 400
outage duration (sec)
all paths
Internet2
0.1
0.01
0.001
0.0001
1e-05
0
50 100 150 200 250 300 350 400
outage duration (sec)
Outage-induced Call Abortion
Probability
Long interruption user
likely to abandon call
from E.855 survey: P[holding]
= e-t/17.26 (t in seconds)
half the users will abandon
call after 12s
2,566 have at least one outage
946 of 2,566 expected to be
dropped 1.53% of all calls
all
1.53%
I2
1.16%
I2+
1.15%
ISP
1.82%
US
0.99%
Int.
1.78%
US ISP
0.86%
Int. ISP
2.30%
Summary of Service
Availability
Through several metrics, one can translate from
network loss to VoIP service availability (no
Internet dial-tone)
Current results show availability far below five
9’s, but comparable to mobile telephony
– Outage statistics are similar in research and ISP
networks
Working on identifying fault sources and locations
Additional measurement sites are welcome
Conclusions
Measuring QoS
– Loss burstiness and delay correlation affects (generally worsens)
perceived quality
– Bridging objective and subjective metrics: the E-model, or speech
recognition based MOS prediction
– Performance of real products: IP phones and soft clients
Ensuring/improving QoS
– Network provisioning (voice traffic aggregation)
Efficient, but may be expensive to deploy and manage
– End-to-End (FEC > LBR, PLC)
Easier to deploy, but must control overhead of FEC
Reality Check
– Good implementation at the end-point (e.g., IP phones) is vital
– VoIP deployment requires PSTN integration and security
– Service availability is crucial for VoIP, but still far from 99.999%
over the Internet
Ongoing and Future Work
Sampling Internet performance
– Where do the problems reside?
Access networks (Cable, DSL), or
International paths?
– How can we solve these problems?
Can adaptive FEC react fast enough to changes in
network conditions?
Playout delay behaviors of VoIP end-points
– How well do they react to jitter, delay spikes?