Carrier-grade vs. Internet VoIP

Download Report

Transcript Carrier-grade vs. Internet VoIP

Carrier-grade vs. Internet
VoIP
Henning Schulzrinne
(with Wenyu Jiang)
Columbia University
FCC Technical Advisory Council III
Washington, DC – October 20, 2003
Overview

Previous talk: interactive
communication services


signaling & media
Now focus on overall architecture:

network & service availability




signaling services: SIP, H.323
supporting services: DNS, DHCP, LDAP, …
network transport
network quality-of-service

packet loss, delay, jitter
Overview
(on-going work, preliminary results, still
looking for measurement sites, …)
 Service availability
 Measurement setup
 Measurement results




call success probability
overall network loss
network outages
outage induced call abortion probability
Service availability




Users do not care about QoS
at least not about packet loss, jitter, delay
rather, it’s service availability  how likely is it that I
can place a call and not get interrupted?
availability = MTBF / (MTBF + MTTR)



MTBF = mean time between failures
MTTR = mean time to repair
availability = successful calls / first call attempts



equipment availability: 99.999% (“5 nines”)  5
minutes/year
Long-distance voice 99.978%
AT&T (2003):
ATM data
99.999%
Sprint IP frame relay SLA: 99.5% Frame relay data
99.998%
IP
99.991%
Availability – PSTN metrics

PSTN metrics (Worldbank study):

fault rate


fault clearance (~ MTTR)


“next business day”
call completion rate



“should be less than 0.2 per main line”
during network busy hour
“varies from about 60% - 75%”
dial tone delay
Example PSTN statistics
Source: Worldbank
Measurement setup
Node name Location
Connectivity
Network
columbia
Columbia University, NY
>= OC3
I2
wustl
Washington U., St. Louis
I2
unm
Univ. of New Mexico
I2
epfl
EPFL, Lausanne, CH
I2+
hut
Helsinki University of Technology
I2+
rr
NYC
cable modem
ISP
rrqueens
Queens, NY
cable modem
ISP
njcable
New Jersey
cable modem
ISP
newport
New Jersey
ADSL
ISP
sanjose
San Jose, California
cable modem
ISP
suna
Kitakyushu, Japan
3 Mb/s
ISP
sh
Shanghai, China
cable modem
ISP
Shanghaihome
Shanghai, China
cable modem
ISP
Shanghaioffice
Shanghai, China
ADSL
ISP
Measurement setup



Active measurements
call duration 3 or 7 minutes
UDP packets:




36 bytes alternating with 72 bytes (FEC)
40 ms spacing
September 10 to December 6, 2002
13,500 call hours
Call success probability


62,027 calls
succeeded, 292
failed  99.53%
availability
roughly constant
across I2, I2+,
commercial ISPs
All
99.53%
Internet2
99.52%
Internet2+
99.56%
Commercial
99.51%
Domestic (US)
99.45%
International
99.58%
Domestic
commercial
99.39%
International
commercial
99.59%
Overall network loss

PSTN: once connected,
call usually of good
quality


exception: mobile phones
compute periods of time
below loss threshold


5% causes degradation
for many codecs
others acceptable till
20%
loss
0%
5%
10%
20%
All
82.3
97.48
99.16
99.75
ISP
78.6
96.72
99.04
99.74
I2
97.7
99.67
99.77
99.79
I2+
86.8
98.41
99.32
99.76
US
83.6
96.95
99.27
99.79
Int.
81.7
97.73
99.11
99.73
US
ISP
73.6
95.03
98.92
99.79
Int.
ISP
81.2
97.60
99.10
99.71
Network outages

sustained packet losses







arbitrarily defined at 8 packets
far beyond any recoverable loss (FEC,
interpolation)
23% outages
make up significant part of 0.25%
unavailability
symmetric: AB  BA
spatially correlated: AB   AX
not correlated across networks (e.g., I2 and
commercial)
Network outages
1
US Domestic paths
International paths
0.1
0.01
0.001
0.0001
Complementary CDF
Complementary CDF
1
all paths
Internet2
0.1
0.01
0.001
0.0001
0
50 100 150 200 250 300 350 400
outage duration (sec)
1e-05
0
50 100 150 200 250 300 350 400
outage duration (sec)
Network outages
no. of
outages
%
duration
symmetric (mean)
duration
(median)
total (all,
h:m)
outages >
1000
packets
all
10,753
30%
145
25
17:20
10:58
I2
819
14.5%
360
25
3:17
2:33
I2+
2,708
10%
259
26
7:47
5:37
ISP
8,045
37%
107
24
9:33
4:58
US
1,777
18%
269
20
5:18
3:53
Int.
8,976
33%
121
26
12:02
6:42
Outage-induced call abortion
probability





Long interruption  user likely
to abandon call
from E.855 survey: P[holding]
= e-t/17.26 (t in seconds)
 half the users will abandon
call after 12s
2,566 have at least one
outage
946 of 2,566 expected to be
dropped  1.53% of all calls
all
1.53%
I2
1.16%
I2+
1.15%
ISP
1.82%
US
0.99%
Int.
1.78%
US ISP
0.86%
Int. ISP
2.30%
Conclusions from measurement







Availability in space is (mostly) solved 
availability in time restricts usability for new
applications
initial investigation into service availability for
VoIP
need to define metrics for, say, web access
unify packet loss and “no Internet dial tone’’
far less than “5 nines”
working on identifying fault sources and
locations
looking for additional measurement sites
What’s next?

Existing SLAs are mostly useless





Existing measurements similarly dubious
Limited ability to learn from mistakes




what are the primary causes of service unavailability?
what can I do to protect myself – multi-homing via same fiber?
diverse access mechanisms?
Consumers of services have no good ways to compare service
availability


too many exceptions
wrong time scales: month vs. minutes
no guarantees for interconnects
only some very large customers may get access to carrier-internal
data
Thus, market failure
Need published metrics

similar to switch availability reporting