tutorial below
Download
Report
Transcript tutorial below
Understanding VoIP
Dr. Jonathan Rosenberg
Chief Technology Strategist
Skype
What is this course about?
Getting “under the hood” and understanding
how VoIP works
An exploration of the protocols and
technologies behind VoIP
Conveying an understanding of the various
problems that need to be solved for VoIP to
work
What this course is not about
A general introduction to telephony
A detailed cookbook or deployment guide to
VoIP
A product survey of VoIP and IP telephony
products
In particular, Cisco or Skype products are not
discussed except in passing
Ground Rules
Ask Questions ANY TIME!
I will be bored if this is a one way
conversation
No question is too stupid
Laughing or mocking anyones questions is
unacceptable
Please ask off-the-wall or exploratory
questions – there is a lot that is not in
here!
Agenda
Breaking up the problem
Voice and Video coding
Voice and Video Transport
Quality of Service
Signaling
Security
NAT Traversal
Non-Agenda
Programming APIs
Emergency Services, Lawful Intercept
Numbering, Routing, Naming (ENUM, TRIP)
PSTN Interworking
Billing, Provisioning, OAM
Conferencing, IVR, Applications
Breaking Up the Problem
Directories
Databases
Accounting
Billing
LDAP,
ENUM
IP
RADIUS
DIAMETER
Application
Server
SIP
Signaling
Servers
Presence
Servers
Media
Servers
OAM
SIP, H.323,
MGCP,H.248 IP Network
Endpoint
SIMPLE,
XMPP
RTP
Endpoint
Voice Coding
Voice Endpoint Model
No Speech
+
Hybrid
DTMF/
Tone
Detection
Nonlinear
Processing
Echo
Canceller
2-wire interface
Packetizer
Speech
Decoding
Unpacker
Silence
Detection
Loss
Admin
DTMF/
Tone
Generation
Speech
Encoding
Comfort
Noise
Generation
Speech
Codecs
Waveform codecs:
Directly encode speech in an efficient way by
exploiting temporal and/or spectral
characteristics
Attempt to reproduce input signal’s waveform
by minimizing error between input and coded
signals
Source codecs / vocoders:
Estimate and efficiently encode a parametric
representation of speech
CELP
Minimizes perceptually
weighted error
similar to waveform coders
Short-term predictor is LP
(vocal tract) filter
Excitation is obtained
from codebook and longterm pitch predictor
Closed-loop search is
MIPS intensive
Codec Comparison
Codec
Sampling
Bitrate
Latency
Comments
G.711
8 Khz
64 kbps
125 us
PSTN Codec
G.729
8 Khz
8 kbps
10ms
CS-ACELP
G.723.1
8 Khz
5.3/6.3 kbps
37.5ms
AMR
8 Khz
4.75 – 12
kbps
25ms
GSM codec
G.722.1
16 Khz
24/32kbps
40ms
Polycom
SIREN
AMR-WB
16 Khz
6.6-23.85
kbps
25ms
GSM
Wideband –
encumbered
SILK
8, 12, 16, 24
Khz (SWB)
6-40kbps
25ms
Skype codec
Listen at: http://www.voiceage.com/listeningroom.php
Echo Cancellation
ERL: Echo Return
Loss (dB)
ERLE: Echo Return
Loss Enhancement
Double-talk
Convergence time
Analog
+ ERLE Non-Linear
Processor
Reflection
ERL
2-4-wire
Hybrid
Echo
Path
Estimati
on
Packet
Network
Echo Canceller
Digital
This echo canceller cancels
‘local’ echoes from the hybrid
reflection
Echo Canceller Specifics
The voice echo path is like an electrical circuit
If a ‘break’ (cancellation) is made anywhere in the ‘circuit’, you will
eliminate the echo
The easiest place to make the break is with a canceller ‘looking
into’ the local analog/digital telephony network, NOT the packet
network (which has much longer and variable delays)
The echo canceller at the other end of the call eliminates the
echoes that YOU hear, and vice versa
Echo canceller coverage (e.g. 32 ms) is the maximum length of
echo impulse response that can be cancelled from the local
analog/digital network (the packet network delay does not matter)
The non-linear processor is used to ‘clean-up’ any residual echo
left over from the canceller
Voice Activity Detection
Speech Magnitude (dB)
Speech Detected
Speech Detected
Hang-Over
Hang-Over
Typically fixed
at 200 ms
Sentence 1
Signal-toNoise
Threshold
Sentence 2
Noise Floor
time
Front-end
Speech Clipping
Front-end
Speech Clipping
Comfort Noise Generation
Silence isn’t golden…it’s annoying
Simple techniques:
When speech stops…what do you play to the
listener?
Play white/pink noise
Replay last receiver packet over and over
Fancier technique:
Transmitter measures local “noise environment”
Transmitter sends special “comfort noise” packet
as last packet before silence
Receiver generates noise based CN packet.
Voice Quality:
Mean Opinion Scores
Source
Channel Simulation
Impairment
Codec ‘X’
1
2
3
4
5
1
2
3
4
5
“Nowadays, a chicken leg is
a rare dish”
Rating
Speech Quality
Distortion
5
Excellent
Imperceptible
4
Good
Just perceptible but not annoying
3
Fair
Perceptible and slightly annoying
2
Poor
Annoying but not objectionable
1
Unsatisfactory
Very annoying and objectionable
MOS of 4.0 = Toll Quality
Clear Channel MOS’s
5
Mean
Opinion
Score
4.1
4
3.8
3.9
3.9
3.4
3
2
1
G.711
(64 kbit/s
PCM)
G.726
(32 kbit/s
ADPCM)
G.723.1
(6.4 kbit/s
MP- MLQ)
G.729
(8 kbit/s
CSACELP)
IS-54
(8 kbit/s
NA Dig
Cellular)
MOS Under Varying Conditions
G.729
Avg Speech Level (-20 dBmO)
Low Input Level (-30 dBmO)
2 Tandem codings
3 Tandem codings
1% Frame Erasure Rate
5% Bit Error Rate
5% FER
10% FER
20% FER
3.85
3.54
3.46
2.68
3.24
3.02
Video Coding
Key Terms
Term
Description
Frame
An individual picture in a sequence that makes up the
video
Frame Rate
The number of frames per second in video. 30 is
excellent (TV quality)
Resolution
The number of horizontal and vertical pixels.
VGA=640x480.
Interlacing
A mechanism for transmitting video by splitting a frame
into two fields, one field representing the odd lines, and
one the even field. This is the “i” in 1080i
Progressive
As opposed to interlaced, a method for transmitting video
by sending each frame as a whole.
HD
High Def resolutions – 720p is 1280x720 with 60fps.
1080i is 1920x1080 at 30fps
Key Concept: Macroblocks
Rectangular block in
an image which is
a basic unit of
compression. Typically
16x16 pixels.
Key Concept: Inter-Frame Prediction
Encode
Predict information in the current frame by looking at previous frames,
possibly taking into account motion.
Key Concept: Discrete Cosine
Transform (DCT)
Increasing vertical frequencies
Increasing horizontal frequencies
A technique for representing a
macroblock by its component
frequencies. Discarding the higher
frequencies throws away the finer
details without losing the core image.
Video Encoder Block Diagram
Key Codec Comparisons
Codec Timeline
Applications
H.261
1990
ISDN at multiples of 64kbps
H.263
1996
Early Flash using Sorenson Spark implementation.
Original RealVideo codec. Required in IMS.
H.264
–AVC
2003
Youtube, iTunes, Blu-ray; most modern video
conferencing. The current primary video codec for
real-time. Typical VGA 15fps bitrate = 500kbps
H.264SVC
2007
“Layered” video that provides improved quality and
resilience; ideal for multiparty video conferencing.
VP7
2005
On2 Technologies codec; Skype, successor to H263
in Flash
Voice and Video Transport:
RTP
RTP: What is it?
Real Time Transport Protocol
RFC 3550
product of avt working group
1996 proposed standard –
RFC1889
2004 full standard
What does it do
e2e transport of real time media
optimized for multicast
provides sequencing, timing,
framing, loss detection
provides feedback on reception
quality
What does it do (cont)
provides information on
group members
provides data to correlate
audio and video and
other media
Works with any codec
need payload format for
each codec
Flexible
RTP: What isn’t it?
Doesn’t guarantee quality of
service
doesn’t reserve network
resources
doesn’t guarantee no loss or
bounded delay
can work with QoS protocols
(RSVP)
Doesn’t provide signaling
other protocols must be used
to set up RTP (like SIP or
H.323)
Not a specific protocol
type
Does not run directly
ontop of IP
Runs ontop of UDP
No fixed port number
RTP Stack
RTP
RTCP
UDP
IP
Big Picture: RTP, SDP and SIP
C=IN IP4 123.1.2.3
m=audio RTP/AVP 1122 0 1
m=video RTP/AVP 1130 98
a=rtpmap:98 h263
SIP w/ SDP
Proxy
Proxy
End
End
User
IP Network
User
RTP
RTP Components: Data + Control
Data aka RTP
very confusing
Usually on an even UDP
port (NATs change this –
later)
Provides
sequencing
timing
framing
content labeling
User identification
Control = Real Time
Control Protocol (RTCP)
Same address as data,
but one higher port
usually
Provides
reception quality
sender statistics
participant information
(multicast)
synchronization
information
Real Time Data Transport
Originator breaks stream into
packets (segmentation)
application layer framing
(ALF)!!!
RTP Source
Packets sent; network may
lose, delay, reorder packets
Must, at receiver:
reorder
recover
resegment
rescynchronize
clock synchronization!
RTP
Packets
RTP Sink
Transport System
Source
Digitize Audio from mike
Silence Suppression
Echo cancellation
Compress Audio
G.711: 64 kbps
G.729: 8 kbps
G.723.1: 5.3/6.3 kbps
Packetize Audio in RTP
Send
Sink
Receive packets
Un-packetize
decompress
comfort noise
generation
reorder
recover loss
jitter buffer
A/D conversion to
speakers
Jitter Buffer
Packets delayed
differently
Must play them out
periodically
pkts
Packets may arrive after
designated playout time
-> loss
Insert extra delay to
compensate
May need to adapt this
amount
time
RTP Packet Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC
|M|
PT
|
sequence number
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
timestamp
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
synchronization source (SSRC) identifier
|
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|
contributing source (CSRC) identifiers
|
|
....
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RTP Header Fields
Version: 2
P: indicates padding (for
encryption)
X: extension bit
CSRC count: for mixers
(later)
M: Marker Bit: indicates
framing
audio codecs: first packet
in talkspurt
video: last packet in frame
Payload Type: indicates
encoding
in RTP packet allows
changes per-packet
Useful for:
adaptation
DTMF codec
silence codecs
SN: defines ordering of
packets
Timestamp: when packet
was generated
SSRC: identifier
CSRC: list of mixed users
RTP Timestamp
Tick units are
dependent on codec
For speech: 125
microseconds (standard
8 khz sampling rate)
For video: 90 KhZ
For audio: 44.1 KhZ (CD
rate)
Gaps in TS, but not in
SN mean silence
Initial value random for
security
Video
Timestamp represents
time at beginning of
frame
Many packets may
have same timestamp
Speech
Time per packet may
vary
Depends on
packetization: 20100ms typical
Payload Formats
Each codec needs a way to
be encapsulated in RTP
RFC3550 defines
mechanisms for many
common codecs
G.711, G.729, G.723.1,
G.722, etc.
Some simple video
More complex codecs have
their own payload format
documents
MPEG
H.263 and H.261
Payload format defines
How to break frame into
packets
extra fields needed below
main RTP header
Advanced Topics
DTMF and Tones
RFC 2833
Special codecs for
encoding touch tones
(DTMF) and other
signals
Can send either the
waveform (frequency,
amplitude)
Or the actual signal (#,
8, 0)
Compressed RTP
RFC 2508
For dialup links
Don’t send header, just
send index
Far side uses index to
retrieve header, and
then increments certain
fields
Quality of Service
Quality of Service
The problem we are trying to solve is to give
“better” service to some at the expense of giving
worse service to to others — QoS fantasies to the
contrary, it’s a zero sum game
- Van Jacobson
Quality of Service
So, what’s the problem?
Us ability of V oice Circuit as a Function of End-to-End Delay
Toll
Quality
1.0
Satellite
Zone
CB
Zone
Fax Relay, Broadcast
Private Network
VoFR & VoIP
Technology
0.5
Early I-Phone
Technologyy
Improving I-Phone
means:
• Lower PC Delay
• Lower Network Latency
• Tighten Network Jitter
Time (msec)
800
700
600
500
400
300
200
100
0.0
0
Utility
Delay Budget
Device sample capture
Encode delay (algorithmic delay + processing delay)
Packetization/framing
Move to output queue/queueing delay
Access (up) link transmission
Backbone network transmission
Access (down) link transmission
Input queue to application
Jitter buffer
Decode processing delay
Device playout delay
Some Techniques to Improve “Network
QoS”
RED — Random Early Drop (or “Detect”)
WFQ — Weighed Fair Queuing
Intserv/RSVP — ReSerVation Protocol
IP Precedence DiffServ
CRTP — Compressed Realtime
Transport
Protocol
MCML — Multi-Class Multi-Link PPP
Random Early Detect (RED)
this is Basic Hygiene!
Objectives
Keep average queue size
low – good for voice
Fairness – bigger streams
punished more
Avoid synchronization
Only works with loss
responsive transport
protocols
Algorithm – probabilistic
dropping of packets
Drop Probability
1
Min
Max
Queue Size
Poll: Will RED Help Voice?
Yes
• Voice not loss responsive
• Mixing voice and data in
same queue bad
• Voice queues usually not
congested
No
Weighted Fair Queueing
Each flow “sees” a
dedicated amount of
bandwidth Bj
A packet arriving at
time t is transmitted at
time t+size/Bj
B
B1
B2
B3
B = B1 + B2 + B3
Whats the Problem??
WFQ is unrealizable
because
Variable packet sizes
Causality
1500
Example:
Link speed 100Kbps
Flow 1: 10Kbps
Flow 2: 90Kbps
8.8ms
Theory
1500
100
128ms
Actual
100
Approximations of WFQ
Many PhDs written with
approximate and
implementable algorithms
Algorithms differ in their
delay bound
How much worse than
perfect WFQ is this?
Delay bounds a function of
bandwidth, number of
queues, other params
Algorithms
SCFQ: Self-Clocked Fair Queueing
WF2Q: Worst-Case Fair
Weighted Fair Queueing
FBFQ: Frame-Based Fair Queueing
PGPS:
DRR:
WFQ Voice Configuration
How to pick allocated bandwidth?
Consider G.711, 30ms framing (74.6Kbps)
If Bi = 74.6kbps, delay is at least 30ms
If Bi = 149.2Kbps, delay at least 15ms
Must set voice queue bandwidth at least 2x actual
voice usage to keep delays down!
Unused bandwidth will go to data
Need an accurate WFQ Implementation
Priority Queueing
Emulates the familiar
“elite airport line”
experience
Voice and data packets
in separate queues
If there is any packets
in voice queue, they
are serviced
Server
Voice
Data
Priority Queueing Considerations
Easy to configure – no bandwidth values
required
Main problem – data starvation
Need to police voice queue
Doesn’t work as well when there is other nonvoice high priority traffic (video)
Head-of-Line Blocking from data queue
Intserv: Integrated Services
Guaranteed Service (RFC 2212)
Mathematically provable bounds on end-to-end datagram
queuing delay/bandwidth
Controlled Load Service (RFC 2211)
Approximate QoS from an unloaded network for
delay/bandwidth
Describe traffic with a “TSPEC”
r= token bucket rate
b= token bucket depth
p= peak transmission rate
m= minimum (policed) packet size
M= maximum packet size
Describe endpoints with a « FlowSpec »
Source/Destination IP addresses, ports, protocol
RSPEC/FSPEC provides the policy to the
queuing/scheduling algorithms
RSVP Design
Signaling distinct from routing (modularity,
deployability, evolvability)
Soft state (robustness, simplicity)
Transparent operation across non-RSVP routers
(deployability)
Support shared and distinct reservations
Applies to unicast & multicast applications
Simplex & receiver-oriented.
RSVP protocol
path
Src
PATH : Source Destination
resv
Traffic parameters of source
Collects info on network capabilities
Detects current route
RESV: Source Destination
Dest.
Receiver selected Int-Serv service
Traffic parameters of receiver selected reservation
Follows route detected by PATH
Reservation actually nailed in network
RSVP messages carried over IP
Can also be carried over UDP but few people do that
RSVP: Admission Control
Flow Request
Routing
Routing
Protocol
Routing Database
Switching
Packets In
Reservation
Protocol
Admission
Control
Resource Utilization
Database
Interface 1
Packet Scheduler
Queuing Policy
Database
Packets Out
Route Selection
Interface N
Packet Scheduler
Packets Out
Intserv/RSVP Acceptance
Enthusiasm
Intserv/RSVP will solve
the world’s QoS
Cool thing to say:
“RSVP does not scale”
vBNS RSVP over ATM
transparently transport RSVP
Real
value
RSVP for VoIP in Enterprise
Today
ISP
Today
Enterprise
Time
IP Precedence & Diffserv
“Poor man’s” approach to QoS
Set IP Precedence/DSCP higher on voice packets
Scales better than RSVP –
This puts them in a different queue, resulting in isolation from
best effort traffic
Can be done by endpoint, proxy, or in routers through
heuristics
Keeps QoS control “local”
Pushes work to the edges and boundaries
Can provide bulk QoS by customer or network
No admission control
Too much high-precedence traffic can still swamp the
network
Diffserv Architectural Model
Clouds — regions of relative
homogeneity:
Within a cloud, QoS managed
by local rules
Hard work confined to
boundaries of clouds:
Administrative control
Technology
Bandwidth
Classification
Conditioning/Policing
QoS information exchange
limited to boundaries
Bi-lateral, not multi-lateral
Not necessarily symmetric
Me
Not Me
Also
Not
Me
Far
Away
Diffserv Scalability
Fundamental assumptions:
Group packets explicitly by the “Per-hop
behavior (PHB)” they are to get
Relatively small number of feasible
queuing/scheduling algorithms for high link
speeds
Number of individual flows is large
Many different rules, often policy driven
Queue service
Shaping/policing
Nodes in the middle of a cloud only have
to deal with traffic aggregates
Diffserv Forwarding via PHBs
PHBs map to DSCPs (Diffserv Code
Points)
Values chosen for backward-compatibility with
IPv4 TOS byte including IP Precedence (RFC
2474)
Packets with different DSCPs may be reordered
Forwarding resources partitioned by
PHB/DSCP
Assured Forwarding PHB
(AF*)
Four independent classes
Within each class, three levels of drop
precedence
A congested AF node discards packets with
higher drop preference first
Packets with lowest drop preference must be
within the subscribed profile
*RFC2597
Expedited Forwarding PHB
(EF*)
Targeted at VoIP and “virtual leased lines”
Roughly equivalent to priority queuing,
with a safety measure to prevent
starvation
Implications:
No more than 50% of a link can be EF
see RFC3247,3248 for interesting mathematical
analyses
Worst case jitter at each hop is max of:
*RFC3246
number of EF microflows in the aggregate, or
a single MTU packet of some other aggregate
Diffserv Traffic Conditioner
Meter
Shaped
Packets
Classifier
Marker
Shaper /
Dropper
Dropped
Classifier: selects a packet in a traffic stream based on the
content of some portion of the packet header
Meter: checks compliance to traffic parameters (e.g. Token
Bucket) and passes result to marker and shaper/dropper to
trigger particular action for in/out-of-profile packets
Marker: writes/rewrites DSCP
Shaper: delay some packets for them to be compliant with
the profile
Diffserv Acceptance
Enthusiasm
Diffserv will solve
the world’s QoS
Diffserv Engineering?
Diffserv SLA ?
Internet e2e SLA?
Real
value
Inter-SP Diffserv and end-to-end
Internet QoS need further
standardisation and commercial
arrangements
Diffserv Design & Deployment
intra Domain
today
Time
Mixing Intserv & Diffserv:
Aggregation
Host signals with RSVP
Edge or transit domains
Edge
In transit domains
Aggregate reservations mark
packets using DSCP
Blindly transfer end to end
reservations using another IP
Protocol Number - change at
edge
Routers detect egress of
reservation (deaggregation) on
transfer from an interior or
aggregator interface to an
exterior (deaggregating)
interface
Aggregate reservation size
varies with load
Backbone
Edge
RTP Compression
20ms @ 8kbit/s yields
20 byte payload
IP header 20; UDP
header 8; RTP header
12
Twice size of
payload!
Header compression:
40 bytes to 2-4 most
of the time
Hop-by-hop: use only
on the slow links
Sample Delay Budget
(G.711 - 64kbps)
Delay Source (G.711)
Budget (ms)
Device Sample Capture
.1
Encode Delay (Algorithmic Delay + Processing Delay)
2.5
Packetization/Framing
10
Move to Output Queue/Queue Delay
.5
Access (up) Link Transmission
30
Backbone Network Transmission
5
Access (down) Link Transmission
10
Input Queue to Application
.5
Jitter Buffer
35
Decode Processing Delay
.5
Device Playout Delay
.5
Total
94.6
Sample Delay Budget
(G.729 - 8kbps)
Delay Source (G.729)
Budget (ms)
Device Sample Capture
.1
Encode Delay (Algorithmic Delay + Processing Delay)
17.5
Packetization/Framing
20
Move to Output Queue/Queue Delay
.5
Access (up) Link Transmission
30
Backbone Network Transmission
5
Access (down) Link Transmission
10
Input Queue to Application
.5
Jitter Buffer
35
Decode Processing Delay
5
Device Playout Delay
.5
Total
119.1
Signaling: SIP
SIP is one of Many
ITU H.323
MGCP
Originally for video conferencing
The first standard protocol for VoIP
Still in wide usage, but negative growth
Dumb phones controlled by smart server
“Softswitch” – PSTN emulation view
Megaco/H.248
Standard version of MGCP
Core SIP Functions
Establishment of peer to peer sessions
Management of peer to peer sessions
Keepalives
Graceful and Non-graceful termination
Rendezvous
Forking
Search
Policy Based Routing
Loose Routing
Mobility
Limited terminal mobility
Device Mobility
Core SIP Functions
Secure User Identification
Exchange and Management of Media
Session data
User registration
Capability declaration
Capability query
Reliability
SIP Technology Community
RTP
SDP
ROHC
STUN
O/A
3264
Events
3265
SIMPLE
SIP
RFC3261
MIDCOM
DNS
3263
ENUM
Rel
3262
SigComp
SIP Extensions
SIP Design Philosophy
Patterned after other
Successful Internet
Standards
HTTP
Don’t Reinvent the PSTN
General Purpose
Functionality
Do Not Dictate
Architectures or Services
It needs to work on any IP
Network
Leverage the Best of
Existing Standards
URLs
MIME
RFC822
Scalability
Push state to the edge
Basic Design
Request/Response Protocol
SIP is a Peer Protocol – all
entities send requests and
receive requests
Modelled after HTTP
Each request invokes
method
Main purpose of request
Messages contain bodies
request
Agent
Agent
response
Transactions
Fundamental unit of
messaging exchange
Request
Zero or more provisional
responses
Usually one final response
Maybe ACK
All signaling composed of
independent transactions
Identified by Cseq
Sequence number
Method tag
INVITE
100
200
Cseq: 1
ACK
First Transaction
BYE
200
Second Transaction
Cseq: 2
Session Independence
Body of SIP message
used to establish call
describes the session
Session could be
Audio
Video
Game
SIP operation is
independent of type of
session
SIP Bodies are MIME
objects
MIME = Multipurpose
Internet Mail Extensions
Mechanisms for
describing and carrying
opaque content
Used with HTTP and
email
Protocol Components
User Agent
Proxy
SIP server responsible for
End systems
relaying and processing
Hard and soft phones
requests between user
agents
PSTN Gateways
Main job: where to send
Phone Adaptors
request next?
Media Servers
Back-to-Back User Agent
(B2BUA)
Anything that
originates or
SIP server that terminates
and re-originates SIP
terminates SIP calls
SBCs, Call Agents, etc.
SIP Addressing
SIP addresses are URL’s
URL contains several
components
Scheme (sip)
Username
Hostname
Optional port
Parameters
Headers and Body
SIP allows any URI type
tel URIs
http URLs for redirects
mailto URLs
leverage vast URI
infrastructure
sip:[email protected]:5061;
user=host?Subject=foo
The SIP Trapezoid
b.com
a.com
SIP
RTP
SIP Methods
INVITE
BYE
Invites a participant to a
session
idempotent - reINVITEs for
session modification
Ends a client’s
participation in a session
CANCEL
Terminates a search
OPTIONS
ACK
Queries a participant
about their media
capabilities, and finds
them, but doesn’t invite
For reliability and call
acceptance
REGISTER
Informs a SIP server about
the location of a user
SIP Architecture
sp.com
Request
Response
Media
2
Corp DB
3
a.com
[email protected]
5
4
b.com
6
1
7
11
12
10
13
8
14
9
SIP Message Syntax
Many header fields
from http
Payload contains a
media description
SDP - Session
Description Protocol
INVITE sip:[email protected] SIP/2.0
From: J. Rosenberg <sip:[email protected]>
;tag=76ah
Subject: Conference Call
To: John Smith <sip:[email protected]>
Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9
Call-ID: [email protected]
Content-type: application/sdp
CSeq: 4711 INVITE
Content-Length: 187
v=0
o=user1 53655765 2353687637 IN IP4 1.2.3.4
s=Sales
c=IN IP4 1.2.3.4
t=0 0
m=audio 3456 RTP/AVP 0
SIP Address Fields
Request-URI
To
Contains address of
next hop server
Rewritten by proxies
based on result of
Location Service
Address of original
called party
Contains optional
display name
From
Address of calling
party
Optional display
name
INVITE sip:[email protected] SIP/2.0
From: J. Rosenberg <sip:[email protected]>
;tag=76ah
Subject: Conference Call
To: John Smith <sip:[email protected]>
Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9
Call-ID: [email protected]
Content-type: application/sdp
CSeq: 4711 INVITE
Content-Length: 187
v=0
o=user1 53655765 2353687637 IN IP4 1.2.3.4
s=Sales
c=IN IP4 1.2.3.4
t=0 0
m=audio 3456 RTP/AVP 0
SIP Responses
Look much like requests
Headers, bodies
Differ in top line
Status Code
Numeric, 100 - 699
Meant for computer
processing
Protocol behavior based on
100s digit
Other digits give extra info
Text phrase for humans
Can be anything
100 - 199 (1XX): Informational
200 - 299 (2XX): Success
300 - 399 (3XX): Redirection
400 - 499 (4XX): Client Error
500 - 599 (5XX): Server Error
600 - 699 (6XX): Global Failure
Two groups
100 - 199: Provisional
Reason Phrase
Status Code Classes
Not reliable
200 - 699: Final, Definitive
Example
200 OK
180 Ringing
Example SIP Response
Note how only
difference is top line
Rules for generating
responses
Call-ID, To, From, Cseq
are mirrored in
response
Branch parameter
used as transaction
ID
Tag added to To field to
identify dialog
SIP/2.0 200 OK
From: J. Rosenberg <sip:[email protected]>
;tag=76ah
To: John Smith <sip:[email protected]>
;tag=112
Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9
Call-ID: [email protected]
Content-type: application/sdp
CSeq: 4711 INVITE
SIP Transport
SIP Messages over UDP or
TCP/TLS or SCTP
Reliability mechanisms
defined for UDP
UDP More Widely Used
Faster
No connection state
TCP preferred these days
NAT
Larger SIP messages
Reliability mechanisms
depend on SIP request
method
INVITE
anything except INVITE
Reason: optimized for
phone calls
Registrations
REGISTER creates mapping in
server from one URI to another
REGISTER properties
UA location in Contact
Registrar identified in Request
URI
Identifies registered user in To
and From field
Expires header indicates desired
lifetime
REGISTER sip:example.com SIP/2.0
To: sip:[email protected];user=phone
From: sip:[email protected];user=phone
Call-ID: [email protected]
CSeq: 123 REGISTER
Contact: sip:[email protected]
Expires: 3600
Can be different for each
Contact
Registrations are soft-state
sip:[email protected]
to
sip:[email protected]
Registration Handling
Registrar is logical
function handling
REGISTER
Registrar steps:
Authenticate
Authorize
Add Binding
Lower expiration
Return all currently
registered UA (can be
more than one)
SIP/2.0 200 OK
To: sip:[email protected];user=phone
From: sip:[email protected];user=phone
Call-ID: [email protected]
CSeq: 123 REGISTER
Contact: sip:[email protected];expires=3600
Contact: sip:[email protected];expires=524
Forking
A proxy may have more than one
address for a user
Happens when more than one SIP
URL is registered for a user
Can happen based on static routing
configuration
INVITE
In this case, proxy may fork
[email protected]
Forking is when proxy sends
request to more than one proxy at
once
First 200 OK that is received is
forwarded upstream
All other unanswered requests
cancelled
Routing of Subsequent Requests
Initial SIP request sent through
many proxies
No need per se for subsequent
requests to go through proxies
Each proxy can decide whether it
wants to receive subsequent
requests
INVITE
Proxy
Inserts Record-Route header
containing its address
For subsequent requests, users
insert Route header
Proxy
Contains sequence of proxies
(and final user) that should
receive request
BYE
Proxy
UA1
UA2
Setting up the Session
INVITE contains the Session
Description Protocol (SDP)
in the body
SDP conveys the desired
session from the callers
perspective
Session consists of a
number of media streams
Each stream can be audio,
video, text, application, etc.
Also contains information
needed about the session
codecs
addresses and ports
SDP also conveys other
information about session
Time it will take place
Who originated the
session
subject of the session
URL for more information
SDP origins are multicast
sessions on the mbone
Originator of INVITE is not
originator of session
Anatomy of SDP
SDP contains informational
headers
Time of the session
Followed by a sequence of
media streams
Each media stream contains an
m line defining
version (v)
origin(o) - unique ID
information (I)
port
transport
codecs
Media Stream also contains c
line
Address information
v=0
o=user1 53655765 2353687637 IN IP4 128.3.4.5
s=Mbone Audio
i=Discussion of Mbone Engineering Issues
[email protected]
t=0 0
m=audio 3456 RTP/AVP 0 78
c=IN IP4 1.2.3.4
a=rtpmap:78 G723
m=video 4444 RTP/AVP 86
c=IN IP4 1.2.3.4
a=rtpmap:86 H263
Negotiating the Session
Called party receives SDP offered
by caller
Each stream can be
Accepting involves generating an
SDP listing same stream
accepted
rejected
port number and address of called
party
subset of codecs from SDP in
request
Rejecting indicated by setting port
to zero
Resulting SDP returned in 200 OK
Media can now be exchanged
v=0
o=user2 16255765 8267374637 IN IP4 4.3.2.1
t=0 0
m=audio 3456 RTP/AVP 0
c=IN IP4 4.3.2.1
m=video 0 RTP/AVP 86
c=IN IP4 4.3.2.1
Audio stream accepted, PCMU only.
Video stream rejected
Changing Session Parameters
Once call is started, session can be
modified
Possible changes
Add a stream
Remove a stream
Change codecs
Change address information
Call hold is basically a session
change
Accomplished through a re-INVITE
Same session negotiation as
INVITE, except in middle of call
Rejected re-INVITE - call still
active!
INVITE
200
ACK
INVITE
200
reINVITE
ACK
Hanging Up
INVITE
How to hang up depends on
when and who
After call is set up
Hangup CANCEL
From caller, before call is
accepted
either party sends BYE request
100
200 OK
Accept
200 OK
send CANCEL
BYE is bad since it may not
reach the same set of users that
got INVITE
If call is accepted after CANCEL,
then send BYE
ACK
BYE
200 OK
From callee, before accepted
Reject with 486 Busy Here
C
S
Call Flow for basic call: UA to proxy to UA
Call setup
Call parameter modification
100 trying hop by hop
180 ringing
200 OK acceptance
re-INVITE
Same as initial INVITE,
updated session description
INVITE
100 Trying
180 Ringing
200 OK
100 Trying
180 Ringing
200 OK
ACK
RTP
Termination
INVITE
BYE
BYE method
200 OK
Privacy and Identity
RFC 3325: A Private Extension for Asserted
Identity in Trusted Networks
RFC 3323: A Privacy Mechanism for SIP
RFC 4474: SIP Identity
RFC3325 Asserted Identity
Trust Domain
INVITE
P-Asserted-Identity:
sip:[email protected]
Authenticates
Caller and verifies
identity. Adds PAID.
RFC3323 – SIP Privacy
Trust Domain
INVITE
P-Asserted-Identity:
sip:[email protected]
From: anonymous
INVITE
Privacy: id
From: anonymous
Anonymous
Caller
INVITE
From: anonymous
4474: SIP Identity
INVITE
From:
sip:[email protected]
INVITE
From:
sip:[email protected]
Identity: asd87f7as66sda8z
Authenticates
Caller and verifies
identity. Signs Request.
Verifies
Signature
Only useful for user@domain addresses!
Transfers and Dialog Movement: REFER
(RFC 3515)
Alice
3
1
REFER
Refer-To: Bob
INVITE Bob
Referred-By: Joe
4
2
Joe
Bob
Third Party Call Control (3pcc): RFC 3725
INVITE
no SDP
3
1
ACK
SDP B
2
200
SDP A
5
4
200
SDP B
6
RTP
INVITE
SDP A
SIP and Quality of Service
RFC 3312: Integration of Resource
Management with SIP
Problem
How to make sure phone doesn’t
ring unless resources are reserved
INVITE w. Preconditions
183 Progress
QoS Reservations
Solution
SIP does not do resource
reservation!
SIP INVITE tells far side not to ring
Both sides do regular QoS
reservations
RSVP
PDP context activation
UPDATE to change state
UPDATE w. Preconditions
180 Ringing
200 OK
ACK
Security
VoIP Security
The only totally secure system I know of is
a rock
-
Tony Lauck, circa 1985
But Even Rocks can be Insecure..
It Had a Great User Interface
But it had a serious security vulnerability…
VoIP Attacks
Attack
Solution
Free Calls aka Toll Fraud
Impersonation
User Authentication
User Authentication,
Secure Caller ID
SIP Encryption, Media
Encryption
Learning Private
Information (calling
patters, PIN codes)
Steal Calls
DoS
SIP Encryption, Media
Encryption
ICE, Others
SIP User Authentication
RTP
We want this SIP server to authenticate
this user
and this SIP server to authenticate
this user
SIP Digest Authentication
Digest= Hash(joe, a7szh1,
myPassword) = z0v88a6
Hi, I’d like
to SIP
REGISTER
401 –
OK, try
again.
Nonce=a7szh1
REGISTER
Nonce=a7szh1
Username=joe
Digest=z0v88a6
Digest= Hash(joe, a7szh1,
myPassword)
OK, done!
Offline Dictionary Attack
Digest= Hash(joe, a7szh1,
alligator) =
REGISTER
Nonce=a7szh1
Username=joe
Digest=z0v88a6
Word
Hash(joe, a7szh1,word)
Aardvark 9z8v77a
Abacus
lkf88z7
Abate
8z77x
…….
Alligator z0v88a6
Digest= Hash(joe, a7szh1,
alligator)
OK, done!
Solution: Digest over TLS
Digest= Hash(joe, a7szh1,
alligator) =
TLS
Armor
This is how
Web Security works!
Digest= Hash(joe, a7szh1,
alligator)
Even Stronger: Mutual TLS for Devices
a.com
TLS
Armor
MAC
8x7a6
Phone has a
Certificate
which identifies
it
SIP Encryption
RTP
We want each SIP hop to be
Encyprted so only the SIP
servers and endpoints see the
signaling.
SIP Encryption: TLS
a.com
RTP
b.com
Mutual TLS
Authentication
Media Encryption
Countermeasure against:
Eavesdropping
Barge-in
Modification
Two useful techniques
IPSEC
SRTP
Complications
Key management
Legal intercept (who has the keys)
Firewall and NAT issues (covered later)
Alternative: Secure RTP
Authentication and encryption of RTP and RTCP
packets
V P X
CC M
PT
sequence number
timestamp
synchronization source (SSRC) identifier
contributing sources (CCRC) identifiers
…
RTP extension (optional)
RTP payload
SRTP MKI -- 0 bytes for voice
Authentication tag -- 4 bytes for voice
Encrypted portion
Authenticated portion
SRTP
Advantages
Provides both Privacy via encryption and authentication via
message integrity check
Very little bandwidth overhead
Uses modern strong crypto suites: AES counter mode for
encryption and HMAC for message integrity
Disadvantages
Needs key management
End-to-end versus hop-by-hop trust tradeoffs in protecting
keys
Yet another security mechanism to ensure is implemented
and deployed correctly
Does not break header compression schemes like cRTP
For very low-rate channels (e.g. cellular) can sacrifice
authentication and have no packet expansion.
NAT Traversal
What is NAT?
Network Address Translation
(NAT)
Creates address binding
between internal private and
external public address
Modifies IP Addresses/Ports in
Packets
Benefits
Avoids network renumbering on
change of provider
Allows multiplexing of multiple
private addresses into a single
public address ($$ savings)
Maintains privacy of internal
addresses
S: 10.0.1.1:6554
D: 67.22.3.1:80
IP Pkt
Client
S: 1.2.3.4:8877
D: 67.22.3.1:80
IP Pkt
N
N
A
A
TT
Binding Table
Internal
External
10.0.1.1:6554 -> 1.2.3.4:8877
Problem: Getting SIP Through NATs
RTP to 10.0.1.1
N
A
T
INVITE sip:[email protected]
m=audio 3456 RTP/AVP 0
c=IN IP4 10.0.1.1
Solution Space
Application Layer Gateways (ALGs)
Session Border Controllers (SBC)
Simple Traversal of UDP Through NAT
(STUN)
Traversal Using Relay NAT (TURN)
Interactive Connectivity Establishment (ICE)
Application Layer Gateway
RTP to 10.0.1.1
INVITE sip:[email protected]
m=audio 3456 RTP/AVP 0
c=IN IP4 10.0.1.1
N
A
T
ALG
INVITE sip:[email protected]
m=audio 1234 RTP/AVP 0
c=IN IP4 19.1.3.2
NAT also modifies SIP
messages to fix them up!
ALG Benefits and Drawbacks
Drawbacks
Doesn’t work when security
turned on
Hard to diagnose problems
Requires network upgrade to
support new app
Frequent implementation
problems (lack of expertise)
Incentives mismatched
Benefits
No change to clients or
servers
Session Border Controller
9.8.7.6
INVITE sip:[email protected]
m=audio 3456 RTP/AVP 0
c=IN IP4 10.0.1.1
INVITE sip:[email protected]
N
A
T
SBC
SBC relays
RTP back to
source
m=audio 3225 RTP/AVP 0
c=IN IP4 9.8.7.6
RTP to
9.8.7.6
SBC Benefits and Drawbacks
Drawbacks
Expensive media relaying
Interferes with some SIP
extensions
Breaks more advanced SIP
security
Benefits
No change to clients or
NATs
Works with basic SIP
security mechanisms
Easier to diagnose
Simple Traversal of UDP Through NAT
(STUN)
9.8.7.6
What is my IP address
and port please?
Its
1.2.3.4:
3472
1.2.3.4
N
A
T
STUN
Server
INVITE sip:[email protected]
m=audio 3472 RTP/AVP 0
c=IN IP4 1.2.3.4
RTP to
1.2.3.4
STUN Benefits and Drawbacks
Drawbacks
Doesn’t always work
Benefits
No change to servers or
NATs
Works with all SIP
security mechanisms
Can support non-VoIP
apps (e.g., games)
Traversal Using Relay NAT (TURN)
9.8.7.6
Give me an IP address
and port please?
9.8.7.6:
2376
1.2.3.4
TURN
Server
RTP to
1.2.3.4
N
A
T
INVITE sip:[email protected]
m=audio 2376 RTP/AVP 0
c=IN IP4 9.8.7.6
TURN Benefits and Drawbacks
Drawbacks
Expensive Media Relaying
Benefits
No change to servers or
NATs
Works with all SIP
security mechanisms
Can support non-VoIP
apps (e.g., games)
Interactive Connectivity Establishment
(ICE)
Hybrid of STUN and
TURN
P2P NAT Traversal
Widely Deployed on
Internet
Popular with
Application Providers
ICE Step 1: Allocation
Before Making a Call, the
Client Gathers
Candidates
Each candidate is a
potential address for
receiving media
Three different types of
candidates
Host Candidates
Server Reflexive
Candidates (STUN)
Relayed Candidates
(TURN)
TURN candidates
reside on a TURN
server
STUN
Host
Candidates reside
on the agent itself
TURN
STUN candidates
are addresses residing
on a NAT
NAT
NAT
ICE Step 2: Create Offer
Each candidate is
placed into an
a=candidate attribute
of the offer
Each candidate line
has IP address and
port plus other info
needed for ICE
c=IN IP4 192.0.2.3
t=0 0
m=audio 45664 RTP/AVP 0
a=rtpmap:0 PCMU/8000
a=candidate:1 1 UDP 2130706178 10.0.1.1
8998 typ host
a=candidate:2 1 UDP 1694498562 192.0.2.3
45664 typ srflx raddr 10.0.1.1 rport 8998
ICE Step 3: Send INVITE
Caller sends a SIP
INVITE as normal
No ICE processing by
SIP servers
SIP
Server
INVITE
ICE Step 4: Allocation
Called party does
exactly same
processing as caller
and obtains its
candidates
Recommended to not
yet ring the phone!
STUN
TURN
NAT
NAT
ICE Step 5: Provisional Response
Callee sends a
provisional response
containing its SDP with
candidates
As with INVITE, no
processing by proxies
Phone has still not rung
yet
SIP
Proxy
1xx
ICE Step 6: Verification
Each agent pairs up its
candidates (local) with its
peers (remote) to form
candidate pairs
Each agent sends a
STUN-based ping on
each pair, starting at
highest priority
If a response is received
the check has succeeded
and we know media can
flow on that pair!
TURN
Server
TURN
Server
5
4
NAT
NAT
2
3
NAT
NAT
1
ICE Benefits and Drawbacks
Drawbacks
Requires client changes
Requires other side to
support it
Benefits
Always Works
No change to servers or
NATs
Works with all SIP security
mechanisms
Minimum Media Relaying
Can support non-VoIP apps
(e.g., games)
Built-In Anti-DOS
Eliminates Ghost Rings
That’s it!
Questions?
Glossary
Advanced Intelligent Network
Adaptive PCM
Border Gateway Protocol
Communication Access for Law
Enforcement Act
Constant Bit Rate
CBR
Code Excited Linear Prediction
CELP
CODEC Coder/Decoder
Common Open Policy Service
COPS
Compressed RTP
CRTP
Contributing Source
CSRC
Computer-Telephony
CTI
Integration
Diffserv Code Point
DSCP
Digital Subscriber Line
DSL
Digital Signal Processor
DSP
DTMF Dual Tone Multi-Frequency
Echo Return Loss
ERL
ERL Enchancement
ERLE
Hybrid Fiber/Coax
HFC
AIN
ADPCM
BGP
CALEA
IN
ISDN
ISUP
JTAPI
LDAP
MCML
MGCP
MOS
MPLS
NLP
NTP
PCM
PPP
PHB
PQ
PSTN
Intelligent Network
Integrated Services Digital
Network
ISDN User Part
Java Telephony API
Lightweight Directory Access
Protocol
Multi-class Multi-link PPP
Media Gateway Control
Protocol
Mean Opinion Score
Multi-protocol Label Switching
Non-linear Processing
Network Time Protocol
Pulse Coded Modulation
Point-to-point Protocol
Per-hop Behavior
Priority Queueing
Public Switched Telephony
Network
Glossary (2)
QoS
RED
RTCP
RTP
SCP
SIP
SS7
SSRC
TAPI
TDM
TRIP
TSPEC
WFQ
Quality of Service
Random Early Detect (or Drop)
Realtime Transport Control
Protocol
Realtime Transport Protocol
Service Control Point
Session Invitation Protocol
Signaling System Number 7
Synchronization Source
Telephony API
Time Division Multiplexed
Telephony Routing Information
Protocol
Transmission Specification
Weighted Fair Queueing
Thanks
Enjoy Interop!
to contact me: [email protected]