Columbia University Department of Computer Science

Download Report

Transcript Columbia University Department of Computer Science

Scaling SIP Servers
Sankaran Narayanan
Joint work with CINEMA team
IRT Group Meeting – April 17, 2002
Agenda





Introduction
Issues in scaling
Facets of sipd architecture
Some results
Conclusion and Future Work
Introduction – SIP servers


SIP Signaling – Proxy,
redirect
Proxies





Call routing by contact location
UDP/TCP/TLS
Stateful or stateless
Programmable scripts
User location – Registrars
SQL
database
What is scale ?



Large call volumes,
commodity hardware
[Schu0012:Industrial]
Response times (mean,
deviation), Turn around time
Goals

200 OK
INVITE
Delay budget [SIPstone]



REGISTER
R2 < 2 s
R1 < 500 ms
Class-5 switches handle
> 750K BHCA
INVITE
R2
180
180
200
200
ACK
ACK
R1
Limits to scaling

Not CPU bound




OS resource limits



Network I/O – blocking
Wait for responses
Latency: Contact, DNS lookups
Open files (<= 1024 on Unix)
LWP’s (Solaris) vs. user-kernel threads (Linux,
Windows)
Try not to…


Customize and recompile OS
(parts) server into kernel (khttpd, AFPA, …)
The problem

Scaling CPU-bound jobs (throughput=1/delay)





Hardware: CPU speed, RAM, …
Software: better OS, scheduler, …
Algorithm: optimize protocol processing
Blocking (Network, Disk I/O) is expensive
Hypothesis


I/O-bound
CPU-bound; reduce blocking
Optimized resource usage – stability at high loads
Facets of sipd architecture




Blocking
Process models
Socket management
Protocol processing
Blocking


Mutex, event (socket,
timeout), fread
Queue builds up



Potentially high variability
Tandem queue system
Easy to fix


Non-blocking calls (event
driven, later!)
Move queue to different
thread (lazy logger)
Logger
{
lock;
write;
unlock;
}
Blocking (2)

Call routing involves ( 1)
contact lookups


10 ms per query (approx)
Cache



Works well for sipd style
servers
Fetch-on-demand with
replacement (harder)
Loading entire database is easy


need for refresh – long lived
servers.
Potentially useful for DNS
SRV lookups (?)
SQL
database
Periodic
Refresh
Cache
< 1 ms
REGISTER performance
Single CPU Sun Ultra10
Response time is constant for Cache (FastSQL)
Process models (1)
One thread per
request
 Doesn’t scale
Too many threads
over a short
timescale


R1
Stateless proxy: 2-4
threads per
transaction
High load affects
throughput
R2
R3
R4
Throughput

Incoming
Requests
R1-4
Load
Incoming
Requests
R1-4
Process models (2)
Thread pool + Queue

Thread overhead less;
more useful processing

Overload management

drop requests over
responses, drop tail
Not enough if holding
time is high

Each request holds
(blocks) a thread
Fixed number of threads
Throughput

Load
Stateless proxy (Solaris)
Turnaround time is almost constant for stateless proxy
•
•
The sudden increase in response time - client problem
UDP losses on Ultra10 @ (120 * 6 * 500 * 8) bps
Stateless proxy (Linux)
Request turnaround time breaks down
Response turnaround time is constant
Effect of high holding times and thread scheduling
How to set queue size – investigate?
Queue evolution for sipd
Number of requests (y-axis) waiting in the queue
for a free thread on Solaris (left) and Linux (right)
over a period of up-time (x-axis).
Process models (3)

Blocking thread model needs “too many”
threads



Stateful transaction stays for 30 s
Return thread to free pool instead of blocking
Event-driven architectures

State transition triggered by a global event
scheduler


OnIncoming1xx(), OnInviteTimeout(), …
SIP-CGI: pre-forked multiple processes
Socket management





Problem: open sockets limit (1024),
“liveness” detection, retransmission
One socket per transaction does not
scale
Global socket if downstream server is
alive, soft state – works for UDP
Hard for TCP/TLS – connections
Worse for Java servers – no select, poll
Optimizing protocol processing




Not too useful if CPU is not the
bottleneck
Text protocol - parsing, formatting
overheads
Order of headers matter (Via)
Other optimizations (parse-on-demand,
date formatting)
...
Conclusion



Unlike web servers: can be stateful, less disk
I/O, lesser impact of TCP stack/behavior, …
Pros: UDP, Stateless routing, Load-balancing
using DNS, …
Challenges: scaling state machine,




Towards 2.5M BHCA (3600 messages/s)
Event driven architecture (SEDA?)
Resource management (file limits, threads)
Tuning operating system (scheduler, …)
Future work

Stateful proxy performance





Evaluate event driven architecture
Effect of request forking (> 1 contacts) on
server behavior
Programmable scripts
Queue management and overload
control
Other types of servers (conference
servers, media servers, etc.),
References



CINEMA web page.
http://www.cs.columbia.edu/IRT/cinema
H. Schulzrinne. “Industrial strength internet
telephony,” Presentation at 6th SIP bakeoff,
Dec. 2000.
H. Schulzrinne et. al. “SIPstone –
Benchmarking SIP server performance,” CS
Technical report, Columbia University.