Services in CINEMA
Download
Report
Transcript Services in CINEMA
SIP Server Scalability
IRT Internal Seminar
Kundan Singh, Henning Schulzrinne
and Jonathan Lennox
May 10, 2005
Agenda
Why do we need scalability?
Scaling the server
Scaling using load sharing
SIP express router (Iptel.org)
SIPd (Columbia University)
Threads/Processes/Events
DNS-based, Identifier-based
Two stage architecture
Conclusions
27 slides
2
Internet telephony
(SIP: Session Initiation Protocol)
[email protected]
yahoo.com
example.com
INVITE
REGISTER
INVITE
129.1.2.3
[email protected]
192.1.2.4
DB
DNS
3
Scalability Requirements
Depends on role in the network architecture
Cybercafe
Edge ISP server
10,000 customers
ISP
IP network
IP phones
ISP
SIP/MGC
SIP/PSTN
Carrier network
GW
Enterprise server
GW
1000 customers
MG
IP
PSTN
PBX
PSTN phones
SIP/MGC
GW
MG
Carrier (3G)
MG
10 million customers
T1 PRI/BRI
PSTN
4
Scalability Requirements
Depends on traffic type
Registration (uniform)
Call routing (Poisson)
Instant message, presence (including sensors), device
control
Stateful calls (Poisson arrival, exponential call
duration)
stateful vs stateless proxy, redirect, programmable scripts
Beyond telephony (Don’t know)
Authentication, mobile users
Firewall, conference, voicemail
Transport type
UDP/TCP/TLS (cost of security)
5
SIPstone
SIP server performance metrics
SQL
database
Steady state rate for
Server
Measure: #requests/s with given delay constraint.
Loader
Handler
REGISTER
200 OK
R1
successful registration, forwarding and unsuccessful
call attempts measured using 15 min test runs.
Performance=f(#user,#DNS,UDP/TCP,g(request),L)
where g=type and arrival pdf (#request/s),
L=logging?
For register, outbound proxy, redirect, proxy480,
proxy200.
Parameters
INVITE
100 Trying
R2
180 Ringing
200 OK
ACK
BYE
200 OK
INVITE
180 Ringing
200 OK
ACK
200 OK
Delay budget: R1 < 500 ms, R2 < 2000 ms
Shortcomings:
BYE
Measurement interval, transaction response time, RPS
(registers/s), CPS (calls/s), transaction failure
probability<5%,
does not consider forking, scripting, Via header,
packet size, different call rates, SSL. Is there linear
combination of results?
Whitebox measurements: turnaround time
Extend to SIMPLEstone
6
SIP server
What happens inside a proxy?
stateful
Response
recvfrom or
accept/recv
parse
Request
Match
transaction
Modify
response
Stateless proxy
Found
Match
transaction
Update DB
REGISTER
other
Stateless proxy
sendto,
send or
sendmsg
Redirect/reject
Lookup DB
Build
response
Proxy
Modify
Request
DNS
(Blocking) I/O
Critical section (lock)
Critical section (r/w lock)
7
Lessons Learnt (sipd)
In-memory database
Call routing involves
( 1) contact lookups
Cache (FastSQL)
10 ms per query
(approx)
Loading entire
database is easy
Periodic refresh
Potentially useful for
DNS lookups
Web config
SQL
database
Periodic
Refresh
Cache
< 1 ms
[2002:Narayanan]
Single CPU Sun Ultra10
Turnaround time vs RPS
8
Lessons Learnt (sipd)
Thread-per-request does not scale
One thread per message
Doesn’t scale
Thread pool + queue
Too many threads over a
short timescale
Stateless: 2-4 threads per
transaction
Stateful: 30s holding time
Overload management
Thread overhead less; more useful processing
Pre-fork processes for SIP-CGI
Graceful failure, drop requests over responses
Not enough if holding time is high
Each request holds (blocks) a thread
Incoming
Requests
R1-4
R1
R2
R3
R4
Throughput
Thread pool with
overload control
Incoming
Requests
R1-4
Thread per request
Load
Fixed number of threads
9
What is the best architecture?
Event-based
Reactive system
1.
Process pool
2.
Each pool process
receives and processes to
the end (SER)
stateful
Response
recvfrom or
accept/recv
parse
Request
Thread pool
3.
Receive and hand-over to pool thread
(sipd)
Each pool thread receives and processes
to the end
Staged event-driven: each stage has a
thread pool
Match
transaction
Modify
response
Stateless proxy
Update DB
Found
Match
transaction
Stateless proxy
REGISTER
other
Lookup DB
sendto,
send or
sendmsg
Redirect/reject
Build
response
Proxy
Modify
Request
DNS
10
Stateless proxy
UDP, no DNS, six messages per call
stateful
Response
recvfrom or
accept/recv
parse
Request
Match
transaction
Modify
response
Stateless proxy
Found
Match
transaction
Stateless proxy
Update DB
REGISTER
other
sendto,
send or
sendmsg
Redirect/reject
Lookup DB
Build
response
Proxy
Modify
Request
DNS
11
Stateless proxy
UDP, no DNS, six messages per call
4
3.5
3
2.5
Event
Th/msg
Th-pool1
Th-pool2
Proc-pool
2
1.5
1
0.5
0
1xP/Linux
4xP/Linux
1xS/Solaris
2xS/Solaris
Architecture
/Hardware
1 PentiumIV 3GHz,
1GB, Linux2.4.20
(CPS)
4 pentium, 450MHz,
512 MB, Linux2.4.20
(CPS)
1 ultraSparc-IIi, 300
MHz, 64MB, Solaris
(CPS)
2 ultraSparc-II, 300
MHz, 256MB, Solaris
(CPS)
Event-based
1650
370
150
190
Thread/msg
1400
TBD
100
TBD
Thread-pool1
1450
600 (?)
110
220 (?)
Thread-pool2
1600
1150 (?)
152
TBD
Process-pool
1700
1400
160
350
12
Stateful proxy
UDP, no DNS, eight messages per call
Event-based
Thread-per-message
single thread: socket listener + scheduler/timer
pool_schedule => pthread_create
Thread-pool1 (sipd)
Thread-pool2
N event-based threads
Each handles specific subset of requests (hash(call-id))
Receive & hand over to the correct thread
poll in multiple threads => bad on multi-CPU
Process pool
Not finished yet
13
Stateful proxy
UDP, no DNS, eight messages per call
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Event
Th/msg
Th-pool1
Th-pool2
1xP/Linux
4xP/Linux
1xS/Solaris
2xS/Solaris
Architecture
/Hardware
1 PentiumIV 3GHz,
1GB, Linux2.4.20
(CPS)
4 pentium, 450MHz,
512 MB, Linux2.4.20
(CPS)
1 ultraSparc-IIi,
360MHz, 256 MB,
Solaris5.9 (CPS)
2 ultraSparc-II, 300
MHz, 256 MB,
Solaris5.8 (CPS)
Event-based
1200
300
160
160
Thread/msg
650
175
90
120
Thread-pool1
950
340 (p=4)
120
120 (p=4)
Thread-pool2
1100
500 (p=4)
155
200 (p=4)
Process-pool
-
-
-
-
14
Lessons Learnt
What is the best architecture?
Stateless
CPU is bottleneck
Memory is constant
Process pool is the
best
Event-based not
good for multi-CPU
Thread/msg and
thread-pool similar
Thread-pool2 close
to process-poll
Stateful
Memory can become
bottle-neck
Thread-pool2 is good
But not N x CPU
Not good if P
CPU
Process pool may be
better (?)
15
Lessons Learnt (sipd)
Avoid blocking function calls
DNS
10-25 ms (29 queries)
Cache
Lazy logger as a separate thread
Date formatter
non-blocking
Logger
110 to 900 CPS
Internal vs external
Logger:
while (1) {
lock;
writeall;
unlock;
sleep;
}
Strftime() 10% REG processing
Update date variable every second
random32()
Cache gethostid()- 37s
16
Lessons Learnt (sipd)
Resource management
Socket management
Problems: OS limit (1024), “liveness” detection, retransmission
One socket per transaction does not scale
Socket buffer size
Global socket if downstream server is alive, soft state – works for UDP
Hard for TCP/TLS – apply connection reuse
64KB to 128KB; Tradeoff: memory per socket vs number of sockets
Memory management
Problems: too many malloc/free, leaks
Stateless processing
INV pool 180
200
ACK
BYE
200
REG
Memory
time (s)
Transaction specific memory, free once; also, less memcpy
W/o mempool About155
67
67gain 95
139
62
237
30% performance
W/ mempool
Improvement (%)
Stateful: 650 to 800 CPS; Stateless: 900 to 1200 CPS
200
70
111
49
48
64
106
41
202
48
28
27
28
33
24
34
15
31
17
Lessons Learnt (SER)
Optimizations
Reduce copying and string operations
Reduce URI comparison to local
Data lumps, counted strings (+5-10%)
User part as a keyword, use r2 parameters
Parser
Lazy parsing (2-6x), incremental parsing
32-bit header parser (2-3.5x)
Case compare
Use padding to align
Fast for general case (canonicalized)
Hash-table, sixth bit
Database
Cache is divided into domains for locking
[2003:Jan Janak] SIP proxy server effectiveness, Master’s thesis, Czech Technical University
18
Lessons Learnt (SER)
Protocol bottlenecks and other scalability concerns
Protocol bottlenecks
Parsing
Authentication
Reuse credentials in subsequent requests
TCP
Order of headers
Host names vs IP address
Line folding
Scattered headers (Via, Route)
Message length unknown until Content-Length
Other scalability concerns
Configuration:
broken digest client, wrong password, wrong expires
Overuse of features
Use stateless instead of stateful if possible
Record route only when needed
Avoid outbound proxy if possible
19
Load Sharing
Distribute load among multiple servers
Single server scalability
There is a maximum capacity limit
Multiple servers
DNS-based
Identifier-based
Network address translation
Same IP address
20
Load Sharing (DNS-based)
Redundant proxies and databases
P1
REGISTER
D1
D2
P3
Write to D1 & D2
INVITE
P2
INVITE
REGISTER
Read from D1 or D2
Database write/
synchronization
traffic becomes
bottleneck
21
Load Sharing (Identifier-based)
Divide the user space
P1
a-h
D1
P2
i-q
D2
Use many
Hashing
P3
r-z
Proxy and database
on the same host
First-stage proxy
may get overloaded
Static vs dynamic
D3
22
Load Sharing
Comparison of the two designs
P1
P1
a-h
D1
D1
P2
P3
P2
i-q
D2
D2
High scale
Low reliability
P3
r-z
D2
Total time per DB
((tr/D)+1)TN
((tr+1)/D)TN
= (A/D) + B
= (A/D) + (B/D)
D
N
r
T
t
=
=
=
=
=
number of database servers
number of writes (REGISTER)
#reads/#writes = (INV+REG)/REG
write latency
read latency/write latency
23
Scalability (and Reliability)
Two stage architecture for CINEMA
a*@example.com
a1
s1
Master
a2
a.example.com
_sip._udp
SRV 0 0 a1.example.com
SRV 1 0 a2.example.com
Slave
sip:[email protected]
s2
sip:[email protected]
b*@example.com
s3
ex
example.com
_sip._udp
SRV 0 40 s1.example.com
SRV 0 40 s2.example.com
SRV 0 20 s3.example.com
SRV 1 0 ex.backup.com
b1
Master
b2
Slave
b.example.com
_sip._udp
SRV 0 0 b1.example.com
SRV 1 0 b2.example.com
Request-rate = f(#stateless, #groups)
Bottleneck: CPU, memory, bandwidth?
24
Load Sharing
Result (UDP, stateless, no DNS, no mempool)
S
P CPS
3
3 2800
2
3 2100
2
2 1800
1
2 1050
0
1
900
25
Lessons Learnt
Load sharing
Non-uniform distribution
Stateless proxy
S=800, P=650 CPS
Registration (no auth)
S=1050, P=900 CPS
S3P3 => 10 million BHCA (busy hour call attempts)
Stateful proxy
Identifier distribution (bad hash function)
Call distribution => dynamically adjust
S=2500, P=2400 RPS
S3P3 => 10 million subscribers (1 hour refresh)
Memory pool and thread-pool2/event-based further
increase the capacity (approx 1.8x)
26
Conclusions and future work
Server scalability
Load sharing
Non-blocking, process/events/thread, resource
management, optimizations
DNS, Identifier, two-stage
Current and future work:
Measure process pool performance for stateful
Optimize sipd
Use thread-pool2/event-based (?)
Memory - use counted strings; clean after 200 (?)
CPU - use hash tables
Presence, call stateful and TLS performance
(Vishal and Eilon)
27
Backup slides
Telephone scalability
(PSTN: Public Switched Telephone Network)
database (SCP)
for freephone,
calling card, …
signaling network
(SS7)
local telephone switch
(class 5 switch)
signaling
router
10,000
customers
(STP)
20,000 calls/hour
regional telephone switch
(class 4 switch)
100,000 customers
150,000 calls/hour
“bearer” network
database (SCP)
10 million customers
2 million lookups/hour
signaling router (STP)
1 million customers
1.5 million calls/hour
telephone switch
(SSP)
29
SIP server
Comparison with HTTP server
Signaling (vs data) bound
Transactions
DNS, SQL database
Transport
Stateful wait for response
Depends on external entities
No File I/O (exception: scripts, logging)
No caching; DB read and write frequency are comparable
UDP in addition to TCP/TLS
Goals
Carrier class scaling using commodity hardware
Try not to customize/recompile OS or implement (parts of)
server in kernel (khttpd, AFPA)
30
Related work
Scalability for (web) servers
Existing work
HTTP vs SIP
Connection dispatcher
Content/session-based redirection
DNS-based load sharing
UDP+TCP, signaling not bandwidth intensive, no
caching of response, read/write ratio is
comparable for DB
SIP scalability bottleneck
Signaling (chapter 4), real-time media data,
gateway
302 redirect to less loaded server, REFER session
to another location, signal upstream to reduce 31
Related work
3GPP (release 5)’s IP Multimedia core network Subsystem uses SIP
Proxy-CSCF (call session control function)
Interrogating-CSCF
First contact in operator’s network.
Locate S-CSCF for register
Serving-CSCF
First contact in visited network. 911 lookup. Dialplan.
User policy and privileges, session control service
Registrar
Connection to PSTN
MGCF and MGW
32
Server-based vs peer-to-peer
Reliability,
failover latency
DNS-based. Depends on client
retry timeout, DB replication
latency, registration refresh
interval
DHT self organization and
periodic registration refresh.
Depends on client timeout,
registration refresh interval.
Scalability,
number of users
Depends on number of servers
in the two stages.
Depends on refresh rate,
join/leave rate, uptime
Call setup
latency
One or two steps.
O(log(N)) steps.
Security
TLS, digest authentication,
S/MIME
Additionally needs a reputation
system, working around spy nodes
Maintenance,
configuration
Administrator: DNS, database,
middle-box
Automatic: one time bootstrap
node addresses
PSTN
interoperability
Gateways, TRIP, ENUM
Interact with server-based
infrastructure or co-locate peer node
with the gateway
33
Comparison of sipd and SER
sipd
Thread pool
Events (reactive
system)
Memory pool
PentiumIV 3GHz,
1GB, 1200 CPS, 2400
RPS (no auth)
SER
Process pool
Custom memory
management
PentiumIII 850 MHz,
512 MB => 2000
CPS, 1800 RPS
34