Transcript threads

Combining Events and Threads
for Scalable Network Services
Peng Li and Steve Zdancewic
University of Pennsylvania
PLDI 2007, San Diego
1
Overview
 A Haskell framework for massively concurrent network
applications
 Servers, P2P systems, load generators
 Massive concurrency ::=
1,000 threads?
(easy)
A lazy, purely
functional
language
| 10,000 programming
threads? (common)
| 100,000http://www.haskell.org
threads? (challenging)
| 1,000,000 threads? (20 years later?)
| 10,000,000 threads? (in 15 minutes)
 How to write such programs?
 The very first decision to make: the programming model
Shall we use threads or events?
2
Threads vs. Events
 The event-driven model:
The multithreaded model
 One thread ↔ 10000 clients
 Asynchronous I/O
 Scheduling: programmer
 One thread ↔ one client
 Synchronous I/O
 Scheduling: OS/runtime libs
“Why threads
a bad
int send_data(int
fd1, are
int fd2)
{ idea
(for most purposes)”
while (!EOF(fd1))
{
[USENIX
ATC 1999]
size =
read_chunk(fd,
buf, count);
write_chunk(fd, buf, size);
}
events are a bad idea
while(1)“Why
{
(for high-concurrency
servers)”
nfds=epoll_wait(kdpfd,
events,
MAXEVT,-1);
for(n=0; n<nfds;
++n) 2003]
[HotOS
handle_event(events[n]);
…
…
Threads
Events
Expressiveness and
Abstraction
(for programming each client)
Synchronous I/O + intuitive Finite state machines /
control flow primitives
Continuation-passing style
(CPS) programming
Flexibility and Control
(for resource scheduling)
Baked into OS/runtime,
difficult to customize

Programmer has complete
control – tailored to each
application’s needs
3
Can we get the best of both worlds?
Programming with each client: threads
• Synchronous I/O
• Intuitive control-flow primitives
One
application
program
The bridge between threads/events?
(some kind of “continuation” support)
Resource scheduling: events
•Written as part of the application
• Tailored to application’s needs
4
Roads to lightweight,
application-level concurrency
 Direct language support for continuations:
 Good if you have them
 Source-to-source CPS translations
 Requires hacking on compiler/runtime
 Often not very elegant
 Other solutions?
 (no language support)
 (no compiler/runtime hacks)
5
The poor man’s concurrency monad
 “A poor man’s concurrency monad” by Koen Claessen,
JFP 1999. (Functional Pearl)
 The thread interface:
 The CPS monad
 The event interface:
 A lazy, tree-like data structure called “trace”
Internal
Representation
Multithreaded code
Scheduler code
SYS_NBIO(accept)
CPS
Monad
Event Abstraction
SYS_EPOLL_WAIT(s)
Thread Abstraction
server_loop s = do {
sock <- sock_accept s;
sys_fork (client sock);
server_loop;
}
client_loop sock = do {
sock_send sock data;
sock_close sock;
}
sock_send sock data = do {
...
n<-sys_nbio (write_nb ...);
...;
sys_epoll_wait sock EPOLL_READ;
...
foo;
...
n<-sys_nbio (write_nb ...);
...
}
Trace
SYS_NBIO(accept)
SYS_FORK
SYS_NBIO(write_nb)
SYS_EPOLL_WAIT(sock)
SYS_NBIO(write_nb)
scheduler = do {
...
trace <- fetch_thread;
execute trace;
...
}
execute trace =
case trace of
SYS_NBIO c ->
do { cont <- c;
execute cont;
}
SYS_FORK t1 t2 ->
...
6
Questions on the poor man’s approach
Does it work for high-performance
network services?
(using a pure, lazy, functional language?)
 How does the design scale up to real systems?
 Symmetrical multiprocessing? Synchronization? I/O?
 How cheap is it?
 How much does a poor man’s thread cost?
 How poor is it?
 Does it offer acceptable performance?
7
Our experiment
A high-performance Haskell framework for massivelyconcurrent network services!!!
 Supported features:





Linux Asynchronous IO (AIO)
epoll() and nonblocking IO
OS thread pools
SMP support
Thread synchronization primitives
 Applications developed




IO benchmarks on FIFO pipes / Disk head scheduling
A simple web server for static files
HTTP load generator
Prototype of an application-level TCP stack
We used the Glasglow Haskell Compiler (GHC)
8
Multithreaded code example
Nested function calls
Exception handling
Conditional branches
Synchronous call to
I/O lib
Recursion
9
Event-driven code example
An event loop running in a
separate OS thread
A wrapper function to the C
library call using the Haskell
Foreign Function Interface (FFI)
Put events in queues for
processing in other OS threads
10
A complete event-driven I/O
One “virtual
subsystem
processor” event
loop for each
ready_queue
CPU
System call completion, thread ready to run
Fetch threads
OS thread pool for CPS thread execution
worker_main
OS thread pool for blocking I/O
Context switch
/ fork
worker_blio
blio_queue
Fetch thread
worker_main
worker_main
worker_main
Haskell Foreign
Function
Inteface (FFI)
worker_blio
SYS_BLIO
Register event handler
Submit AIO request
with event handler
Epoll
interface
AIO interface
Event notification
AIO completion
worker_epoll
worker_aio
Each event loop
runs in a
separate OS
thread
11
Modular and customizable I/O system
(add a TCP stack if you like)
ready_queue
System call completion, thread ready to run
Fetch threads
OS thread pool for CPS thread execution
worker_main
OS thread pool for blocking I/O
Context switch
/ fork
worker_blio
blio_queue
SYS_BLIO
Fetch thread
worker_main
Define
/ interpret Register
TCPevent handler
worker_main
syscalls (22 lines)
Epoll
interface
Event
Event notification
worker_blio
worker_epoll
loop for incoming
packets (7 lines)
worker_main
Submit AIO request
with event handler
AIO interface
worker_main
Blocking
TCP User requests
TCP stack
states
AIO completion
Request Completion
Request Completion
Event loop for timers
(9 lines)
worker_aio
worker_tcp_input
worker_tcp_timer
12
How cheap is a poor man’s thread?
 Minimal memory consumption:
48 bytes
48 bytes
 Each thread just loops and does nothing
 Actual size determined by thread-local states
 Even an ethernet packet can be >1,000 bytes…
 Pay as you go --- only pay for things needed
In contrast:
 A Linux POSIX thread’s stack has
2MB by default
 The state-of-the-art user-level thread system (Capriccio) use at
least a few KBs for each thread
Observation:
The poor man’s thread is extremely memory-efficient
(Challenging most event-driven systems)
13
I/O scalability test
 Comparison against the Linux POSIX Thread Library
(NPTL)
 Highly optimized OS thread implementation
 Each NPTL thread’s stack limited to 32KB
 Mini-benchmarks used:
 Disk head scheduling (all threads running)
 FIFO pipe scalability with idle threads (128 threads running)
14
A simple web server
15
How poor is the poor man’s monad?
 Not too shabby
 Benchmarks shows comparable (if not higher)
performance to existing, optimized systems
 An elegant design is more important than 10%
performance improvement
type safety for
 Added benefit:
many dangerous things
 Continuations, thread queues, schedulers, asynchronous I/O
16
Related Work
 We are motivated by two projects:
 Twisted: the python event-driven framework for
scalable internet applications
- The programmer must write code in CPS
 Capriccio: a high-performance user-level thread
system for network servers
- Requires C compiler hacks
- Difficult to customize (e.g. adding SMP support)
 Continuation-based concurrency
 [Wand 80], [Shivers 97], …
 Other languages and programming models:
 CML, Erlang, …
17
Conclusion
 Haskell and The Poor Man’s Concurrency
Monad are a promising solution for highperformance, massively-concurrent networking
applications:
Get the best of both threads and events!
 This poor man’s approach is actually very
cheap, and not so poor!
http://www.cis.upenn.edu/~lipeng/homepage/unify.html
18