Rethinking OS support for high-speed networking

Download Report

Transcript Rethinking OS support for high-speed networking

OS support for (flexible)
high-speed networking
u
uspace
k
kspace
n
nspace
Herbert Bos
Vrije Universiteit Amsterdam
• monitoring
• network processors
• intrusion detection
• multiple concurrent apps
• packet processing
(NATs, firewalls, etc.)
• optical infrastructure
we’re hurting because of
●
hardware problems
●
software problems
●
language problems
●
lack of flexibility
●
●
network speed may
grow too fast anyway
multiple apps problem
CPU
MEM
NIC
What are we building?
flows
different
barriers
What are we building?
if (pkt.proto == TCP) then
if (http) then
scan webattacks;
else if (smtp)
scan spam;
scan viruses
fi
else if (pkt.proto == UDP) then
• reconfigure when load changes
if
(NFS
traffic)
• dynamic optical infrastructure:
= statistics
(pkt);
-mem
configure
at startup
time
else
scan SLAMMER;
- construct
datapaths
fi
reduce copying
●
FFPF avoids both ‘horizontal’ and ‘vertical’ copies
Application A
Application B
U
K
‘filter’
- no ‘vertical’ copies
- no ‘horizontal’ copies
within flow group
- more than ‘just filtering’
in kernel (e.g.,statistics)
minimise copying,
context switching
different
copying
regimes
endianness
new
languages
control

dataplane
different
buffer
regimes
flowgraph
=
DAG
combine
multiple
requests
Software structure
applications
FFPF toolkit
libpcap
FFPF
MAPI
userspace
userspace-kernel boundary
FFPF
host- NIC boundary
FFPF
kernelspace
ixp2xxx
current status
●
●
●
working for Linux PC with
–
NICs
–
IXP1200s and IXP2400s (plugged in)
–
remote IXP2850 (with poetic license)
rudimentary support for distributed packet
processor
speed comparable to tailored solutions
Thank you!
http://ffpf.sourceforge.net/
bridging barriers
Three flavours
Three flavours of packet processing on IXP &
host
• Regular
-
copy only when needed
-
may be slow depending on access pattern
• Zero copy
-
never copy
-
may be slow depending on access pattern
• Copy once
-
copy always
11
combining filters
(dev,expr=/dev/ixp0)
> [ [ (BPF,expr=“…”) >(PKTCOUNT) ] | (FPL2,expr=“…”) ] >
(PKTCOUNT)
Example 1: writing applications
1.
int main(int argc, char** argv) {
void *opaque_handle;
2.
int count, returnval;
3.
/** pass the string to the subsystem. It
initializes and returns an opaque pointer */
4.
5.
opaque_handle = ffpf_open(argv[1], strlen(argv[1]));
6.
if (start_listener(asynchronous_listener, NULL))
7.
{
8.
printf (WARNING,“spawn failed\n");
9.
ffpf_close (opaque_handle); return -1;
10.
}
11.
count = ffpf_process(FFPF_PROCESS_FOREVER);
12.
stop_listener(1);
13.
returnval = ffpf_close(opaque_handle);
14.
printf(,"processed %d packets\n",count);
15.
return returnval;
16.
}
Example 1: writing applications
1.
void* asynchronous_listener(void* arg) {
2.
int i;
3.
while (!should_stop()) {
4.
sleep(5);
5.
for (i=0;i<export_len;i++) {
printf("flowgrabber %d: read %d, written%d\n", i,
get_stats_read(exported[i]),
get_stats_processed(exported[i]));
6.
while ((pkt = get_packet_next(exported[i]))) {
7.
dump_pkt (pkt);
8.
}
9.
}
10.
11.
}
12.
return NULL;
Example 2: FPL-2
• new pkt processing language: FPL-2
•
•
•
for IXP, kernel and userspace
simple, efficient and flexible
simple example: filter all webtraffic
IF ( (PKT.IP_PROTO == PROTO_TCP)
&& (PKT.TCP_PORT == 80)) THEN RETURN 1;
•
more complex example: count pkts in all TCP flows
IF (PKT.IP_PROTO == PROTO_TCP) THEN
R[0] = Hash[ 14, 12, 1024];
M[ R[0] ]++;
FI
FPL-2
●
all common arithmetic and bitwise operations
●
all common logical ops
●
all common integer types
●
–
for packet
–
for buffer ( useful for keeping state!)
statements
–
Hash
–
External functions
●
to call hardware implementations
●
to call fast C implementations
–
If … then … else
–
For … break; … Rof
–
Return
Example application:
dynamic ports
1. // R[0] stores no. of dynports found (initially 0)
2. IF (PKT.IP_PROTO==PROTO_TCP) THEN
3.
IF (PKT.TCP_DPORT==554) THEN
4.
M[R[0]]=EXTERN("GetDynTCPDPortFromRTSP",0,0);
5.
R[0]++;
6.
ELSE // compare pkt’s dst port to all ports in array – if match, return pkt
7.
FOR (R[1]=0; R[1] < R[0]; R[1]++)
8.
IF (PKT.TCP_DPORT == M[ R[1] ] ) THEN
9.
RETURN TRUE;
10.
FI
11.
ROF
11.
FI
12. FI
12. RETURN FALSE;
efficient
userspace
?
reduced copying and context switches
kernel
● sharing data
●
●
flowgraphs: sharing computations
?
x
“push filtering tasks as far down the processing
hierarchy as possible”
?
?
network card
Network Monitoring
●
Increasingly important
–
●
traffic characterisation, security
traffic engineering, SLAs, billing, etc.
Existing solutions:
–
designed for slow networks
or traffic engineering/QoS
–
not very flexible
We’re hurting because of
spread of SAPPHIRE in 30 minutes
●
–
hardware (bus, memory)
–
software (copies, context switches)
 demand for solution:
- scales to high link rates
- scales in no. of apps
- flexible
-process at lowest possible level
-minimise copying
-minimise context switching
-freedom at the bottom
generalised notion of flow
Flow: “a stream of packets that match arbitrary user criteria”
TCP SYN
UDP with
CodeRed
bytecount
eth0
HTTP
U
IP
TCP
RTSP
“contains worm”
UDP
RTP
 Flowgraph
UID 0
reduce copying
●
FFPF avoids both ‘horizontal’ and ‘vertical’ copies
Application A
U
K
‘filter’
Application B
Extensible
(device,eth0) | (device,eth1) -> (sampler,2) -> (FPL-2,”..”) | (BPF,”..”) -> (bytecount)
(device,eth0) -> (sampler,2) -> (BPF,”..”) -> (packetcount)
✔
modular framework
✔
language agnostic
✔
plug-in filters
BuffersR
O
O
●
●
PacketBuf
–
circular buffer with N fixed-size slots
–
large enough to hold packet
O
O
IndexBuf
–
circular buffer with N slots
–
contains classification result + pointer
O
O
O
W
Buffers
O
O
●
●
PacketBuf
–
circular buffer with N fixed-size slots
–
large enough to hold packet
O
O
IndexBuf
–
circular buffer with N slots
–
contains classification result + pointer
O
R
O
O
W
Buffers
X
X
●
●
PacketBuf
–
circular buffer with N fixed-size slots
–
large enough to hold packet
X
X
IndexBuf
–
circular buffer with N slots
–
contains classification result + pointer
X
O
R
O
W
R1
Buffer managementO
 what to do if writer catches
up with slowest reader?
●
–
●
fast reader preference
–
application responsible for keeping up
●
●
O
O
O
can check that packets have been overwritten
different drop rates for different apps
O
O
overall speed determined by slowest reader
overwrite existing packets
O
O
drop new packets
(traditional way of dealing with this)
–
O
O
slow reader preference
–
O
O
O
O
O
O
W
R1
Languages
IF (PKT.IP_PROTO == PROTO_TCP)
THEN
// reg.0 = hash over flow fields
R[0] = Hash (14,12,1024)
●
FFPF is language neutral
●
Currently we support:
–
BPF
–
C
–
OKE Cyclone
–
FPL
// increment pkt counter at this
// location in MBuf
MEM[ R[0] ]++
FI
• simple to use
• compiles to optimised native code
• resource limited (e.g., restricted FOR loop)
• access to persistent storage (scratch memory)
• calls to external functions (e.g., fast C functions
or hardware assists)
• compiler for uspace, kspace, and nspace (ixp1200)
packet sources
●
currently three kinds implemented
-netfilter
-net_if_rx()
-IXP1200
uspace
●
kspace
implementation on IXPs: NIC-FIX
-bottom of the processing hierarchy
-eliminates mem & bus bottlenecks
nspace
Network Processors
“programmable NIC”
zero copy
copy once
on-demand copy
Performance results
pkt loss:
FFPF: < 0.5%
LSF: 2-3%
Performance results
pkt loss:
LSF:64-75%
FFPF: 10-15%
Performance
Copy St rat egies
100
reference
d rop
accept
90
processed (in %)
80
70
60
50
40
30
20
10
0
r eg ular copy
copy once
zer o copy
Summary
concept of ‘flow’  generalised
copying and context switching  minimised
processing in kernel/NIC  complex programs + ‘pipes’
FPL: FFPF Packet Languages  fast + flexible
persistent storage  flow-specific state
authorisation + third-party code any user
flow groups  applications sharing packet buffers
Thank you!
http://ffpf.sourceforge.net/
microbenchmarks
we’re hurting because of
●
hardware problems
●
software problems
●
language problems
●
lack of flexibility
●
●
network speed may
grow too fast anyway
multiple apps problem
• kernel: too little, too often
• vertical + horizontal copies
• context
popular switching
ones: not expressive
• expressive ones: not popular
• heterogeneous hardware
• same for speed
• programmable cards, routers,
•• no
way40Gbps:
to mix and
match
10

what
will you do
etc.
CPU
MEM
in
10
ns?
• no easy way to combine
• build
beyond
of single
oncapacity
abstractions
that are
processor
too
hi-level
• hard to use as ‘generic’ packet
processor
NIC