Transcript Document

Insights Into RouterVM’s Flexibility
and Performance
Mel Tsai
[email protected]
Outline
Network Appliance Convergence
Brief Overview of RouterVM & GPFs
GPF Flexibility
GPF Performance
Demo
2
New Requirements in the Enterprise
200 Mbps
1 Gbps
Switch
IP Storage
Gateway
1 Gbps
Switch
SAN
1 Gbps
1 Gbps
Switch
1 Gbps
1 Gbps
Server Blades
2.5 Gbps
Server Load
Balancer
Content Cache
1 Gbps
Switch
Client
Workstations
2.5 - 10 Gbps
Offsite
1-2.5 Gbps
Firewall / VPN
Link
Compressor
ISP
40 Mbps
Edge Router
Intrusion
Detection
Switch
3
Network Appliance Convergence
Recent strong trend towards cascading multiple functions
into one appliance
Netscalar, F5, Redline, Tasman, Inkra
The hardware is coming… We are slowing reaching the
point where we can do almost anything to packet flows
at line rate
But how do you manage multiple devices/functions in your
network?
What about configurability and ease-of-deployment?
Can end-users or administrators program the device?
What about the user interface?
4
RouterVM Overview
RouterVM turns the concept of a “packet filter” into a high-level,
programmable building-block for network appliance applications
RouterVM Generalized Packet Filter (type L7)
Traditional Filter
FILTER 19 SETUP
Classification
Parameters
Action
NAME SIP SMASK DIP DMASK PROTO SRC PORT DST PORT VLAN ACTION -
example
any
255.255.255.255
10.0.0.0
255.255.255.0
tcp,udp
any
80
default
drop
5
RouterVM HTTP Switch Example
6
Trade-offs for GPF Flexibility
fewer
shallower
fewer
Less
flexibility,
easier to use
…and generally
lower
performance?
# of classification fields
classification depth
# of actions
more
deeper
more
fewer
# of programmatic elements
more
fewer
# of packet tagging options
more
fewer
# of control flow options
more
fewer
Extent and variety of per-flow state
more
Greater
flexibility,
more difficult to
use
…and generally
higher
performance?
(cont )
7
Trade-offs for GPF Flexibility
fewer
shallower
fewer
Less
flexibility,
easier to use
…and lower
performance?
# of classification fields
classification depth
# of actions
more
deeper
more
fewer
# of programmatic elements
more
fewer
# of packet tagging options
more
fewer
# of control flow options
more
fewer
Extent and variety of per-flow state
more
Greater
flexibility,
more difficult to
use
…and higher
performance?
(cont )
Where is the sweet spot? Depends on the application and usage scenario!
8
Trade-offs for GPF Flexibility
fewer
shallower
fewer
Less
flexibility,
easier to use
…and lower
performance?
# of classification fields
classification depth
# of actions
more
deeper
more
fewer
# of programmatic elements
more
fewer
# of packet tagging options
more
fewer
# of control flow options
more
fewer
Extent and variety of per-flow state
more
Greater
flexibility,
(somewhat)
more difficult to
use
…and higher
performance?
In addition, a complexity-hiding intelligent interface and the use of smart defaults
can shift the sweet spot towards greater flexibility, without decreasing ease of use.
9
How many GPF types are enough?
Not a simple question, since the number of applications and usage
scenarios supported by a library of GPFs is not equal to the number of
available GPFs
By virtue of a common set of available actions, any GPF can support the
following features:
Programmatic decision making (“if dest_ip == 127.0.0.0 then drop;”)
Server load balancing (“loadbalance table SLB_Table;”)
Packet field rewriting (“rewrite dest_ip 192.168.0.1;”)
Packet duplication (“copy;”)
QoS (“ratelimit 1 Mbps;”)
Packet logging (“log intrusion_log.txt;”)
Network address translation (“nat dir=forward, table=NAT_table;”)
Server health monitoring (“if 192.168.0.5 is alive”);
…and others
In practice, actions serve to multiply the base-level functionality of a
given GPF to a much higher level than suggested by its name
“A server load-balancing, bandwidth throttling, health monitoring, and
statistics-gathering ‘L7 filter’”
10
Planned/Implemented GPF Library
for RouterVM .NET
Basic Filter
Simple L2-L4 header classifications
Any RouterVM actions
L7 Filter
Adds regular expressions & ADU reconstruction
NAT Filter
Adds a few more capabilities beyond the simple NAT action that is
available to all GPFs
Content Caching
Builds on the L7 filter functionality
WAN Link Compression
Relatively simple to specify, but requires lots of computation
IP-to-FC Gateway
Requires its own table format & processing
XML Preprocessing
Not very well documented, and difficulty is unknown…
11
GPF Flexibility by OSI Layer
…As expected, GPF flexibility at the application layers starts to depend heavily on the
breadth of the GPF library and the availability of GPFs for specific applications
12
GPF Performance: Basic Filters
Performance of filters has been measured on RouterVM for
.NET using Win32 performance counters
Accurate to roughly 0.5 microseconds
Measured on an Athlon XP 2000 system, Win2k
A basic filter with simple actions (no payload processing)
requires roughly 3000 CPU cycles to perform its processing
This is mostly independent of packet size
Results in ~284 Mbps for 64-byte packets, 6.7 Gbps for
1500-byte packets (theoretically of course)
If the average packet size is ~240 bytes, a packet stream
can traverse 10 basic filters and still maintain 100 Mbps
…Keep in mind, this is with no optimization (yet)!
13
GPF Performance: Complex Filters
What about complex L7 filters that search packet payloads with regular
expressions?
Benchmark setup… Let’s hand-craft a packet stream of 256-byte packets:
L2-L4 Headers
“Retreat”
25 bytes of char ‘X’
“Retreat”
25 bytes of char ‘X’
“Retreat”
Padding with ‘X’
Create three different L7 filters, which search for three different patterns:
^Retreat
^Retreat.*Retreat
^Retreat.*Retreat.*Retreat
Although this is instructive, the setup is a little artificial
We’re searching every bit of every packet payload, whereas a real L7 filter
would stop when it identifies a flow matching the expression
14
GPF Performance: Complex Filters
L2-L4 Headers
“Retreat”
25 bytes of char ‘X’
“Retreat”
25 bytes of char ‘X’
“Retreat”
Padding with ‘X’
15
GPF Performance: Complex Filters
L2-L4 Headers
“Retreat”
25 bytes of char ‘X’
“Retreat”
25 bytes of char ‘X’
“Retreat”
Padding with ‘X’
Lesson: try to use
start-of-buffer
indicators ^ and
avoid *’s…
Many apps can be
identified with
simple start-ofbuffer expressions
.NET Regex also
involves payload
copying, which
might be avoidable
16
Thread Optimization
The choice of thread boundaries, thread scheduling, and packet
FIFO implementations has a tremendous impact on overall
performance
My current choice of four threads per module/port is too many…
Too difficult to optimally schedule the CPU, and overall performance
is at least 10X slower than should be possible
Also, threads waste a lot of time waiting for locks on the packet
FIFOs, which also can be avoided by reducing the # of threads
17
Performance Conclusions
RouterVM for .NET is just one possible implementation of RouterVM, and
is only a demonstration of functionality, not performance
Many other performance aspects haven’t been mentioned, such as
maintaining shared tables and per-flow state.
…Left for future presentations
Porting RouterVM to higher-performance parallel hardware should
drastically increase performance
RouterVM’s 3000/cycles per packet per basic filter using .NET would be a
terrible result for a network processor!
Dedicated search hardware is severely needed…
It is trivial to come up with regular expression searches that require
200,000+ cycles per packet using .NET’s regular expression engine
Other regular expression libraries may be faster, but a software-only approach
will rarely be good enough for high-performance datacenter apps
18
Backup
19
Comments on GPF Flexibility
We can show that GPFs are flexible by examining the following
GPF properties:
Classification capabilities
Headers fields only vs. headers + payloads
Stateless classifications vs. stateful, individual packets vs. specific flows
Simple field searches vs. complex general search expressions
Layer support: L1 through L7
Action capabilities
Packet handling (allow, drop, packet generation/copying)
Packet rewriting (header field rewrites, truncation, header
stripping/adding, checksum recalculations)
Control flow (filter jump/skip via tags, messaging to downstream filters &
RouterVM elements such as the routing engine)
QoS support (e.g. rate limiting, WFQ, etc.)
(cont )
20
Comments on GPF Flexibility (cont)
Maintaining shared state and GPF interaction
Efficient state sharing mechanism through tables or message passing
Maintaining per-flow state within a filter, and between filters
Mass storage capability (e.g. for content caching)
Computational Power
Simple, low-latency computations vs. complex, high-latency computations (e.g. NIDS,
in-network antivirus scanning)
Specification Flexibility
Specific Application Support
Storage, XML, Wireless, etc.
21