Implementation of IXP200 Network Processor Packet Filtering

Download Report

Transcript Implementation of IXP200 Network Processor Packet Filtering

Implementation of IXP1200 Network
Processor Packet Filtering Software and
Parameterization for Higher Performance
Network Processors
Shyamal H. Pandya
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
1
Agenda
• Introduction and Goal of the Thesis
• Brief description of IXP1200 Network
Processor and the ENP-2505 ESB
• Software Environment
• Packet Filter Design
• Implementation
• Tests, Results and Parameterization
• Conclusion
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
2
Introduction
• Network Processors
– A class of programmable processors designed for
applications
– flexible and efficient alternative to ASICs and General
Purpose Processors
– Employ several architectural features to achieve their
design goals:
• A number of processing elements
• Intelligent and fast memory units and buses
• Instruction set architecture specifically tailored for
packet processing operations
– Examples: Intel IXP1200, IBM PowerNP series,
Vitesse IQ2200
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
3
IXP1200
• Belongs to the IXP family of Network Processors from
Intel (IXP1200, IXP2400, IXP2800)
• Major Components
– Intel StrongARM core processor
– Six programmable RISC microengines
• 4 hardware contexts per microengine
• instruction set tailored to suit network applications
– Memory Units
• 32-bit SRAM unit supporting upto 8 MB
• 64-bit SDRAM unit supporting upto 256 MB
• 8 KB of 32-bit Scratchpad Memory
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
4
Goal
• Network Processors targeted towards network applications - e.g.
routers, VoIP, intrusion detection, packet filtering.
• These applications are characterized by the need to process
packets at extremely fast rates to keep up with the speed of
network traffic.
• Goal: to investigate the programmability of the IXP1200 through
the design and implementation of a packet filter.
• Linux IP Tables - the Linux packet filtering framework, chosen
as the basis of our packet filter.
• Parameterization - based on the experiences with packet filter
implementation on the IXP1200, the architectural enhancements
of the IXP2400 and higher performance network processor of
the same family is analyzed to estimate its benefits.
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
5
IXP1200 in more Detail
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
6
IXP1200 in Operation
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
7
ENP-2505
• ESB based on IXP1200
• Pluggable in a PCI slot of
a host computer
• Supports 4 10/100 Mbps
ethernet ports
• 8 MB SRAM, 256 MB
SDRAM
• StrongARM core
processor and
Microengines operate at
232 MHz
• 8 MB of flash memory
that holds a RAM disk.
Shyamal
Pandya
ENP-2505 and Host Setup
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
8
Programming Model
• The ACE framework - A software framework to design
applications that consists of isolated software components
performing well-defined tasks
– An ACE encapsulates the tasks or modules performing
independent packet processing functions
– One or more input targets and one or more output
targets
– Packets arrive at the input targets, are processed within
the ACE and are transmitted through one of its output
targets
– An ACE can be bound to another by binding its output
target to the other’s input target
– An application is comprised of several ACEs bound to
each other
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
9
Example ACE Application (Packet Forwarder)
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
10
MicroACE
• An extension to the ACE model: part of the ACE
implemented on core processor, other part on the
microengines
• Microblock performs fast path packet processing
• Core component a conventional ACE, manages the
microblock
• MicroACE model can be exploited to divide the tasks
between the microengines and the core processor
Forwarding
Application using
MicroACEs
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
11
Packet Filter Design
• IP Tables
– Packet filtering Infrastructure for the Linux OS
– A set of modules that maintain tables of rules
– A rule contains a specifications in terms of values that fields
of a header must match and a target (ACCEPT/DROP)
– Tables correspond to the kind of manipulation a packet
undergoes - e.g. filter table, NAT table etc.
– Table contains a number of chains, each chain to be
traversed at particular points in the packets path, e.g INPUT,
OUTPUT, FORWARD
– Extensibility - each rule has at a minimum specs for IP
Header matching. More examination can be specified by
adding match structures, e.g tcp_match structures has
specifications for matching packet TCP headers.
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
12
Packet Filter Design - Data Structures
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
13
Packet Filter Design - Algorithm
• For each rule in the chain of interest
– match packet IP header against the specs in the rule. If the
match succeeds, look for other match structures in the rule.
– match the packet against each match structure found in the
rule. If the packet satisfies all matches, the packet has
successfully matched the rule.
– For a successful match, look at the target of the rule
• if the target is ACCEPT, let the packet pass
• if the target is DROP, drop the packet and free its
resources
– For unsuccessful match, go to the next rule and repeat the
process
• last rule matches all packets. Target specified is default policy
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
14
Implementation
• Task Division between the core processor and the microengines
– Data Plane(Microengines): Ingress, Filtering, Forwarding,
egress.
– Control Plane(Core): Filter table, route table management.
– Management Plane(Core): User Interface, Deployment
• Chains - INPUT, OUTPUT, FORWARD
– INPUT and OUTPUT chains are traversed infrequently
– FORWARD chain is used most frequently, hence implemented
on microengines
• Software Components
– Ingress, Egress, Forwarder MicroACEs and Stack ACE.
Provided as part of SDK.
– PacketFilter MicroACE - Designed and Implemented as part of
the thesis.
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
15
Implementation
Application Design in terms of MicroACEs
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
16
Implementation
• User Interface - iptables command
– used to manipulate filter table by adding, deleting,
inserting, replacing rules
– an executable and libraries implement the user interface
– Algorithm
• parse the command line,validate all the options and
arguments
• obtain a local copy of the filter table by making a crosscall to the PacketFilter core component
• modify the local copy according to the command
• make a cross-call to the PacketFilter core component to
replace old filter table with the new one, passing the
modified filter table as argument
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
17
Implementation
• PacketFilter Core Component
– Initialization
• Control Data Structures, filter table allocation in
SRAM, patching filter table address to microcode
– Cross-call Interface
• function do_replace, used by user interface to replace
the current filter table with a new filter table in the
SRAM
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
18
Implementation
• Microcode - Each microengine can run more than one
microblock
• Flow of control is governed by a dispatch loop running on
each enabled microengine
• Microblock partitioning across microengines
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
19
Implementation
• Dispatch Loops - Microengine 0
– Initialize the Ingress and PacketFilter Microblocks
– In an infinite loop, do the following
• Call Ingress Microblock
• If a packet has arrived, call the PacketFilter Microblock, else if there is an
exception, queue the packet for Ingress core component, else continue from
beginning of the loop
• If PacketFilter microblock returns ACCEPT, queue the packet for Microengine
2, running the Forwarder
• If PacketFilter microblock returns DROP, drop the packet
– Every SA_CONSUME_NUM times around the loop, poll the Core to ME
packet queue for packets from core components. If there is a packet,
determine its source (Ingress core or PacketFilter Core) and call the
corresponding microblock
– SA_CONSUME_NUM - tunable parameter to control frequency of
memory accesses w.r.t. Core to ME packet queue
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
20
Implementation
• Dispatch Loops - Microengine 2
– Initialize the Forwarder Microblock
– In an infinite loop, do the following
• Poll the packet queue from Microengine 0 to see if there is a packet.
• If packet available, call the Forwarder microblock, else continue from the
beginning
• If Forwarder microblock returns success, queue the packet for microengine 5
to be scheduled for output, else if it returns an exception, queue the packet for
the core component, else drop the packet
– Poll the Core to ME packet buffer every SA_CONSUME_NUM times,
and if there is a packet from the core component, call the microblock
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
21
Implementation
• Dispatch Loops - Microengine 5
–
–
–
–
Initialize Egress microblock
4 output queues, contain packets for each output port
Context 0 polls the 4 output queues in a round-robin manner
Contexts 1-3 fill up the TFIFO with data from the current packet to be
transmitted
• PacketFilter Microblock macros
– PacketFilter() - main macro
– ip_packet_match() - called from PacketFilter()
– ipt_tcp_match() - TCP extension to core packet filtering code
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
22
Implementation - Microengine Re-tasking
• Triggered when the first rule specifying TCP match specs is added to the
table
• Implementation
– Core component sends inter-thread signals to all threads of microengine 0
– Each time around the dispatch loop, each thread checks for a signal
– If signal is present, the thread stops its execution and sends interrupt to the
StrongARM
– Interrupt Handler - when an interrupt is received from each of the 4 threads of
microengine 0, it wakes up the process sleeping on the interrupt (PacketFilter
core component)
– The core component disables microengine 0, reloads it with a new image
containing ipt_tcp_match() macro and enables the microengine
• Above design makes sure that microengines are not interrupted while
processing a packet thus preventing packet loss
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
23
Tests and Results
• Test setup
• Packets sent from host machine to the notebook
• Libnet library used to build packets
• host machine runs tcpdump and windows laptop runs ethereal
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
24
Tests and Results
• Experiment 1 - Code size
• Experiment 2 - Packet filtering operations
– various commands to add, delete rules from the filter table
– packet filtering operations performed correctly from
observations of packet transmission and reception from
tcpdump and ethereal
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
25
Tests and Results
• Experiment 3 - performance penalty due to task partitioning
across microengines
• Experiment 4 - Microengine Re-tasking
– command to add a TCP match specs rule to the filter table
– Microengine 0 was re-tasked successfully and packet
filtering operations continued
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
26
Parameterization
• IXP2400 Network Processor
– Higher performance network processor of same family,
with significant architectural enhancements
• Microstore (4Kb v/s 16KB)
– 1 K instructions limit - split tasks across 2 microengines
– 4K instructions: not necessary, performance penalty
avoided
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
27
Parameterization
– Total number of microwords for
Ingress+PacketFilter+Forwarder = 1156
– extra instruction store space can be used for other
components, UDP match, limit match, NAT, connection
tracking
• Number of Microengines and Contexts
– IXP1200 serves 8 ports with 16 contexts for input and 8
contexts for output to forward packets
– Number of context per microengine is doubled, so each
microengine can serve 4 ports for the input process (2
contexts per port as in IXP1200)
– with 5 microengines for input and 3 for output, the number
of ports service could be 20
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
28
Parameterization
• Next neighbor register set
– data sharing very fast, avoiding memory accesses
– Task partitioning between microengines = packet queues.
Inter-microengine data communication - SRAM accesses,
performance penalty
– IXP2400 - packet queues avoided, buffer handles shared
through next neighbor registers. Performance penalty avoided.
• Memory
– ENP-2505 has 48 MB DRAM and 3 MB SRAM accessible to
microengines
– SRAM could accommodate 9K rules of average size. Thus
memory was enough for PacketFilter application
– Increase in memory in IXP2400 could benefit simultaneous
execution of many memory hungry applications
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
29
Conclusions
• Successfully implemented Packet filter core code and TCP
header match extension
• Had to split filtering and forwarding across 2 microengines
due to instruction store size limits
• MicroACE software framework was ideal for the design of
the packet filter
• Microengine re-tasking complicated by the lack of smooth
interface to microengine signals and interrupt handling
• Future work: investigating simultaneous operation of more
than one application, more IP Tables extensions to the packet
filter.
• Future work: incorporating interface to inter-thread signals
and call-backs to MicroACE Framework
Shyamal
Pandya
Implementation of Network Processor Packet Filtering and Parameterization for
Higher Performance Network Processors
30