Transcript PPT

Paper Review
Building a
Robust
Software-based
Router
Using
Network Processors
ABSTRACT
Need More Service  Software-based Routers
Router:
IXP1200 Network Processor development board
PC

3.47 Mpps (minimum size packets) or 1.77 G of
aggregate
Hierarchical Architecture:
Guarantees line speed for forwarding of simple packets
Extra capacity for exceptional packets in P3(310 Kpps and 1510
cycles for each)

INTRODUCTION
Most Network Processors use parallelism.
IXP1200:
6 Micro Engines each supporting up to 4 hardware contexts.
Router with a data plane (MEs) and a control plane (P3).
Processor Hierarchy:
OSPF, Updating Routing tables, …[More cycles]
Missed packets from cache
Minimum packet processing, forwarding,…[Fewer cycles]
ARCHITECTURE-Software
•Classifier
•Forwarder
•Scheduler
Two default forwarder:
Input queue
•Minimal IP forwarding fast path.
•Full IP protocol (IP options)
Two main attributes:
Explicit support for adding new forwarders in run time
Does not specify where in Processor hierarchy
Input queue
ARCHITECTURE-Hardware
IXP Evaluation System (200MHz):
•32MB DRAM (64-bit 100MHz)
•2MB SRAM (32-bit 100MHz)
•4KB On-chip scratch
•64-bit 66MHz IX bus
•Ethernet ports(8*100M + 2*1G)
•32-bit 100MHz PCI Bus
•4KB ISTORE for each ME
rate of DRAM = 6.4Bbps
•4KB I-cache for StrongARM
Send/receive BW = 2*(8*100M+2*1G) = 5.6 Gbps
•A pair of FIFOs: (16 slot*64 byte)
Capacity of IX Bus = 4 Gbps
Forwarding Pipeline
The common unit = 64-byte MAC-packet(MP)
MAC breaks and tag as first, intermediate, last or only MP in packet
Allocating slots to MACs and drains input FIFO and fill output FIFO
Can MEs from input FIFO to output FIFO in a single step?
2 stage pipeline:
Input Processing
INPUT_LOOP:
1 acquire_input_mutex()
2 if (!port_rdy(p)) goto INPUT_LOOP
3 load IN_FIFO[c]
4 release_input_mutex()
5 mp_addr = calculate mp_addr()
6 copy reg_mp_data IN_FIFO[c]
7 state = protocol_processing(reg_mp_data)
8 copy reg_mp_data  DRAM[ mp_addr]
9 if (at_start_of_packet(state))
10 enqueue(state, state.queue)
11 goto INPUT_LOOP
Strict FIFO slots and context binding
For IP:
Validating header
Updating TTL
Re-computing checksum
Set source and dest MACs
Destination Queue
Minimum Forwarder:
one-cycle hardware hash
Scheduling & Buffering
A Queue that is serviced by StrongARM
Statically allocates a set of contexts to run input loop
16 input contexts
Token passing (hardware signaling mechanism) to serialize DMA access.
Buffer scheduling:
16MB of DRAM (8192 buffers of 2KB) consumed in a circular fashion
A shared state variable
Output Processing
OUTPUT LOOP:
Select none empty queue form that
1 acquire_output_mutex()
port queues (Scheduling)
2 release_output_mutex()
3 if (finished_last_ packet)
4 qid = select_queue()
5 state = dequeue(qid)
6 mp_addr = first_mp(state)
7 else
8 mp_addr =next_mp(state)
9 fifo_addr = calculate_fifo_addr()
10 copy DRAM[mp_addr]OUT_FIFO[fifo_addr]
11 enable IN_FIFO[fifo_addr]
12 finished_last_packet =at_end_of_packet(state)
13 goto OUTPUT LOOP
Queuing
Queues: Circular arrays of 32-bit entries in SRAM.
Queues are assigned statically to output contexts:
Output context saves queues in 16 registers not in scratch memory.
Multiple queues. Which one next? By prioritizing queues.
Contention:
1. Use mutexes.
2. Have queues for each inputs in outputs  Single priority level
Queuing [cont]
I.2 + O.1
I.2 + O.3 : Maximum flexibility
I.1 + O.3 : Slower rate
Evaluation
For one MP:
280 cycles for register operations
180(DRAM) + 90(SRAM) + 160(Scratch) = 430 cycles for memory
Sum = 710 cycles = 3550 ns (for 200 MHz)
3.47 Mppseach packet is processed in 288 ns
Result: The system can forward 12 packets in parallel
Switching Paths
Path C: Forward packets at 534 Kpps(500cpp)
StrongARM is involved too.
|No additional tasks for MEs.
Path B: Forward packets at 526 Kpps
Path A: Forward packets at maximum rate
of 3.47Mpps
PRIORITY
StrongARM
Complicated to decide forwarders:
It supports Pentium
It shares resources with MEs and can act like them
OS on StrongARM:
1. Acts as a bridge that forward packets to P4
2. Supports a small collection of local forwarders
Simple priority scheme:
Gives packets being passed to P3 over packets that are to be processed locally.
Virtual Router Processor
MEs statically have 2 tasks:
•A router infrastructure (RI) that is able to forward
minimum-sized packets
•A virtual router processor (VRP) that run
additional code on behalf of each packet
protocol_processing runs on abstract machine.
Interfacing & Implementation
StrongARM interacts with MEs:
fid = install(key, fwdr, size, where)
remove(fid)
data = getdata(fid)
setdata(fid, data)
Key:
(src addr, src port, dst addr, dst port)
Where:
ME: Load from StrongARM to ME’s ISTORE
SA: Loads into DRAM
PE: Loads into Pentium jump table
Installs fwrd that matches the key
and specified flow size and where
indicates the processor
Interfacing
Some date forwarders:
Conclusions
•How to program the processor hierarchy with a fixed forwarding infrastructure
that fully exploits the parallelism available on the IXP1200 MicroEngines.
•Demonstrates how new functionality can be injected into all three levels of the
processor hierarchy.
•Statically partition the processing capacity of the MicroEngines into a fixed
routing infrastructure and a programmable VRP.
•Can be used in many designs.