Transcript Slide 1

Commercial Network Processor Architectures
Agere PayloadPlus
Vahid Tabatabaee
Fall 2007
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
1
References
 Title: Network Processors Architectures, Protocols, and Platforms
Author: Panos C. Lekkas
Publisher: McGraw-Hill
 Agere PayloadPlus Family White Papers
 Payload+: Fast Pattern Matching & Routing for OC-48, David
Kramer, Roger Bailey, David Brown, Sean Mcgee, Jim Greene,
Robert Corley, David Sonnier, (Agere Systems) in Hot Chips a
Symposium on High Performance Chips, Aug. 19-21, 2001
 Agere Product Brief documents for FPP, RSP, ASI and FPL.
 Agere White paper: “The case for a classification Language”, Feb.
2003.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
2
General Information
 Agere PayloadPlus is a comprehensive networking processor
solution for OC-48.
 It has expanded to support OC-192 through the NP10/TM10
(renamed to APP750NP and APP750TM).
 This product is discontinued since then.
 Originally this was a 3 chip solution but later on it was integrated
into a single chip solution.
 We review the original solution and APP550 (single chip) which
their info. is on the Agere website.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
3
The Big Picture
The network processor family has a pipeline architecture
and includes (in the original 3 chip solution):
 Fast Pattern Processor (FPP)
 Takes data from PHY chip
 Protocol recognition
 Classification
 based on layer 2 to 7
 Table lookup with millions of entries and
variable lengths
 Reassembly
 Routing Switch Processor (RSP)
 Queueing
 Packet Modification
 Traffic Shaping
 QoS processes
 Segmentation
 Agere System Interface (ASI)
 Management
 Tracks state information
 Support for RMON (Remote Monitoring)
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
4
The 3 Chip Solution
 POS-PHY: Packet Over Sonet – PHYsical
 UTOPIA: Universal Test & Operation Phy Interface for ATM
 FBI: Functional Bus Interface
Physical
Interface
FPP
Configuration
Bus
RSP
Fabric
Interface
Controller
Switch
Fabric
8-bit POS-PHY
FBI
ASI
microP
PCI to Host CPU
8-bit POS-PHY
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
Source: http://nps.agere.com/support/non-nda/docs/FPP_Product_Brief.pdf
5
Main Responsibilities and Interfaces
 FPP receives data from the PHY over a standard interface that can
be POS PHY Level 3 (POS-PL3) or a UTOPIA 2 or 3 interface.
 FPP classify traffic based on the contained at layer 2 to 7.
 FPP send packet over POS-PL3 to RSP.
 RSP is responsible for
 Queueing, packet modification, shaping, QoS tagging,
Segmentation.
 The ASI chip is responsible for
 Exceptions, maintains state information, interface to host
processor, configure FPP and RSP over the CBI interface.
 The management-Path Interface (MPI) enables the FPP to
receive management frames from the local host.
 Functional Bus Interface (FBI) connects the FPP to ASI to
externally process function calls.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
6
Memory
 64 bit standard PC-133 synchronous dynamic random
access memory (SDRAM)
 133 MHz pipelined zero bus turnaround (ZBT)
synchronous static random access memory (SSRAM).
 PayloadPlus can use standard off-the-shelf standard
DRAM for table lookups and does not need expensive
and power hungry Content Addressable Memory
(CAM).
 Typical power limit for a line card is 150 W.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
7
FPP Features
 Programmable classification from layer 2 to 7
 Pipelined multi-threaded processing of PDU
 High-level Functional Programming Language (FPL) that implicitly
takes care of multiple threads
 ATM re-assembly at OC-48 rates (eliminates external SAR)
 Table lookup with millions of entries
 Eliminates need for external CAMs
 Deterministic performance regardless of the table size
 Configurable UTOPIA/POS interfaces
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
8
FPP Protocol Data Unit (PDU)
 FPP is a pipelined multithreaded processor that can simultaneously
analyze and classify up to 64 protocol data units (PDU).
 Each incoming PDU is assigned its own processing thread which is
called a context.
 Each PDU consists of one or multiple 64-byte blocks
 The context is a processing path that keeps track of:
 All blocks of PDU.
 Input port Number of the PDU
 Data offset for the PDU
 The last block information
 Program variable associated with the PDU
 Classification information of the PDU
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
9
FPP Block Diagram
Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf
Source: http://nps.agere.com/support/non-nda/docs/FPP_Product_Brief.pdf
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
10
FPP Functional Description
 The input framer frames incoming data into 64 byte blocks.
 It writes blocks into the data buffer (SDRAM) and into block buffers
and context memory.
 The block buffer stores data that are being processed and other
associated context data for the execution of the FPP operations on
the incoming data.
 The output interface sends the PDU and their classification
information to the RSP.
 The Pattern Process Engine (PPE) performs pattern matching to
determine how the incoming PDUs are classified.
 The Queue Engine manages FPP replay contexts, provide address
for block buffers and maintains information on blocks, PDUs and
connection queues.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
11
FPP Functional Description (two pass)
 FPP processes bit streams in two passes.
 In the first pass the PDU blocks are read into the queue engine memory
 It produces data blocks as separate 64-byte blocks
 The data offsets of each block is determined
 Links between individual blocks that compose a PDU is established.
 The PDU type is identified
 In the second pass (replay phase) as the PDU is replayed from the queue
engine
 The PDU is processed as a whole entity.
 Pattern matching is executed
 At the same time PDU transmission toward the output unit is done.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
12
FPP Top Level Flow
Source: http://nps.agere.com/support/non-nda/docs/FPL_Product_Brief.pdf
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
13
RSP (Traffic Manager) Features
 64K queues
 Programmable shaping (such as VBR, UBR, CBR)
 Programmable discard policies (RED, WRED, EPD)
 Programmable QoS (CBR, VBR, UBR)
 Programmable CoS (Fixed Priority, Round Robin, WRR, WFQ,
GFR)
 Programmable packet modification
 Support for multicast
 Generates required checksums/CRC
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
14
RSP overview
Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf
15
RSP Functional Description
 RSP handles classification and analysis results of the FPP on the
incoming PDU.
 It supports up to 64 logical input port.
 For each PDU there is a command from the FPP that instructs
RSP how to handle the PDU.
 The PDU is added to a queue and stored in the PDU SDRAM.
 RSP supports up to 64K programmable queues.
 Processed data is output on a configurable 32-bit interface
 There is also an 8-bit POS-PHY level 3 management interface.
 RSP uses custom logic and three Very Large Instruction Word
(VLIW) compute engines to process PDU
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
16
VLIW Compute Engines
 The compute engines operate in a pipeline fashion
 Each compute engine is dedicated to a processing function
 Traffic Management Engine enforces, discard policies, and
keeps queue statistics.
 Traffic Shaper Engine ensures QoS and CoS for each queue.
 Stream Editor Engine performs necessary PDU modifications
 In each queue definition, the RSP includes, destination, scheduling
information, and pointers to programs for each of the three VLIW
compute engines.
 Therefore, RSP can run multiple protocols at the same time.
 The external CPU can also add queue definitions to set up ATM
virtual circuits, for example.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
17
RSP Data Flow
The RSP 3 major processing stages:
1. Prepares and queues the PDU for
scheduling
1.
2.
3.
2.
Selects the next PDU block to be
scheduled
1.
2.
3.
4.
3.
Assembles the blocks into a PDU in
SDRAM
Determines the destination queue
Determines if the PDU should be
queued. If it should, it is added to the
appropriate queue for scheduling
Selects the physical port
Selects the logical port
Selects the scheduler
Selects the QoS queue Selects the
CoS queue
Modifies and transmits the PDU on
the appropriate output ports
1.
2.
3.
Adjusts the QoS transmit intervals
and CoS priority
Performs PDU modifications
Perform AAL5 CRC if necessary
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf
18
Hierarchical Scheduling
(Internal Scheduling Logic)
 Channels: The output interface
supports a 32-bit data channel which
supports 1-4 POS-PHY or UTOPIA
channels. It also has an 8-bit
management output.
 Physical Ports: Physical output ports
are assigned to channels. There are up
to 32 physical ports since there are 32
back pressure signals.
 Logical ports: The RSP supports up to
256 logical output ports.
 Schedulers: A set of schedulers is
defined for each logical port. The RSP
supports CBR, VBR and UBR
schedulers.
 QoS queues: Each of the QoS queues
is assigned to a single scheduler.
http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf
 COS queues: Up to 16 CoS queues
feed a single QoS queue.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
19
ASI
 ASI seamlessly integrates FPP and RSP with the host processor.
 It makes it possible for the designer to do the following:
 Centralized initialization and configuration of the NP system
and its physical interfaces.
 Send routing and VPI/VCI updates to the system.
 Implement various routing and management protocols.
 Handle any occurring exceptions.
 ASI enables high speed flow-oriented state maintenance:
 Gathering Remote Network Monitoring (RMON) statistics
 Time stamping packets
 Checking Packet Sequence
 Policing ATM and frame relay up to OC-48 rates
 8-bit POS-PHY interface over which the ASI sends packets to
the FPP and receives them from RSP
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
20
How Does ASI Work?
 It has a PCI interface for
communication with host
processor.
 32-bit high speed interface
(FBI) to get functional call
from FPP.
 Two ALUs for processing FPP
external function requests for:
 Maintaining state and statistics.
 Policing (leaky bucket)
 Two SSRAM interface to allow
memory access for different
tasks without contention
http://nps.agere.com/support/non-nda/docs/ASIProductBrief.pdf
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
21
ASI Configuration Capabilities
 ASI enables host processor to configure up to 8 devices
 The configuration bus is compatible with both Intel and
Motorola bus formats.
 It is used to :
Initialize and configure FPP and RSP
Load the program code for the FPP and RSP
Load the dynamic updates to the FPP tables and
RSP queues
Configure third party external framers and physical
interfaces
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
22
Policy and Conformance Checking
 ASI performs conformance checking or policing
for up to 64k connections at OC-48 rate.
 It only does marking, not scheduling or shaping
 Several variations of GCRA (leaky-bucket)
algorithm can be used
 For the dual leaky bucket case, the ASI
indicates whether cells or frames are compliant
or not and from which bucket the
nonconformance was derived.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
23
FPL
 FPL is a functional language for classification.
 In the functional language the programmer tells the
computing resources what to do rather than how to do
it.
 In FPL you describe the protocol and the actions to
process them.
 In C you have to say how to process protocols.
 FPL codes would be much shorter, easier to debug, and
modify.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
24
FPL Main Features
 Fast pattern matching and classification of the data stream.
 Defining functions for the FPP to execute based on the recognized
patterns
 Easy to read semantics
 Dynamic updating of the code in the FPP
 Software development tool set
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
25
Two Pass Processing
 Recall the two pass processing in FPP
 The first pass does preliminary process such as
identifying the PDU type.
 In the second pass (replay) it can simply transmit the
PDU and conclusions or process a higher level
protocol.
 The queue engine allows you to process PDUs
embedded in higher layer protocols in the replay phase.
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
26
Sample FPL Program Flow
Source: http://nps.agere.com/support/non-nda/docs/FPL_Product_Brief.pdf
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
27
FPL code example
Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
28
Dynamic FPL Program Changes
 You can add and delete certain types of FPL statements from the
image code in FPL dynamically.
 FPL supports two types of pattern statement structures:
 Single-rule patterns have a single pattern to match with one or
two functions to perform.
 These are called flows
 These can not be added or removed dynamically
 Multiple rule pattern statements allow you to define tables
 This is used to define IP routing tables
 These are called trees
 You can add or delete statements from existing trees
 You can not add a tree dynamically
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
29
Performance of the Network Processor
 Drop in the performance due to the N+1 problem
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
30
Network Processor Performance


Performance evaluation for a mixture of packet sizes
Performance drops when the number of computations per packet increases
ENTS689L: Packet Processing and Switching
Commercial Network Processor Architectures
31