Transcript Slide 1

ECE 526 – Network
Processing Systems Design
Network Processor Tradeoffs and Examples
Chapter: D. E. Comer
Outline
• Network Processor design tradeoffs
• Sample Network Processor
Ning Weng
ECE 526
2
NP Architecture
• Numerous different design goals
─
─
─
─
Performance
Cost
Functionality
Programmability
• Numerous different system choices
─
─
─
─
Use of parallelism
Types of memories
Types of interfaces
Etc.
• We consider
─ Design tradeoffs on high level (qualitative tradeoffs)
─ Commercial Network Processors
Ning Weng
ECE 526
3
Processor Topologies
• How can processors be arranged on NP?
─ Consider heterogeneity of processing resources and workload
• Multiprocessor
─ Parallel processors with shared interconnect
─ Problems?
• Pipeline
─ Multiple processors per data path
─ Problems?
• Data Flow Architecture
─ Extreme form of pipelining
─ Problems?
• Heterogeneous Architectures
Ning Weng
ECE 526
4
Design Tradeoffs (1)
• Low development cost vs. performance
─ ASICs give higher performance, but take time to develop
─ NPs allow faster development, but might give lower performance
• Programmability vs. processing speed
─ Similar to tradeoff between ASIC and NP
─ Co-processors pose the same tradeoffs
─ Complexity of instruction set
• Performance: packet rate, data rate, and bursts
─ Difficult to assess the performance of a system
─ Even more difficult to compare different systems
• Per-interface rate vs. aggregate data rate
─ NP usually limited to one port
Ning Weng
ECE 526
5
Design Tradeoffs (2)
• NP speed vs. bandwidth
─ How much processing power per bandwidth is necessary?
─ Depends on application complexity
• Coprocessor design: look aside vs. flow-through
─ Look aside: “called” from main processor, need state transfer
─ Flow-through: all traffic streams through coprocessor
• Pipelining: uniform vs. synchronized
─ Pipeline stages can take different times
─ Tradeoff between slowing down or synchronization
• Explicit parallelism vs. cost and programmability
─ Hidden parallelism is easier to program
─ Explicit parallelism is cheaper to implement
Ning Weng
ECE 526
6
Design Tradeoffs (3)
• Parallelism: scale vs. packet ordering
─ Why is packet order important?
─ Giving up packet order constraint gives better throughput
• Parallelism: speed vs. stateful classification
─ Shared state requires synchronization
─ Limits parallelism
• Memory: speed vs. programmability
─ Different types of memories give performance
─ Increases difficulty in programming
• I/O performance vs. pin count
─ Packaging can be major cost factor
─ More pins give higher performance
Ning Weng
ECE 526
7
Design Tradeoffs (4)
• Programming languages
─ Ease of programming vs. functionality vs. speed
• Multithreading: throughput vs. programmability
─ Threads improve performance
─ Threads require more complex programs and synchronization
• Traffic management vs. blind forwarding at low cost
─ Traffic management is desirable but requires processing
• Generality vs. specific architecture role
─ NPs can be specialized for access, edge, core
─ NPs can be specialized towards certain protocols
• Memory type: special-purpose vs. general-purpose
─ SRAM and DRAM vs. CAM
Ning Weng
ECE 526
8
Design Tradeoffs (5)
• Backward compatibility vs. architectural advances
─ On component level: e.g., memories DDR DRAM
─ On system level: NP needs to fit into overall router system
• Parallelism vs. pipelining
─ Depends on usage of NP
• Summary:
─ Lots of choices
─ Most decisions require some insight in expected NP usage
─ Tradeoffs are all qualitative
• Lets look at the commercial design
Ning Weng
ECE 526
9
Novel Areas of NP Use
•
•
•
•
TCP/IP offloading on high-performance servers
Security processing: SSL offloading
Storage area networks
Many others: IDSs and etc.
Ning Weng
ECE 526
10
Performance Bottlenecks
• Memory
─ Bandwidth available, but access time too slow
─ Increasing delay for off-chip memory
• I/O
─ High-speed interfaces available
─ Cost problem with optical interfaces
─ Otherwise no problem
• Processing power
─ Individual cores are getting more complex
─ Problems with access to shared resources
─ Control processor can become bottleneck
Ning Weng
ECE 526
11
Limitations on Scalability
• What are the limitations on how fast NPs need to get?
─ Link rates (optical bandwidth limits)
─ Application complexity (core vs. edge)
• What are the limitations on how fast NPs can get?
─ Parallelism in networks
─ Power consumption
─ Chip area
Ning Weng
ECE 526
12
Commercial Network Processors
• Commercial NPs
─ Large variety of architectures
─ Different applications and performance spaces
─ Lots of implementation details and practical issues
• General Themes
─ Type and number of processors
• Homogeneous vs. heterogeneous
─
─
─
─
Type and size of memories
Internal and External communications channels
Mechanisms of scalability: parallelism and pipelining
Generality vs. specialization
Ning Weng
ECE 526
13
Intel IXP1200: external connection
Ning Weng
ECE 526
14
Intel IXP1200: internal architecture
Ning Weng
ECE 526
15
Cisco PXF
Ning Weng
ECE 526
16
Motorola C-Port: conceptual design
Ning Weng
ECE 526
17
Motorola C-Port: internal architecture
Ning Weng
ECE 526
18
Motorola C-Port: channel processor
Ning Weng
ECE 526
19
IXP2400
• XScale (ARM compliant) embedded control processor
─ Instruction and data caches
• 8 microengines
─ 400 or 600 MHz
•
•
•
•
•
•
•
•
8 threads per microengine
Multiple instruction stores with 4k instructions
256 general purpose registers
512 transfer registers
2GB addressable DDR-DRAM memory (19.2 Gbps)
32MB addressable QDR-SRAM memory (12 Gbps r+w)
16 words of Next Neighbor Registers
16kB scratchpad
Ning Weng
ECE 526
20
IXP2400
• Interconnects
─ Coprocessor bus
added (incl. access
to T-CAM)
─ Flow control bus for
two-chip
configurations (e.g.,
ingress and egress)
• Switch Fabrics
─
─
─
─
No IX bus
Utopia 1, 2, 3
CSIX-L1
SPI-3 (POS-PHY
2/3)
Ning Weng
ECE 526
21
Two-Chip Configurations
• Flow control needed between ingress and
─ 1Gbps over flow control bus (not shown)
Ning Weng
ECE 526
22
IXP2400 Internal Architecture
Ning Weng
ECE 526
23
IXP2400 Microengine
• Enhancements over IXP1200 microengines:
─
─
─
─
─
─
─
─
─
─
─
─
Multiplier unit
Pseudo-random number generator
CRC calculator
4 32-bit timers and timer signaling
16-entry CAM for inter-thread communication
Time stamping unit
Generalized thread signaling
640 words of local memory
Simultaneous access to packet queues without mutual exclusion
Functional units for ATM segmentation and reassembly
Automated byte-alignment
uE divided into two clusters with independent command and
SRAM buses
Ning Weng
ECE 526
24
Software
• Support for software pipelining
─ “Reflector Mode Pathways” for communication
─ Next Neighbor Registers as programming abstraction
• SDK 4.0
─
─
─
─
Simulator, debugger, profiler, traffic generator
Portable modules
Provides better infrastructure support
C compiler
Ning Weng
ECE 526
25
Summary
• Network Processor design space is big due to
─ Varying design goals
─ Varying implementation choices
•
•
•
•
•
Qualitative tradeoffs
Survey commercial NPs
Network processors are getting more features
Main architecture characteristic is still parallelism
Software support is becoming more important
Ning Weng
ECE 526
26