Introduction to Network Processors

Download Report

Transcript Introduction to Network Processors

CS 162 Computer Architecture
Lecture 8: Introduction to
Network Processors (II)
Instructor: L.N. Bhuyan
www.cs.ucr.edu/~bhuyan/cs162
1
2003 ©UCR
Outline
°Introduction to NP Systems
°Relevant Applications
°Design Issues and Challenges
°Relevant Software and Benchmarks
°A case study: Intel IXP network
processors
2
2003 ©UCR
What are Network Processors
°Any device that executes programs to
handle packets in a data network
°Examples
• Processors on router line cards
• Processors in network access equipment
3
2003 ©UCR
Why Network Processors
° Current Situation
• Data rates are increasing
• Protocols are becoming more dynamic and
sophisticated
• Protocols are being introduced more rapidly
° Processing Elements
• GP(General-purpose Processor)
-
Programmable, Not optimized for networking applications
• ASIC(Application Specific Integrated Circuit)
-
high processing capacity, long time to develop, Lack the flexibility
• NP(Network Processor)
4
-
achieve high processing performance
-
programming flexibility
-
Cheaper than GP
2003 ©UCR
Outline
°Introduction to NP Systems
°Relevant Applications
°Design Issues and Challenges
°Relevant Software and Benchmarks
°A case study: Intel IXP network
processors
5
2003 ©UCR
Organizing Processor Resources
°Design decisions:
• High-level organization
• ISA and micro architecture
• Memory and I/O integration
°Today’s commercial NPs:
• Chip multiprocessors
• Most are multithreaded
• Exploit little ILP (Cisco does)
• No cache
• Micro-programmed
6
2003 ©UCR
Architectural Comparisons
°High-level organizations
• Aggressive superscalar (SS)
• Fine-grained multithreaded (FGMT)
• Chip multiprocessor (CMP)
• Simultaneous multithreaded (SMT)
7
Ref: [NPRD]
2003 ©UCR
Architectural Comparisons (cont.)
Fine-Grained Coarse-Grained
Multiprocessing
Time (processor cycle)
Superscalar
Simultaneous
Multithreading
8
Thread 1
Thread 3
Thread 5
Thread 2
Thread 4
Idle slot
2003 ©UCR
Tasks and Services
Three Benchmarks used in the experiment
9
Ref: [NPRD]
2003 ©UCR
Performance Evaluation
Forwarding: IP Forward
Authentication: MD5
Encryption: 3DES
SS
FGMT
CMP
SMT
•Workloads have little ILP
•Need to exploit packet-level parallelism
•CMP and SMT do just that
10
Ref: [NPRD]
° Systems must support some
form of concurrent packetlevel parallelism
° SMT and CMP are nearly
equivalent, with SMT always
coming out ahead
2003 ©UCR
Example Toaster System: Cisco 10000
° Almost all data plane operations execute on the programmable XMC
° Pipeline stages are assigned tasks – e.g. classification, routing, firewall,
MPLS
• Classic SW load balancing problem
° External SDRAM shared by common pipe stages
11
Ref: [NPT]
2003 ©UCR
IBM PowerNP
° 16 pico-procesors and 1
powerPC
° Each pico-processor
• Support 2 hardware threads
• 3 stage pipeline :
fetch/decode/execute
° Dyadic Processing Unit
• Two pico-processors
• 2KB Shared memory
• Tree search engine
° Focus is layers 2-4
° PowerPC 405 for control
plane operations
• 16K I and D caches
° Target is OC-48
12
Ref: [NPT]
2003 ©UCR
SRAM
Fabric
Processor
Table
Lookup
Unit
Executive
Processor
text
Buffer Mngt
Unit
Queue
Mngt Unit
60Gbps Busses
Cluster
13
SRAM
PROM
Switch
Fabric
PCI
SRAM
CONTROL
C-Port C-5 Chip Architecture
Ref: [NPT]
Cluster
CP-0 CP-1textCP-2 CP-3
CP12
CP- CP13 text 14
CP15
PHY PHY PHY PHY
PHY PHY PHY PHY
2003 ©UCR
Some Challenges
° Intelligent Design
• Given a selection of programs, a target network link speed, the
‘best’ design for the processor
-
Least area
-
Least power
-
Most performance
° Write efficient multithreaded programs
• NPs have
-
Heterogeneous computer resources
Non-uniform memory
Multiple interacting threads of execution
Real-time constraints
• Make use of resources
-
How to use special instructions and hardware assists
–
–
Compilers
Hand-coded
• Multithreaded programs
14
Manage access to shared state
Synchronization between threads
Ref: [NPRD]
2003 ©UCR
Outline
°Introduction to NP Systems
°Relevant Applications
°Design Issues and Challenges
°Relevant Software and Benchmarks
°A case study: Intel IXP network
processors
15
2003 ©UCR
NP Software
° Teja
• NPU vendor-neutral software tools
• Key is a GUI-based state-machine tool
° CLICK router
• From MIT, supports a specialized development model
° Zebra
• Open source routing environment
• Supporting most of the key IP routing protocols in SW
• IP Fusion Inc. is providing commercial support
° LVL7
• Closed source – i.e. traditional commercial – complete IP
solutions
16
Ref: [NPT]
2003 ©UCR
Benchmarks for Network Processors
• NetBench
- 10 applications
- http://cares.icsl.ucla.edu/NetBench
• CommBench
- 8 networking and communications
applications
- http://ccrc.wustl.edu/~wolf/cb/
• EEMBC
- http://www.eembc.org/benchmark
• MediaBench
17
- Transcoders
- Some communications applications
Ref: [NPT]
2003 ©UCR
Outline
°Introduction to NP Systems
°Relevant Applications
°Design Issues and Challenges
°Relevant Software and Benchmarks
°A case study: Intel IXP network
processors
18
2003 ©UCR
IXP1200 Block Diagram
° StrongARM
processing core
° Microengines
introduce new ISA
° I/O
• PCI
• SDRAM
• SRAM
• IX : PCI-like packet
bus
° On chip FIFOs
• 16 entry 64B each
19
Ref: [NPT]
2003 ©UCR
IXP1200 Microengine
° 4 hardware contexts
• Single issue processor
• Explicit optional context switch on SRAM
access
° Registers
• All are single ported
• Separate GPR
• 256*6 = 1536 registers total
° 32-bit ALU
• Can access GPR or XFER registers
° Shared hash unit
• 1/2/3 values – 48b/64b
• For IP routing hashing
° Standard 5 stage pipeline
° 4KB SRAM instruction store – not a cache!
° Barrel shifter
20
Ref: [NPT]
2003 ©UCR
IXP 2400 Block Diagram
° XScale core replaces
StrongARM
DDR DRAM
controller
° Microengines
ME0
ME1
ME3
ME2
Scratch
/Hash
/CSR
XScale
Core
PCI
QDR SRAM
controller
• Faster
• More: 2 clusters of 4
microengines each
° Local memory
ME4
ME7
ME5
ME6
MSF Unit
° Next neighbor routes
added between
microengines
° Hardware to accelerate
CRC operations and
Random number
generation
° 16 entry CAM
21
2003 ©UCR
Different Types of Memory
Type
Width Size
Approx
Notes
unloaded
(byte) (bytes) latency
(cycles)
Local
4
2560
1
Indexed addressing
post incr/decr
On-chip 4
Scratch
16K
60
Atomic ops
SRAM
4
256M
150
Atomic ops
DRAM
8
2G
300
Direct path to/fro MSF
22
Ref: [NPRD]
2003 ©UCR
IXA Software Framework
External
Processors
Control Plane Protocol Stack
Control Plane PDK
XScale
Core
C/C++
Language
Core Components
Core Component Library
Resource Manager Library
Microengine
Pipeline
Microblock Library
Micro
block
Micro
block
Protocol Library
Micro
block
Microengine
C Language
Utility Library
Hardware Abstraction Library
23
Ref: [NPRD]
2003 ©UCR
References
° [NPT] W. H. Mangione-Smith, G. Memik Network Processor
Technologies
° [NPRD] Patrick Crowley, Raj Yavatkar An Introduction to
Network Processor Research & Design, HPCA-9 Tutorial
24
2003 ©UCR