Transcript Slide 1

ECE 526 – Network
Processing Systems Design
Network Processor Introduction
Chapter 11,12: D. E. Comer
Goal
• Understanding the inefficiency of 1st, 2nd and 3rd
generation network processing systems
─ Scalability plus flexibility
• Recognizing the necessity of new solution: 4th generation
(network processor technology)
• Learning
─ courage to appreciate the challenges
─ skill to characterize the “real” problem
─ art to propose an engineering solution
• Be aware of current network processor is a conceptual
and general term
Ning Weng
ECE 526
2
Recall 1ST
• 1st generation network processing system
• Feasibility study
─ Design a software router
• data rate 10Gbps
• Assuming small packets (64B)
• Assuming each packet need 10,000 instruction to process
─ Can Intel 80986@2007 do the job?
• CPU:24Ghz
• MIPs:125,000 (Million Instruction Per Second)
• 1 billion transistors ….
─ Conclusion: not feasible
• What is the real problem here?
Ning Weng
ECE 526
3
Real Problem is
• Technology push: uneven
link bandwidth 2 x / year
106 x
105 x
Growth
─ Link bandwidth scaling much
faster than CPU and memory
technology
─ Transistor scaling and VLSI
technology help but not
enough
107 x
104 x
103 x
102 x
CPU 2 x / two years
10 x
• Application pull: harder
1x
─ More complex applications are
required
─ Processing complexity is
defined as the number of
instructions and number of
memory access to process
one packet
Ning Weng
Mem improvement in latency 10% / year
1975
Hundreds of
instructions per
packet
Layer 2
IPv4
switching
routing
ECE 526
1980
1985
1990
1995
2000
2005
Thousands of
instructions per
packet
Flow
Intrusion
Encryption
Classification
detection
Processing Complexity
4
What is the ideal platform?
•Structured
ASIC
•Network
Processor
•Reconfigura
ble Coprocessors
•FPGA
5
2nd and 3rd Generations
• 2nd generation: offloading and decentralized
• 3rd generation: further offloading and using specialized devices
(ASIC + embedded processors)
• Problems: losing the flexibility and very cost, why?
Ning Weng
ECE 526
6
Why not ASIC?
• High cost to develop
─ Network processing moderate quantity market
• Long time to market
─ Network processing quickly changing services
• Difficult to simulate
─ Complex protocol
•
•
•
•
•
Expensive and time-consuming to change
Little reuse across products
Limited reuse across versions
No consensus on framework or supporting chips
Requires expertise
Ning Weng
ECE 526
7
Network Processors
• Question: where does NP gain higher performance from,
compared with conventional processor?
Ning Weng
ECE 526
8
Instruction Set: minimality
• Not general as RISC and CISC processor
─ E.g. no floating point instructions
─ Optimized for packet processing functions only
• Not specific to a protocol or part a protocol
• Seek a minimal set of instruction set of instructions
sufficient to handle arbitrary protocol,
─ plus specific instructions for protocol processing
• Example : atomic operation
─ Hard problem and will cover later
Ning Weng
ECE 526
9
Architecture: multiprocessor
• Parallelism
─ The nature of workload network processing: high parallel
•
•
•
•
Flow-level
Queue-level
Packet-level
Protocol-level
• Pipelining
─ Pipeline will help system performance at cost of longer delay
─ Is this acceptable?
• System-on-chip
─ Processing: RISC core
─ Memory: register, cache, instruction store, scratch pad, SRAM and
SDRAM
─ I/O: network /switch fabric interfaces
• Question: how hard to build and use this NPs?
Ning Weng
ECE 526
10
Typical Processing
Ning Weng
ECE 526
11
Case Study: IPv4 Packet Forwarding
•2-port •From (0)
router (2
Gbps) •From (1)
•To (0)
•Lookup
•IPRoute
•To (1)
•Xilinx Virtex-II Pro
FPGA (2VP30)
•Root
•Memory
access 2
•0
•1
•b •b
•Memory
access 5
•Memory
12 access 6
•0 •1 •F
•FFF
•FF
E
•000 •001 •002 •003
•Memory
•b
•a
•a
access 1 •a
•F
•e
•0
•c
•1
•F
•d •d
•a
•b
•c
•d
•e
•Prefix (hex : binary)
•: 0*
•IP Lookup:
•002 : *
•longest prefix match
•002F : *
•FFE : 000*
•(trie lookup algorithm)
•FFF : *
Multiprocessor for Header Processing
•RS232 •Timer •BRAM
•Packet
Reception
•Verify
•Lookup-1
•Lookup-2
•Transmit
•Verify
•Lookup-1
•Lookup-2
•Transmit
•Verify
•Lookup-1
•Lookup-2
•Transmit
•Verify
•Lookup-1
•Lookup-2
•Transmit
•FS
L
•BRAM
•Packet
Transmission
•OP
B
•LEDs
•FIFO
queues
•BRAM
•BRAM
13
Typical using NPs
Router
Port
packets
Port
Switching fabric
Router Port
Processor
Core
Processor
Core
Processor
Core
Coprocessor
Processor
Core
Port
Port
Processor
Core
Interconnect
Network Interface
Network Processor
Coprocessor
I/O
Ning Weng
ECE 526
14
System Implementation Space
Ning Weng
ECE 526
15
Memory Architecture
• Memory access bottleneck
• Memory is area consuming
─ Limited memory-on-chip
─ Limited bandwidth to off-chip
memory: pin and package cost
─ Off-chip memory access is slow:
100 cycles
• Possible solutions
─ Profiling application memory
access pattern
─ Propose heterogeneous memory
architecture
─ Memory aware mapping
─ Transactional memory (project
topic)
Ning Weng
ECE 526
16
Application Mapping
Mapping
Current approach: fixed topology, assembly coding & hand-tuning
Ning Weng
ECE 526
17
Basic Steps for Mapping
•From (0)
•From (1)
•To (0)
•Lookup
•IPRoute
•To (1)
•MEM
•MEM
•FPGA •PE
•PE •FPGA
•Application description
•High-level optimizations
•Task graph
•(platform specific)
•Profile
•Architecture configuration
•MEM
•FPGA •PE
•PE •FPGA
•MEM
•MEM
18
•HW / SW partitioning
•Task allocation
•Data layout
•Communication assignment
•Compilation / Synthesis
Summary
• Network Processor
─
─
─
─
Special purpose, programmable hardware device
Optimized for network processing
Building blocks of network processing systems
Fundamental ideas
• Flexibility through programmability
• Scalability with parallelism and pipelining
• Here, NP is a concept
─ We will learn example of network processor soon
Ning Weng
ECE 526
19
For Next Class & Announcement
•
•
•
•
Read Comer: chapter 13 and 14
Lab 1 total grade reduce to 82
HW 1 due Wed.
Project topic will be announced after Wed.
Ning Weng
ECE 526
20