Transcript Slide 1
ECE 526 – Network
Processing Systems Design
Hardware Architecture for Protocol Processing
Chapter 8: D. E. Comer
Goal
• Understand hardware architecture of protocol processing
• Learn the key metric of protocol processing system:
─ aggregated packet rate
• Learn the key requirements of protocol processing
system
─ High throughput
─ Scalability
• Survey mechanisms to design scalable protocol
processing systems
Ning Weng
ECE 526
2
Outline
• First generation of network processing system
architecture
• Figure of metric of network processing system
• Possible ways to improve performance of network
processing systems
Ning Weng
ECE 526
3
1st Generation Network System
• Traditional software-based router
• Using conventional hardware
─
─
─
─
Single general-purpose processor handles most tasks
Single shared memory
I/O over a shared bus
NIC use same design as other I/O devices
Cheap but performance is poor!
Ning Weng
ECE 526
4
Figure of Metric of Network Systems
• Interface data rate
─ Rate at which data enters /leaves
• Aggregate data rate
─
─
─
─
Sum of interface rates
Measure of total data rate system can handle
Note: aggregate rate crucial if CPU handles traffic from all interfaces
Could be misleading if packet size varying and processing cost constant
• Aggregate packet rate
─ Sum of the number of packets enters / leaves system
─ More important for protocol processing (no touch payload)
─ Why?
• Packet rate vs. data rate
─ CPU metric: per-packet rate
─ Interface hardware metric: per-bit (data) rate
Ning Weng
ECE 526
5
Data Rate vs. Packet Rate
• Packet size: small 64 byte; large 1518 byte
• For protocol processing, with same data rate, which is more difficult
for network processing system?
─ Smallest packet or
─ Biggest packet
• How to calculate the packet rate?
Ning Weng
ECE 526
6
Aggregate Packet Rate
Ning Weng
ECE 526
7
Time per Packet
• Aggregate packet rate determines time per packet
• Each packet processing requires in the order of 100s to
1000s instruction per packet
Ning Weng
ECE 526
8
Feasibility Analysis
• Design a software router
─ data rate 10Gbps
─ Assuming small packets (64B)
─ Assuming each packet need 10,000 instruction to process
• Can Intel 80986@2009 do the job?
─
─
─
─
CPU:24Ghz
1 billion transistors
Address bus bit: 64
CPU is a RISC machine which can execute an instruction per clock
cycle
• Hint:
─ What is the packet rate?
─ What is the processing requirement in MIPS?
• Single CPU router lacks scalability. How multi-core?
Ning Weng
ECE 526
9
Scalability
• The capability of a system that can be easily extended in
“size” and performance
─ E.g., CPU with more memory slots and disk slots; router can add
more ports or faster links
• Why we care scalability?
─ Design a new system is timing consuming and expensive
─ Performance requirement increase fast
─ Others
• How can we make a network system more scalable?
─ Optimized processing engines
─ Intelligent NICs
─ Parallel processing by duplicating processing engines + NICs
Ning Weng
ECE 526
10
Processing Power
• Overcoming processing bottlenecks:
─
─
─
─
─
Specialized hardware (ASICs)
Fine-grained parallelism
Symmetric coarse-grain parallelism
Asymmetric coarse-grain parallelism
Special-purpose coprocessors
• Other improvements
─ NICs with onboard processing
─ Smart NICs
─ => basically same as per-port processing engines
Ning Weng
ECE 526
11
Parallelism in Processors
• Fine-grained parallelism
─ Exploits instruction-level parallelism
─ Examples: VLIW, SMT, etc.
─ Limited due to workload
• Symmetric coarse-grain parallelism
─ Multiple parallel identical CPUs
─ Inter-processor communication can limit performance
• Asymmetric coarse-grain parallelism
─ Multiple parallel different CPUs
─ E.g., one processor for layer 2, one for layer 3
• Special-purpose coprocessors
─ Custom logic for lookups, checksums, etc.
─ High-performance but not (fully) programmable
• Key question: how such a system can be programmed
• Duplicate processing engines --- Advance router architecture
Ning Weng
ECE 526
12
Advanced Router Architecture
S.Keshav etc. IEEE
Communication 1998
•
•
•
•
•
Port: point of attachment for a physical link
Switching Fabric (SF): interconnect input & output ports
Line Card: the device connecting between link and SF
Routing Processor: create forwarding tables using routing protocol
Queues: buffers between input port and SF or SF and output port
Ning Weng
ECE 526
13
Advanced Router Architecture
• Changing requirements
─
─
─
─
Increasing link speed
Increasing number of ports
Increasing routing tables
Increasing processing complexity
• Scalable system design:
─ Exploit parallelism wherever possible
• Per-port, per-flow, per-packet, instruction-level
─ One Processing engine per port (instead of single CPU)
─ Multiple processors per port
─ “Better” processors
Ning Weng
ECE 526
14
Reminder
• Read Comer: chapter 11 & 12
• Sep. 19: project group leader email me your group
members for the project
• Sep. 24: homework 1 due
Ning Weng
ECE 526
15