Mapping of scalable RDMA protocols to ASIC/FPGA

download report

Transcript Mapping of scalable RDMA protocols to ASIC/FPGA

Mapping of scalable RDMA
protocols to ASIC/FPGA
platforms
Yosef Gavriel Tirat-Gefen, PhD
Senior Member IEEE
Chief Scientist
Castel Systems Inc.
& Dept. Physics and Astronomy
George Mason University
Fairfax, VA
[email protected]
2004 MAPLD/205
1
Tirat-Gefen
Presentation Overview
• Motivation
• TCP Off-loading
• Zero-copying
• RDMA protocol
• RDMA protocol stack
• Structure of a RDMA card
• Results
• Conclusion
2004 MAPLD/205
2
Tirat-Gefen
Motivation
Supercomputer
or Server farm
Supercomputer
or Server farm
WAN
Terabyte
storage
Terabyte
storage
Workstation
Enabling high-bandwidth WAN applications
2004 MAPLD/205
3
Tirat-Gefen
Applications
• Distributed Command and Control.
• Signal processing (e.g. RADAR)
• Sharing of intelligence data real-time.
• Distributed large scale computation/
simulation of aerospace problems.
• Extension of storage area networks over a
wide area network (WAN).
• Enabling technology for modern
supercomputing installations.
2004 MAPLD/205
4
Tirat-Gefen
Traditional TCP/IP Networking
Application/O.S.
Application/O.S.
TCP
TCP
Layer 3 (IP)
Layer 2 (MAC)
Layer 1 (PHY)
2004 MAPLD/205
Router
Layer 3 (IP)
Layer 3
Layer 3
Layer 2 (MAC)
Layer 2
Layer 2
Layer 1 (PHY)
Layer 1
Layer 1
5
Tirat-Gefen
Standard Data Flow on TCP/IP
Application A
Memory Space
Application B
Memory Space
WAN/LAN
TCP Buffer/Stack
Memory Space
L3
2004 MAPLD/205
L2
TCP Buffer/Stack
Memory Space
L1
L1
6
L2
L3
Tirat-Gefen
Standard Data Flow on TCP/IP
• Traditional
TCP/IP copies data from
application to TCP memory buffer
• Leads to CPU lost cycles in buffer copying
• CPU gets overwhelmed to rates above 2.5
Gbps
• TCP/IP off-loading is a help but it does
not solve the problem on the receiver side
2004 MAPLD/205
7
Tirat-Gefen
TCP/IP off-load processing
Application/O.S.
TCP
Layer 3 (IP)
Mapped to hardware
Layer 2 (MAC)
TCP/IP offload
Layer 1 (Phy)
2004 MAPLD/205
Application/O.S.
Processor (TOE)
8
Tirat-Gefen
Zero-copying and TCP offloading
processing
Host CPU Cache Memory
TCP off-load Processor
TOE/NIC Card
Host CPU
Network buffer
WAN/LAN
Receive Buffer
Host Main Memory
2004 MAPLD/205
9
Tirat-Gefen
Zero-copying and TCP offloading
processing
• Zero-copying
is still not achieved as
receiver buffer is still copied back to
application memory space
• TCP/IP off-loading is not scalable
• RDMA protocols provide a solution
2004 MAPLD/205
10
Tirat-Gefen
RDMA data-flow for WAN
applications
Host Memory
Application
Memory
Space
Host CPU B
Host CPU A
WAN
RDMA NIC Card
2004 MAPLD/205
Host Memory
Application
Memory
Space
RDMA NIC Card
11
Tirat-Gefen
Scalable WAN-RDMA for
bandwidths above 10 Gbps
Host
10 Gbps
links
RDMA NIC Card for WAN
Tx Buffer
> 10 Gbps
RDMA Engine
MAC
PHY
WAN
Rx Buffer
DMA channel
2004 MAPLD/205
12
Tirat-Gefen
The RDMA protocol layers and
our prototype
Running
on Host
CPU
2004 MAPLD/205
ULP (e.g. iSCSI, NFS)
RDMA
DDP
MPA
SCTP
TCP
Layer 3 (e.g. IP)
Layer 2 (MAC)
Layer 1 (PHY)
13
FPGA
implementation
FPGA and
off-the-shelf
MAC/PHY
chips
Tirat-Gefen
Overall Hardware/Firmware Organization of the WAN RDMA card
PCI-Express/Hyper-transport Interface
IP/Firmware
module
RDMA Protocol
Engine
Rx Memory
controller
SCTP Protocol Engine
Rx Memory
Bank
Layer 3 (IP) Processor
Tx Memory
controller
Rx Memory
Bank
Data stream split/join unit
SAR
SAR
SAR
SAR
10GE/OC-192
framer
10GE/OC-192
framer
10GE/ OC-192
framer
10GE/OC-192
framer
PHY
PHY
PHY
2004 MAPLD/205
PHY
14
Tirat-Gefen
Present Results
• Currently using Virtex-II/Virtex-IIPro (Xilinx) as target
devices
for our cores
• Data indicate that most of the key cores will fit one FPGA
device (Virtex-II)
• Aggregate of all cores is spanning several FPGAs
• Intra-device communication is a issue, need to be careful with
PCB design.
• We are currently trying to accommodate most of the cores in
one FPGA.
•Most of the cores will be made available free-of-charge to
researchers in non-profit or government organizations.
2004 MAPLD/205
15
Tirat-Gefen
Conclusion
• Advent of Hyper-transport/ PCI-Express and
VITA (embedded computing) standards will
enable I/0 bandwidths above 10 Gbps locally
• Extension of RDMA protocol enables large
bandwidths over wide area networks
• The proposed cores will fulfill the natural
growth of bandwidth requirements in
commercial/defense/aerospace applications.
2004 MAPLD/205
16
Tirat-Gefen