Why DDP & RDMA In IETF?

download report

Transcript Why DDP & RDMA In IETF?

Remote Direct Memory Access
(RDMA) over IP
PFLDNet 2003, Geneva
Stephen Bailey, Sandburst Corp., [email protected]m
Allyn Romanow, Cisco Systems, [email protected]
RDDP Is Coming Soon
“ST [RDMA] Is The Wave Of The Future” – S Bailey
& C Good, CERN 1999
• Need:
–
–
–
–
standard protocols
host software
accelerated NICs (RNICs)
faster host buses (for > 1G)
• Vendors are finally serious:
Broadcom, Intel, Agilent, Adaptec, Emulex, Microsoft, IBM,
HP (Compaq, Tandem, DEC), Sun, EMC, NetApp, Oracle,
Cisco & many, many others
Overview
• Motivation
• Architecture
• Open Issues
CFP SigComm Workshop
• NICELI SigComm 03 Workshop
Workshop on Network-I/O Convergence:
Experience, Lessons, Implications
• http://www.acm.org/sigcomm/sigcomm2003/w
orkshop/niceli/index.html
High Speed Data Transfer
• Bottlenecks
– Protocol performance
– Router performance
– End station performance, host processing
• CPU Utilization
• The I/O Bottleneck
–Interrupts
–TCP checksum
–Copies
What is RDMA?
• Avoids copying by allowing network adapter
under control of application to steer data
directly into application buffers
• Bulk data transfer or kernel bypass for small
messages
• Grid, cluster, supercomputing, data centers
• Historically, special purpose fabrics – Fibre
Channel, VIA, Infiniband, Quadrics, Servernet
Traditional Data Center
Storage
Network
(Fibre Channel)
The World
Ethernet/
IP
application
Servers
A Machine
Database
Intermachine
Network
(VIA, IB,
Proprietary)
Why RDMA over IP? Business Case
• TCP/IP not used for high bandwidth
interconnection, host processing costs too
high
• High bandwidth transfer to become more
prevalent – 10 GE, data centers
• Special purpose interfaces are expensive
• IP NICs are cheap, volume
The Technical Problem- I/O Bottleneck
• With TCP/IP host processing can’t keep up
with link bandwidth, on receive
• Per byte costs dominate, Clark (89)
• Well researched by distributed systems
community, mid 1990’s. Industry experience.
• Memory bandwidth doesn’t scale, processor
memory performance gap– Hennessy(97),
D.Patterson, T. Anderson(97),
• Stream benchmark
Copying
Using IP transports (TCP & SCTP) requires
data copying
1
Packet
Buffer
User
Buffer
Data copies
2
Packet
Buffer
NIC
Why Is Copying Important?
• Heavy resource consumption @ high
speed (1Gbits/s and up)
– Uses large % of available CPU
– Uses large fraction of avail. bus bw –
min 3 trips across the bus
Test
Throughput
(Mb/sec)
Tx CPUs
Rx CPUs
1 GBE, TCP
769
0.5 CPUs
1.2 CPUs
1 Gb/s RDMA
SAN - VIA
891
0.2 CPUs
0.2 CPUs
64 KB window, 64 KB I/Os, 2P 600 MHz PIII, 9000 B MTU
What’s In RDMA For Us?
Network I/O becomes `free’ (still have latency though)
2500 machines using
30% CPU for I/O
1750 machines using
0% CPU for I/O
Approaches to Copy Reduction
• On-host – Special purpose software and/or
hardware e.g., Zero Copy TCP, page flipping
– Unreliable, idiosyncratic, expensive
• Memory to memory copies, using network
protocols to carry placement information
– Satisfactory experience – Fibre Channel,
VIA, Servernet
• FOR HARDWARE, not software
RDMA over IP Standardization
• IETF RDDP Remote Direct Data Placement
WG
– http://ietf.org/html.charters/rddp-charter.html
• RDMAC RDMA Consortium
– http://www.rdmaconsortium.org/home
RDMA over IP Architecture
Two layers:
• DDP – Direct Data Placement
• RDMA - control
ULP
RDMA
control
DDP
Transport
IP
Upper and Lower Layers
• ULPs- SDP Sockets Direct Protocol,
iSCSI, MPI
• DAFS is standardized NFSv4 on RDMA
• SDP provides SOCK_STREAM API
• Over reliable transport – TCP, SCTP
Open Issues
•
•
•
•
•
•
•
•
•
Security
TCP order processing, framing
Atomic ops
Ordering constraints – performance vs. predictability
Other transports, SCTP, TCP, unreliable
Impact on network & protocol behaviors
Next performance bottleneck?
What new applications?
Eliminates the need for large MTU (jumbos)?