iWARP Ethernet: Key to Driving Ethernet into the

Download Report

Transcript iWARP Ethernet: Key to Driving Ethernet into the

iWARP Ethernet
Key to Driving Ethernet into the Future
Brian Hausauer
Chief Architect
NetEffect, Inc.
Agenda
Situation overview
Performance considerations
Networking
Applications
New generation of adapters
Performance discussion and demo
Wrap up
Data Center Evolution
Separate Fabrics for Networking, Storage,
and Clustering
Users
NAS
switch
LAN
Applications
networking
storage
networking
clustering
networking
storage
clustering
networking
storage
clustering
clustering
Ethernet
iWARP
Ethernet
adapter
switch
adapter
Block
Storage
Storage
adapter
iWARP
Fibre Channel
Ethernet
switch
Clustering
Clustering
iWARP Quadrics,
Ethernet
Myrinet,
InfiniBand, etc.
SAN
Single Adapter for All Traffic
Converged Fabric for Networking, Storage,
and Clustering
Smaller footprint
Lower complexity
Higher bandwidth
Lower power
Lower heat dissipation
Switch
▪▪▪▪▪▪▪▪▪▪▪
Server Blade
Server Blade
adapter
Server Blade
Server Blade
Server Blade
NAS
Applications
Applications
networking
storage
clustering
Users
switch
Converged
iWARP Ethernet
iWARP iWARP iWARP iWARP iWARP
SAN
Networking Performance Barriers
Packet Processing
Intermediate Buffer Copies
Command Context Switches
app buffer
application
user
100%
I/O library
I/O cmd
context switch
server
software
OS TCP/IP
OS buffer
device
driver
driver buffer
I/O cmd
hardware
I/O Adapter
I/O cmd
adapter buffer
I/O cmd
standard Ethernet
TCP/IP packet
application to OS
context switches
40%
Intermediate buffer copies
20%
transport processing
40%
I/O cmd
kernel
software
% CPU Overhead
I/O cmd
Eliminate Networking Performance
Barriers With iWARP
Packet Processing
Intermediate Buffer Copies
Command Context Switches
app buffer
application
% CPU Overhead
I/O cmd
user
100%
I/O library
I/O cmd
application to OS
context switches
context switch
60%
server
software
OS TCP/IP
OS buffer
I/O cmd
kernel
software
40%
device
driver
driver buffer
40%
Intermediate
buffer
copies
application
to OS
context switches
application to OS
transport processing
context switches
Intermediate
buffer copies
20%
40%
40%
20%
I/O cmd
Transport (TCP) offload
hardware
I/O Adapter
standard
Ethernet
TCP/IP
packet
TCP/IP
I/O cmd
adapter buffer
I/O cmd
RDMA / DDP
User-Level Direct Access/
OS Bypass
NetEffect NE010 iWARP Ethernet
standard Ethernet
TCP/IP packet
Application Performance Barriers
In Today’s Data Center
Non-overlapped socket send()
Usually means data is copied before transmit
on wire
On receive, transaction control info and
data payloads are usually multiplexed on
a single byte stream
To avoid an additional buffer copy on receive,
application often does not pre-post receive
buffers
Application Performance Solutions
in Tomorrow’s Data Center
Windows already provides overlapped I/O
to solve copy-on-transmit problem
Elimination of copy-on-receive requires
application to be RDMA-aware for typical
transaction protocols
Legacy Sockets App Performance Barrier
Non-Overlapped Socket send()
App performs
socket send #1
and blocks
Local Server
Network
Remote Server
OS builds
TCP/IP
data packets
NIC Tx
Time
NIC Rx
OS receives TCP/IP
data packets and
builds ACK packets
Application
Blocked!
NIC Tx
NIC Rx
OS receives ACK
packets and
unblocks App
App performs
socket send #2
and blocks
OSes typically eliminate application
blocking by copying application
socket send data into kernel buffers
Enhanced Sockets App Performance Fix
Winsock2 Overlapped Socket send()
App performs
Winsock2 Overlapped
socket send #1
Local Server
Remote Server
OS builds
TCP/IP
data packets
Application
Blocked!
App performs
Winsock2 Overlapped
socket send #2
Time
Network
NIC Tx
NIC Rx
OS builds
TCP/IP
data packets
NIC Tx
NIC Tx
NIC Rx
NIC Rx
OS receives ACK
packets and notifies
App of completion
OS receives TCP/IP
data packets and
builds ACK packets
NIC Tx
NIC Rx
OS receives ACK
packets and notifies
App of completion
OS receives TCP/IP
data packets and
builds ACK packets
Legacy Sockets App Performance Barrier
No Pre-Posted Socket recv()
// Pseudocode showing legacy sockets app receive algorithm
while (1) {
post socket recv() to obtain transaction control message;
identify pre-allocated app buffer pertaining to received control message;
post socket recv() to move transaction data payload into identified buffer;
}
p
s
d
Application buffers in Host memory
Transaction
Protocols such as
iSCSI multiplex
control info and
data payloads on a
single byte stream
CtrlMsg
Msg#2
#1
Ctrl
Ctrl Msg #3
Ctrl Msg #4
Data
Data
Data
Payload
Data
Payload
Payload
s
Payload
p
s+1
d
RDMA Aware Sockets App Performance Fix
Use Direct Data Placement (DDP)
Intelligent NIC uses iWARP headers embedded in the packets to
directly place data payloads in pre-allocated app buffers
Eliminates software latency loop from legacy sockets apps
p
q
d
Application buffers in Host memory
CtrlMsg
Msg#2
#1
Ctrl
Ctrl Msg #3
Ctrl Msg #4
Data
Data
Data
Payload
Data
Payload
Payload
Payload
pq
q+1
d
iWARP Receive Queue
Preposted buffers for
Control Messages
Networking Performance Continuum
Application
Characteristics
Networking
Offloads
Availability
Legacy
Sockets
App
Legacy
Sockets
App
Enhanced
Sockets
App
Enhanced
Sockets
App
RDMA-enabled NIC
supporting
WSD
Layer 2 traditional
NIC only
Now
RDMA aware
Sockets
App
RDMA-enabled NIC
supporting
RDMA Chimney
Future Windows Server
release
Ethernet Adapters Are
Evolving To Require...
Networking offloads defined by RDMAC and IETF iWARP extensions to
TCP/IP
Transport (TCP) offload
RDMA / DDP
User-Level Direct Access/OS Bypass
Ability to eliminate both networking and
application performance barriers
Simultaneous support for traditional sockets
and RDMA-aware applications
Industry standard h/w and s/w interfaces
Performance
> 1 million messages per second
< 10% CPU utilization
< 10us end-to-end application latency
Scalability
100k’s of simultaneous connections
Architecture that scales to multiple 10 Gb Ports
NE010 10 Gb iWARP
Ethernet Channel Adapter
iWARP Demonstration
Enhanced sockets application running on
iWARP hardware through Winsock Direct
RDMA-enabled application running
on iWARP hardware through iWARP
Verbs emulating RDMA-aware
sockets application
NE010 iWARP adapter
Ethernet
adapter
NE010 iWARP
Ethernet
Network Application Performance
Unidirectional B/W vs. Message Size
PCI-X Bus B/W Limit
8.00
7.00
6.00
Gb/s
5.00
4.00
3.00
2.00
1.00
0.00
1
NetEffect WSD Overlapped I/O
Host Stack Overlapped I/O
10
Message Size (KB)
100
NetEffect WSD Non-Overlapped I/O
Host Stack Non-Overlapped I/O
1000
NetEffect RDMA-aware App
Network Application
CPU Utilization
GBits per CPU GHz versus Message Size
90.00
80.00
GBits per CPU GHz
70.00
60.00
Conventional wisdom:
Traditional NIC with
Host Stack capable of 1
Gb per x86 CPU GHz
50.00
40.00
30.00
20.00
10.00
0.00
1
10
Host Stack Overlapped I/O
Message Size (KB)
100
NetEffect WSD Overlapped I/O
1000
NetEffect RDMA-aware App
Takeaways
iWARP Ethernet Channel Adapters
Eliminate networking barriers
Support Microsoft’s advanced APIs enabling
application evolution for performance
NetEffect iWARP Ethernet
Channel Adapters
Industry leading 10 GB Ethernet throughput,
CPU utilization and latency
Available now
Call To Action
Deploy Winsock Direct with iWARP RDMA
to boost performance of
existing applications
Plan for convergence of networking,
storage and clustering enabled by 10 GB
iWARP Ethernet Channel Adapters
Develop RDMA-aware applications for
optimal performance
Additional Resources
Web Resources
NetEffect: www.neteffect.com
iWARP Consortium:
www.iol.unh.edu/consortiums/iwarp/
Specs
RDMA Consortium: www.rdmaconsortium.org
IETF RDDP WG: www.ietf.org/html.charters/rddpbh2006 @ neteffect.com
charter.html
White Papers
Asynchronous Zero-copy Communication for
Synchronous Sockets
nowlab.cse.ohio-state.edu/publications/confpapers/2006/balaji-cac06.pdf
Contact info