High-Performance Networking With NDIS 6.0, TCP

Download Report

Transcript High-Performance Networking With NDIS 6.0, TCP

High-Performance Networking
With NDIS 6.0, TCP Chimney
Offload, and RSS
Vik Desai
Program Manager
Windows Networking
Microsoft Corporation
Appropriate Audience
Who should attend this session?
Networking product builders
Product decision makers
Hardware and software engineers
Architects
Network designers and deployers
IT Managers
IT Consultants
Venture Capitalists and Private Investors
Industry analysts
Agenda
Networking stack challenges
Scalable networking goals
Scalable networking architecture
Receive Side Scaling (RSS)
TCP Chimney Offload
Scalable networking demo
NetXen Demo – Vikram Karvat
Broadcom Demo – Uri Elzur
Offload roadmap
Summary and Call to Action
Networking Challenges
Receive processing limited to a single
CPU on a multi-processor system
CPU utilized in Protocol Processing
increases with Physical layer speeds
Data Movement between network and
application buffers is a bottleneck
Large number of Interrupts even with
Interrupt Moderation
Scalable Networking Goals
Boost application scalability on 1 GB and 10 GB
Ethernet with an integrated architecture
That preserves standard infrastructure (1500b MTU)
That maintains standard network and server
management practices
That does not compromise security, server reliability,
and application compatibility
Enable Ethernet fabric convergence
Robustly support new class of protocol offload
NICs in Microsoft Windows
Receive Side Scaling
Networking Challenge
Receive processing limited to a single CPU on
a multi-processor system
Solution
Parallelize receive processing by Queuing incoming
packets to multiple CPUs
Implementing Solution via RSS
NIC manages multiple hardware queues
NIC hashes incoming TCP segments to different
hardware queues
NIC driver requests DPCs on appropriate CPUs
RSS Description –
Non RSS Capable NIC
Processor 0
APP
D
P
C
ISR
Interrupt
Logic
TCPIP
NDIS
Receive
FIFO
Incoming Packet
Regular NIC
RSS Description –
RSS Capable NIC
Processor 0
Processor 1
APP
D
P
C
ISR
Processor 2
APP
D
P
C
TCPIP
NDIS
TCPIP
NDIS
APP
D
P
C
TCPIP
NDIS
RSS CapableNIC
Interrupt
Logic
Receive
FIFOs
Toeplitz
Hash
Incoming Packet
TCP Chimney Offload
Networking Challenges
Data Movement between network
and application buffers is a bottleneck
Large number of Interrupts even with
Interrupt Moderation
CPU utilized in Protocol Processing increases
with Physical layer speeds
Solution
Provide Zero Copy solution for pre posted buffers
Change interrupts from a per packet basis to
a per segment basis
Offload Protocol Processing to hardware
TCP Chimney Architecture
Applications
Other Misc. Layers
Switch
TCP Chimney Interfaces
Path Layer IPv4 or IPv6
State Updates
Framing Layer (Ethernet)
NDIS 5.2 / 6.0
NDIS Miniport Driver
TCP Chimney Offload Capable Hardware
Data Transfer
Transport Layer (TCP)
TCP Chimney Interface Details
TCP/IP States Divided into
Const State – Does not change for connection lifetime
Cached State – Controlled by host stack and updated
appropriately to offload target
Delegated State – Controlled by Offload Target
NDIS Supports
Offload Capability Advertisement
Interface to transfer and update state information
Interface to query statistics
Interface to transfer data
TCP Chimney Initialization
Offload Manager determines suitability
of connection for offload
State from each layer is captured and
transferred to offload target
Incoming Data packets/outgoing sends
are queued
Data packets will be replayed to offload
targets for successful offload attempts
Data packets will be processed by stack
for unsuccessful offload attempts
Data Transfer Begins
TCP Chimney Data Transfer
Sends
Segment passed to offload target
for completion
Send Completions after end-to-end TCP Ack
Receive
If no receive buffers posted indicate data
If receive buffers are posted indication
occurs as appropriate
OOB/Urgent Data passed to Host Stack
TCP Chimney
Connection Teardown
Connections can be uploaded/offloaded
at any time
Heuristics Manager tracks connections
appropriate for upload/offload
Half Closed Connections are not uploaded
Upload request initiated by offload target
Offload target to provide delegated state to host stack
Offload target keeps connection state till host
sends upload call
TCP Chimney Implications
IPsec Chimney required for IPsec traffic
Will not work with
IM drivers incapable of understanding
Chimney interfaces
Hooking Firewalls
Best benefits for
Long Lived Connections
Pre-posted Receive Buffers
Large Application IO Sizes
10GbE Chimney Offload
Vikram Karvat
VP Marketing
[email protected]
Faisal Latif
Principal Software Engineer
[email protected]
NetXen
Next generation Ethernet silicon provider
focused on server OEMs
Chips, Boards, S/W
Founded February 2002
Top tier investors
Accel, Benchmark, Integral Capital
Expertise in semiconductor, software, systems
and servers
Intelligent NIC™ product line
Launched March 27, 2006
REAL products, REAL customers
Intelligent NIC Architecture
DDR
GbE
CAM
L2 Caches
QM
CORE INTERCONNECT
FABRIC
Flow Classifier
10GE
10GbE
QDR
PCI-E 8X
Protocol
Processing
Engine
Single-Chip
Dual 10GbE
Quad GbE
Protocol Features
TCP/IP
RDMA
iSCSI
Virtualization
Security
Native 8X PCI-express
1X/4X/8X
NetXen 10GbE Chimney
Rx
3.4 GHz Xeon
Tx
10GbE Switch
Windows Server 2003 SP1 with SNP
Windows Server 2003 SP1 with SNP
10GbE Chimney Results
60% Throughput
Throughput
800% Processor Efficiency
CPU Utilization
10000
70
60
8000
7000
50
6000
40
5000
4000
30
3000
20
2000
10
1000
0
0
NIC (1500 Byte)
Configuration: DP Xeon, 3.4GHz, HT off, 2GB
NIC (Jumbo)
Chimney (1500 Byte)
CPU (%)
Throughput (Mb/s)
9000
Demo Conclusion
10GbE is happening NOW
Chimney enables
Scalability with balanced system design
Increased datacenter power efficiency
The Agile Datacenter requires
Adaptability, Scalability, Intelligence
Broadcom
Uri Elzur
Director, Advanced Technology
Broadcom
Gururaj Ananthateerta
Senior Staff Engineer
Broadcom
Scalable TCP Chimney enables
Convergence Over Ethernet
Scalable TCP Chimney - basis for Convergence over Ethernet
TCP based - Socket applications, iSCSI, iSCSI boot, iWARP (RDMA)
Microsoft’s SNP enable convergence over Ethernet
Sockets Applications
Secure (Network based security), robust
Windows Sockets
and standard compliant
Windows Socket Switch
implementation is required
Storage
RDMA Provider
Applications
Ethernet requires
Layer 2 functionality – VLAN,
WoL, power management
Integrated Management
Kernel
Mode
File System
Partition
TCP/IP
Class Driver
NDIS
C-NIC
iSCSI Port Driver
(iscsiprt
. sys)
NDIS IM
Driver
iSCSI
Miniport
NDIS
Miniport
HBA
NIC
User
Mode
RDMA Driver
RNIC
Broadcom’s C-NIC 2.5G/S
NTTCP over 2.5 GB/s TCP Chimney
NTTTCPs
C-NIC
Perfmon
BCM5708S
BCM5708S
fiber cable
S2 (TX/RX)
HP DL 380G4 server
3.4GHz Intel Xeon CPU
1 GB RAM
Windows Server 2003
SP1-SNP build 2670
Two BCM5708S NICs
Broadcom Miniport driver v 2.6.14*
S1 (TX/RX)
Broadcom 2.5G Switch
BCM56580 StrataXGS III
HP DL 380G4 server
3.4GHz Intel Xeon CPU
1GB RAM
Windows Server 2003
SP1-SNP build 2670
Two BCM5708S NICs
Broadcom Miniport driver v 2.6.14*
TCP Chimney scales…
Higher is better
Lower is better
Less CPU - TOE vs. L2 @2.5G
More Throughput - TOE vs. L2 @2.5G
Two Broadcom CNIC - L2
Two Broadcom CNIC - TOE
5
120
4
100
CPU Util [%]
BW [Gb/S]
Two Broadcom CNIC - TOE
3
2
1
0
TOE
L2
1
BW improvement TOE vs. L2
Two Broadcom CNIC - L2
80
60
40
20
0
TOE
L2
1
CPU Utilization reduction TOE vs. L2
• 2.5G/S offers more BW than non-TOE, at 1/6 of the CPU utilization
• Microsoft’s SNP combined with BCM5708 provides 7.5 times better P/E
• Performance Efficiency (P/E) is network throughput divided by CPU Utilization
• At Gigabit and beyond, TCP Chimney is critical to free up cycles for the
Demo: NTTTCP
applications
RSS Improves SMP Scalability
RSS Enabled
50000
RSS Disabled
40000
30000
20000
10000
Demo: Web Bench 5.0
Requests / Sec
60000
0
4
16
32
48
64
80
0
10
8
12
0
16
0
18
0
20
Number of connections
With RSS web traffic is more evenly distributed on multiple CPUs
Web Bench delivers up to 50% more requests/sec
Demo Conclusion
Broadcom’s C-NIC with Microsoft’s TCP
Chimney is here TODAY
TCP Chimney scales to accommodate the
needs of the server and applications
TCP Chimney is the basis for the future of
Networking in Windows
Architecture allows for IPsec based security
RSS provides for a better load spreading
on SMP servers
Scalable Networking
Pack Partners
Future Chimney Offloads
IPsec Chimney
RDMA Chimney
SSL Chimney
Call To Action
Develop low cost TCP Chimney Offload
and RSS hardware for Windows Vista and
Windows Server codenamed “Longhorn”
Deploy TCP Chimney Offload and RSS
hardware in enterprise and personal
computing environments
Additional Resources
Web Resources
Documentation, White Papers, and software bits available
today for TCP Chimney Offload and RSS:
http://support.microsoft.com/?kbid=912222
Specs: DDK and Documentation will available on:
www.microsoft.com/whdc
White Paper: http://www.microsoft.com/whdc/device/network/scale.mspx
Other Resources:
www.microsoft.com/snp
http://www.microsoft.com/whdc/device/network/netintro.mspx
Related Sessions
Net088 – Technical Overview of Microsoft’s NetDMA Architecture
Please send e-mail tondis6fb @ microsoft.com with questions
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.