Transcript ppt File
VIA and Its Extension To
TCP/IP Network
Yingping Lu ([email protected])
Based on Paper “Queue Pair IP, …”
by Philip Buonadonna
Outline
Motivation
VIA Overview
QP/IP Architecture
QP/IP Performance
Summary
Motivation
High performance computing, clustering applications
require high-throughput, low-latency communications
facility
Traditional TCP/IP is not designed for highthroughput, low-latency communications
Application software has not kept pace with the
increase of I/O speed
Memory copy
Checksum Computation
Interrupt
Context Switching
Typical Communication Data
Path
Bandwidth Comparison
Throughput (MB/s)
Bandwidth Comparison
1000
800
600
TCP/IP
400
VIA
200
0
6
25
2
51
24
0
1
48
0
2
96
0
4
92
1
8
84
3
16
Message Length (Bytes)
68
7
32
VIA Solution
VIA is a industry standard convened by
Microsoft, Compaq, Intel.
Key features of VIA:
Reduce memory copy (Zero-copy)
Direct user level access to NIC hardware
Eliminate OS kernel from critical path
Collapse ISO/OSI model
Offload CPU processing to intelligent NIC
VIA Architecture
VIA Components
Consumer
The end entity to use VIA function to communicate, can be
user-level or kernel
Use VIPL for programming
VI User Agent
Implements OS bypassing agent
Kernel Agent
Device driver, handle security and OS-related issues
VIA-capable NIC (Channel Adapter)
Implements VIA communications
Programming Abstraction
Queue Pairs
Components
Send queue
Receive queue
Completion queue (status)
Data Movement Operations
Send/Receive
RDMA Read
RDMA Write
Virtual Interface (Queue Pair)
Memory Access
Memory Registration
Memory must registered before use
System pins out the memory region
Nic use DMA to transfer data from memory to Nic
Memory Protection
Registered memory are associated with a VI
consumer and only valid to the VI consumer
Gather/Scatter list
Gather list: a list of registered source data buffers
(read)
Scatter List: a list of registered destination data
buffers (write)
Memory Model
Registered
Memory
Region
Virtual
Memory
Space
Page 1
Page n-1
Physical Memory
Page 0
Descriptor
A work queue element to be placed into
queue pair (send or receive queue)
Contains control segment and a list of
address segment
Specifies operation command, memory
address, size
Door Bell
An asynchronous
mechanism to notify VI
NIC of a new work
queue post
Door Bell can be a
register in NIC accessed
by both CPU and NIC
VIPL
Descriptor
1 VI NIC
0
Operation Example –
Send/Receive
Sender:
Consumer:
Receiver
Register receive buffer
Post a receive buffer in
the receive queue
Register send buffer
Post a Send work
queue element
Channel Adapter:
Send out the data
and header, data are
retrieved directly
from consumer
memory
Consumer:
Channel Adapter:
Receive packets from
sender
Find out a receive
queue element in the
receive queue
Move data directly to
the buffer specified in
the receive queue
element
Operation Example - RDMA
Write
Initiator
Consumer:
Receiver
Register receiving buffer
address
Send the address, R-key
Register sending buffer
address
Get receiver’s address
Post a RDMA Write
Channel Adapter
Send out data with
header(the operation,
receiving address), data
are retrieved directly
from sender buffer
Consumer
and length to initiator
Channel Adapter
Receive data
Check the validity of
address in RDMA header
Move data directly to
the memory specified in
the RDMA header
Summary of VIA
Goal: low-latency, high-throughput by
offering direct access to NIC, Zero copy
Architecture components: consumer
(VIPL), UA, KA, VI-NIC
Main concepts: queue pairs, memory
pin, gather/scatter, descriptor, door bell
Operations: Send/Receive, RDMA Read,
RDMA Write
Why QP/IP
TCP/IP network is robust, ubiquitous
However, TCP/IP is not designed for highperformance, low-latency purpose
Queue Pair abstraction provides a way to
offload CPU processing, reduce the critical
data path, provide memory zero copy
The Integration of QP and IP may be able to
reduce the latency, improve the throughput
between end-end node applications
connected through TCP/IP network
Challenges to QP/IP
Provide a VIPL supporting QP/IP
Integration of connection setup
Handle message segmentation
Implement TCP/IP mechanism at NIC
Handle message boundary for TCP
Handle zero-copy in the event of packet
loss
QP/IP Architecture
QPIP Components
FSM:
Doorbell FSM
Sched/XMT FSM
RECV FSM
Mgmt FSM
Major Data Abstract
QPs
CQs
TCP Control Block (TCB)
QP/IP State Machines
QPIP Prototype
Three components
Application Library
PostSend(), PostRecv(), Poll(), Wait()
Kernel driver
Initialization
Address mapping mechanism
Interrupt service
Network interface firmware
Implement TCP, UDP, IPV6 protocols
Application-Application RTT
Application Throughput & CPU
Utilization
Network Interface Processing
Cost
QPIP Based on NBD
NDB Client Throughput and
CPU Effectiveness
Summary
Integrate the QP concept from VIA with the
ubiquitous TCP/IP network
Provide low-latency, high throughput for SAN
QP/IP contains doorbell FSM, Sched/XMT FSM, RECV
FSM, Mgmt FSM. It also contains QPs, CQs, TCB data
structure.
Demonstrate comparable performance, much lower
CPU utilization with modest hardware.
The programmability also adds flexibility to adapt
with the evolvement of TCP/IP and scheduling
requirements.
Issues
How to integrate TOE in the mechanism?
How to effectively handle message boundary in
TCP to support upper level application, I.e.
iSCSI? How to handle segmentation?
How to support zero-copy in the case of packet
loss?
How to extend this into a WAN environment
(more unpredictability, fluctuation of latency,
available bandwidth, congestion, LFN)?
How to effectively support OSD communication?
Questions?