Transcript lesson21

Checksum ‘offloading’
A look at how the Pro1000 NICs
can be programmed to compute
and insert TCP/IP checksums
Network efficiency
• Last time (in our ‘nictcp.c’ demo) we saw
the amount of work a CPU would need to
do when setting up an ethernet packet for
transmission with TCP/IP protocol format
• In a busy network this amount of packetcomputation becomes a ‘bottleneck’ that
degrades overall system performance
• But a lot of that work can be ‘offloaded’!
The ‘loops’ are costly
• To prepare for a packet-transmission, the
device-driver has to execute a few dozen
assignment-statements, to set up fields in
the packet’s ‘headers’ and in the Transmit
Descriptor that will be used by the NIC
• Most of these assignments involve simple
memory-to-memory copying of parameters
• But the ‘checksum’ fields require ‘loops’
Can’t ‘unroll’ checksum-loops
• One programming technique for speeding
up loop-execution is known as ‘unrolling’,
to avoid the ‘test-and-branch’ inefficiency:
int
sum = 0;
sum += wp[0];
sum += wp[1];
sum += wp[2];
…
sum += wp[99];
• But it requires knowing in advance what
number of loop-iterations will be needed
The ‘offload’ solution
• Modern network controllers can be built to
perform TCP/IP checksum calculations on
packet-data as it is being fetched from ram
• This relieves a CPU from having to do the
most intense portion of packet preparation
• But ‘checksum offloading’ is an optional
capability that has to be ‘enabled’ – and
‘programmed’ for a specific packet-layout
‘Context’ descriptors
• Intel’s Pro1000 network controllers employ
special ‘Context’ Transmit-Descriptors for
enabling and configuring the ‘checksumoffloading’ capability
• Two kinds of Context Descriptor are used:
– An ‘Offload’ Context Descriptor (Type 0)
– A ‘Data’ Context Descriptor (Type 1)
Context descriptor (type 0)
63
48 47
40 39
TUCSE
TUCSO
MSS
HDRLEN
32 31
TUCSS
RSV
16 15
IPCSE
STA TUCMD
DTYP
=0
8
IPCSO
7
0
IPCSS
PAYLEN
DEXT=1 (Extended Descriptor)
Legend:
IPCSS (IP CheckSum Start)
IPCSO (IP CheckSum Offset)
IPCSE (IP CheckSum Ending)
PAYLEN (Payload Length)
TUCMD (TCP/UCP Command)
HDRLEN (Header Length)
TUCSS (TCP/UDP CheckSum Start)
TUCSO (TCP/UDP CheckSum Offset)
TUCSE (TCP/UDP CheckSum Ending)
DTYP (Descriptor Type)
STA (TCP/UDP Status)
MSS (Maximum Segment Size)
The TUCMD byte
7
IDE
6
5
4
SNAP
DEXT
(=1)
reserved
(=0)
3
RS
2
TSE
Legend:
IDE (Interrupt Delay Enable)
SNAP (Sub-Network Access Protocol)
DEXT (Descriptor Extension)
RS (Report Status)
TSE (TCP-Segmentation Enable)
IP (Internet Protocol)
TCP (Transport Control Protocol)
always valid
valid only when TSE=1
1
IP
0
TCP
Context descriptor (type 1)
63
48 47
40 39
32 31
16 15
8
7
ADDRESS
VLAN
POPTS
RSV
STA DCMD
DTYP
=1
DTALEN
DEXT=1 (Extended Descriptor)
Legend:
DTALEN (Data Length)
DTYP (Descriptor Type)
DCMD (Descriptor Command)
STA (Status)
RSV (Reserved)
POPTS (Packet Options)
VLAN (VLAN tag)
0
The DCMD byte
7
IDE
6
5
4
VLE
DEXT
(=1)
reserved
(=0)
3
RS
2
TSE
Legend:
IDE (Interrupt Delay Enable)
VLE (VLAN Enable)
DEXT (Descriptor Extension)
RS (Report Status)
TSE (TCP-Segmentation Enable)
IFCS (Insert Frame CheckSum)
EOP (End Of Packet))
always valid
valid only when EOP=1
1
IFCS
0
EOP
Our usage example
• We’ve created a module named ‘offload.c’
which demonstrates the NIC’s checksumoffloading capability for TCP/IP packets
• It’s a modification of our earlier ‘nictcp.c’
character-mode device-driver module
• We have excerpted the main changes in a
class-handout – the full version is online
Data-type definitions
// Our type-definition for the ‘Type 0’ Context-Descriptor
typedef struct
{
unsigned char
unsigned char
unsigned short
ipcss;
ipcso;
ipcse;
unsigned char
unsigned char
unsigned short
tucss;
tucso;
tucse;
unsigned int
unsigned int
unsigned int
paylen:20;
dtyp:4;
tucmd:8;
unsigned char
status;
unsigned char
hdrlen;
unsigned short
mss;
} TX_CONTEXT_OFFLOAD;
Definitions (continued)
// Our type-definition for the ‘Type 1’ Context-Descriptor
typedef struct
{
unsigned long long base_addr;
unsigned int
unsigned int
unsigned int
dtalen:20;
dtyp:4;
dcmd:8;
unsigned char
status;
unsigned char
pkt_opts;
unsigned short
vlan_tag;
} TX_CONTEXT_DATA;
typedef union
{
TX_CONTEXT_OFFLOAD
TX_CONTEXT_DATA
} TX_DESCRIPTOR;
off;
dat;
Our packets’ layout
Ethernet Header
(14 bytes)
IP Header
(20 bytes)
TCP Header
(20 bytes)
Packet-Data
(length varies)
14 bytes
10 bytes
16 bytes
HDR
CKSUM
(no options)
TCP
CKSUM
(no options)
How we use contexts
• Our ‘offload.c’ driver will send a ‘Type 0’
Context Descriptor within ‘module_init()’
txring[ 0 ].off.ipcss = 14;
txring[ 0 ].off.ipcso = 24;
txring[ 0 ].off.ipcse = 34;
// IP-header CheckSum Start
// IP-header CheckSum Offset
// IP-header CheckSum Ending
txring[ 0 ].off.tucss = 34;
txring[ 0 ].off.tucso = 50;
txring[ 0 ].off.tucse = 0;
// TCP/UDP-segment CheckSum Start
// TCP/UDP-segment Checksum Offset
// TCP/UDP-segment Checksum Ending
txring[ 0 ].dtyp = 0;
// Type 0 Context Descriptor
txring[ 0 ].tucmd = (1<<5)|(1<<3);
// DEXT=1, RS=1
iowrite32( 1, io + E1000_TDT );
// give ownership to NIC
Using contexts (continued)
• Our ‘offload.c’ driver will then use a Type 1
context descriptor every time its ‘write()’
function is called to transmit user-data
• The network controller ‘remembers’ the
checksum-offloading parameters that we
sent during module-initialization, and so it
continues to apply them to every outgoing
packet (we keep our same packet-layout)
Sequence of ‘write()’ steps
•
•
•
•
•
•
•
•
Adjust the ‘len’ argument (if necessary)
Copy ‘len’ bytes from the user’s ‘buf’ array
Prepend the packet’s TCP Header
Insert the pseudo-header’s checksum
Prepend the packet’s IP Header
Prepend the packet’s Ethernet Header
Initialize the Data-Context Tx-Descriptor
Give descriptor-ownership to the NIC
The TCP pseudo-header
• We do initialize the TCP Checksum field, (but
this only needs a short computation)
Zero
Protocol-ID
TCP Segment-length
(= 6)
Source IP-address
Destination IP-address
• The one’s complement sum of these six words is
placed into ‘TCP Checksum’
Setting up the Type-1 Context
int
txtail = ioread32( io + E1000_TDT );
txring[ txtail ].dat.base_addr = tx_desc + (txtail * TX_BUFSIZ);
txring[ txtail ].dat.dtalen = 54 + len;
txring[ txtail ].dat.dtyp = 1;
txring[ txtail ].dat.dcmd = 0;
txring[ txtail ].dat.status = 0;
txring[ txtail ].dat.pkt_opts = 3;
// IXSM=1, TXSM=1
txring[ txtail ].dat.vlan_tag = vlan_id;
txring[ txtail ].dat.dcmd |= (1<<0);
txring[ txtail ].dat.dcmd |= (1<<3);
txring[ txtail ].dat.dcmd |= (1<<5);
txring[ txtail ].dat.dcmd |= (1<<6);
txtail = (1 + txtail) % N_TX_DESC;
iowrite32( txtail, io + E1000_TDT );
// EOP (End-Of-Packet)
// RS (Report Status)
// DEXT (Descriptor Extension)
// VLE (VLAN Enable)
In-class demonstration
• We can demonstrate checksum-offloading
by using our ‘dram.c’ device-driver to look
at the packet that is being transmitted from
one of our ‘anchor’ machines, and to look
at the packet that gets received by another
‘anchor’ machine
• The checksum-fields (at offsets 24 and 50)
do get modified by the network hardware!
In-class exercise
• The NIC can also deal with packets having
the UDP protocol-format – but you need to
employ different parameters in the Type 0
Context Descriptor and arrange a ‘header’
for the UDP segment that has a different
length and arrangement of parameters
• Also the UDP protocol-ID is 17 (=0x11)
UDP Header
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0
1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0
Source Port
Destination Port
Length
Checksum
Data :::
Traditional ‘Big-Endian’ representation