Slides - Agenda INFN

Download Report

Transcript Slides - Agenda INFN

R&D on data transmission FPGA → PC
using UDP over 10-Gigabit Ethernet
Domenico Galli
Università di Bologna and INFN, Sezione di Bologna
XII SuperB Project Workshop, Annecy-les-Vieux, 18th March, 2010
Commodity Links
• More and more often used in HEP for DAQ, Event Building and
High Level Trigger Systems:
– Limited costs;
– Maintainability;
– Upgradability.
• Demand of data throughput in HEP is increasing following:
– Physical event rate;
– Number of electronic channels;
– Reduction of the on-line event filter (trigger) stages.
• Industry has moved on since the design of the DAQ for the
LHC experiments:
– 10 Gigabit Ethernet well established;
– 4x DDR Infiniband (16 Gb/s) ready;
– 100 Gigabit Ethernet is being actively worked on.
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
2
Evaluation of New Commercial
Link Technologies
• Bologna group, in its spare time, is constantly
evaluating new commodity link technologies:
– In the perspective of an employment in DAQ/EB/HLT.
• Evaluated parameters:
–
–
–
–
Maximum throughput;
Maximum datagram rate;
CPU load;
Datagram loss rate.
• Recently tested links:
– Gigabit Ethernet (presented at IEEE RT-05);
– 10-Gigabit Ethernet (presented at IEEE RT-09);
– Infiniband (2010).
• Choice of technology for the experiment must be
delayed as much as possible.
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
3
10-GbE Point-to-Point Tests
• We start technology evaluation from PC-to-PC tests.
– NIC mounted on the PCI-E bus of commodity PCs as
transmitters and receivers.
• In real operating condition, maximum transfer rate
limited not only by the capacity of the link itself, but
also:
– by the capacity of the data busses (PCI and FSB/QPI);
– by the ability of the CPUs and of the OS to handle packet
processing and interrupt rates raised by the network interface
cards in due time.
10GBase-SR
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
4
10-GbE Network I/O
• “Fast network, slow host” scenario:
– Already seen in transitions to 1 Gigabit Ethernet:
• 3 major system bottlenecks may limit the
efficiency of high-performance I/O adapters:
– The peripheral bus bandwidth:
• PCI-X (peak throughput 8.5 Gbit/s in 133 MHz flavor)
substituted by the PCI-E, (20 Gbit/s peak throughput in x8
flavor).
– The memory bandwidth:
• FSB has increased the clock from 533 MHz to 1600 MHz
and then substituted by AMD Hypertransport and Intel
QuickPath Interconnect.
– The CPU utilization:
• Multi-core architectures.
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
5
CPU Affinity Settings
10-GbE Receiver
Core
0
1
L2 Cache
0
Task
(IRQ + softIRQ) from Ethernet NIC
Receiver process
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
6
CPU Affinity Settings (II)
10-GbE Sender
Core
0
1
2
3
L2 Cache
0
1
Task
(IRQ + softIRQ) from Ethernet NIC
Sender process
Second sender process [2 sender tests]
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
7
UDP protocol
• UDP/IP protocol is the simplest IP protocol that can
be implemented in a FPGA.
– It does not hide the network problems at lower layers.
– SCTP/IP (Stream Control Transmission Protocol) could be
an alternative.
– TCP/IP is too complex:
• Need thousands of connections (and buffers) to be kept open on
the FPGA side.
• Too many mechanism which slow down the data flow to be tuned:
– Congestion control, slow start, sliding windows, retransmission timer,
Nagle’s algorithm, etc.
• Large protocol overhead.
• Retransmission timer to be tuned in order to keep the latency low.
• Experience in DAQ shows that a protocol stack as
complete as possible is very useful to simplify
debugging in commissioning phase:
– Including ARP, RARP, ICMP (ping), etc.
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
8
UDP – Standard Frames
•
•
•
User
System
IRQ
Soft IRQ
Total
1500 B MTU (Maximum Transfer Unit).
UDP datagrams sent as fast as they can be sent.
Bottleneck: sender CPU core 2 (sender process 100% system load).
~ 4.8 Gb/s
softIRQ
(4/5)
100%
(bottleneck)
3 frames
4 frames
2 frames
~ 440 kHz
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
IRQ
(1/5)
fake softIRQ
softIRQ
(~50%)
system
(~50%)
9
UDP – Jumbo Frames
•
•
User
System
IRQ
Soft IRQ
Total
9000 B MTU.
Sensible enhancement with respect to 1500 MTU.
3 PCI-E
frames
2 PCI-E
frames
~ 440 kHz
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
3 frames
4 frames
softIRQ
(4/5)
100%
(bottleneck)
3 frames
4 frames
2 frames
2 frames
2 PCI-E
frames
~ 9.7 Gb/s
IRQ
(1/5)
fake softIRQ
softIRQ
(~50%)
3 PCI-E
frames
system
(~50%)
10
UDP – Jumbo Frames
2 Sender Processes
•
•
•
User
System
IRQ
Soft IRQ
Total
Doubled availability of CPU cycles to the sender PC.
10GbE fully saturated.
Receiver (playing against 2 senders) not yet saturated.
3 frames
4 frames
2 frames
~3 KiB
~ 10 Gb/s
softIRQ
(4/5)
IRQ
(1/5)
~5 KiB
no more CPU bottleneck
softIRQ
(25-75%)
~3 KiB
2 frames
~ 600 kHz
3 frames
4 frames
fake
200%
(bottleneck) softIRQ
system
(75-90%)
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
11
R&D Project
• A R&D project (PRIN) has been funded by
Italian Education and Research Ministry
(MIUR):
– TeraDAQ:
• protype demonstrator of a high-performance data
acquisition system based on a PC cluster and using ultrahigh speed networking standards. The project targets
particle physics experiments on next-generation
accelerators of very high luminosity.
– INFN Bologna, Bologna University and Roma Tor
Vergata University.
– 51,700 €.
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
12
Electronics
• Evaluation kit Xilinx ML605:
– Equipped with last generation Virtex-6 Xilinx FPGA;
– FPGA Mezzanine Connector (FMC).
• Connectivity board FMC XM104:
– 10-GbE CX4 connector.
Xilinx Virtex-6 FPGA ML605 Evaluation board
FPGA Virtex-6 Xilinx
Software
VHDL
UDP/IP
Software
core
10-GbE
MAC
Software
core
XAUI
SERDES
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
Mezzanine
FMC
10 Gb/s
connector
CX4
FMC XM104
connectivity
card
Xilinx
10 GbE
10GBASE-CX4
(max 10 m)
PC
13
Electronics (II)
• FMC XM104 Connectivity Card:
– designed to provide access to eight
serial transceivers on the FMC HPC
connector found on Xilinx FMCsupported boards including Virtex-6
ML605.
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
ML605 board
14
Software
• XAUI SERDES and 10-GbE MAC:
– Available as evaluation software for free.
• UDP/IP:
– Evaluating possible solutions.
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
15
Domenico Galli
Dipartimento di Fisica, Alma Mater Studiorum - Università di Bologna and INFN, Sezione di Bologna
[email protected]
http://www.unibo.it/docenti/domenico.galli
Test Platform
Motherboard
IBM X3650
Processor type
Intel Xeon E5335
Procesors x cores x clock (GHz)
2 x 4 x 2.00
L2 cache (MiB)
8
L2 speed (GHz)
2.00
FSB speed (MHz)
1333
Chipset
Intel 5000P
RAM
4 GiB
NIC
Myricom 10G-PCIE-8A-S
NIC DMA Speed (Gbit/s) ro / wo /rw
10.44 / 14.54 / 19.07
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
17
Settings
net.core.rmem_max (B)
16777216
net.core.wmem_max (B)
16777216
net.ipv4.tcp_rmem (B)
4096 / 87380 / 16777216
net.ipv4.tcp_wmem (B)
4096 / 65536 / 16777216
net.core.netdev_max_backlog
250000
Interrupt Coalescence (μs)
25
PCI-E speed (Gbit/s)
2.5
PCI-E width
x8
Write Combining
enabled
Interrupt Type
MSI
DOMENICO GALLI - R&D on data transmission
FPGA → PC using UDP over 10-Gigabit
Ethernet
18