Eventbuilding with Smart NICs

Download Report

Transcript Eventbuilding with Smart NICs

Event Building With
Smart NICs
Jean-Pierre Dufey, Beat Jost,
Niko Neufeld & Marianna Zuin
DAQ 2000
Lyon, October 20, 2000
1
Timing
&
Fast
Control
Throttle
Front-End Multiplexers (FEM)
Front End Links
RU
SFC
Variable latency
L2 ~10 ms
L3 ~200 ms Storage
RU
RU
LAN
Recap: LHCb DAQ System
Read-out units (RU)
6 GB/s
Read-out Network (RN)
6 GB/s
SFC Sub-Farm Controllers (SFC)
CPU
CPU
CPU
CPU
Niko NEUFELD
CERN, EP
Trigger Level 2 & 3
Event Filter
Control
50 MB/s
&
Monitoring
2
Event Building Components
• Readout units (RU): multiplexing of
front-end links, destination
assignment
• Switching read-out network
• Sub-farm controllers (SFC): event
building and event dispatching
Niko NEUFELD
CERN, EP
3
Event Building Properties
• Static load balancing among the SFCs
– RUs send round robin to destinations
destination = f(event_number)
f being the same for all RUs
• Pure push protocol
– congestions handled via flow control and more
importantly by throttling
• Distributes the event data flow of 6 GB/s
from m sources to n destinations, each of
which has to handle O(1Kb) fragments at 80
kHz
Niko NEUFELD
CERN, EP
4
Why Use Smart NICs?
 Modern Smart NICs
are powerful
embedded computers
 Off-load general
purpose CPU
 Take advantage of
cheap CPU power on
the NIC
 Facilitate hardware
design of the RU
 (Yet) limited CPU
power compared to
commodity PC
 No guarantee that
high-end NIC
development will
continue in this
direction
(firmware/CPU vs.
ASIC/FPGA)
Niko NEUFELD
CERN, EP
5
Alteon Tigon 2
•
Features
•
Development environment
Niko NEUFELD
CERN, EP
– Dual R4000-class
processor running at 88
MHz
– Up to 2 MB memory
– GigE MAC+link-level
interface
– PCI interface
– GNU C cross compiler with
few special features to
support the hardware
– Source-level remote
debugger
6
Test Setup
CPU
CPU
Mem
GbE
NIC
GbE
NIC
NIC
NIC
PCI
Mem
PCI
CERN Network
PC/Linux
PC/Linux
Niko NEUFELD
CERN, EP
7
NIC 2 NIC Performance
Nic 2 Nic throughput vs framesize
Throughput [Bytes/us]
140.0000
x
120.0000
y =
100.0000
a = 0.2 μs
a + b
•
max( x , c )
b = 125B/μs
c = 64.0 Bytes
80.0000
60.0000
Data
40.0000
Fit
20.0000
Extrapolation w/o
min frame size
0.0000
1
10
100
1000
10000
Framesize [bytes]
Niko NEUFELD
CERN, EP
8
Performance of Alteon NIC
• Can fill the wire at any given frame
size (from 64 to 9000 bytes)
• Can send out frames at a frequencies
of up to 1.4 MHz
• For frames bigger than 512 bytes
more than 95% of nominal bandwidth
available for data (practically 100% for
>8000 Jumbo frames)
Niko NEUFELD
CERN, EP
9
Event Building Algorithm
• Assembles events out of fragments from a
•
•
•
known number of sources
Handles an adjustable amount of events
concurrently (limited only by buffer space)
Implements “Implicit + Time-out Completion”
Uses “scatter/gather” capabilities of NIC’s
DMA engine to concatenate the fragments
into the host’s memory
Niko NEUFELD
CERN, EP
10
Algorithm
Start
Procedure
Polling
New fragment
New
event fragment
NO
NO
Fragment
out of time
Event
still in the
table ?
?
YES
YES
Add new event
descriptor
Collect the
fragment
Decrement
sources
Check for missing fragments
in previous events
Niko NEUFELD
CERN, EP
11
PC Test Implementation
400 MHz
PIII VC++ 5.0
Simple time-out / Event # on top
T/frg (microseconds)
2.5
2
2 2
1.5
1.4
1.64
1.6
1.61
1
0.5
0
500000
10500000
20500000
30500000
40500000
50500000
60500000
70500000
80500000
90500000
100500000
Generated fragments
Niko NEUFELD
CERN, EP
12
Performance NIC 2 NIC
Events/second
100000
Average time per fragment
11.65 us
10000
1000
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
Number of sources
Niko NEUFELD
CERN, EP
13
Summary
• Event building on a smart NIC at a
frequency of incoming fragments of
almost 100 KHz has been
demonstrated
• Event building at Gigabit speed for
fragments bigger than ~1100 bytes
• Code Optimization ongoing (9 us/frag
have already been achieved)
Niko NEUFELD
CERN, EP
14
Program of Work
• Evaluate impact of interrupt coalescence
•
•
•
on SFC performance
Study possibility of handling some amount
of TCP/IP traffic on the outgoing link of
the SFC (events to storage)
“Real world” tests on a Gigabit Ethernet
switching network
Use measured parameters in a detailed
simulation of the readout network
Niko NEUFELD
CERN, EP
15