Transcript protocols
UNIC, a Linux Framework to Reach
Wire Speed Performances
on Ethernet Networks
Alain NINANE for the CMS DAQ Group
University of Louvain
DESY, 20/09/04
[email protected]
Outline
• Introduction to CMS DAQ
– Trigger and DAQ architecture
– Network requirements
• Review internals of modern OS architecture
– Memory managements & network protocols
• The UNIC architecture
– User level access to Network Interface Card
• Measurements
• Conclusions & Prospectives
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
2 of 45
Experiments at LHC
• CMS and ATLAS
– PP collisions
• LHCb
– CP violation in B-meson
decay
• ALICE
– Heavy-Ions collisions
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
3 of 45
Compact Muon Solenoid
Inner
Outer
Tracker
Pixel
Silicon
DESY 20 Sept 2004
Calorimeter
Electromagnetic
Hadron
UNIC - Wire Speed Performances on Ethernet
Muon
Detector
Diameter
15 m
Length
21 m
Weight
12500 T
4 of 45
CMS Physics Rates
40 MHz bunch crossing frequency
1034 cm-2s-1 luminosity
20 pp interaction every 25 ns
109 Hz pp collisions rate
Powerful event selection of 1 over 1013
“Interesting” physics ...
new particles 10-4 Hz
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
5 of 45
CMS Event Data
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
Sub-detector
KB
Tracker pixel
72
Tracker silicon
300
Preshower
110
Electromagnetic calorimeter
100
~ 1 MB of data
every 25 ns
Hadronic calorimeter
64
~ 40.000.000 MB/s
Muon system
22
Trigger
10
Powerful data rate reduction of 1 over 400 103
Disk/tape storage capacity
100 MB/s
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
DESY 20 Sept 2004
Offline computing power
UNIC - Wire Speed Performances on Ethernet
6 of 45
CMS Trigger and DAQ
•
Level 1 Trigger
–
–
–
–
Filtering by custom hardware
3.2 µs processing time
Data stored in pipeline memories
Maximum output rate 100 kHz
•
Event Builder (EVB)
•
High Level Trigger
–
–
–
–
100 KHz
Filtering by “COTS” computers
Near offline software algorithms
Full event data
~1s processing time
Data stored in RAM
Maximum output rate 100 Hz
DESY 20 Sept 2004
40 MHz
100 Hz
UNIC - Wire Speed Performances on Ethernet
7 of 45
CMS Event Builder Architecture
• Event data fragments from subdetectors are read and stored in
~650 FED memory systems
• Switching network connecting
data sources to data destinations
• Full data set of one event stored
in the memory system of a single
unit for HLT processing
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
8 of 45
CMS Event Builder Throughput
• 100 kHz x 1 MByte
– From ~ 500 readout links
– Links at 200 MByte/s
• 100 GByte/s
(1 Tbit/s)
– ~500 links at 200 MByte/s
– To HLT filtering system
• 100 Hz x 1MB to storage by
computing services
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
9 of 45
CMS Event Builder Baseline
• Hosts capabilities
– Receive, send, process data at 200 MByte/s
• Network switch capabilities
– Handle 500 data sources and 500 data sinks
– Aggregate throughput of 100 GByte/s
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
10 of 45
Technological Choice (I)
• Commercial solution • Private solution
– Use commercially
available hardware &
software
– Use widely accepted
standards
DESY 20 Sept 2004
– Custom made
hardware/software
– Application
dedicated protocol
UNIC - Wire Speed Performances on Ethernet
11 of 45
Technological Choice (II)
Topic
Public
Commercial/Standard
Private/Custom
Reliability
Flexibility
Performances
Evolutive
Vendor Independence
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
12 of 45
Intermediate solution
• Use widely available hardware like
Ethernet and publicly available, Open
Source, software like Linux
– Open Source …
Source code available
– Widely documented
Can be modified/adapted to fit particular
needs
Avoid to reinvent the whole wheel
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
13 of 45
Review of Internals of
Modern Operating Systems
• Memory Managements
• Network Protocols
Modern OS Architecture
User Programs
User Programs
Libraries
User Level
socket
network
protocols
network
interface
drivers
plain file
filesystem
cooked disk
interface
cooked tty
raw disk
interface
raw tty
interface
block buffer cache
block device drivers
line
discipline
process
control
subsystem
Kernel Level
scheduler
memory
management
character device drivers
Hardware Control
Hardware
DESY 20 Sept 2004
inter-process
communication
UNIC - Wire Speed Performances on Ethernet
Kernel Level
Hardware Level
15 of 45
Kernel / User Mode
• CPU in user mode
– Unprivileged CPU
instruction set
– Code written by
users and software
programmmers
(libraries)
– User process owned
memory space
DESY 20 Sept 2004
• CPU in kernel mode
– Unprotected CPU
instruction set
– Code written by
kernel developpers
and privileged users
– Kernel and user
memory space
UNIC - Wire Speed Performances on Ethernet
16 of 45
Roles of a Device Driver
• In the bottom half of the kernel
– Control and command the hardware
• In the top half of the kernel
– Manage data transfer between applications
(user space) and devices (kernel space)
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
17 of 45
Memory Management in Linux
Physical Addresses
User Virtual
Addresses
Physical
Memory
0xC238’0000
0x0238’0000
0x41F0’0000
Process 123
Kernel Virtual
Addresses
0x1000’0000
0x01F0’0000
0x0126’0000
Process 345
0x2126’0000
0x4000’0000
0x1000’0000
0x00F0’0000
0x1200’0000
0xC1F0’0000
Kernel Logical
Addresses
0x0000’0000
0xF100’0000
Device
Memory
DESY 20 Sept 2004
0xC000’0000
0xF000’0000
UNIC - Wire Speed Performances on Ethernet
18 of 45
Data Transfer Overhead (I)
• Problem 1
• Solution
– Copy of the data
between the user
and kernel
processes
– Can’t be avoided
easily
Synchronous for user
application
Asynchronous in the
kernel
DESY 20 Sept 2004
– Use capability of
device drivers to
remap memory
spaces (ioremap)
– Requires careful
programming
UNIC - Wire Speed Performances on Ethernet
19 of 45
Memory Mapping in Linux
Physical Addresses
User Virtual
Addresses
Physical
Memory
0x2126’0000
Kernel Virtuel
Addresses
0x0126’0000
Process 345
0x1000’0000
0x4000’0000
0x1200’0000
0x00F0’0000
0xC1F0’0000
Kernel Logical
Addresses
0x0000’0000
0xF100’0000
Device
Memory
DESY 20 Sept 2004
0xC000’0000
0xF000’0000
UNIC - Wire Speed Performances on Ethernet
20 of 45
Network Protocols
• Role of network protocols
– Provide communication and interoperability
between differents applications running on
different computers and operating systems
– Provide communication reliability, even for
applications running on top of unreliable
network layers
– Isolate network details from application
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
21 of 45
A Real Life Example
Linux DEC Alpha Workstation
4b
Mail Text File (a)
4a
3a
2a
2b
1b
1a
2c
2d
3b
2e
3d
3e
4c
1c
3c
Mail Text File (b)
MS Hotmail Web Server
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
22 of 45
Transmission Control Protocol
• TCP - a reliable stream transport service
–
–
–
–
–
Stream oriented
Virtual circuit connection
Buffered transfer
Unstructured stream
Full duplex connection
• Reliability
– Provided by a positive acknowledgement with
retransmission method
• TCP itself is based on top of another protocol
– Best known as IP, the Internet Protocol
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
23 of 45
Protocols Layering
• SMTP, HTTP, NFS, …
Application
Reliable Stream
TCP
User Datagram
UDP
Internet Protocol (IP)
Network
• TCP - Connected stream
• UDP - Connection less
• IP Datagram
• Ethernet, ATM, VMEbus, …
Physical Medium
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
24 of 45
Protocols Headers
Data Fragment Number
Acknowledgement Number
Source/Destination Port Numbers
Checksum
Application data
TCP Header
Application data
IP Header
TCP Header
Application data
IP Header
TCP Header
Application data
‘Next’ Protocol number
Size and checksum
Time to live
Source/Destination IP addresses
‘Next’ Protocol number
Size and CRC
Source/Destination Ethernet Addresses
Ethernet Header
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
25 of 45
Data Transfer Overhead (II)
• Problem 2
• Solution
– Protocols have been
designed to be general
Few buttons to tune to get
higher performances
– Overhead of network
protocols
Headers
Checksumming
Relies on the quality of the
software implementation
of the protocol
Copy of the data between
differents layers
DESY 20 Sept 2004
– Be less general
No need for a flexible
addressing system if
domain of application is
local
Benefits from the
homogeneity of your
hardware
– Implements an
application specific
protocol
Avoid copying of the
data between many
layers
Be fault tolerant
UNIC - Wire Speed Performances on Ethernet
26 of 45
The UNIC Framework
UNIC - User Level Access to NIC
• Avoid useless overhead in data copy
– Between user and kernel spaces
– Inside protocols
• Avoid overhead by protocols
– Allows event builder task to access the ethernet
frames directly
• The UNIC solution
– Use memory mapping between event builder
task and ethernet frames in kernel.
– Patch the ethernet device driver to use the
memory mapped frames.
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
28 of 45
Network Subsystem in Linux
STANDARD Arch.
UNIC Arch.
Application
Application
+ Protocol
Kernel Boundary
Systems Calls
TCP
Systems Calls
UDP
Kernel Top Half Layer
Protocols
IP
e1000
acenic
UNIC - Wire Speed Performances on Ethernet
syskonnect
Hardware
eepro100
DESY 20 Sept 2004
e1000
acenic
syskonnect
eepro100
NICs
Kernel Bottom Half Layer
UNIC Device Driver
gluecode to patched
Ethernet device driver
NICs
29 of 45
Zero-Copy Layer 2 Device Driver
STANDARD Arch.
DESY 20 Sept 2004
UNIC Arch.
UNIC - Wire Speed Performances on Ethernet
30 of 45
Patched Device Drivers
• Problem !
– Patch a device driver
• However
– The network subsystem is standardized
– Task is nearly repetitive on existing drivers
• “Augment” the standard control structure
– Socket buffer (skbuff) -----> unic slot
• Work done for:
Becker’s driver for Intel 100 Mbit (eepro100)
Syskonnect Gigabit (sk98lin)
Sorensen’s driver for Alteon Gigabit (acenic)
Intel 1000 Mbit (e1000)
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
31 of 45
Standard Ethernet Device Driver
Frames allocated “on the fly” by the device driver
Control structures are called socket buffer (skbuff)
• Tx: sendto()
• Rx: recvfrom()
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
32 of 45
Patched Ethernet Device Driver
Frames allocated statically and mapped by the application
Control structures are called unicslots
• Tx: ioctl()
• Rx: polling thread
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
33 of 45
Measurements & Performances
Event Builder Demonstrator
•
•
•
•
•
64 PCs - Supermicro 370 DLE - Serverworks LE chipset
Pentium III 750 MHz, 1000 MHz
PCI 64 bit/66 MHz
Linux kernel 2.4
Gigabit ethernet
– NIC: Alteon AceNIC (Copper UTP)
– Switch: 64 ports, FastIron-8000 from Foundry Networks
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
35 of 45
Streaming Tests
•
1 way point-to-point streaming
–
1 host sender to 1 host receiver
1 rail: 1 NIC / host
2 rails: 2 NICs / host
varying packet size up to MTU
• Drivers and protocols
– Standard
TCP/IP
Layer 2
•
Measurements
–
–
sockets
total saturation throughput measured
at the receiver side
bottleneck is the receiver
– Patched
packet losses: ~10 % with Layer 2
protocols (standard and patched
drivers)
Layer 2
zero-copy
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
36 of 45
Streaming - 1 rail
Streaming throughput vs packet size
140
TCP/IP
Layer 2 sockets
Layer 2 zero-copy
120
Throughput [MB/s]
100
80
60
40
20
0
0
200
400
600
800
1000
1200
1400
1600
Packet size [bytes]
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
37 of 45
Streaming - UNIC - 1 & 2 rails
Time/packet - Layer 2 zero copy driver
14
1 rail
12
2 rails
Time/packet [us]
10
116 MB/s
8
6
230 MB/s
4
2
0
0
128
256
384
512
640
768
896
1024
1152
1280
1408
1536
Packet size [bytes]
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
38 of 45
EVB Protocol
Protocol with destination based traffic shaping
Builder units request events at event manager
Builder units reads fragments from readout units sequentially
Builder units process several events simultaneously
Application level reliability
accounts of packet losses, ...
Acronyms
RU : readout units
BU : builder units
EVM : event manager
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
39 of 45
EVB - TCP/IP Performances
31 x 31
Event building performance
measurements
N x N setup
Fragments size generated
according to log-normal distribution
average 16 kB rms 8 kB
Performance results
75 MB/s for 16 kB
Scalable with N
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
40 of 45
EVB - UNIC performances
Event building performances
31 x 31
N x N setup
Fragments size
generated according to
log-normal distribution
Performance results
average 16 kB rms 8 kB
Maximum between 8-20 kB
1-rail : 115 MB/s
2-rails : 220 MB/s
Scalable with N
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
41 of 45
Conclusion &
(Currents & Future)
Developments
Conclusion
• Goal of 200 MB/s is reached !
– However, maintenance has to be made
over years and years
• TCP/IP is still investigated together with
the UNIC driver
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
43 of 45
Current Developments
• TCP/IP implementation are improving
– Kernel 2.4 -> 2.6
zero copy inside the protocol
– More “buttons”
Linux Advanced Routing & Traffic Control
– Jumbo frames (MTU now up to 9 kB)
Support of jumbo frames in switches ???
• UNIC has been ported to Intel e1000
– Tests on a NIC with 4 rails
327 MByte/s standards frames
364 MByte/s jumbo frames
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
44 of 45
The Complete Story
• Ethernet is not the unique networking
technology planned to be used in the Event
Builder
– Myrinet
Higher performances (native 250 Mbyte/s)
Cheaper switches but expensive NIC
Depends only on a single manufacturer/vendor
– Role of this Ethernet and TCP/IP study shows that
they are still valuable candidates … may be just
as a backup solution !!
DESY 20 Sept 2004
UNIC - Wire Speed Performances on Ethernet
45 of 45