transparencies - Indico

Download Report

Transcript transparencies - Indico

Farm infrastructure
for the
Real Time Trigger Challenge
Umberto Marconi
INFN Bologna
CERN, 16th May 2004
1
Introduction and outline

Target: simulate “as realistic as is possible” the full
software trigger chain


Use Monte Carlo data (filtered by the L0 algorithm) to feed
the farm prototype
Ideas on how to feed the sub-farm modules to emulate the
L1&HLT trigger data sources


Mid of 2005 10% of the network (farm) infrastructure
should be in place




Will focus on L1, but ideas can be generalized to include HLT quite
naturally
However, all the ideas discussed in this talk will refer to the
usage of just one sub-farm module
Then everything can be scaled up according to the available
resources
Monitoring and control issues
How to efficiently employ the hardware resources
available in Bologna to be prepared for this challenge?
2
Sub-farm Configuration





1 Service PC, providing
network boot services,
central syslog, time
synchronization, NFS
exports, etc.
1 diskless SFC
n diskless SFNs
Root fs mounted RAM disk
(download of kernel and
compressed RAM disk image
from network at boot time)
Application software
directories (e.g. /usr and
/home) mounted via NFS
from the service PC
Network
SFC
SPC
SWITCH
SFN
SFN
SFN
3
Data Sizes


Suppose we employ 150x106 simulated minimum
bias events
After the L0 selection we will have order of
9x106 events


The data sample to feed the L1 trigger has a
size of 4.5kB x 9 x 106 = 41 GB


Assuming an L0 efficiency of 0.06 on simulated
minimum bias events
L1 event size ~4.5 kB
The data sample to feed the HLT trigger has a
size of 45kB x 9 x 106 = 410 GB

HLT event size ~45 KB
4
Feeding a sub-farm module
UDP broadcast



Suppose we want to use
commodity PCs as data
sources
Control Node
broadcasts UDP packets
to sender nodes to start
MEP transmissions to
the SFC
This can be scaled up to
more than one sub-farm
CN
S1
S2
Sn
SWITCH
Module
SFC
SWITCH
SFN
SFN
SPC
SFN
5
L1 running conditions
for one sub-farm controller







L0=1.1 MHz: L0 event
 L0
1.1 MHz
output rate


 460 Hz
m  N SFC
25  94
m=25: L1 MEP packing
factor
T  2.14 ms
NSFC=94: #sub-farms
0.9 ms
: SFC L1 input rate
L1ES=4.5 kB: L1 event size
t: L1 data transfer time
IT: average SFC L1 input
t
throughput
2.14 ms
m  L1ES 25  4.5  8  103
t 

 0.9 ms
9
1 Gb / s
10
IT    m  L1ES  52 MB / s
6
Data structure

Steps to be done to prepare the Monte Carlo data:
Strip the events by the L0 trigger reducing the data sample
Rearrange the files in proper structure to be stored on each
sender (1 sequential events file126 files = # L1 data sources)


Events
1
2
25
3
MEP
25 fragments
1st
Reside on sender PC S1


126th
2nd
Reside on sender PC Sn
Each L1 data file will have a size of ~325 MB (41 GB/126files)
These files should be stored on each sender node

grouped according to the number of sender nodes
7
Number of senders
per sub-farm module



I assume we won’t have 126 senders next year
Anyhow we can effectively test the system with a
reduced number of sender PCs
How to evaluate the number of senders?


A modern ULTRA/ATA IDE disk can easily provide a sustained
throughput of about 40 MB/s (single sequential read)
Allowing for a large safety factor, a sender can send data at a
rate of vs10 MB/s, then to feed the sub-farm module one needs
a minimum number of senders ns satisfying the inequality

Assuming to use a bufferization in the memory of the sender of order of 1GB,
the access and seek time of a disk is completely negligible
ns  v s
 L0
 
m  L1 ES
m  N SFC
ns  5.2
8
Example
with ns=10 and 1 SF

A simple model: 3 threads running on each sender




One thread reading from disk and writing to dual buffered shared
memory
The other thread reading from the shared memory and sending data
to the network
This model can be oversimplified in case the memory of the sender is
sufficient to store the whole data sample, but this model allows also
for larger event numbers independently of the sender RAM
The arbitration of the shared memory has to be defined

A third thread controlling the other two ones
Sender PC
Thread 3
To the network
  m  L1ES
ns
 5.2 MB / s
Thread 1
1GB
Shared
Memory
1GB
Shared
Memory
UDP Broadcast sync message
(=460Hz)
13 files x
77MBx13 each fill
325 MB
Thread 2
can sustain 10MB/s
9
Monitoring and control
issues


PVSS-DIM based project started to allow monitoring and control of
the farm nodes
All the relevant quantities useful to diagnose hardware or
configuration problems should be traced












CPU fans and temperatures
Memory occupancy
RAM disk filesystem occupancy
CPU load
Network interface statistics, counters, errors
TCP/IP stack counters
Status of relevant processes
Network Switch statistics (via the SNMP-PVSS interface)
… plus many other things to be learnt by experience (and discussions!)
Information should be viewed as actual values and/or historical trends
Alarms should be issued whenever relevant quantities don’t fit in
allowed ranges
For mid of next year the project should be mature enough to be
efficiently used in a such a production-like environment
10
Bologna Testbed Farm
Hardware

2 Gigabit Ethernet switches


16 1U rack-mounted PCs







2x(3Com 2824), 2x24 ports
Dual Intel Xeon 2.4 GHz with HyperTrading
2 GB of RAM
160 GB IDE disk
(but machines operate diskless)
1 Fast Ethernet and 3 Gigabit Ethernet
adapters
64 bits/133 MHz PCI-X bus
1 TB RAID5 disk array with Adaptec RAID
controller and 10 Ultra320 SCSI disks
We should discuss and agree a roadmap to
efficiently use such a system and perform
useful measurements in the preparation
phase towards the Real Time Trigger
Challenge!
11