DAQ-ACES-07 - Indico

Download Report

Transcript DAQ-ACES-07 - Indico

DAQ upgrades at SLHC
S. Cittolin CERN/CMS, 22/03/07
DAQ architecture. TDR-2003
DAQ evolution and upgrades
DAQ baseline structure
Parameters
TDR design implementation
Collision rate
40 MHz
Level-1 Maximum trigger rate
100 kHz
Average event size
≈ 1 Mbyte
Data production
≈ Tbyte/day
No. of In-Out units
Readout network bandwidth
Event filter computing power
No. of PC motherboards
512
≈ 1 Terabit/s
≈ 106 SI95
≈ Thousands
(SLHC: x 10)
S. Cittolin CMS/PH-CMD
Evolution of DAQ technologies and structures
PS:1970-80: Minicomputers
Readout custom design
First standard: CAMAC
• kByte/s
Detector Readout
Event building
On-line processing
Off-line data store
LEP:1980-90: Microprocessors
HEP standards (Fastbus)
Embedded CPU, Industry standards (VME)
• MByte/s
LHC: 200X: Networks/Grids
IT commodities, PC, Clusters
Internet, Web, etc.
• GByte/s
S. Cittolin CMS/PH-CMD
DAQ-TDR layout
Architecture: distributed DAQ
An industry based Readout Network with Tbit/s aggregate
bandwidth between the readout and event filter DAQ nodes
(FED&FU) is achieved by two stages of switches interconnected by
layers of intermediate data concentrators (FRL&RU&BU).
Myrinet optical links driven by custom hardware are used as FED
Builder and to transport the data to surface (D2S, 200m)
Gb Ethernet switches driven by standard PCs is used in the event
builder layer (DAQ slices and Terabit/s switches).
Filter Farm and Mass storage are scalable and expandable
clusters of services
Equipment replacements (M+O):
FU-PCs every 3 years
Mass storage 4 years
CMS/SC TriDAS Reviw DAQ-PC
AR/CR05
&networks
5 years
LHC --> SLHC
•Same level 1 trigger rate, 20 MHz crossing, higher occupancy..
•A factor 10 (or more) in readout bandwidth, processing, storage
•Factor 10 will come with technology. Architecture has to exploit it
•New digitizers needed (and a new detector-DAQ interface)
•Need for more flexibility in handling the event data and operating the
experiment
S. Cittolin CMS/PH-CMD
5
Computing and communication trends
1000Gbit/IN2
1Gbit/IN2
10Mbit/IN2
2020
2015 SLHC
S.C. CMS-TriDAS 05-Mar-07
Industry Tbit/s (2004)
Text
Text
http://www.force10networks.com/products/reports.asp
•Above is a photograph of the configuration used to test the performance and resiliency metrics of the Force10
TeraScale E-Series. During the tests, the Force10 TeraScale E-Series demonstrated 1 billion packets per second
throughput, making it the world's first Terabit switch/router, and ZERO packet loss hitless fail-over at
Terabit speeds. The configuration required 96 Gigabit Ethernet and eight 10 Gigabit Ethernet ports from
Ixia, in addition to cabling for 672 Gigabit Ethernet and 56 Ten Gigabit Ethernet.
S.C. CMS-TriDAS 05-Mar-07
LHC-DAQ: trigger loop & data flow
BU
DCS
EVB
EVM
FU
FRL
FB
FES
FEC
FED
TPG
RTP
RU
Builder Unit
Detector Control System
EVent builder
Event manager
Filter Unit
Front-end Readout Link
FED Builder
Front End System
Front End Controller
Front End Driver GTP
Global Trigger Processor LV1
Regional Trigger Processors
Trigger Primitive Generator TTC
TIming, Trigger and Control TTS
Trigger Throttle System RCMS
Run Control and Monitor
Regional Trigger Processor
Readout Unit
S.C. CMS-TriDAS
BU
DCS
EVB
EVM
FU
FRL
FB
FES
FEC
FED
GTP
LV1
TPG
TTC
TTS
RCMS
RTP
RU
Builder Unit
Detector Control System
from GTP to FED/FES
EVent builder
distributed
by an optical tree (~1000 leaves)
- Event
LV1 20 manager
MHz rep rate
- Filter
Reset,Unit
B cmd 1 Mhz Rep rate
- Front-end
Controls data
1 Mbit/s Link
Readout
FED Builder
Front End System
Front(N
End
TTS
toController
1)
Frontfrom
End FED
Driver
sTTS:
to GTP
collected
by a FMM
tree (~ 700 roots)
Global Trigger
Processor
aTTS:
FromTrigger
DAQ nodes
to TTS
Regional
Processors
via service network (e.g. Ethernet) (~ 1000s nodes)
Trigger Primitive Generator
Transition in few BXs
TIming, Trigger and Control
Trigger Throttle System
Run Control and Monitor
FRL&DAQ
networks (NxN)
Regional Trigger Processor
Data to Surface (D2S). FRL-FB
Readout
Unit
1000
Myrinet
links and switches up to 2Tb/s
TTC (1 to N)
Readout Builders (RB). RUBU/FU
3000 ports GbEthernet switches 1Tb/s sustained
LHC-DAQ: event description
FED-FRL-RU
TTC rx
Event No. (LOCAL)
Orbit/BX (LOCAL)
Event No. (LOCAL)
Orbit/BX (LOCAL)
Event No. (LOCAL)
Orbit/BX (LOCAL)
FED data fragment
FED data fragment
FED data fragment
Readout Unit
from EVM
EVB token
BU destination
Event fragments
FED data fragment
Filter Unit
from GT Event No. (LOCAL)
data
Time (ABSolute)
Run Number
Event Type
HLT info
Full event data
Built events
Storage Manager
Event #(Central)
Time (ABSolute)
Run Number
Event Type
HLT info
Full Event Data
S.C. CMS-TriDAS
9
SLHC: event handling
Event #(Central)
Time (ABSolute)
Run Number
Event Type
HLT info
•TTC: full event info in each
event fragment distributed
from the GTP-EVM via TTC
Full Event Data
EVM
•FED-FRL interface e.g. PCI
•FED-FRL system able to handle high level data
Network
Distributed clusters
and computing
services
S.C. CMS-TriDAS
communication protocols (e.g. TCP/IP)
•FRL-DAQ commercial interface
e.g. Ethernet (10 GBE or multi-GBE) to be used for
data readout and FED configuration, control, local
DAQ etc.
New FRL design can be based on embedded PC
(e.g. ARM like) or advanced FPGA etc.
The network interface can be used in place of VME
backplane to configure and control the new FED.
Current FRL-RU systems can be used (with some
limitations) for those detector not upgrading their
FED design.
10
SLHC DAQ upgrade (architecture evolution)
Another step toward uniform network architecture:
•All DAQ sub-systems (FED-FRL included)
are interfaced to a single multi-Terabit/s
network. The same network technology (e.g.
ethernet) is used to read and control all DAQ
nodes (FED,FRL, RU, BU, FU, eMS, DQM,...)
•In addition to the Level-1 Accept and to the other synchronous commands, the
Trigger&Event Manager-TTC transmits to the FEDs the complete event description such as
the event number, event type, orbit, BX and the event destination address that is the
processing system (CPU, Cluster, TIER..) where the event has to be built and analyzed....
•The event fragment delivery and therefore the event building will be warranted by the
network protocols and (commercial) network internal resources (buffers, multi-path,
network processors, etc.). FED-FU push/pull protocols can be employed...
•Real time buffers of Pbytes temporary storage disks will cover a real-time interval of
days, allowing to the event selection tasks a better exploitation of the available distributed
processing power
S.C. CMS-TriDAS 05-Mar-07
11
SLHC: trigger info distribution
Trigger synchronous data (20 MHz) (~ 100 bit/LV1)
Clock distribution & Level 1 accept
Fast commands (resets, orbit0, re-sync, switch context etc...)
Fast parameters (e.g. trigger/readout type, Mask, etc.)
Event synchronous data packet (~ 1 Gb/s, few thousands bits/LV1)
Event Number
Orbit/time and BX No.
Run Number
Event type
Event destination(s)
Event configuration
64 bits as before local counters will be used to sychronize and check...
64 bits
64 bits
256 bits
256 bits e.g. list of IP indexes (main EVB + DQMs..)
256 bits e.g. information for FU event builder
Event fragment will contain the full event description etc.. Events can be built in Push or Pull mode. E.g.
1. and 2. will allow many TTC operations be done independently per TTC partition (e.g. local re-sych,
local reset, partial readout etc..)
aSynchronous control information
Configuration parameters (delays, registers, contexts, etc.)
Maintenance commands, auto-test.
Additional features? Readout control signal collector (a la TTS)?
sTTS replacement. Collect FED status (at LV1 rate from >1000 sources?)
Collect local information (e.g. statistics etc.) on demand?
Maintenance commands, auto-test, statistics etc.
Configuration parameters (delays, registers, contexts, etc.)
S.C. CMS-TriDAS 05-Mar-07
Summary
DAQ design
Architecture: will be upgraded enhancing scalability and flexibility and exploiting
at maximum (by M+O) the commercial network and computing technologies.
Software: configuring, controlling and monitoring a large set of heterogeneous
and distributed computing systems will continue to be a major issue
Hardware developments
Increased performances and additional functionality's are required by the event
data handling (use standard protocols, perform selective actions etc.)
- A new timing, trigger and event info distribution system
- A general front-end DAQ interface (for the new detector readout electronics)
handling high level network protocols via commercial network cards
The best DAQ R&D programme for the next years is the
implementation of the 2003 DAQ-TDR
Also required a permanent contact between SLHC developers and LHC-DAQ
implementors to understand and define the new systems scope, requirements
and specifications
S.C. CMS-TriDAS 05-Mar-07
13
DAQ data flow and computing model
Event rate
Level-1
HLT output
EVENT BUILDER NETWORK
Multi Terabit/s
Filter Farms & Analysis
S.C. CMS-TriDAS 05-Mar-07
DQM&Storage servers
FRL direct network access