Transcript Document

1
A DAQ Architecture for the Agata
experiment
Gaetano Maron
INFN – Laboratori Nazionali di Legnaro
GM, Agata Meeting, Padova, May 2003
2
Outline
• On-line Computing Requirements
• Event Builder
• Technologies for the On-line System
• Run Control and slow Control
• Agata Demonstrator 2003-2007
• Off-line Infrastructure
• Conclusions
GM, Agata Meeting, Padova, May 2003
Agata Global On-line Computing
Requirements
3
Front-end electronic
and pre-processing
1000 Gbps (4 Gbps x 200)
Pulse Shape Analysis
1.5 x 106 SI95(?)
(present algorithm)
Max 10 Gbps (50 Mbps x 200)
Event Builder
5 x 103 SI95
10 Gbps
Tracking
3 x 105 SI95 (no GLT)
3 x 104 SI95 (GLT @ 30 kHz)
1 Gbps
SI95 = SpecInt 95
1 SI95 = 10 Cern Unit = 40 MIPS
GM, Agata Meeting, Padova, May 2003
GLT = Global Level Trigger
Storage
4
Pulse Shape Analysis Farm
PREAM
FADC
FADC
DSP
DSP
DSP
DSP
FPGA
Mux
PC
Detector i=1-200
PSA Farm
1 Gbps x 200
50 Mbps x 200
1.5 MSI95 (now)
GM, Agata Meeting, Padova, May 2003
5
Event Building: simple case
Clock
Time Slot
T01
1 T02
T03
T04
Time Slot 2
T05
T06
T07
T08
Time Slot 3
T09
T10
T11
T12
R1
R2
R3
R200
T01
T02
T01
T02
T04
T01
T03
T01
T03
T07
T08
T07
T05
T06
T07
T09
T05
T06
T09
T10
T12
T09
T12
T09
T12
Builder Network (10 Gbps)
BU1
Where n could range (now) from 10 to 15
according to the efficiency of the
event builder algorithm and to the
communication protocol used
GM, Agata Meeting, Padova, May 2003
PSA Farms
BU2
BUn
Builder Units
In the final configuration (after 2005) we could image
To have a single 10 Gbps output link and a single BU
6
MUX
Ev##
Ev
Ev
#
Ev
Ev ###
Ev
Ev##
Ev
Ev
#
MUX buffers events according to a given rule
and then define the Time Slot
MUX assigns a buffer # to this collection of events
Buf #
MUX distributes the buffers to PSA farm
According to their buffer #
Ev##
Ev
Buf
#
Ev##
Ev
Buf
#
Ev##
Ev
Buf
#
EB #
PSA farm shrink down the incoming buffers,
A further buffering is then needed
PSA assigns a EB (Event Builder) # to the new buffers
PSA distributes the buffer to Event Builder farm
According to such number.
Slice 1
Slice 2
PSA FARM
Detector Slice
EB FARM
GM, Agata Meeting, Padova, May 2003
Slice 200
Det i
TTC assigns EV #
(time stamp)
Ev #
Ev #
Ev #
All this is synchronous for all the detectors
Event Merging in the EB Farm is then feasible
Time Slot Assignment
Agata Event Builder: some more
requirements
Clck
Time Slot
T01
1 T02
T03
T04
Time Slot 2
T05
T06
T07
T08
Time Slot 3
T09
T10
T11
T12
R1
R2
R3
R200
T01
T02
T01
T02
T04
T01
T03
T01
T03
T07
T08
T07
T05
T06
T07
T09
T05
T06
T09
T10
T12
Readout Farms
T09
T12
T09
T12
Builder Network (10 Gbps)
BU1
BU2
BU3
Builder Units
- delayed coincidence can span more time slots.
- fragments of the same events are in different BUs.
GM, Agata Meeting, Padova, May 2003
7
8
HPCC for Event Builder
Builder Network
- High speed links (> 2 Gbps)
- low latency switch
- fast inter processor comm
- low latency message
passing
GM, Agata Meeting, Padova, May 2003
High Performance
Computing and
Communication
(HPCC)
System
9
Agata on-line system
Front-End
F1
F2
F3
F200
PSA Farm
R1
R2
R3
R200
HPCC builder
10 Gbps
Builder Network
Event Builder
B1
1000 Gbps
B20
B2
10 Gbps
Tracking Farm
1 Gbps
Data Servers
ds1
ds2
ds3
ds4
Storage (1000 TB)
GM, Agata Meeting, Padova, May 2003
1 Gbps
100 Mbps
> 1 Gbps
10
Technologies for Agata On-line System
•
•
•
•
•
Networking Trends
Event Builder
CPU Trends
Building blocks for the Agata Farms
Storage Systems
GM, Agata Meeting, Padova, May 2003
11
Networking Trends - I
Infiniband
10 GbEth
256 Gbps
•
•
Local networking is not an issue.
Already now Ethernet fits the future
needs of the NP experiments
– link speed max 1 Gbps
– switch aggregate bdw O(100) Gbps
– O(100) Gbit ports per switch
– O(1000) FastEthernet per switch
Myrinet
192 Gbps
GigaEthernet
128 Gbps
64 Gbps
If HCCP is requested (e.g. Agata builder
farm) options are Myrinet, Infiniband = 4
x Myrinet
Myrinet one way latency
10 msec latency time
GM, Agata Meeting, Padova, May 2003
Aggregate bandwidth
within a single switch
2000
2001
250 MB/s
Myrinet
throughput
2002 2003
12
Networking Trends II: Infiniband
Storage
Target
Channel based
message passing
1
CPU
Mem
Cntlr
HCA
Link
Internet
Intranet
Link
TCA
Link
Link
3
GM, Agata Meeting, Padova, May 2003
Mem
Cntlr
HCA
HCA
Link
Switch
Link
Link
TCA
N/W
Target
Link
Link
1000’s node
per subnet
xCA
n CPU
Mem
Cntlr
Router
HCA
110 mm
New server form factor
about 300-400 box per rack
CPU
Mem
Cntlr
Link
Same network
to transport low
latency ipc, storage I/O
and network I/O
CPU
Link
2
222 mm
1 x Link 4 x Link
12 x Link
Link speed
1x 2.5 Gbps
4x 10.0 Gbps
.
12x 30 Gbps
13
Event Builder and Switch Technologies
GM, Agata Meeting, Padova, May 2003
14
CMS EVB Demonstrator 32x32
CMS
GM, Agata Meeting, Padova, May 2003
15
Myrinet EVB (with Barrel Shifter)
CMS
GM, Agata Meeting, Padova, May 2003
16
Raw GbEth EVB
CMS
GM, Agata Meeting, Padova, May 2003
17
GbEth full Standard TCP/IP
CMS
CPU Load 100 %
GM, Agata Meeting, Padova, May 2003
18
TCP/IP CPU Off-loading - iSCSI
• Internet SCSI (iSCSI) is a standard protocol for encapsulating SCSI
command into TCP/IP packets and enabling I/O block data transport
over IP networks
• iSCSI adapters combines NIC and HBA functions.
Network Interface Card
Application
Layer
Storage HBA
FC Storage
iSCSI Adapter
Block
Block
File
IP Server
FC Server
Block
Driver
Layer
1.
2.
3.
take the data in block form
handle the segmentation
and processing with
TCP/IP processing engine
send IP packets across
the IP network
IP Server
Block
I 80200
Processor
IP Packets
Link
Layer
IP Packets
on Ethernet
GM, Agata Meeting, Padova, May 2003
FC Packets
IP Packets
on Ethernet
Intel GE 1000 T
IP Storage Adapter
19
Comments on the Agata Event Builder
• Agata Event Builder is not an issue (also now). CMS experiment has
already shown the ability to work with an order of magnitudine better
that the Agata requirements
• Agata could work, also in the prototype, fully on standard TCP/IP
• Agata could require an hpcc based Event Builder. Technologies
already exist for that, but never applied to event builder problems.
Should not be a big issue
– Myrinet (now)
– Infiniband (soon)
GM, Agata Meeting, Padova, May 2003
20
Processors and Storage trends
80 SI95 Now
2007 CPU = 250 SI95
2010 CPU = 700 SI95
GM, Agata Meeting, Padova, May 2003
250 GB Now
Year 2007
1 disk = 1 TByte
21
Building Blocks for the Agata Farms
1 U CPU Box with 2 processors
40 Boxes x Rack
Configurations
2004
1 Detector
Year
SI95 x Box
2004
200
2007
500
2010
1500
2007
15 Detectors
2010
200 Detectors
Nr.
Boxes
Nr.
Racks
Nr.
Boxes
Nr.
Racks
Nr.
Boxes
Nr.
Racks
35
1
200
5
1000
25
-
-
2
1/20
10
1/4
Track. Farm
No GLT
-
-
40
1
200
5
Track, Farm
GLT
-
-
4
1/10
20
1/2
Farms Type
PSA Farm
Builder Farm
GM, Agata Meeting, Padova, May 2003
22
Blade Based Farms
1 Blade Box with 2 processors
14 Boxes x crate (7 U)
6 Blade crates x rack = 108 Boxes
Power = 16 KW x Rack
Configurations
30 backplane
Gbps backplane
SW2
SW1
2004
1 Detector
2007
15 Detectors
2010
200
Detectors
Nr.
Blades
Nr.
Racks
Nr.
Blades
Nr.
Racks
Nr.
Blades
Nr.
Racks
35
1/3
200
2
1000
10
-
-
2
<
crate
10
<
crate
Track. Farm
No GLT
-
-
40
2/5
200
2
Track, Farm
GLT
-
-
4
<
crate
20
1/5
Farms Type
PSA Farm
2 x 4 x 1 Gbps uplinks
Builder Farm
GM, Agata Meeting, Padova, May 2003
23
On-line Storage
•
On-line Storage needs
– 1-2 weeks experiments
– Max 100 TByte / experiment (no GBT)
– Max 1000 TByte/year
– 2010 1 disk = 4 Tbyte
– Storage Agata System: 250 disks (+ 250 for mirroring)
•
Archiving
– O (1000) TB per year can not be handled as normal flat files
– Not only physics data stored
• run conditions
• Calibration
– Correlation between physics data, calibration and run condition are
important for off-line analysis
– Data Base technology already plays an important role in physics data
archiving (Babar, LHC experiments, etc.). Agata can exploit their
experience and development
GM, Agata Meeting, Padova, May 2003
24
Storage Technologies Trends
Application Servers
GEth/iSCSI
Infiniband
Data Servers
SAN enabled disk array
gateway
Commodity Storage Area Network share all the
farms nodes. Technololgies interested for us are:
- iSCSI over Giga (or 10 Giga) Ethernet
- Infiniband
Full integration between the SAN and the farm is
realized if a Cluster File System is used.
Example of Cluster File System are:
- LUSTRE (www.lustre.org)
- STORAGE TANK (IBM)
GM, Agata Meeting, Padova, May 2003
25
Example of a iSCSI SAN available today
Application Servers
GEth/iSCSI
Data Servers
LSI logic iMegaRAID 1
2 x GE
Host adapters:
- Intel GE 1000 T
- Adaptec ASA-7211
- LSI 5201
- ecc.
SATA
= ~ 5 Tbyte x controller
16
iSCSI Controller
RAID – SATA Controller
SATA = Serial ATA
GM, Agata Meeting, Padova, May 2003
26
Data Archiving
Input
Load balancing
switch
Data Servers
Low latency Interconnection
(e.g. HPCC)
Internet
Intranet
Shared Data
Caching
(Oracle)
Storage Area Network
Scalability
GM, Agata Meeting, Padova, May 2003
27
Run Control and Slow control
Front-end electronic
and pre-processing
Pulse Shape Analysis
Run Control and
Monitor
System
Slow Control
Event Builder
Tracking
Storage
GM, Agata Meeting, Padova, May 2003
Run Control and Slow Control
Technological Trends
• Virtual Counting Room
• Web based technologies
– SOAP
– Web Services and Grid Services (Open Grid Service Architecture)
– Data Base
• Demonstrators in operation at CMS test beams facilities
GM, Agata Meeting, Padova, May 2003
28
29
RCMS present demonstrators
Java
MySQL
Java TomCat
Containers
Or
Grid Services
SOAP
GM, Agata Meeting, Padova, May 2003
30
Slow Control Trends
•
Ethernet every where
•
Agata could be fully controlled by
Ethernet connections, including the
front end electronics
•
This lead to have an homogeneous
network avoiding the use of bridges
between busses, software drivers
to peform the bridging, etc.
•
•
Tini system
Embedded web server and
embedded java virtual machines on
the electronics
Xilink Virtex II Pro
Embedded Java should guarantee
an homogeneous development
environment, portability, etc.
GM, Agata Meeting, Padova, May 2003
31
Agata Demonstrator (2003-2007)
Front-End
F1
F2
F3
F15
PSA Farm
P1
P2
P3
P15
B1
HPCC builder
T1
Tracking Farm
2 Dual processor
Servers + Myrinet
B2
T2
Blade Center
Storage Area network = iSCSI
SAN
Data Servers
iSCSI Disk Array + SATA disks
1
GM, Agata Meeting, Padova, May 2003
15 x 2 Eth Switch
Builder Network
Event Builder
Blade Center
16 1
16
8 + 8 TByte
1 Gbps
100 Mbps
32
> = 10 Gbps links
Off-line Infrastructure
Data Production Center:
- on-line system
- online storage (1000 TByte)
- central archive ?
Regional Computing Facilities:
- computing power for own analysis
- on-line storage for 1-2 experiments
- local archive
•
•
•
•
•
Exploit the LHC off-line infrastructure based on Regional Computing centers
Regional Computing facilities (e.g. Country bounded) are linked to the data
center via 10 Gbps links.
All the computing facilities are based on PC farms
a typical experiment take about 1 day to copy the entire data set
no tape copy
GM, Agata Meeting, Padova, May 2003
33
World Wide Farming: the GRID
–
GRID is an emerging model of large distributed computing. Main
aims are:
– Transparent access to multi-petabyte distributed data bases
– Easy to plug in
– Hidden complexity of the infrastructure
– GRID tools can help significantly the computing of the
experiments promoting the integration between data centers and
the regional computing facilities
– GRID technology fits the Agata experiment off-line requests
HEP focused GRID
initiatives
- DataGrid (UE)
- GriPhyN (USA)
- PPDG (USA)
GM, Agata Meeting, Padova, May 2003
34
Summary
•
Fully adoption of digital electronics
– increased on-line computing power needed to perform pulse shape analysis
• Digital Signal Processors (DSP) embedded in the front-end electronics
• Commodity components like PCs and networking devices
•
Trigger less systems (dead time free)
– time stamp techniques are used to have dead time free systems
– on-line computing power is needed to correlate data by applying prompt or
delayed coincidence (event building)
•
On-line analysis
– tracking systems needs O(105 ) SI95
•
Storage
– O(100) MB/s on the storage devices
– Use of data bases for archiving. Advanced parallel server needed to follow
the rate
•
Off-line analysis using GRID techniques
– data storage O(1000) TB per year
– international collaborations
• data distributions
• regional computing centers
– GRID architecture
GM, Agata Meeting, Padova, May 2003
35
Conclusions
• No fundamental technological issues for the final Agata on-line system:
– The experiment requirements and the present understanding of the
PSA algorithm fit with a final (2010) moderate size ( O(1000)
machines) on-line system. Only a 3 times improvement in the PSA
calculations lead to a system much more manageable (3 racks).
– Both network and event builder issues already fit with the today
available technologies.
– Storage requirements (1000 TByte) fit with the evolution of the
storage technologies.
– On-line storage staging, high bandwidth network data transfer and
GRID technologies allow data distribution over WAN ; tape only for
backup.
• Demonstrator
– Same architecture of the final system, only scaled down to the
foreseen number of detector.
GM, Agata Meeting, Padova, May 2003