The PHENIX Online Computing System for the RHIC Run 2001

Download Report

Transcript The PHENIX Online Computing System for the RHIC Run 2001

Computing in the PHENIX experiment and our
experience in Korea
Presented by H.J. Kim
Yonsei University
For the behalf of HEP Data
GRID Working Group
The Second International Workshop on HEP Data Grid
CHEP, KNU Aug. 22-23, 2003
RHIC
 Configurations: Two concentric superconducting magnet rings
(3.8Km circumference) with 6 interaction regions
 Ion Beams:
Au + Au (or d + A)
s = 200 GeV/nucleon
luminosity = 21026 cm-2 s-1
 Polarized proton:
p+p
 s = 500 GeV
luminosity = 1.4 1031 cm-2 s-1
 Experiments: PHENIX, STAR, PHOBOS, BRAHMS
PHENIX Experiment
4 spectrometer arms
12 Detector subsystems
350,000 detector channels
Event size typically 90Kb
Data rate 1.2 - 1.6KHz, 2KHz observed
100-130MB/s typical data rate
expected duty cycle ~50% -> 4TB/day
We store the data in STK Tape Silos
with HPSS
Physics Goals
Search for Quark-Gluon Plasma
Hard Scattering Processes
Spin Physics
RCF
 RHIC
Computing Facility (RCF) provides computing facilities for
four RHIC experiments (PHENIX, STAR, PHOBOS, BRAHMS).
 Typically RCF gets ~ 30 MB/sec (or a few TB/day) from the
PHENIX counting house only through Gigabit network. Thus RCF is
required to have complicated data storage and data handling
systems.
 RCF has established an AFS cell for sharing files with remote
institutions and NFS is the primary means through which data is
made available to the users at the RCF.
 The similar facility is established at RIKEN (CC-J) as a
regional computing center for PHENIX.
Grid Configuration concept in RCF
LSF Grid Cluster
LSF
Server1
HPSS
LSF
Server2
Disks
grid
30MB/sec
Gatekeeper
Job manager
Submit Grid Jobs
Internet
Grid Job Requests
620GB
PHENIX Computing Environment
Linux OS with ROOT Framework
 PHOOL (PHenix Object Oriented Library)
 C++ class library

Mining
&
Staging
Pretty
Big
Disk
Raw
Data
HPSS
Raw
Data
Counting
House
Calibrations
&
Run Info
Big
Disk
DST
Data
Reconstruction
Farm
Tag DB
Database
Analysis
Jobs
Local
Disks
PHENIX DAQ Room
A picture from the PHENIX Webcam
PHENIX Computing Resources
Locally in the countinghouse:
• ~110 Linux machines of different ages ( 26 dual 2.4GHz , lots
of 900-1600MHz machines), loosely based on RedHat 7.3
• Configuration,reinstallation, very low maintenance
• 8TB local disk space
At the RHIC computing facility:
• 40TB disk (PHENIX’s share)
• 4 tape silos w/ 420 TB each (no danger of running out of space
here)
• ~320 fast dual CPU Linux machines (PHENIX’s share)
Run ‘03 highlights
• Replaced essentially all previous Solaris DAQ machines with
Linux PC’s
• Went to Gigabit for all the core DAQ machines
• tripled the CPU power in the countinghouse to perform online
calibrations and analysis
• boosted our connectivity to the HPSS storage system to
2x1000MBit/s (190MB/s observed)
• DATA taken : ~¼ PB /year
Raw data in Run-II: Au-Au; 100TB, d-AU; 100TB (100days), pp; 10TB, p-p in Run-III; 35TB
Micro-DST(MDST) : 15% of Raw data
Au-Au, d-Au; 15TB, p-p ; 1.5TB (Run-II), 5TB(Run-III)
Phenix Upgrade plans
Locally in the countinghouse:
• go to Gigabit (almost) everywhere (replace ATM)
• use compression
• replace older machines, re-claim floor space, get another
factor of 2 CPU and disks
• get el cheap commodity disk space
In general:
• Move to gcc 3.2, loosely some Redhat 8.x flavor
• we have already augmented Objectivity with convenience
databases (Postgres, MySQL), which will play a larger role in
the future.
• Network restructuring
Network data transfer at CHEP, KNU
Real-time data transfer between CHEP(Daegu, Korea) and CCJ(RIKEN
Japan)
2 Tbyte of data volume ( physics data ) has been transferred by one
bbftp
session and we experienced
maximum : 200 Gbyte/day
probable : 100 Gbyte/day.
●
200 Gbyte/day between CCJ and RCF(BNL,USA) is known. Comparable
speed
is expected between CHEP and RCF.
●
Mass storage at KNU for Phenix
Mass storage ( HSM at KNU ) :
Efficient usage by PHENIX.
2.4 Tbyte assigned (for PHENIX) among total of 50 TB,
2 + some TB used (by PHENIX)
Experience
Reliable storage up to 2 TB. ( 5TB possible?)
Optimised usage is under study.
Computing nodes at KT, DaeJeon
Computing nodes at KT, Deajeon :
10 nodes ( CPU : Intel XEON 2.0 GHz & memory : 2 Gbyte )
Centralized management of the analysis softwares are
possible by
nfs-mount of the CHEP disk to the cluster supported by the
fast
network between KT and CHEP.
CPU usages for the Phenix analysis > 50% always.
Experience :
Network between CHEP and KT is satisfactory. No difference
from
the computing by the local cluster is seen.
NFS
KT cluster1
KT cluster2
Script
KT cluster3
Software
library
Network
100 Gbyte/day
CHEP17
Data
KT cluster9
KT cluster0
HSM
2.5 Tbyte
Yonsei Computing Resources for PHENIX
 Linux (RedHat 7.3)
 ROOT Framework
CHEP
100Mbps
Big Disk(1.3 T byte)
RAID tools for Linux
PHENIX Library
Raw
Data & DST
Computers (P4)
Reconstruction
Analysis
Jobs
Calibrations
&
Run Info
Tag DB
Database
RAID IDE Storage system R&D at Yonesei
●
IDE RAID5 system with Linux
●
8 (1 for parity) x 180 Gbyte = 1.3 Tbyte storage
●
Total Cost 3000$ ( Interface CARD + HD + PC)
●
Write ~100Mbyte/sec and Read ~150Mbyte/sec
●
So far no serious problem is experienced
Summary
* Computing facility of PHENIX experiment and current status is
reported.
It manages large data storage (1/4 PB/year) well.
There is a movement to HEP GRID.
* We investigated parameters relevant to the HEP data GRID
computing for PHENIX in CHEP
1. Network data transfer : Real-time data transfer speed of 100
GB/day
between the major research facilities at CHEP (KOREA),
CCJ (JAPAN), and RCF (USA)
2. Mass storage ( 2.5 TB, HSM at CHEP)
3. 10 Computing nodes ( at KT, Deajeon )
Data analysis is in progress (Run-II p-p MDST).
The overall analysis experience has been satisfactory.
* IDE RAID storage R&D is on the way in Yonsei Univ.