PHENIX Computing Center in Japan(CC-J)

Download Report

Transcript PHENIX Computing Center in Japan(CC-J)

CC-J : Computing Center in Japan
for RHIC Physics
Takashi Ichihara
(RIKEN and RIKEN BNL Research Center)
Presented on 28/09/2000 at RBRC Review at BNL
Contents
1. Overview
2. Concept of the system
3. System requirement
4. Other requirement as a regional center
5. Project plan and current status
6. Activity since last review
7. Plan in this year
8. Current Configuration of the CC-J
9. Components of the CC-J (photo)
10. CC-J Operation
11. Summary
PHENIX CC-J : Overview


PHENIX Regional Computing Center in Japan (CC-J) at RIKEN wako
Scope
 Principal site of computing for PHENIX simulation
 PHENIX CC-J is aiming at covering most of the simulation tasks of the whole
PHENIX experiments
 Regional Asian computing center
 Center for the analysis of RHIC spin physics

Architecture
 Essentially follow the architecture of RHIC Computing Facility
(RCF) at BNL

Construction
 R&D for the CC-J started in April ‘98 at RBRC
 Construction began in April ‘99 over a three years period
 1/3 scale of of the CC-J started operation in June 2000
Concept of the CC-J System
Duplicating Facility
import
DST
HPSS
Servers
DST
Tape drive units
to duplicate data
DST
sim.
Export
Sim.
DST
DST PC farms
15TB
Phys. for ana. &
Big
simulation
Disk
sim. 10k Spectnt95
DST
HPSS
STK
Tape
Robot
Tapes
(50GB/
volume)
PHENIX CC -J
SMP
Servers
APAN/ESNET
WAN
Duplicating Facility
DST
DST
40TB
Big
Disk
Tapes
(50GB/
volume)
Tape drive units
to duplicate data
HPSS
Servers
SMP
Servers
Physics
DST
CAS
RCF
HPSS
STK
Tape
Robot
Raw
DST
CRS
20MB/s
Track
reconstruction
PHENIX
System Requirement for the CC-J

Annual Data amount
DST
micro-DST
Simulated Data
Total

150 TB
45 TB
30 TB
225 TB
Hierarchical Storage System
Handle data amount of 225TB/year
Total I/O bandwidth: 112 MB/s
HPSS system

Disk storage system
15 TB capacity
All RAID system
I/O bandwidth: 520 MB/s
CPU
( SPECint95)
Simulation
8200
Sim. Reconst
1300
Sim. ana.
170
Theor. Mode
800
Data Analysis
1000
Total
11470 SPECint95
( = 120K SPECint2000)
Data
Duplication Facility
Export/import DST, simulated data.
Other Requirements as a Regional Computing Center

Software Environment
• Software environment of the CC-J should be compatible to the PHENIX Offline
Software environment at the RHIC Computing Facility (RCF) at BNL
• AFS accessibility (/afs/rhic)
• Objectivity/DB accessibility

Data Accessibility
• Need exchange data of 225 TB/year to RCF
• Most part of the data exchange will be done by SD3 tape cartridges (50GB/volume)
• Some part of the data exchange will be done over the WAN
• CC-J will use Asia-Pacific Advanced Network (APAN) for US-Japan connection
• http://www.apan.net/
• APAN has currently 70 Mbps bandwidth for Japan-US connection
• Expecting 10-30% of the APAN bandwidth (7-21 M bps) can be used for this project:
• 75-230 GB/day ( 27 - 82 TB/year) will be transferred over the WAN
Project plan and current status of the CC-J
1998
1999
April
RBRC
(BNL)
RIKEN
Wako
2000
April
2002
2001
April
April
April
Sep. 2000 Mar. 2001 Mar. 2002
R&D for CC-J
CC-J frontend at BNL
Prototype of CPU fa rms
Data Dupli cation facility
CC-J
construction
Phase 1
Phase 2
1/3 scale
2/3 scale
Phase 3
Full scale
CC-J Working Group
formed (Oct. 1998)
CC-J review at BNL
(Dec. 1998)
CC-J starts operation
at 1/3 scale
(June. 2000)
HPSS Software/Hard wa re
In stal lati on (March 1 999)
(Sup plem enta ry Budge t)
Full scale CC-J
(Mar. 2002)
CC-J Operation
PHENIX Exp. at RHIC
CPU farm (number)
128
192
288
CPU farm (SPECint95)
4100
7100
13000
CPU farm (SPECint2000)
41k
71k
130k
Tape Storage siz e(TB)
100
100
100
Disk Storage siz e(TB)
3.5
10
15
Tape Drive (number)
4
7
10
Tape I/O (MB/s)
45
78
112
Disk I/O (MB/s)
100
400
600
SUN SMP Server unit
2
4
6
HPSS Server unit
5
5
5
Activity since the last review (in these 15 months)

Construction of the CC-J at phase 1,2
• Phase 1 and 1/3 of Phsae 2 hardware installed
• High Performance Storage System (HPSS) with 100 TB Tape library
• CPU farm of 128 processors (about 4000 SPECint95)
• 3.5 TB Raid Disk , Two Gigabit Ethernet Switches, etc.
• A tape drive and a workstation installed at BNL RCF for tape duplication
• Software environment for PHENIX computing : established
• AFS local mirroring, Linux software environment, LSF etc : ready to use

CC-J operation
• CC-J started operation in June 2000.
• 40 user’s accounts created so far.
• 100 K event simulation (130 GeV/nucleon, Au+Au min. bias) of
• (1)retracted geometry, zero filed, (2) standard geometry, zero filed,
• (3)full 2D field field, (3) full 3D field,(3) half 3D filed : in progress
• Real raw data (2TB) of PHENIX experiment transferred to CC-J via WAN
• Large ftp performance (641 KB/s = 5 Mbps) was obtained (RTT = 170 ms)
• data analysis by regional users are in progress.
Plan in this year

Hardware upgrade
• 6 TB Raid disk, tape drives and cached disks for HPSS
• 64 Linux CPU farms, SUN servers for data mining & NFS serving

System development
• Data duplicating facility (to be operational)
• Objectivity/DB accessing method (to be established)

Simulation production
• 100k Au+Au simulations (continuing)
• Other simulations to be proposed by PHENIX PWG (spin etc.)

Data Analysis
• Official Data Summary Tape (DST) will be produced at RCF soon
• DST : transfer to the CC-J via the duplication facility (by tape)
• Micro-DST production (data mining ) for data analysis
Current configuration of the CC-J
(Alta cluster) * 8 box
SUN E450
NFS Server
PentiumIIIII
Pentium
Pentium
Pentium
IIII III
Pentium
Pentium
II
PentiumIIII
Pentium
Redhat 6.1
288 GB
RAID Disk
1 00GB
1.6 TB
RAID Disk
X2
4CPU, 1GB memory
Linux
32 Pentium II (450 MHz)+
32 Pentium III (600 MHz)+
32 Pentium III (700 MHz)+
32 Pentium III (850 MHz)
256 MB Memory /CPU
SUN E450
G.C.E. Server 1 00GB
Pentium III
Pentium
Pentium
IIII III
Pentium
Pentium
II
PentiumIIII
Pentium
Pentium
II
100BaseT x n
Gigabit Switch
Catalyst 2948G
Private
address
2CPU, 1GB memory
Serial
HIPPI
1000
BaseSX
Al tacluster
contro l WS
1000BaseSX
(9kB MTU)
Gigabit
Switch #2 (L3)
Jumbo Frame
Alteon 180 (9KB MTU)
compa c
DS20
DSC
FC
1000BaseSX
(9kB MTU)
AFS01
AFS server
2F
1F
HIPPI
SWITCH
EPS-10 00
28 8 GB
Raid (Work)
100 TB
HIPPI
SP Router
Ascend GRF
4 RedWood
drives
RIKEN
super
computer
updated on 25 Aug 2000
T. Ichihara (RIKEN/RBRC)
SUN
ACSLS
Switch
Gigabit
Switch
#1 (L3)
Tape Mover
Disk Mover
Disk Mover
Alteon 180
(9kB MTU)
HPSS Server
15 0GB Raid
HPSS Cache
28 8 GB
Raid X 2
HPSS Cache
HPSS
Controlle r
10/100BaseT
IBM SP2
(AIX 4.3.2)
HPSS
ACSLS
FC
DSC
Tape Mover
HPSS
STK
Tape
Robot
1000BaseSX
x 5 (9k B MTU)
1000BaseSX
RIKEN LAN
WAN
Component of the PHENIX CC-J at RIKEN
StorageTek Tape Robot (100TB [250 TB])
HPSS Server (IBM RS-6000/SP)
STK Tape Robot (100 TB [240 TB] )
3. 2 TB
RAID5 Disk
TWO SUN E450
Data Servers
Uninterruptable
Power Supply (UPS)
CPU Farm of
128 CPU
CC-J Operation

Operation, maintenance and development of CC-J are carried out under the
charge of the CC-J Planning and Coordinate Office (PCO).
Planning and Coordination Office
manage r
techn ical manage r
T. Ichihara
Y. Watanabe
(RIKEN and RBRC)
(RIKEN and RBRC)
scientific progra mmi ng coordinator
H. En'yo
H. Hamagaki
(Kyoto and RBRC, PHENIX-EC)
(CNS-U-Tokyo, PHENIX-EC)
PHENIX Liaison
computer scientists
N. Sait o
N. Haya shi
(RIKEN and RBRC)
(RIKEN)
S. Yokkaichi
(RIKEN)
Y. Goto
S. Sawada
(RBRC)
(KEK)
Technical Management Office
Manage r, front-end BNL
Y. Watanabe
(RIKEN and RBRC)
System eng inee r
Tape dupli cation operator
N. Otaki
(TBD)
(IBM Japan)
Summary






The construction of the PHENIX Computing Center in Japan (CC-J) at RIKEN
Wako campus, which will extend over a three years period, began in April 1999.
The CC-J is intended as the principal site of computing for PHENIX
simulation, a regional PHENIX Asian computing center, and a center for the
analysis of RHIC spin Physics.
The CC-J will handle the data of about 220 TB/year and the total CPU
performance is planned to be 10k SPECint95 (100k SPECint2000) in 2002.
CPU farm of 128 processors (RH6.1, kernel 2.2.14/16 with nfsv3 ) is stable.
Copy data over WAN: Large ftp performance (641 KB/s = 5 Mbps) was obtained
for over the Pacific Ocean (RTT = 170 ms)
The CC-J operation started in June 2000 at 1/3 scale.
• 39 user’s account created.


100K-event simulation project started in September 2000.
Some part of real raw data (about 2 TB) of PHENIX experiment transferred to
CC-J via WAN and data analysis by regional users are in progress.