ILC_3: Distributed Computing Toward ILC

Download Report

Transcript ILC_3: Distributed Computing Toward ILC

ILC_3: DISTRIBUTED
COMPUTING TOWARD ILC
(PROPOSAL)
CC-IN2P3 and KEK Computing Research Center (KEK-CRC)
Hiroyuki Matsunaga (KEK)
2014 Joint Workshop of the TYL/FJPPL and FKPPL
26-28 May, 2014, in Bordeaux
Background
• CC-IN2P3 and KEK-CRC worked together on Grid computing within
FJPPL
•
2007-2012 : Comp_3: Grid Interoperability and Data Management
EGEE – EGI (gLite) in Europe, and NAGREI – RENKEI in Japan
• SAGA (Simple API for Grid Application) as a Grid Standard for a solution of the
interoperability
• SRB / iRODS Data Grid
•
•
The project was successful
Useful to improve the development and deployment
• To exchange ideas and experiences is valuable
•
• Grid Computing works well in particle physics and other fields
CC-IN2P3 is the Tier-1 center for WLCG
• KEK-CRC will be the “Tier-0” center for Belle-II experiment which starts
taking data soon
•
Background (cont.)
• ILC project is progressing well
Technical Design Report (TDR) was complete last year
• The candidate site (Kitakami) has been decided in Japan
•
• Not too early to start considering the computing design for ILC
•
It should be sustainable to support the long-term project which will last
for more than a decade (like LHC)
•
•
Grid is being replaced with Cloud
•
•
Agile and (ideally) less deployment cost
Network is still improving
•
•
It should adopt “standard” technologies as much as possible
But it is not easy to exploit the available bandwidth efficiently
More Virtualization
•
Software-Defined Networking(SDN), Storage(SDS), and Data-Center(SDDC)?
The Aim of this project
• R&D of distributed computing for future large projects like ILC
Evaluate and gain experience of emerging technologies such as cloud
• More studies on data storage and data transfer
•
•
Might be useful for WLCG and Belle II
• To exchange ideas and knowledge between the two centers
Re-establish the collaboration beyond the previous Comp_3 project
• Long-term collaboration has a benefit
•
•
Especially when ILC is realized
Members
French Group
(CC-IN2P3)
Japanese Group
(KEK-CRC)
P-E. Macchi
H. Matsunaga
B. Delaunay
G. Iwai
L. Caillat-Vallet
K. Murakami
J-Y. Nief
T. Sasaki
V. Hamar
W. Takase
M. Puel
Y. Watase
S. Suzuki
Cloud computing
• Widely used in commercial sector
• Some HEP labs already deployed IaaS (Infrastructure as a Serivce)
cloud
• IaaS cloud allows to
Provision (compute) resources more dynamically
• Use resources more flexibly
•
•
System environment can be adjusted according to use cases
Optimize the resource usage (in rather opportunistic way)
• Ease the system administration
•
• Cloud management itself would not be easy
•
Quite different from the conventional system management
Cloud experience at KEK
• No strong needs from users so far
• Set up small test system and performed tests
•
ISF +LSF7 from Platform / IBM
•
•
We chose a commercial solution for better support and integration of batch
scheduler
Will migrate to PCMAE (successor of ISF) + LSF9 this fall
• Another small instance using OpenStack
•
Most popular software for free
Actively developed
• Strong support by major IT companies
•
•
Many HEP sites try to deploy OpenStack
• KEK’s main computer system will be replaced next summer
•
Cloud will be integrated into the next system for production use
Cloud at CC-IN2P3
• Already deployed using OpenStack
For compute and testing/development systems
• Hundreds of CPU cores are provisioned
•
• Target projects for compute resources
•
Large Synoptic Survey Telescope (LSST, http://www.lsst.org)
•
•
Euclid (http://www.euclid-ec.org/)
•
•
Astronomy
Astronomy / Astrophysics (Dark Energy)
Atlas MC simulation, HammerCloud validation
How to use the cloud
• Integrate with batch scheduler
No more work needed for ISF / PCMAE & LSF
• Openstack & Univa Grid Engine
•
• Direct use of cloud interfaces
•
Amazon EC2, OpenStack Nova
• Using middleware that has cloud interfaces
DIRAC (VMDIRAC)
• ILC and Belle II go to this direction
•
Cloud R&D in this project
• Exchange experience and knowledge
VM image creation and management
• Customizations
•
•
•
VM-job scheduling
Network configuration
IP address management
• IPv6 migration/addition in the near future
•
•
•
Many addresses available
NaaS (Network as a Service)
Storage integration
• Authentication/Authorization, Accounting, Information system
•
• Federation of the cloud resources
Submitting jobs using DIRAC and direct API
• Interoperability between heterogeneous Cloud resources
•
iRODS
• iRODS (iRule Oriented Data Systems) is a light-weight data
storage system
•
Rule-based data management with a meta-data catalogue
Versatile system that could be used by many communities according to their
needs
• Various policies for data management and replication can be implemented
•
•
Easier to administer than SRM storage systems
•
•
iRODS lacks the SRM interface currently
Developed by Prof. Reagan Moore of Univ. of North Carolina and his
team
• Both CC-IN2P3 and KEK have expertise on iRODS (and also its
predecessor SRB)
•
Deployment as an additional data storage system
iRODS service at KEK
• 4 iRODS servers
• 2 DB (iCAT) servers
•
PostgreSQL, active-standby
• Data Storage: GPFS + HPSS
• Clients
•
•
•
Command line (iCommands)
GUI (JUX, Java Universal eXplorer)
Web (rodsweb)
• Supported projects
•
MLF (Materials and Life Science Experimental Facility at J-PARC)
•
•
140TB, 10M files
T2K (Tokai-to-Kamioka) experiment
•
•
•
For Data Quality at the Near Detectors
Federation with Queen Mary University of London (QMUL)
10TB, 800 files
Use Case in T2K
DB
GPFS
iRODS
server
HPSS
Client
disk
tar file
file
file
• Rules
Bundle small files into large tar
files
• Copy a tar file to GPFS, then
replicate to HPSS
•
iRODS at CC-IN2P3
• Used by multidisciplinary environment
•
High Energy Physics
•
•
BaBar (~2PB), double Chooz (600TB)
Astrophysics
•
AMS
Biology and Biomedical applications
• Arts and Humanities
•
• 10+ data servers + 2 iCAT servers
•
Oracle as the backend DB
• Mass Storage: HPSS
• Backup system: TSM
iRODS federation
• Federation of iRODS between KEK and CC-IN2P3
•
•
Basically worked in the previous FJPPL project
Possible new federation with gravitational wave experiments
•
KAGRA project will start the operation in the near future
At Kamioka, Japan
• KEK helps set up distributed computing environment
•
CC-IN2P3 supports Virgo, European experiment, by using iRODS
• For the gravitational wave detection, it is very important to cross check
the results between the experiments
•
•
•
Data should be shared by the experiments for that
Good opportunity for iRODS federation
•
Could be a precursor for ILC
Data transfer
• High-performance data transfer over WAN is a long-standing issue
Necessary to tune OS/application configuration at both ends
• As for France-Japan network connection:
•
High latency (300ms RTT) going through US
• Academic link shared by many users
•
•
•
Directional asymmetry in the throughput
Varying in time
• Network technology is improving
•
Available bandwidth increasing
•
•
•
40 Gbit or 100 Gbit forseen for WAN in next few years
SDN
Important to check network performance continually
•
Beneficial for other experiments too
Summary
• This project aims at studying some items of the distributed
computing beyond the current Grid environment
• Cloud is getting mature and we want to evaluate it for the future
large-scale deployment
• Data storage and data transfer remain important issues for a long
time
• Of particular importance for the distributed computing is to work
together and communicate with each other
•
Face-to-face meeting is important