WAN RAW/ESD Data Distribution for LHC

Download Report

Transcript WAN RAW/ESD Data Distribution for LHC

WAN RAW/ESD Data Distribution for LHC
27.05.2004
Bernd Panzer-Steindel, CERN/IT
T0  T1 dataflow
T0 Mass Storage recording of the RAW data from the 4 LHC experiments
T0 First ESD production
RAW data and ESD export to the Tier1 centers
one copy of the RAW data spread over the T1 centers of an experiment
several copies of the ESD data sets (3-6), experiment dependent
ESD size ~= 0.5 * RAW data (each T1 2/3 or one copy of the ESD)
 ~10PB per year
(requirements from the latest discussions with the 4 experiments)
T1T0 Data import (new ESD versions, MC data, AOD, etc.)
near real time export to the Tier1 centers during LHC running (200 days per
year) + ALICE heavy ion data during the remaining 100 days
data transfers are between mass storage systems
near real time == from disk cache  sizing of the tape system == minimal data recall from tape
27.05.2004
Bernd Panzer-Steindel, CERN/IT
Network
There are currently 7+ Tier1 centers
(RAL, Fermilab, Brookhaven, Karlsruhe, IN2P3, CNAF, PIC,…)
The T0 export requirements need at least a 10 Gbit/s link per Tier1
(plus more if one includes the Tier1-Tier2 communications)
The CERN T0 needs at least a 70 Gbit/s connection
the efficiency is still unknown
27.05.2004
Bernd Panzer-Steindel, CERN/IT
We need to start Service Data Challenges which
should test/stress all necessary layers for these large continuous data
transfers
network hardware
circuit switching versus packet switching, QoS
transport
TCP/IP parameters, new implementations
transfer mechanisms
GRIDFTP
mass storage systems
ENSTORE, CASTOR, HPSS, etc.
coupling to the mass storage systems
SRM 1.x
replication system
data movement service (control and bookkeeping layer)
Key points :
resilience and error-recovery !!
resilience and error-recovery !!
resilience and error-recovery !!
modular layers
simplicity
performance
27.05.2004
Bernd Panzer-Steindel, CERN/IT
Proposed timescales and scheduling
Midyear
2004
2005
2006
27.05.2004
Endyear
10Gbit “end-to-end” tests
with Fermilab
First version of the LHC
Community Network proposal
10Gbit “end-to-end” test complete with European
Partner
Measure performance variability and understand
H/W and S/W Issues to ALL sites.
Document circuit switched options and costs, first
real test if possible.
Circuit/Packet switch design
completed.
LHC Community network
proposal completed.
All T1 Fabric architecture
documents completed.
LCG TDR completed
Sustained throughput test achieved to some sites:
2-4 Gb/sec for 2 months. H/W and S/W problems
solved.
All CERN b/w provisioned.
All T1 bandwidth in
production (10Gb links)
Sustained throughput tests
achieved to most sites.
Verified performance to all sites for at least 2
months.
Bernd Panzer-Steindel, CERN/IT
These WAN service data challenges needs dedication of
material : cpu server, disk server, tape drives , etc.
services : HSM, load balanced GRIDFTP, network
personnel : for running the DC, debugging, tuning,
software selection and tests
on the T0 and the different T1 centers
dedication of material and personnel for longer time periods
months not weeks !
important for getting the necessary experience, only 2 years for a
reliable working system worldwide, T1 – T0 network
to be watched :
interference with ongoing productions (HSM, WAN capacity, etc.)
need to start now with a more detailed plan and start to ‘fill’ the network
right now !
challenging, interesting and very important
who is participating when and how ?????
discussion………………………..
27.05.2004
Bernd Panzer-Steindel, CERN/IT