CHEP 04 conference report
Download
Report
Transcript CHEP 04 conference report
CHEP 04 conference report
Judith Katzy, DESY
Conference Contributions
522 participants
25 plenary talks
228 parallel talks organized in 7 parallel tracks:
Online Computing
Event Processing
Core Software
Distributed Computing Services
Distributed Computing Systems + Experiences
Computer Fabrics
Grid Security
3 Poster sessions
Industrial program
1 Birthday Party
All talks and papers available at http://www.cern.ch/chep04
Outline
Online computing
Event processing & core software
• Simulation
• Software Frameworks
• Developments in languages and tools
Distributed Computing in the Grid century
•
•
•
•
Wide area networking
Computer Fabrics
Grid Middleware
Experiences
Summary
Online Computing
Online computing track: very direct report of
findings by BES, D0, Phenix, H1, Zeus, Alice,
Atlas, LHCb, ..
Major shift of technology: pervasive adoption of
commodity computing and networking in ALL
areas of trigger and DAQ
Growing importance of control software and
simulation
Pierre VANDE VYVRE – CERN/PH
Online Operating Systems
Online computing used to be a zoo for OS
Special OS tend now to disappear from the
landscape
Present generation of experiments still using
kernels:
• But even there, Linux is becoming a credible
competitor
Transition out of Windows
• PHENIX EB: windows->Linux for performance
New developments (even extremely
demanding such as BTev or LHCb L1) plan to
use standard OS (Linux)
Pierre VANDE VYVRE – CERN/PH
Online @ HERA2
H1 HLT
- HERA luminosity upgrade
- Transition from VME SBCs
to commodity hw
- Corba as transport layer for
control and data
A.Campbell
ZEUS TRG L2
- HERA luminosity upgrade
- ZEUS detector upgrade:combined
trigger of CTD (Central Tracking)
with MVD (Micro Vertex) and STT
(Straw Tube Tracker)
- Transition from Transputers to
commodity hw
M.Sutton
Outline
Online computing
Event processing & core software
• Simulation
• Software Frameworks
• Developments in languages and tools
Distributed Computing in the Grid century
•
•
•
•
Wide area networking
Computer Fabrics
Grid Middleware
Grid experiences
Summary
Simulation
Currently Geant3, Geant4, FLUKA being used for HEP
and other applications
• 3 LHC experiments G4, Alice and Star considering FLUKA
Virtual Monte Carlo:
• interface to allow easy switch between different MC (G3, G4,
FLUKA)
Validation of Simulation done on test beams:
• Macroscopic tests:
measure LHC detector components response
• Microscopic test:
typically thin-target setups to measure single interaction
or cross section
Test beam at Bessy 1
Advanced Concepts and Science Payloads
Pure material samples:
• Cu
• Si
• Fe
• Al
• Ti
• Stainless steel
Monocromatic photon beam
HpGe detector
detector
67 mm
40 mm
45°
beam
40 mm
material samples
A. Owens, A. Peacock
Comparison of Simulation
•Most data/Geant3/Geant4 comparison favor G4
Geant4
electromagnetic physics
has been already
validated at percent
level. In the next future,
efforts to reach permil
level forseen.
• A reasonable
agreement between data and both Geant4
and Fluka in hadronic physics was found. The shape of
hadronic shower needs further improvements.
Parameterized Simulations
Fast simulation a MUST for LHC to get
from O(min) to <1s/event
Software frameworks
Reports on data models, reconstruction and
analysis frameworks from Atlas, BaBar, CMS, H1,
Star
Component based architectures
• More modularity and flexibility in the assembling of
the application
OS: Much of the software only works on Linux
• Alternatives?
Solaris
• Still seen in “server” environment
Mac OS X
• Growing popularity (supported by LCG)
• Compiler is the same as Linux
Objects Persistency
ROOT I/O used by many
experiments
(all LHC, H1, GLAST,
BaBar..)
Hybrid solution:
Relational DB for
catalog and ROOT I/O
proved to be very
successful
POOL:
implements the hybrid
solution
widely used, grid
enabled
Improvements of Root I/O
TFile improvement
• Large files and trees, Double32_t, XML output format.
• Support for non-instrumented classes
Enhancement in I/O and Tree Query for collection
• Split Collections
• Fast histograming of (potentially) any collections
• Lift restrictions on STL I/O
Nested containers
Reading without compiled code
TTree
• Remove stringent requirements on CloneTree
• Add support for auto loading of referenced objects
• Support for RDBMS databases back-end coming soon.
TTree Queries
• Can call any functions taking numerical arguments
• Can use arbitrary C++ and still use the branch names as
variables
• TTree Friend linked by Index
Talk by P.Canal
Languages
Most experiments use more than one language
• Use the best tool for each application
Mostly C++
Only legacy FORTRAN discussed
• Still some being replaced (e.g. ZOOM replacing Minuit)
• Majority of remaining in event generators
Java (still) mostly confined to graphics and event viewing
• Places where performance isn’t limiting factor
• Used for event processing by North American work on ILC:
estimated 20-30% slower than C++
Python becoming more popular
• One big use is for quick prototyping
Need for bridges
• I/O exchanges (LCIO, Java , root I/O, XML data files)
• Interpreter Binding (PyROOT, Java JNI)
• Language independent API (AIDA)
Standard C++ for HEP
FNAL member of the ANSI C++ standard committee
Likely to become standard soon:
• Improve performance by
Enhanced function declaration
Move semantics (10-20fold speed increase)
• Improve domain support with
Random number toolkit
Mathematical special functions (math.h didn’t change since 30
years)
• Improve core language by
Compile time reflection
Dynamic libraries
For a long list of improvements on the std.library and the
core language please check the talk and
http://www.open-std.org/jtc1/sc22/wg21
M.Paterno,W.Brown, FNAL
Outline
Online computing
Event processing & core software
• Simulation
• Software Frameworks
Distributed Computing in the Grid century
•
•
•
•
Wide area networking
Computer Fabrics
Grid Middleware
Experiences
Summary
Grid Vision
What is Grid ?
“A Grid provides an abstraction for resource sharing and
collaboration across multiple administrative domains …”
(Source: NGG Expert Group, 16 June 2003 “European Grid Research 20052005-2010)
• Benefits
Increased productivity by reducing Total Cost of
Ownership
Any-type, anywhere, anytime
services by/for all
Industry & Business
Grids
Infrastructure for dynamic virtual
organisations
e-Science
Next generation Internet services
backbone
M.Lemke, European Commission
30 sites
3200 CPUs
Grid Reality
LCG-2
Many different grids
30 sites
3200 cpus
25 Universities
4 National Labs
2800 CPUs
Grid3
LHC Data Grid Hierarchy
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
Online System
Experiment
~100-1500
MBytes/sec
CERN Center
PBs of Disk;
Tape Robot
Tier 0 +1
Tier 1
10 - 40 Gbps
IN2P3 Center
INFN Center
RAL Center
FNAL Center
~10 Gbps
~1-10
Gbps
Tier3
Tier2
Institute Institute
Physics data cache
Workstations
Institute
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Institute
1 to 10 Gbps
Tier 4
Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
Network for Grid
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~TByte/sec
Online System
Experiment
~100-1500
MBytes/sec
CERN Center
PBs of Disk;
Tape Robot
Tier 0 +1
Tier 1
10 - 40 Gbps
IN2P3 Center
INFN Center
RAL Center
FNAL Center
~10 Gbps
~1-10
Gbps
Tier3
Tier2
Institute Institute
Physics data cache
Workstations
Institute
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Institute
1 to 10 Gbps
Tier 4
Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
Existing Networks
Europe: Geant, USA: Esnet (DOE Energy Sciences Network) ,Japan: SINET
All at 10 Gbits/sec currently, upgrades in the near future, internationally linked
LIGO
PNN
L
ESnet IP
MIT
J
G
I
LB
NL
NERSC
SLAC
SN
LL
QWEST
ATM
LL
NL
FN
AL
AN
L
AM
ES
MAE-W
BNL
NY-NAP
PP
PL
MAE-E
ANL-DC
INEELDC
ORAUDC
4xLAB-DC
LLNL/LAN
GTN&NN
L-DC
PAIX-E
SA
KCP
YUCCA
MT
SDS
C
ALB
HUB
GA
Office Of Science Sponsored (22)
LAN
L
OR
OSTINL
SNL
A Allied
Signal
AR
M
JL
AB
ORAU
NOAA
SRS
Network development
Network backbones and major links used by HENP and
other fields are advancing rapidly
To the 10 G range in < 3 years; much faster than
Moore’s Law
New HENP and DOE Roadmaps: a factor ~1000 BW
Growth per decade
We are learning to use long distance 10 Gbps networks
effectively
2004 Developments: to 7.5 Gbps flows with TCP over
16 kkm
(B.Newman,Caltech)
The near future
Global Ring Network for Advanced Applications Development
GLORIAD 5-year Proposal (with US NSF) for expansion to 2.5G-10G
Moscow-Amsterdam-Chicago-Pacific-Hong Kong-Pusan-Beijing early
2005; 10G ring around northern hemisphere 2007;
Multi-wavelength hybrid service from ~2008-9
Network status & outlook
Todays R&D networks are excellent, reliable, and
responsive to research needs
(today they are underloaded)
In the future we will see a greater diversity of
requirements from “high demand” applications, including
the Grid
Today there is a global movement to develop new ways of
networking for grids and research in general
HEP Must undertake network-data-challenges now to
demonstrate it really needs ~100 Gbits/s in the LHC era
Talk by P.Clarke
Computer Fabrics
CERN/Outside Resource Ratio ~1:2
Tier0/( Tier1)/( Tier2)
~1:1:1
~TByte/sec
Online System
Experiment
~100-1500
MBytes/sec
CERN Center
PBs of Disk;
Tape Robot
Tier 0 +1
Tier 1
10 - 40 Gbps
IN2P3 Center
INFN Center
RAL Center
FNAL Center
~10 Gbps
~1-10
Gbps
Tier3
Tier2
Institute Institute
Physics data cache
Workstations
Institute
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Tier2 Center
Institute
1 to 10 Gbps
Tier 4
Tens of Petabytes by 2007-8.
An Exabyte ~5-7 Years later.
Computer Fabrics
Scale (CERN, BNL, GridKa, Belle,FNAL,..)
• 500-1300 nodes
• O(100) TB disk, O(100) disk server
• O(1) PB tapes, O(100) tape server
Monitoring is a maturing field
• Ganglia: widespread use, LEMON: similar
Installation and maintenance philosophies
• Rocks: Reproducible installations (reinstall to update)
• Quattor: Actively manage the running environment
• Grid underware is complex to install
…including the grid installation server itself
Storage management:
• CASTOR and dCache in full growth
• SRM proliferating to support all major storage managers
OS: still large demand for OS variety
• Move to RHES3 / Scientific Linux
Computer Fabrics on the Grid
Tiers 0,1:
• Turnkey solutions
• Low maintenance central services
“Challenge is to run cheap HW with minimal staff and
moderate expertise”
Tiers 2ff:
• Large variety: different HEP experiments,
different Tier1/0s
• large teams want tools to coordinate large and
diverse services
• Emphasis on coordinated information
• Central servers to orchestrate the automation
Data management
POOL widely used
Interplay of data base technology and native grid
service for data distribution and replication
• FronTier (FNAL, running experiments
Decouple development and user data access
Scalable
Many commodity tools and techniques (SQUID)
Simple to deploy
Sustainable infrastructure
• LCG 3D (CERN, LHC experiments) Convergence foreseen
• SAM, BaBar DM
Experience with running experiments
• gLite Data management
New technology and experience
and conceivable
Workload management
BNL STAR SUMS systems
• Used for Distributed / Grid Simulation
job submission
• Emphasis on stability
Running experiments, lots of users!
EGEE gLite WMS is being released
Optimization by
• “Phenomenological” estimates based on few
parameters like CPU power & event size (J.
Huth et al.)
• Based on Workload Monitoring (Sphinx)
Grid Security
Pre-Grid services like ssh on Grid machines
are already under attack!
People are developing tools to look for
attacks.
Grids still need to interface to security used by
pre-Grid systems like Kerberos, AFS and
WWW
We are developing tools to manage 1000s of
users in big experiments.
Application-level software developers are
starting to interface to Grid security systems.
Andrew McNab
Grid Middleware Status
M.Lamanna
New generation of middleware becoming available:
• LCG-2->gLite
LCG-2: Focus on production, large-scale data handling, tested in the
2004/5 LHC data challenges
gLite: Focus on analysis
• GAE
Interactivity as a goal as opposed to “production” mode
RPC-based web service framework (Clarence)
Compatibility with gLite to be explored
• DIRAC
XML-RPC: no need for WSDL…
Instant messages protocol inter service/agents communication
Interact with other exp. specific services (cfr. File-Metadata
Management System For The LHCb Experiment“)
Some commonality on technology
• Service Oriented Architecture; Web Services
Dynamics of the evolution of the middleware very
complex
• Experience injected in the projects via large user community/Data Challenges
Grid Middleware
– General remarks
How many middleware flavours?
Many opposing forces at work
Innovation
Standardisation
Source of funding
National/regional
services
Functionality
Security
Stability
Maturity & experience
HEP goals
Worldwide collaborations
Reliability
Ease of access
On the timescale of LHC startup - we are going to have to live
with a few different middleware implementations/standards
But we really should try to avoid two solutions
that do the same thing slightly differently!
Les Robertson, CERN
Grid experiences at LHC…
All 4 LHC experiments performed
2004 data challenges:
Based on LCG-2 software
30-60TBytes (~30M events) in 6-100k jobs
and 972 MSI-2k hours total CPU
Focus on production, large-scale data
handling
Result:
• ~65% job efficiency
• Stability is largest source of problems
Summary of I.Bird
…pre-Grid experience at CDF
CDF: Global User Analysis Computing
• Computing environment for user analysis with elements of
global computing and use of standard components
• Consist of commodity linux based system and file servers
• Presently 300TB data + 1500 cpus available for physicist
• 770 registered users, 100% utilization
Alan Sill
Grid Talk
Summary
Conference covered large variety of
aspects of HEP computing
Component based architectures in all
areas of software development
Move to commodity hardware and
Linux OS
GRID is coming!