No Slide Title
Download
Report
Transcript No Slide Title
Technical Evolution in LHC Computing
(with an ATLAS slant)
Torre Wenaus
Brookhaven National Laboratory
July 18, 2012
GRID’2012 Conference
JINR, Dubna
Fabiola Gianotti,
ATLAS Spokesperson
Higgs search seminar
July 4 2012, CERN
Torre Wenaus, BNL
2
Grid’2012 Conference, JINR, Dubna July 18 2012
This Talk
LHC computing – an outstanding success!
Experiments, Grids, Facilities, Middleware providers, CERN IT
collaborated closely and delivered
Where do we go from here? Drawing on what we’ve learned,
and the technical evolution around us, what tools and
technologies will best enable continued success?
A hot topic as we approach a long shutdown 2013-14
Spring 2011: ATLAS established R&D programs across key
technical areas
Fall 2011: WLCG established technical evolution groups
(TEGs) across similar areas
This talk draws on both in surveying LHC computing technical
evolution activities and plans
Subjectively, not comprehensively
Torre Wenaus, BNL
3
Grid’2012 Conference, JINR, Dubna July 18 2012
Computing Model Evolution in ATLAS
Originally:
Static, strict hierarchy
Multi-hop data flows
Lesser demands on
Tier 2 networking
Virtue of simplicity
Today:
Flatter, more fluid, mesh-like
Sites contribute according to capability
Greater flexibility and efficiency
More fully utilize available resources
Principal enabler for the evolution:
the network
Excellent bandwidth, robustness,
reliability, affordability
Torre Wenaus, BNL
4
Grid’2012 Conference, JINR, Dubna July 18 2012
Requirements 2013+
In Light of Experience and Practical Constraints
Scaling up: processing and storage growth with higher
luminosity, energy, trigger rate, analysis complexity, …
With ever increasing premium on efficiency and full
utilization as resources get tighter and tighter
Greater resiliency against site problems, especially in
storage/data access, the most common failure mode
Effectively exploit new computer architectures, dynamic
networking technologies, emergent technologies for large
scale processing (NoSQL, clouds, whatever’s next…)
Maximal commonality: common tools, avoid duplicative efforts,
consolidate where possible. Efficiency of cost and manpower
All developments must respect continuity of operations; life
goes on even in shutdowns
Torre Wenaus, BNL
5
Grid’2012 Conference, JINR, Dubna July 18 2012
WLCG Technical Evolution Groups (TEGs)
TEGs established Fall 2011 in the areas of databases, data
management and storage, operations, security, and workload
management
Mandate: survey present practice, lessons learned, future
needs, technology evolution landscape, and recommend a 2-5
year technical evolution strategy
Consistent with continuous smooth operations
Premium on commonality
Reports last winter, follow-on activities recently defined
Here follows a non-comprehensive survey across the technical
areas of
Principal TEG findings (relevant to technology evolution)
and other technical developments/trends
Some ATLAS examples of technical evolution in action
Torre Wenaus, BNL
6
Grid’2012 Conference, JINR, Dubna July 18 2012
Databases
Great success with CORAL generic C++ interface to back end DBs
(Oracle, Frontier, MySQL, …). Will ensure support, together with the
overlaid COOL conditions DB
Conditions databases evolved early away from direct reliance on
RDBMS. Highly scalable thanks to Frontier web service interfacing
Oracle to distributed clients served by proxy caches via a RESTful
http protocol
Establishing Frontier/Squid as a full WLCG service
Oracle as the big iron DB will not change; use ancillary services like
Frontier, NoSQL to offload and protect it
NoSQL: R&D and prototyping has led to convergence on Hadoop as
a NoSQL ‘standard’ at least for the moment
CERN supporting a small instance now in pre-production
ATLAS is consolidating NoSQL apps around Hadoop
Torre Wenaus, BNL
7
Grid’2012 Conference, JINR, Dubna July 18 2012
Hadoop in ATLAS
Distributed data processing framework: filesystem (HDFS), processing
framework (MapReduce), distributed NoSQL DB (Hbase), SQL front end
(Hive), parallel data-flow language (Pig), …
Hadoop in ATLAS data management system:
Aggregation and mining of log files – HDFS (3.5TB) and MapReduce
(70min processing at 70 MB/s I/O)
Usage trace mining – HBase (300 insertions/sec, 80 GB/mo,
MapReduced in 2 min)
Migrated from Cassandra in two days
File sharing service – copied to HDFS and served via Apache
Wildcard file metadata search – HDFS and MapReduce
Accounting – Pig data pipeline MapReduces 7GB to 500MB of
output summaries for Apache publication in 8min
Stable, reliable, fast, easy to work with, robust against hardware failure
Complements Oracle by reducing load and expanding mining capability
Future potential: storage system, parallel access to distributed data
Torre Wenaus, BNL
8
Grid’2012 Conference, JINR, Dubna July 18 2012
Data Management and Storage Evolution
Federated storage based on xrootd for easy to use,
homogeneous, single-namespace data access
NFS 4.1 an option in principle but not near ready
Support for HTTP storage endpoints and data access protocol
Wide area direct data access to simplify data management
and support simpler, more efficient workflow
Caching data dynamically and intelligently for efficient storage
usage, minimal latencies, simplified storage managment
Point to point protocols needed: GridFTP, xrootd, HTTP, (S3?)
Without SRM overheads
Adapt FTS managed transfer service to new requirements:
endpoint based configuration, xrootd and HTTP support (FTS3)
On their way out: LFC file catalog, SRM storage interface
(except for archive)
Torre Wenaus, BNL
9
Grid’2012 Conference, JINR, Dubna July 18 2012
Data Placement with Federation
Diagram from the storage & data management TEGs
Torre Wenaus, BNL
10
Grid’2012 Conference, JINR, Dubna July 18 2012
Why xrootd as Basis for Storage Federation?
Xrootd is mature, proven for over a decade,
tuned to HEP requirements
Stable, responsive support
Commonality (Alice, ATLAS, CMS)
Alice: xrootd in production for years
CMS: federation in pre-production
Very efficient at file discovery
ATLAS: federation emerging from R&D
Transparent use of redundant replicas to
overcome file unavailability
Can be used to repair placed data
Works seamlessly with ROOT data formats used
for event data
Close ROOT/xrootd collaboration
Supports efficient WAN data access
Interfaces to hierarchical caches, stores
Uniform interface to back end storage
dCache, DPM, GPFS, HDFS, xrootd…
ATLAS xrootd
federation currently
Good monitoring tools
Now expanding to EU
Torre Wenaus, BNL
11
Grid’2012 Conference, JINR, Dubna July 18 2012
WAN Data Access and Caching
I/O optimization work in ROOT and experiments in recent years makes
WAN data access efficient and viable
Caching at source and destination further improve efficiency
Source: migrate frequently used data to a fast (e.g. SSD) cache
Destination: addressable shared cache reduces latency for jobs
sharing/reusing data and reduces load on source. Requires cache
awareness in brokerage to drive re-use
Asynchronous pre-fetch at the client further improves; WAN latencies
don’t impede processing. Now supported by ROOT
Major simplification to data and workflow management; no more
choreography in moving data to processing or vice versa
Demands ongoing I/O performance analysis and optimization,
knowledge of access patterns (e.g. fraction of file read), I/O monitoring
Collaborative work between ROOT team and experiments
Will benefit from integrating network monitoring into workflow brokerage
Torre Wenaus, BNL
12
Grid’2012 Conference, JINR, Dubna July 18 2012
Operations & Tools
Strong desire for “Computing as a Service” (CaaS) especially at
smaller sites
Easy to install, easy to use, standards-compliant packaged
services managed and sustained with very low manpower
Graceful reactions to fluctuations in load and user behavior
Straightforward and rapid problem diagnosis and remedy
“Cloud is the ultimate CaaS”
Being acted on in ATLAS: cloud-based Tier 3s currently being prototyped
More commonality among tools
Availability monitoring, site monitoring, network monitoring
Carefully evaluate how much middleware has a long term future
Adopt CVMFS for use as a shared software area at all WLCG sites
In production for ATLAS and LHCb, in deployment for CMS
Torre Wenaus, BNL
13
Grid’2012 Conference, JINR, Dubna July 18 2012
CERNVM File System (CVMFS)
Caching HTTP file system
optimized for software
delivery
Efficient: compression over
the wire, duplicate detection
Scalable: works hand in
hand with proxy caches
Originally developed as lightweight distributed file
system for CERNVM virtual machines
Keep the VM footprint small and provision
software transparently via HTTP and FUSE,
caching exactly what you need
Adopted well beyond CERNVM community and very
successful for general software (and other file data)
distribution
ATLAS, CMS, LHCb, Geant4, LHC@Home 2.0, …
Currently 75M objects, 5TB in CERN repositories
Torre Wenaus, BNL
14
ATLAS deployment
Aim is 100% by end 2012
Grid’2012 Conference, JINR, Dubna July 18 2012
Security
No longer any security through obscurity
And lack of obscurity makes security incidents all the
worse (press and publicity)
Need fine-grained traceability to quickly contain and
investigate incidents, prevent recurrence
Bring logging up to standard across all services
Address in the future through private clouds? VM is
isolated and is itself fully traceable
Incorporate identity federations (next slide)
Centralized identity management also enables easy
banning
Torre Wenaus, BNL
15
Grid’2012 Conference, JINR, Dubna July 18 2012
Identity Management
Traditional access
to grid services
Identity management today:
Complex for users, admins, VOs,
developers
ID management is fragmented
and generally non-local
Grid identity not easily used in
other contexts
Federated ID management:
Common trust/policy framework
Federated access Single sign-on
ID managed in one place
to grid services
ID provider manages attributes
Service provider consumes
attributes
Support services besides grid:
collab tools, wikis, mail lists, …
R. Wartel
WLCG Pilot project getting started
Torre Wenaus, BNL
16
Grid’2012 Conference, JINR, Dubna July 18 2012
Workload Management
Pilots, pilots everywhere!
Approach spread from ALICE and LHCb to all experiments
Experiment-level workload management systems successful
No long term need for WMS as middleware component
But is WMS commonality still possible then? Yes…
DIRAC (LHCb) used by other experiments, as is AliEn (ALICE)
PanDA (ATLAS) has new (DOE) support to generalize as an
exascale computing community WMS
ATLAS, CMS, CERN IT collaborating on common analysis
framework drawing on PanDA, glideinWMS (pilot layer)
Pilot submission and management works well on foundation of Condor,
glideinWMS (which is itself a layer over Condor)
Can the CE be simplified? Under active discussion
Torre Wenaus, BNL
17
Grid’2012 Conference, JINR, Dubna July 18 2012
ATLAS/CMS Common Analysis Framework
Proposed Architecture
Workflow engine
Status: CMS able to submit PanDA jobs to the ATLAS PanDA server
Torre Wenaus, BNL
18
Grid’2012 Conference, JINR, Dubna July 18 2012
Workload Management 2
Multi-core and whole node support
Support is pretty much there for simple usage modes – and
simplicity is the aim – in particular whole node scheduling
Extended environmental info (eg. HS06, job lifetime)
Environment variables providing homogeneous information
access across batch system flavors (could have done with this
years ago)
Virtualization and cloud computing
Virtual CE: better support for “any” batch system. Essential.
Virtualization clearly a strong interest for sites, both for service
hosting and WNs
Cloud computing interest/activity levels vary among experiments
ATLAS is well advanced in integrating and testing cloud
computing in real analysis and production workflows, thanks
to robust R&D program since spring 2010
Torre Wenaus, BNL
19
Grid’2012 Conference, JINR, Dubna July 18 2012
Cloud Computing in ATLAS
Several use cases prototyped or in production
PanDA based data processing and workload
management
Centrally managed queues in the cloud
Elastically expand resources
transparently to users
Institute managed Tier 3 analysis clusters
Hosted locally or (more efficiently) at
shared facility, T1 or T2
Personal analysis queues
User managed, low complexity (almost
transparent), transient
Data storage
Transient caching to accelerate cloud
processing
Object storage and archiving in the cloud
Personal PanDA Amazon
analysis site prototype
Torre Wenaus, BNL
20
Concurrent jobs at
Canadian sites
PanDA production in the cloud
Amazon
analysis
cloud status
Personal
pilot
factory
ActiveMQ
Monitoring
EC2 analysis
Grid’2012 Conference, JINR, Dubnacloud
July 18 2012
Networking
Network has been key to the evolution to date, and will be key
also in the future
Dynamic intelligent networks, comprehensive monitoring,
experiment workflow and dataflow systems integrated with
network monitoring and controls
Network-aware brokerage
Recent WLCG agreement to establish a working group on
better integrating networking as an integral component of LHC
computing
LHCONE: progressively coming online to provide entry points
for T1, T2, T3 sites into a dedicated LHC network fabric
Complementing LHCOPN as the network that serves T0-T1
traffic
Torre Wenaus, BNL
21
Grid’2012 Conference, JINR, Dubna July 18 2012
LHCONE and Dynamic Networks
LHCONE initiative established 2010 to
(first and foremost) better support and
integrate T2 networking given the strong
role of T2s in the evolving LHC computing
models
Managed, segregated traffic for
guaranteed performance
Traffic engineering, flow management
to fully utilize resources
Leverage advanced networking
LHCONE services being constructed:
Multipoint, virtual network (logical traffic separation)
Static & dynamic point to point circuits for guaranteed bandwidth, high-throughput
data movement
Comprehensive end-to-end monitoring and diagnostics across all sites, paths
Software Defined Networking (OpenFlow) is probable technology for LHCONE in long
term; under investigation as R&D. Converge on 2 year timescale (end of shutdown)
In the near term, establish a point-to-point dynamic circuits pilot
Torre Wenaus, BNL
22
Grid’2012 Conference, JINR, Dubna July 18 2012
Conclusions
Left out of the focus areas: scalability
Of course it is a principal one, but the success of
LHC computing to date exemplifies that we have a
handle on scalability
Thanks to success to date, the future will be
evolutionary, not revolutionary
For the future, focus on
Technical tools and approaches most key to
having achieved scalability and manageability
New tools and technologies with the greatest
promise to do the same in the future
So ensure sustainability in the face of ever more
challenging requirements and shrinking manpower
On the foundation of powerful networking that we
are still learning to exploit to its fullest
And on the foundation of the Grid, which has served
us well and is evolving with us
Torre Wenaus, BNL
23
Focus for the future:
Leverage the network
Monitoring, diagnostics
Robustness, reliability
Simplicity, ease of use
Support, sustainability
Pilots, Condor
Federation
Oracle + NoSQL
Caching, WAN access
HTTP, RESTful APIs
Virtualization
Commonality
Open standards
Leverage beyond HEP
Farewells to:
LFC, WMS, SRM (mostly)
Grid’2012 Conference, JINR, Dubna July 18 2012
Thanks
This talk drew on presentations, discussions, comments,
input from many
Thanks to all, including those I’ve missed
Artur Barczyk, Doug Benjamin, Ian Bird, Jakob Blomer,
Simone Campana, Dirk Duellmann, Andrej Filipcic, Rob
Gardner, Alexei Klimentov, Mario Lassnig, Fernando
Megino, Dan Van Der Ster, Romain Wartel, …
Torre Wenaus, BNL
24
Grid’2012 Conference, JINR, Dubna July 18 2012
Related Talks at GRID’12
I. Bird (Monday)
A. Klimentov, Distributed computing challenges, HEP computing
beyond the grid (Tuesday)
A. Vaniachine, Advancements in Big Data Processing (Tuesday)
O. Smirnova, Implementation of common technologies in grid
middleware (Tuesday)
A. Tsaregorodtsev, DIRAC middleware for distributed computing
systems (Tuesday)
K. De, Status and evolution of ATLAS workload management system
PanDA (Tuesday)
ATLAS computing workshop (Tuesday)
D. Duellmann, Storage strategy and cloud storage evaluations at
CERN (Wednesday)
Torre Wenaus, BNL
25
Grid’2012 Conference, JINR, Dubna July 18 2012
More Information
WLCG Technical Evolution Groups https://twiki.cern.ch/twiki/bin/view/LCG/WebHome
Frontier talks at CHEP 2012
Comparison to NoSQL, D. Dykstra http://goo.gl/n0fcp
CMS experience, D. Dykstra http://goo.gl/dYovr
ATLAS experience, A. Dewhurst http://goo.gl/OIXsH
NoSQL & Hadoop in ATLAS DDM, M. Lassnig, CHEP 2012 http://goo.gl/0ab4I
Xrootd http://xrootd.slac.stanford.edu/
Next generation WLCG File Transfer Service (FTS), Z. Molnar, CHEP 2012 http://goo.gl/mZWtp
Evolution of ATLAS PanDA System, T. Maeno, CHEP 2012 http://goo.gl/lgDfU
AliEn http://alien2.cern.ch/
DIRAC http://diracgrid.org/
CVMFS http://cernvm.cern.ch/portal/filesystem
Status and Future Perspectives of CVMFS, J. Blomer, CHEP 2012 http://goo.gl/82csr
LHCONE http://lhcone.web.cern.ch/
Torre Wenaus, BNL
26
Grid’2012 Conference, JINR, Dubna July 18 2012
Supplementary
Torre Wenaus, BNL
27
Grid’2012 Conference, JINR, Dubna July 18 2012
Frontier/Squid for Scalable Distributed DB Access
LHC experiments keep time-dependent data on detector conditions, alignment etc in Oracle
at CERN
ATLAS alone runs 120,000+ jobs concurrently around the world. Many of these need
conditions data access
How to provide scalable access to centrally Oracle-resident data?
The answer, first developed by CDF/CMS at FNAL, and adopted for LHC: Frontier
Frontier is a web service that translates DB queries into HTTP
Far fewer round trips between client and DB than using Oracle directly; fast over WAN
Because it is HTTP based, caching web proxies (typically squid) can provide hierarchical,
highly scalable cache based data access
Reuse of data among jobs is high, so caching is effective at off-loading Oracle
CMS Frontier deployment
Torre Wenaus, BNL
28
Grid’2012 Conference, JINR, Dubna July 18 2012