No Slide Title

Download Report

Transcript No Slide Title

Technical Evolution in LHC Computing
(with an ATLAS slant)
Torre Wenaus
Brookhaven National Laboratory
July 18, 2012
GRID’2012 Conference
JINR, Dubna
Fabiola Gianotti,
ATLAS Spokesperson
Higgs search seminar
July 4 2012, CERN
Torre Wenaus, BNL
2
Grid’2012 Conference, JINR, Dubna July 18 2012
This Talk
 LHC computing – an outstanding success!
 Experiments, Grids, Facilities, Middleware providers, CERN IT
collaborated closely and delivered
 Where do we go from here? Drawing on what we’ve learned,
and the technical evolution around us, what tools and
technologies will best enable continued success?
 A hot topic as we approach a long shutdown 2013-14
 Spring 2011: ATLAS established R&D programs across key
technical areas
 Fall 2011: WLCG established technical evolution groups
(TEGs) across similar areas
 This talk draws on both in surveying LHC computing technical
evolution activities and plans
 Subjectively, not comprehensively
Torre Wenaus, BNL
3
Grid’2012 Conference, JINR, Dubna July 18 2012
Computing Model Evolution in ATLAS
Originally:
Static, strict hierarchy
Multi-hop data flows
Lesser demands on
Tier 2 networking
Virtue of simplicity
Today:
Flatter, more fluid, mesh-like
Sites contribute according to capability
Greater flexibility and efficiency
More fully utilize available resources
Principal enabler for the evolution:
the network
Excellent bandwidth, robustness,
reliability, affordability
Torre Wenaus, BNL
4
Grid’2012 Conference, JINR, Dubna July 18 2012
Requirements 2013+
In Light of Experience and Practical Constraints
 Scaling up: processing and storage growth with higher
luminosity, energy, trigger rate, analysis complexity, …
 With ever increasing premium on efficiency and full
utilization as resources get tighter and tighter
 Greater resiliency against site problems, especially in
storage/data access, the most common failure mode
 Effectively exploit new computer architectures, dynamic
networking technologies, emergent technologies for large
scale processing (NoSQL, clouds, whatever’s next…)
 Maximal commonality: common tools, avoid duplicative efforts,
consolidate where possible. Efficiency of cost and manpower
 All developments must respect continuity of operations; life
goes on even in shutdowns
Torre Wenaus, BNL
5
Grid’2012 Conference, JINR, Dubna July 18 2012
WLCG Technical Evolution Groups (TEGs)
 TEGs established Fall 2011 in the areas of databases, data
management and storage, operations, security, and workload
management
 Mandate: survey present practice, lessons learned, future
needs, technology evolution landscape, and recommend a 2-5
year technical evolution strategy
 Consistent with continuous smooth operations
 Premium on commonality
 Reports last winter, follow-on activities recently defined
 Here follows a non-comprehensive survey across the technical
areas of
 Principal TEG findings (relevant to technology evolution)
and other technical developments/trends
 Some ATLAS examples of technical evolution in action
Torre Wenaus, BNL
6
Grid’2012 Conference, JINR, Dubna July 18 2012
Databases
 Great success with CORAL generic C++ interface to back end DBs
(Oracle, Frontier, MySQL, …). Will ensure support, together with the
overlaid COOL conditions DB
 Conditions databases evolved early away from direct reliance on
RDBMS. Highly scalable thanks to Frontier web service interfacing
Oracle to distributed clients served by proxy caches via a RESTful
http protocol
 Establishing Frontier/Squid as a full WLCG service
 Oracle as the big iron DB will not change; use ancillary services like
Frontier, NoSQL to offload and protect it
 NoSQL: R&D and prototyping has led to convergence on Hadoop as
a NoSQL ‘standard’ at least for the moment
 CERN supporting a small instance now in pre-production
 ATLAS is consolidating NoSQL apps around Hadoop
Torre Wenaus, BNL
7
Grid’2012 Conference, JINR, Dubna July 18 2012
Hadoop in ATLAS
 Distributed data processing framework: filesystem (HDFS), processing
framework (MapReduce), distributed NoSQL DB (Hbase), SQL front end
(Hive), parallel data-flow language (Pig), …
 Hadoop in ATLAS data management system:
 Aggregation and mining of log files – HDFS (3.5TB) and MapReduce
(70min processing at 70 MB/s I/O)
 Usage trace mining – HBase (300 insertions/sec, 80 GB/mo,
MapReduced in 2 min)
 Migrated from Cassandra in two days
 File sharing service – copied to HDFS and served via Apache
 Wildcard file metadata search – HDFS and MapReduce
 Accounting – Pig data pipeline MapReduces 7GB to 500MB of
output summaries for Apache publication in 8min
 Stable, reliable, fast, easy to work with, robust against hardware failure
 Complements Oracle by reducing load and expanding mining capability
 Future potential: storage system, parallel access to distributed data
Torre Wenaus, BNL
8
Grid’2012 Conference, JINR, Dubna July 18 2012
Data Management and Storage Evolution
 Federated storage based on xrootd for easy to use,
homogeneous, single-namespace data access
 NFS 4.1 an option in principle but not near ready
 Support for HTTP storage endpoints and data access protocol
 Wide area direct data access to simplify data management
and support simpler, more efficient workflow
 Caching data dynamically and intelligently for efficient storage
usage, minimal latencies, simplified storage managment
 Point to point protocols needed: GridFTP, xrootd, HTTP, (S3?)
 Without SRM overheads
 Adapt FTS managed transfer service to new requirements:
endpoint based configuration, xrootd and HTTP support (FTS3)
 On their way out: LFC file catalog, SRM storage interface
(except for archive)
Torre Wenaus, BNL
9
Grid’2012 Conference, JINR, Dubna July 18 2012
Data Placement with Federation
Diagram from the storage & data management TEGs
Torre Wenaus, BNL
10
Grid’2012 Conference, JINR, Dubna July 18 2012
Why xrootd as Basis for Storage Federation?










Xrootd is mature, proven for over a decade,
tuned to HEP requirements
Stable, responsive support
Commonality (Alice, ATLAS, CMS)
Alice: xrootd in production for years
CMS: federation in pre-production
Very efficient at file discovery
ATLAS: federation emerging from R&D
Transparent use of redundant replicas to
overcome file unavailability
 Can be used to repair placed data
Works seamlessly with ROOT data formats used
for event data
 Close ROOT/xrootd collaboration
Supports efficient WAN data access
Interfaces to hierarchical caches, stores
Uniform interface to back end storage
 dCache, DPM, GPFS, HDFS, xrootd…
ATLAS xrootd
federation currently
Good monitoring tools
Now expanding to EU
Torre Wenaus, BNL
11
Grid’2012 Conference, JINR, Dubna July 18 2012
WAN Data Access and Caching
 I/O optimization work in ROOT and experiments in recent years makes
WAN data access efficient and viable
 Caching at source and destination further improve efficiency
 Source: migrate frequently used data to a fast (e.g. SSD) cache
 Destination: addressable shared cache reduces latency for jobs
sharing/reusing data and reduces load on source. Requires cache
awareness in brokerage to drive re-use
 Asynchronous pre-fetch at the client further improves; WAN latencies
don’t impede processing. Now supported by ROOT
 Major simplification to data and workflow management; no more
choreography in moving data to processing or vice versa
 Demands ongoing I/O performance analysis and optimization,
knowledge of access patterns (e.g. fraction of file read), I/O monitoring
 Collaborative work between ROOT team and experiments
 Will benefit from integrating network monitoring into workflow brokerage
Torre Wenaus, BNL
12
Grid’2012 Conference, JINR, Dubna July 18 2012
Operations & Tools
 Strong desire for “Computing as a Service” (CaaS) especially at
smaller sites
 Easy to install, easy to use, standards-compliant packaged
services managed and sustained with very low manpower
 Graceful reactions to fluctuations in load and user behavior
 Straightforward and rapid problem diagnosis and remedy
 “Cloud is the ultimate CaaS”

Being acted on in ATLAS: cloud-based Tier 3s currently being prototyped
 More commonality among tools
 Availability monitoring, site monitoring, network monitoring
 Carefully evaluate how much middleware has a long term future
 Adopt CVMFS for use as a shared software area at all WLCG sites
 In production for ATLAS and LHCb, in deployment for CMS
Torre Wenaus, BNL
13
Grid’2012 Conference, JINR, Dubna July 18 2012
CERNVM File System (CVMFS)
Caching HTTP file system
optimized for software
delivery
Efficient: compression over
the wire, duplicate detection
Scalable: works hand in
hand with proxy caches



Originally developed as lightweight distributed file
system for CERNVM virtual machines
 Keep the VM footprint small and provision
software transparently via HTTP and FUSE,
caching exactly what you need
Adopted well beyond CERNVM community and very
successful for general software (and other file data)
distribution
 ATLAS, CMS, LHCb, Geant4, LHC@Home 2.0, …
Currently 75M objects, 5TB in CERN repositories
Torre Wenaus, BNL
14
ATLAS deployment
Aim is 100% by end 2012
Grid’2012 Conference, JINR, Dubna July 18 2012
Security
 No longer any security through obscurity
 And lack of obscurity makes security incidents all the
worse (press and publicity)
 Need fine-grained traceability to quickly contain and
investigate incidents, prevent recurrence
 Bring logging up to standard across all services
 Address in the future through private clouds? VM is
isolated and is itself fully traceable
 Incorporate identity federations (next slide)
 Centralized identity management also enables easy
banning
Torre Wenaus, BNL
15
Grid’2012 Conference, JINR, Dubna July 18 2012
Identity Management
Traditional access
to grid services
Identity management today:
 Complex for users, admins, VOs,
developers
 ID management is fragmented
and generally non-local
 Grid identity not easily used in
other contexts
Federated ID management:
Common trust/policy framework
Federated access Single sign-on
ID managed in one place
to grid services
ID provider manages attributes
Service provider consumes
attributes
Support services besides grid:
collab tools, wikis, mail lists, …
R. Wartel
WLCG Pilot project getting started
Torre Wenaus, BNL
16
Grid’2012 Conference, JINR, Dubna July 18 2012
Workload Management
 Pilots, pilots everywhere!
 Approach spread from ALICE and LHCb to all experiments
 Experiment-level workload management systems successful
 No long term need for WMS as middleware component
 But is WMS commonality still possible then? Yes…
 DIRAC (LHCb) used by other experiments, as is AliEn (ALICE)
 PanDA (ATLAS) has new (DOE) support to generalize as an
exascale computing community WMS
 ATLAS, CMS, CERN IT collaborating on common analysis
framework drawing on PanDA, glideinWMS (pilot layer)
 Pilot submission and management works well on foundation of Condor,
glideinWMS (which is itself a layer over Condor)
 Can the CE be simplified? Under active discussion
Torre Wenaus, BNL
17
Grid’2012 Conference, JINR, Dubna July 18 2012
ATLAS/CMS Common Analysis Framework
Proposed Architecture
Workflow engine
Status: CMS able to submit PanDA jobs to the ATLAS PanDA server
Torre Wenaus, BNL
18
Grid’2012 Conference, JINR, Dubna July 18 2012
Workload Management 2
 Multi-core and whole node support
 Support is pretty much there for simple usage modes – and
simplicity is the aim – in particular whole node scheduling
 Extended environmental info (eg. HS06, job lifetime)
 Environment variables providing homogeneous information
access across batch system flavors (could have done with this
years ago)
 Virtualization and cloud computing
 Virtual CE: better support for “any” batch system. Essential.
 Virtualization clearly a strong interest for sites, both for service
hosting and WNs
 Cloud computing interest/activity levels vary among experiments
 ATLAS is well advanced in integrating and testing cloud
computing in real analysis and production workflows, thanks
to robust R&D program since spring 2010
Torre Wenaus, BNL
19
Grid’2012 Conference, JINR, Dubna July 18 2012
Cloud Computing in ATLAS



Several use cases prototyped or in production
PanDA based data processing and workload
management
 Centrally managed queues in the cloud
 Elastically expand resources
transparently to users
 Institute managed Tier 3 analysis clusters
 Hosted locally or (more efficiently) at
shared facility, T1 or T2
 Personal analysis queues
 User managed, low complexity (almost
transparent), transient
Data storage
 Transient caching to accelerate cloud
processing
 Object storage and archiving in the cloud
Personal PanDA Amazon
analysis site prototype
Torre Wenaus, BNL
20
Concurrent jobs at
Canadian sites
PanDA production in the cloud
Amazon
analysis
cloud status
Personal
pilot
factory
ActiveMQ
Monitoring
EC2 analysis
Grid’2012 Conference, JINR, Dubnacloud
July 18 2012
Networking
 Network has been key to the evolution to date, and will be key
also in the future
 Dynamic intelligent networks, comprehensive monitoring,
experiment workflow and dataflow systems integrated with
network monitoring and controls
 Network-aware brokerage
 Recent WLCG agreement to establish a working group on
better integrating networking as an integral component of LHC
computing
 LHCONE: progressively coming online to provide entry points
for T1, T2, T3 sites into a dedicated LHC network fabric
 Complementing LHCOPN as the network that serves T0-T1
traffic
Torre Wenaus, BNL
21
Grid’2012 Conference, JINR, Dubna July 18 2012
LHCONE and Dynamic Networks
LHCONE initiative established 2010 to
(first and foremost) better support and
integrate T2 networking given the strong
role of T2s in the evolving LHC computing
models
 Managed, segregated traffic for
guaranteed performance
 Traffic engineering, flow management
to fully utilize resources
 Leverage advanced networking



LHCONE services being constructed:
 Multipoint, virtual network (logical traffic separation)
 Static & dynamic point to point circuits for guaranteed bandwidth, high-throughput
data movement
 Comprehensive end-to-end monitoring and diagnostics across all sites, paths
Software Defined Networking (OpenFlow) is probable technology for LHCONE in long
term; under investigation as R&D. Converge on 2 year timescale (end of shutdown)
In the near term, establish a point-to-point dynamic circuits pilot
Torre Wenaus, BNL
22
Grid’2012 Conference, JINR, Dubna July 18 2012
Conclusions






Left out of the focus areas: scalability
Of course it is a principal one, but the success of
LHC computing to date exemplifies that we have a
handle on scalability
 Thanks to success to date, the future will be
evolutionary, not revolutionary
For the future, focus on
 Technical tools and approaches most key to
having achieved scalability and manageability
 New tools and technologies with the greatest
promise to do the same in the future
So ensure sustainability in the face of ever more
challenging requirements and shrinking manpower
On the foundation of powerful networking that we
are still learning to exploit to its fullest
And on the foundation of the Grid, which has served
us well and is evolving with us
Torre Wenaus, BNL
23
Focus for the future:
 Leverage the network
 Monitoring, diagnostics
 Robustness, reliability
 Simplicity, ease of use
 Support, sustainability
 Pilots, Condor
 Federation
 Oracle + NoSQL
 Caching, WAN access
 HTTP, RESTful APIs
 Virtualization
 Commonality
 Open standards
 Leverage beyond HEP
Farewells to:
 LFC, WMS, SRM (mostly)
Grid’2012 Conference, JINR, Dubna July 18 2012
Thanks
 This talk drew on presentations, discussions, comments,
input from many
 Thanks to all, including those I’ve missed
 Artur Barczyk, Doug Benjamin, Ian Bird, Jakob Blomer,
Simone Campana, Dirk Duellmann, Andrej Filipcic, Rob
Gardner, Alexei Klimentov, Mario Lassnig, Fernando
Megino, Dan Van Der Ster, Romain Wartel, …
Torre Wenaus, BNL
24
Grid’2012 Conference, JINR, Dubna July 18 2012
Related Talks at GRID’12
 I. Bird (Monday)
 A. Klimentov, Distributed computing challenges, HEP computing
beyond the grid (Tuesday)
 A. Vaniachine, Advancements in Big Data Processing (Tuesday)
 O. Smirnova, Implementation of common technologies in grid
middleware (Tuesday)
 A. Tsaregorodtsev, DIRAC middleware for distributed computing
systems (Tuesday)
 K. De, Status and evolution of ATLAS workload management system
PanDA (Tuesday)
 ATLAS computing workshop (Tuesday)
 D. Duellmann, Storage strategy and cloud storage evaluations at
CERN (Wednesday)
Torre Wenaus, BNL
25
Grid’2012 Conference, JINR, Dubna July 18 2012
More Information











WLCG Technical Evolution Groups https://twiki.cern.ch/twiki/bin/view/LCG/WebHome
Frontier talks at CHEP 2012

Comparison to NoSQL, D. Dykstra http://goo.gl/n0fcp

CMS experience, D. Dykstra http://goo.gl/dYovr

ATLAS experience, A. Dewhurst http://goo.gl/OIXsH
NoSQL & Hadoop in ATLAS DDM, M. Lassnig, CHEP 2012 http://goo.gl/0ab4I
Xrootd http://xrootd.slac.stanford.edu/
Next generation WLCG File Transfer Service (FTS), Z. Molnar, CHEP 2012 http://goo.gl/mZWtp
Evolution of ATLAS PanDA System, T. Maeno, CHEP 2012 http://goo.gl/lgDfU
AliEn http://alien2.cern.ch/
DIRAC http://diracgrid.org/
CVMFS http://cernvm.cern.ch/portal/filesystem
Status and Future Perspectives of CVMFS, J. Blomer, CHEP 2012 http://goo.gl/82csr
LHCONE http://lhcone.web.cern.ch/
Torre Wenaus, BNL
26
Grid’2012 Conference, JINR, Dubna July 18 2012
Supplementary
Torre Wenaus, BNL
27
Grid’2012 Conference, JINR, Dubna July 18 2012
Frontier/Squid for Scalable Distributed DB Access







LHC experiments keep time-dependent data on detector conditions, alignment etc in Oracle
at CERN
ATLAS alone runs 120,000+ jobs concurrently around the world. Many of these need
conditions data access
How to provide scalable access to centrally Oracle-resident data?
The answer, first developed by CDF/CMS at FNAL, and adopted for LHC: Frontier
Frontier is a web service that translates DB queries into HTTP

Far fewer round trips between client and DB than using Oracle directly; fast over WAN
Because it is HTTP based, caching web proxies (typically squid) can provide hierarchical,
highly scalable cache based data access
Reuse of data among jobs is high, so caching is effective at off-loading Oracle
CMS Frontier deployment
Torre Wenaus, BNL
28
Grid’2012 Conference, JINR, Dubna July 18 2012