LCG POOL Development Status and Production Experience

Download Report

Transcript LCG POOL Development Status and Production Experience

LCG POOL Development Status and
Production Experience
Project outlook
Production experiences
New developments
Conclusions
Giacomo Govi
CERN IT/DB
On behalf of POOL project
The POOL Project
• Pool Of persistent Objects for LHC
– develops a framework for the persistency
– started in April 2002 in the context of LHC Computing Grid (LCG)
Application Area
– Joint project of the LHC experiments and the CERN IT/DB group
• Purpose: storage and retrieval of the experiment data and
associated meta data in a distributed, Grid enabled environment
– Event data, physics and detector simulation,
– Detector data and bookkeeping data
– Metadata
• This challenge is faced by a hybrid technology approach
– C++ object streaming technology for bulk data
• Using ROOT framework
– Transactional safe services for catalogs, collections and meta data
• Using RDBMS systems such as Oracle, MySQL, …
POOL architecture
• Storage technology neutral API
• Built from SW components
– Implement pure abstract C++ interfaces
• Experiment framework user code is insulated from
concrete implementation details and technologies
– Expose minimal dependencies
• Weak coupling ensured by interactions only via their
abstract interfaces
– Loaded on demand
• Using plug-in management and component model
POOL domains
• Storage Manager
– Streams transient C++ objects into/from a storage
– Resolves a logical object reference into a physical object
• File Catalog
– Maintains the information about POOL accessible data files
– Resolves a logical reference into a physical data source
– Helps the Storage Manager to resolve the physical location
of the data
• Collections
– Provides the tools to manage potentially (large) ensembles
of objects stored via POOL persistence services
• Explicit: server-side selection of object from query able
collections
• Implicit: defined by physical containment of the objects
Components Interaction
POOL API
Storage Service
FileCatalog
Collections
ROOT I/O
Storage Svc
XML
Catalog
Explicit
Collection
RDBMS
Storage Svc
MySQL
Catalog
Implicit
Collection
Relational
Catalog
EDG Replica
Location Service
POOL in production
POOL has been adopted by three experiments
– Integrated in ATLAS, CMS, LHCb offline frameworks
– Limited impact thanks to the close collaboration
Intensive usage in production (Data Challenges)
– Root-based object streaming for Event data
– File Catalogue in XML, MySQL and EDG/RLS
– Collections in various usage patterns
Positive experience gained
– No major POOL-related problems
– Data volume stored = ~400 Tb!
Development Focus This Year
Enable RDBMS technology
• For pure relational data management
– Provide technology neutral RDBMS connectivity for the
relational components of POOL and user code
– Provide middleware for data distribution in RDBMS
• For Object storage in relational tables
– Implementing the POOL StorageSvc interfaces (object
streaming)
– Accommodating existing table schema and data (Online)
– Storage of condition-like objects (Offline)
Upgrade/enhance current features
• Move to ROOT4
• File Catalog, Collection issues
Relational Abstraction Layer (RAL)
Functionality
•
Database Schema Access and Manipulation
– Describing existing and creating new tables
– Support for primary, foreign keys and indices
•
Data Manipulation
– Insertion, update and deletion of table rows
– Bulk insertions to minimise database server roundtrips
– SQL free API
•
Queries
– Nested queries involving one or more tables
– Ordering and limiting the result set
– Database cursors
Current Plug-ins
•
•
•
Oracle 9i/10g
SQLite
MySQL/ODBC
POOL relational layer
POOL::FileCatalog
POOL::Collection
RelationalCatalog
RelationalCollection
POOL::StorageService
RelationalAccess
OracleAccess
SQLiteAccess
MySQLAccess
ODBCAccess
ObjectRelationalAccess
RelationalStorageSvc
Object streaming to Relational DB
•
How to map classes ↔ tables ?
–
•
Objects need an unique identifier (persistent
address)
–
–
–
•
C++ and SQL describe data layout with very different
constraints/aims
allows fast navigation
requires unique index for addressable objects
part of mapping definition
Mapping has to be stored with the object data
–
more mapping versions may be needed
Summary
• The LCG POOL project provides a hybrid store
integrating object streaming (Root I/O) with RDBMS
technology (Oracle/MySQL/SQLight)
– POOL has been integrated into the experiment software
framework by CMS, ATLAS and LHCb
– Successfully deployed as baseline persistency mechanism
and used in production activities at the scale of ~400TB
• POOL continues the LCG component approach by
abstracting relational database access in a vendor
neutral way
– Relational Abstraction has been released and is being
picked up by several experiments
– Relational Storage Service for object persistency in relational
tables is being developed