Databases & Object Persistency
Download
Report
Transcript Databases & Object Persistency
POOL Development
Status and Plans
K. Karr, D. Malon, A. Vaniachine (Argonne National Laboratory)
R. Chytracek, D. Duellmann, M. Frank, M. Girone, G. Govi, J. Moscicki, I. Papadopoulos, H. Schmuecker(CERN)
Z. Xie (Princeton University )
T. Barrass (University of Bristol)
C. Cioffi (University of Oxford)
W. Tanenbaum (Fermi National Accelerator Laboratory)
CHEP 2004, Interlaken, Switzerland
CHEP 2004, POOL Development Status & Plans
The LCG Persistency Framework
• The LCG persistency framework project consists of two parts
– Common project with CERN IT and strong experiment involvement
• POOL
– Hybrid object persistency integration object streaming (ROOT I/O) with
Relational Database technology
– Established baseline for three LHC experiments
– Has been successfully integrated into the software frameworks of ATLAS,
CMS and LHCb
• See also G. Govi’s talk (382)
– Being successfully deployed in three large scale data challenges
• See also M. Girone’s talk (383)
• Conditions Database
– Conditions DB was moved into the scope of the LCG project
• To consolidate different independent developments
– Should share storage of complex objects into Root I/O and RDBMS
backend with POOL
• See the talks of A. Valassi (447) and A. Amorim (262) about this work
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
2
POOL Project Evolution
• POOL is entering its third year of active development
– During the last 2 years we managed to follow the proposed work plan and
met the rather aggressive schedule to move POOL into the experiment
production
– This year POOL has been proven in the LCG data challenges with volumes
~400TB
• Changing from pure development mode to support, deployment and
maintenance
– Several developers moved their effort into experiment integration or back-end
services
• This is healthy move and insures proper coupling between software and
deployment!
• Affects the available development manpower
– Task profile changing from design and debugging to user support and reengineering
• Need to maintain stable and focused manpower from CERN and the
experiments
– This close contact has made POOL a successful project
– Both Experiments and CERN have confirmed their commitment to the project
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
3
Development Focus This Year
•
Move to ROOT4 (POOL2.0 Line)
– To take advantage of automatic schema evolution and simplified streaming of STL
containers
• Need to insure backward compatibility for POOL 1.x files
– Currently undergoing validation by the experiments
• Will release two branches until POOL 2 is fully certified
•
File Catalog deployment issues
– DC productions showed some weaknesses of grid catalog implementations
• Several new/enhanced catalogs coming up
• Changes in the experiment computing models need to be taken into account
– POOL tries to generalise from specific implementations and provides an open
interface to accommodate upcoming components
•
Collections
– Several implementations of POOL collections exist
– Collection cataloguing has been added in response to experiment requests
• Similar to file catalogs
• re-use of catalog implementation and commandline tools
– Experiment analysis models are still being concretized
– Expect experience from concrete analysis challenges
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
4
Why a Relational Abstraction Layer (RAL)?
• Goal: Vendor independence for the relational components of POOL,
ConditionsDB and user code
– Continuation of the component architecture as defined in the LCG
Blueprint
– File catalog, collections and object storage run against all available
RDBMS plug-ins
• To reduced code maintenance effort
– All RDBMS client components can use all supported back-ends
– Bug fixes can be applied once centrally
• To minimise risk of vendor binding
– Allows to add new RDBMS flavours later or use them in parallel and are
picked up by all RDBMS clients
– RDBMS market is still in flux..
• To address the problem of distributing data in RDBMS of different
flavours
– Common mapping of application code to tables simplifies distribution of RDBMS
data in a generic application independent way
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
5
Relational Access functionality
• Database Schema Access and Manipulation
– Describing existing and creating new tables
– Support for primary, foreign keys and indices
• Formed by one or more table columns
• Data Manipulation Language
– Insertion, update and deletion of table rows
– Bulk insertions to minimise database server roundtrips
• Queries
–
–
–
–
Nested queries involving one or more tables
Ordering and limiting the result set
Control of client cache for the result set
Database cursors
• scalable iteration through large query results
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
6
Domain Decomposition
• Pure relational data management
– Provide technology neutral RDBMS connectivity
– Encapsulate main differences eg table creation options
– Direct clients: File catalog, Collections and Object relational
mapping
• Object-relational mapping and storage
– Bridges the differences between relational and object world (object
identity resolution, object associations)
– Provide guided object storage
– Direct client: POOL Relational Storage Service
• POOL Relational Storage Service
– Adapter implementing the POOL StorageSvc interfaces
– Direct client: experiment framework
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
7
Software design
Experiment framework
FileCatalog
Collection
StorageSvc
RelationalStorageSvc
Relational
Catalog
Relational
Collection
ObjectRelationalAccess
RelationalAccess
MySQL
Oracle
Seal reflection
SQLite
uses
implements
Implementation
Abstract interface
Technology dependent plugin
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
8
Relational Access Layer Design
• Interface and implementation design driven by software
requirement document
– Co-authored by main users and POOL developers
• Simple key-value pair interface (AttributeList) used for the
handling and the description of the relational data
• Clean standard C++ interface
– No special SQL types exposed for data elements
– Type converter responsible for default and user-defined type
conversion between C++ and SQL data types
– Can take advantage of vendor specific SQL type extensions
• Exposed SQL fragments are used only in SQL WHERE
clauses
– Most non standard SQL extensions (eg in create table) are well
encapsulated
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
9
RDBMS plug-ins in POOL
• Oracle 9i/10g
–
–
–
–
Based on OCI
Supports Oracle instant client
Fully supports the POOL RAL interfaces
Available for the Linux platforms (win32 will follow)
• SQLite
– A light-weight embeddable SQL database engine
– File-based (zero configuration, administration)
– Available for the Linux and Win32 platforms
• MySQL
– Implementation based on the MyODBC driver
– Prototype released with POOL 1.8
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
10
Object to Relational Mapping
• How to map classes ↔ tables ?
– Both C++ and SQL allow to describe data layout
– But with very different constraints/aims
• no single unique mapping
• Need for fast object navigation an unique Object
identity (persistent address)
– requires unique index for addressable objects
– part of mapping definition
• POOL stores mapping with the object data
– need to store mapping versions
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
12
A Mapping Example
class A {
int x;
float y;
std::vector<double> v;
class B {
int i;
std::string s;
} b;
};
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
13
A Mapping Example
T_A
p.k.
f.k. constraint
ID
1
X
10
Y B_I
B_S
1.4 3
“Hello”
2
22
2.2
3
.
.
.
.
T_A_V
ID
POS
V
1
1
0.12
“Hi”
1
2
12.2
.
1
3
4.1
1
4
5.452
2
1
32.1
2
2
0.1
2
3
0.1
This is only one of the possible
mappings!
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
14
Mapping Elements
• A complete mapping consists of
– A mapping version per object
– A hierarchical tree of mapping elements per version
• Each mapping element contains
– Element type (“Object”, “Primitive”, “Array”, “POOL reference”,
“Pointer”)
– Database table and column names
– C++ member name and type
– Lower level associated mapping elements
• POOL stores these persistently in 3 (hidden) relational
tables
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
15
Generating a Mapping..
•
Two use cases need to be supported
1) Starting from existing table schema and data
• Give access to RDBMS data with minimal changes to existing data
• POOL generates default header and mapping from the DB schema
2) Starting from existing C++ header file
• Implement existing class with minimal changes to user C++code
• POOL generates default DB schema and mapping from the LCG
dictionary entry
•
In both cases the user can override a default mapping via an xml
steering file
–
–
• Select the C++ classes which are mapped
• Override default mapping rules (eg member names and types)
• Define the mapping version
Mapping then gets “materialized” - eg stored in the database with a
command line tool
Need to support copies and
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
16
POOL Summary
• The LCG POOL project provides a hybrid store integrating
object streaming (Root I/O) with RDBMS technology
(Oracle/MySQL/SQLight)
– POOL has been integrated into LHC experiments software frameworks
and is use for the pre-production activities in CMS
– Successfully deployed as baseline persistency mechanism for CMS,
ATLAS and LHCb at the scale of ~400TB
• POOL continues the LCG component approach by abstracting
relational database access in a vendor neutral way
– POOL Relational Abstraction has been released and is being picked up
by several experiments
– Minimised risk of vendor binding, simplified maintenance and data
distribution are the main motivations
• POOL as a project is (slowly) migrating to a support and
maintenance phase
– Need keep remaining manpower focused in order to finish remaining
developments and to provide relevant support to user community
CHEP 2004, POOL Development Status & Plans
D.Duellmann, CERN
17