Conditions Database mySQL Implementation Status Report

Download Report

Transcript Conditions Database mySQL Implementation Status Report

Experience with the Open Source
based implementation for ATLAS
Conditions Data Management
System
A.Amorim, J.Lima, C.Oliveira, L.Pedro, N.Barros
ATLAS-DAQ LISBON COLLABORATION
CHEP 2003
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Outline
Goals
Conditions Data, what is it?
Open Source Choice
Runtime environment
Design and Implementation
Performance tests
Conclusions
Jorge Lima - FCUL
CHEP 2003
21 March 2003
GOALS
●
Study the feasibility of a Database Management
System for Conditions Data based in an Open Source
RDBMS
●
●
●
Using only standard SQL features
Keep implementation dependent optimizations in a self
contained set of classes to improve portability
Provide example implementation and evaluate them
●
MySQL based
●
Postgres based
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Conditions Data
●
What is Conditions data?
●
●
●
●
The conditions data reflect the conditions in which the experiment was
performed and the actual physics data was taken.
Calibration and alignment; robustness (DCS); detector description.
Slowly evolving data associated with the experiment besides the event
data itself.
The Conditions DB
●
Data management system for storing and retrieving conditions data
●
Hides details of the underlying implementation or implementations
●
Exposes a commonly agreed data model
●
Provides one or several C++ API for application programmers and a set
of administration tools
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Why Open Source?
●
The panorama ten years ago
●
Reliable DBMS were only available from vendors like ORACLE.
●
RDBMS expertise was concentrated.
●
●
A common way of storing physics data was directly using the file
system.
The situation evolved dramatically in the last years
●
●
●
Robust, high performance, open source implementations of RDBMS
appeared.
A growing community of programmers has a reasonable
understanding of DB technologies.
Open Source RDBMS are being used very successfully in many fields
and the HEP community should not loose this opportunity.
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Why Open Source?
●
Open Source also offers the following
advantages
●
●
Low cost (frequently free of charge)
Code availability allows fine tuning and specific
optimizations when necessary.
●
Available for most common platforms.
●
Users can build and test their own setup
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Runtime environment
LVL1
ROD
ROS/ROB
LVL2
EF
ConfDB
Online
Offline
Online Interface
DCS
Clib/Align/Rob DB
Offline Analysis
Promp Recon
Calibrating node/process
Client
Provider
Sub-system
Control
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Starting point
●
●
●
The legacy from CERN IT
●
API specification (and example implementation in Objectivity).
●
The data model (some similarities with the model for the Conditions Data
found in the BABAR experiment).
Nice but far from complete.
●
No knowledge whatsoever about the object format (BLOBs)
●
Insufficient tag mechanism (only HEAD could be tagged – potential data loss)
There are new challenges posed by the ATLAS complex triggering system
●
Many clients (both online and offline) must be able access data efficiently
●
They must understand (know how to read) each others data.
●
Objects with different granularities (how to deal efficiently with extreme
cases?)
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Implementation
●
The usual development cycle…
●
●
●
●
Devise a relational database schema to cope with the data model.
Devise clustering/replication model to cope with efficiency and
scalability.
Code it and evaluate its performance.
On the other hand, we had to understand in what extent the
current Interface Specification was appropriate:
●
●
Does it fulfil the requirements collected so far amongst the people
working on the several detectors and subsystems?
If not, we’ll need to extend or redesign the Interface…
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Data Model
●
●
Three different variation axis
●
Type (supported by the folder concept)
●
Interval of validity
●
Revision
Tags for grouping collections of objects
Jorge Lima - FCUL
CHEP 2003
21 March 2003
File system like hierarchical structure
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Database schema
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Clustering Model
●
●
Similar to Oracle’s Table Spaces
●
Optimized for the conditions data problem
●
Reconfigurable as the system grows
The DB server doesn’t know nothing about it
●
●
●
The model doesn’t rely on special features of the
underlying technology.
Can be used with many different backends in a
heterogeneous environment.
Replication can also be used
●
Mainly in the top level databases which can became a
bottleneck
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Tagging mechanism
version
Examples: insertion; tagging; retrieving
2-
Object 2
1+
Object 1
1-
HEAD
2+
version
time
HEAD
341-
Object 4
4+
Object 1
Object 3 3+
2Object 2
1+
2+
TAG1
time
Browse objects in HEAD
4
1
3
2
Browse objects in tag TAG1
1
Jorge Lima - FCUL
2
CHEP 2003
21 March 2003
Layered software layout
Extentions to the API
CondDB API (specs)
Upper layer - Deals with the data
Upper Layer
Postgres
Devel. version
MySQL
Stable version
MySQL client library
MySQL server
MyISAM
Jorge Lima - FCUL
model and clustering issues
Bottom layer implementation specific
Postgress client library
Some features found on
the Postgress server are
Postgress server being investigated
InnoDB
CHEP 2003
21 March 2003
What’s wrong?...
From the requirements collected so far by Luis Pedro we can
state that:
●
●
●
●
●
●
Objects with different granularities and different data rates must be stored and
retrieved efficiently
To cope with the object granularity problem efficiently extensions to the API
specification are required.
The object schema must be stored within the database and should be accessible
to the users
Data is presently stored as a BLOB and the current API specification doesn’t
provide means to do it differently.
The tagging and versioning mechanisms are not adequate and can lead to data
loss, specially in multi-client environments.
Only head can be tagged, and the version or revision is meaningless to the user.
Should be possible group objects in more flexible ways. What if someone insert
a new object, thus changing the head while tagging is in progress?
Jorge Lima - FCUL
CHEP 2003
21 March 2003
…What’s wrong?
●
●
●
We need a more flexible tagging mechanism
●
Probably its enough to allow tagging based in some other criteria. For
instance, based in insertion time
●
Version is useless. What about labelling objects at insertion time for later
referencing?
Current model is inappropriate for DCS objects
●
The model is not efficient dealing with many and small objects.
●
DCS objects don’t need revision.
Storing data only as BLOBs will force users to implement their
own solutions to store object schema.
●
Some will use XML data (online); some will use the Athena Store Gate services
(offline); others plain integers.
●
Data will not be interchangeable.
Jorge Lima - FCUL
CHEP 2003
21 March 2003
We need to revise the API
API extensions can solve the problem…
●
Extend the tagging mechanism to support other tagging criteria.
●
Bind object schema to the folder at folder creation time.
●
●
Support for a limited number of built in object types. Basic types like
integers, doubles as well as arrays of integers or doubles
●
Allow extension to user defined types??
Use the most efficient internal representation for a given object type.
●
All types shall support the same interface, though with different degrees of
effectiveness (orthogonality).
●
This effectively means some sort of embedded conversion service.
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Results from performance tests
Tests details
Client Computers
Intel PIII 1GHz dual processor / Linux 2.4.18-18.7.x.cernsmp / 376 MByte RAM /
on-line software release 00-18-01 / gcc 2.96 /Gigabit network between the clients
Server Computer
Intel PIV 2GHz / Linux 2.4.18 / 1 GByte RAM / gcc 2.95.4 / MySQL Distrib 3.23.49 /
Network?????
All tests were performed using the on-line software infrastructure for
synchronization
Each client (controller) establishes 3 connections to the database server when
performing a test
Databases sizes:
100 objects - 37.8KByte
1.000 objects - 180.1 KByte
10.000 objects - 1.6 MByte
100.000 objects - 15.4 MByte
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Retrieve performance (Single-DB)
Read times
(All controllers reading from the same DB)
00:57:36
00:50:24
00:43:12
Time
00:36:00
10 Controllers
00:28:48
50 Controllers
00:21:36
00:14:24
00:07:12
00:00:00
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Nº of objects
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Retrieve performance (multi-DB)
Read times
(Each controller reads from a different DB)
00:23:02
00:20:10
00:17:17
Time
00:14:24
10 Controllers
20 Controllers
00:11:31
50 Controllers
80 Controllers
100 Controllers
00:08:38
00:05:46
00:02:53
00:00:00
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Nº of objects
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Comparisons (single client)
Oracle (local)
createFolderx
storeDatax
Oracle (remote)
MySQL (remote)
0m15.173s
0m0.034s
0m11.857s
0m0.072s
10
0m4.973s
0m1.447s
0m5.434s
0m1.127s
100
0m9.749s
0m1.368s
0m15.820s
0m2.345s
10.000
9m22.103s
0m23.175s
11m12.554s
0m56.929s
100.000
109m40.878s
3m49.184s
04m13.283s
8m16.510s
1.000.000
readDatax
MySQL (local)
23m12.563s
40m48.256s
10
0m0.324s
0m0.025s
0m0.955s
0m0.058s
100
0m1.403s
0m0.050s
0m3.061s
0m0.135s
10.000
2m1.919s
0m2.851s
4m37.315s
0m7.926s
100.000
25m46.423s
0m27.273s
46m53.846s
1m18.850s
1.000.000
Jorge Lima - FCUL
4m40.315s
CHEP 2003
10m26.124s
21 March 2003
Conclusions
●
The feasibility of an Open Source based RDBMS for the ConditionsDB was
proven to be possible, furthermore:
●
●
●
●
●
The clustering model devised, which should scale well over data volume and
time, can be used with almost any kind of RDBMS backend.
Porting between different backend implementations doesn’t constitute a major
effort as verified while porting the original implementation to Postgress.
The MySQL implementation outperformed the Oracle’s based one by a factor of
50 on the most critical operations while being slightly slower in store
operations.
Further tests are needed, mainly in multi server and many clients
environments.
Side effects…
●
An important effort for collecting requirements was deployed and several
lacking features were identified.
●
The community is adopting the MySQL ConditionsDB. (they can easily play
around with it).
Jorge Lima - FCUL
CHEP 2003
21 March 2003
Where to fetch the latest release
●
Software project for the MySQL's Conditions Database
implementation, available at the ATLAS offline software repository.
●
ATLAS offline software repository
CVSROOT=:kserver:atlas-sw.cern.ch:/atlascvs
Packages: offline/Database/ConditionsDBMySQL
offline/Database/ConditionsDBTests
offline/Database/IConditionsDB
●
The same package and an another test implementation based in
Postgres are available from FCUL to compile without CMT.
●
Available through CVS and FTP
ftp://kdataserv.fis.fc.ul.pt/pub/Software
CVSROOT=:ext:kdataserv.fis.fc.ul.pt:/usr/local/cvsroot
CVS_RSH=ssh
Packages: ConditionsDB-MySQL, ConditionsDB-PgSQL
Jorge Lima - FCUL
CHEP 2003
21 March 2003