CHEP12_Poster_COMA

Download Report

Transcript CHEP12_Poster_COMA

Conditions and configuration metadata
for the ATLAS experiment
E J Gallas1, S Albrand2, J Fulachier2, F Lambert2, K E Pachal1, J C L Tseng1, Q Zhang3
1. Introduction to COMA
2. The AMI Framework Task Server
• The COMA System (Conditions/Configuration Metadata for ATLAS), has
been developed to make globally important run-level metadata more
readily accessible. It is based on a relational database storing directly
extracted, refined, reduced, and derived information from system-specific
data sources as well as information from non-database sources.
• This information facilitates a variety of unique dynamic interfaces and
provides information to enhance the functionality of other systems.
• COMA is one of the 3 dedicated metadata repositories in ATLAS fitting
nicely between the AMI and TAG databases, which store metadata at the
dataset and event-levels, respectively. A generalization of the data
sources of each of these metadata repositories is shown below:
ProdSys
Tier 0
TAG Catalog
DDM
AMI DB
COMA DB
Conditions
Trigger
TAG
files
TAG DB
“other”
• COMA data sources include:
– Conditions database: A wide variety of configuration information
and measured conditions at the Run and Luminosity Block (LB) levels.
An ATLAS Run is a interval of data taking (generally for many hours)
with a fixed configuration, with selected configurations allowed to
change at the sub-Run (or LB) level.
– Trigger database: Trigger specific information not readily
accessible or available via the Conditions database.
– Tier-0 and TAG Catalog databases: Information about the
processing of Runs and for the filtering of Runs into COMA: Which
Runs are of “analysis” interest and available in the TAG database.
– AMI database: AMI and COMA systems work symbiotically to
store a variety of information about collections of Runs and their
processing, making the data available to both systems. For
Montecarlo datasets, COMA gets keys from AMI to identify trigger
configurations in the Trigger Database used for MC simulation.
– Other: Information from a variety of non-database sources:
TWiki and other documentation, text and xml files, human entry.
• Data is entered by a set of specialized tasks controlled by the task server.
• Information is available from different sources in a chaotic way.
• The task server imposes time-sharing.
– One cannot allow an sudden peak in production tasks finishing to
allow a backlog of input from Tier0 to develop.
– Little and very often is best.
• Some tasks also must store a "stop point" which is usually the data source
time stamp of the time when the last AMI read was successful.
3. Overview of the COMA Schema:
Speed of
treating
Active
MQ
messages
• T0M (Tier 0 Management) determines which datasets go to AMI.
• AMI looks every 60 seconds.
4. Production System : Timestamp mechanism
• Read everything greater than lastUpdateTime and TaskNumber.
– Reader (AMI) must decide what is relevant.
– "Secure programming" = be ready for surprises.
•.
5. DDM : Publish/Subscribe
• Registration & Deletion of data using Active MQ/Stomp
"Publish/subscribe" protocol.
• Very reliable, a few problems when the production of messages has a
peak.
6. COMA : Symbiosis
• AMI and COMA "think" they are part of the same application. Parts of
COMA were rendered "AMI Compliant"
• COMA has benefited from the AMI infrastructure, in particular pyAMI, the
web service client. AMI writes some aggregated quantities in the COMA
DB.
• AMI has benefited from the access to Conditions Data.
7. Is AMI loading scalable?
8. Some Examples of Derived Quantities
• Insertion in AMI is longer than pure SQL insert operations on a database.
– Many coherence checks,
– derivation of quantities etc.
• Although almost all the time we have spare capacity we have observed
backlogs from time to time – usually with massive numbers of finished
production jobs arriving within a short period.
• Some obvious optimisations still not attempted.
• Scalable in medium term.
• Complete dataset provenance. (T0 & ProdDB)
• Number of files and events available in the dataset is updated every time
fresh information arrives. (T0, ProdDB, DDM)
• Production Status. (T0, ProdDB, DDM)
• Average, min and max cross section recorded for the simulated,
transported down the production chain. (ProdDB)
• Lost luminosity blocks in reprocessed data. (T0, ProdDB, DDM)
• Run period reprocessing errors (ProdDB & COMA)
• Datasets in run periods. (COMA)
(1)
Laboratoire de Physique Subatomique et Corpusculaire,
(2)
Department of Physics,
Oxford University,
Denys Wilkinson Building, Keble Road,
Oxford OX1 3RH, UNITED KINGDOM
Université Joseph Fourier Grenoble 1,
CNRS/IN2P3, INPG,
53 avenue des Martyrs,
38026, Grenoble, FRANCE
(3)
Argonne National Laboratory,
High Energy Physics Division,
Building 360, 9700 S. Cass Avenue,
Argonne - IL 60439, United States of America