transparencies

Download Report

Transcript transparencies

LCG Accounting
John Gordon
Grid Deployment Board
13th January 2004
[email protected]
Overview
 Aim is to gather accounting information at the job
level from each site and store in a central database
so that arbitrary queries can be done against it.
 Four parts




Schema
Gathering
Sending
Querying
[email protected]
Schema
 Defined our own
 Then discovered GGF Usage Schema
 http://www.psc.edu/~lfm/Grid/UR-WG/
[email protected]
Accounting Schema



















LocalGroup - the group that the job ran as on the system
LocalJobID - the ID of the job on the system on which it ran **
LCGJobID - the job ID assigned by the RB (null for jobs submitted locally or via globus) **
ExecutingSite - the site at which the job ran **
ExecutingCE - the queue from which the job ran
SubmittingRB - the RB used to submit the job
LocalUserID - the ID of the user on the system on which the job ran **
LCGUserDN - the DN of the user within LCG **
LCGUserVO - the virtual organisation of this user for this job **
LocalStartTime - the local wallclock date/time at which the job commenced execution
LocalStopTime - the local wallclock date/time at which the job ceased execution
UTCStartTime - the wallclock date/time at which the job commenced execution in UTC (GMT) **
UTCStopTime - the wallclock date/time at which the job ceased execution in UTC **
ElapsedTime - StopTime-StartTime (calculated seconds)
BaseCpuTime - the actual CPU time recorded by the system on which the job ran (seconds)
BaseCpuPower - the power of the system on which the job ran in SpecInts (needs more precise definition)
SpecIntSecs - the computing power consumed by the job in SpecInt-Seconds (calculated from previous two fields)
**
DiskSpace - the maximum aggregated amount of local disk space used by the job (Mb)
DiskIO - the aggregated volume of disk data read/written by the job (Mb)
[email protected]
Gathering
 PBS doesn’t produce job accounts, only logs events
 Four solutions




RAL processes event logs and populkates db
NIKHEF does similar
pbsacct in SourceForge
Job info available in pbs prolog. Do it direct.
 Still considering
 Assume unix group = vo name
 Several solutions to mapping unix username to DN
 Not yet chosen
 Initial version probably VO only
[email protected]
 LSF will be easier as it has job accounts
 Site ultimately responsible.
 They can filter out certain info (eg non-LHC VOs)
 Or remap groups to Vos
 Or replace the whole thing by calls to their own internal
database.
[email protected]
Sending
 Send schema info over grid to central database
 Considered
 Spitfire
 Develop our own web service
 R-GMA
• R-GMA Archiver is a web-service which takes a row of information and
inserts it in a persistent database
 Chose R-GMA
 Stream producer and general archiver
 Retain records in local Mon to provide resilience against network
failure.
 Sites could also run archivers to keep a copy of their data local.
[email protected]
Querying
 Simple SQL query to start
 Canned queries with plotting
 General queries later
[email protected]
Progress
 Evaluated different components separately
 Now integrating




Our Schema
PBS prolog
Calling R-GMA stream producer
With general archive consuming
 Prototype by 23rd.
[email protected]