transparencies

Download Report

Transcript transparencies

DIRAC Job submission
A.Tsaregorodtsev,
CPPM, Marseille
LHCb-ATLAS GANGA Workshop, 21 April 2004
1
DIRAC Job structure



Job consists of one or more
steps; can be split in subjobs
corresponding to steps.
Step consists of one or more
modules; is indivisible w.r.t to
the job scheduling;
Module is the least unit of the
execution:


Use standard modules in
production;
User defined modules in
analysis can be used.
Production Job
Gauss Step
SoftwareInstallation module
GaussApplication module
BookkeepingUpdate module
Boole+Brunel Step
SoftwareInstallation module
BooleApplication module
BrunelApplication module
BookkeepingUpdate module
2
Job class diagram
TypedParameter
n
Job
n
TypedParameter n
Step
n
TypedParameter n
Shell
ApplicationFactory
Module
Scriptlet
Script
Application
3
Job object structure

Three-level structure as a result of quite a lot
of experimentation:
 Need
for very complex workflows for production
jobs;
 Can be reduced to a single step, single module
job for analysis

Simple Job Class structure allows for easy
formalization:
example – simple XML schema for persistent
representation
 For
4
Job object contract (subset)










Job(‘job.xml’) - constructor from XML file
addParameter()
addOption()
- just a special type parameter
addStep()
getPackages() – packages required by all the steps
inputData()
outputData()
execute() - recursively executes all the steps, modules
toXML() – save as XML file
toJDL() – generate JDL file suitable for submission
5
Job API example
module=Module()
module.addParameter(‘NAME’,’PARAM’,’DaVinci’)
module.addParameter(‘TYPE’,’PARAM’,’APPLICATION’)
module.addParameter(‘VERSION’,’PARAM’,’@{DAVINCI_VERSION}’)
opt = ‘={DATAFILE=”@{INPUT_FILE}” TYP=“ROOT”}’
module.addOption(‘EventSelector’,’Input’,opt)
step = Step()
step.addModule(module)
step.addParameter(‘INPUT_FILE’,’INPUTDATA’,
’lfn:/lhcb/production/DST/xxxx.dst’)
job = Job()
job.addStep(step)
job.addParameter(‘DAVINCI_VERSION’,’PARAM’,’v12r3’)
job.addParameter(‘MaxCPUTime’,’JDL’,’100000’)
job.addHeader(‘Author’,’atsareg’)
xml_string = job.toXML()
jdl_string = job.toJDL()
6
Steps

Main purpose - container of modules
 Execute
modules;
 Handle module failures;
 Reporting job progress;


Steps are indivisible from the Workload
management perspective;
Steps can be dependant one on each other
 Job
object can represent a DAG of arbitrary
complexity
7
Module types

The actual work is done in Modules:
 Shell
• Any shell script either preinstalled or provided as a
Module parameter;
 Scriptlet
• Python code executed within the job process
 Script
• Python class with an execute() method – Functor;
• Either shipped with the job or preinstalled.
 Application
• More involved Python script for invoking a (Gaudi)
application
• Preinstalled as part of the DIRAC software
8
Module parameters


Modules have access to the parameters of
the containing Step and to the parameters of
the Job;
Module parameters can be defined in terms
of the Step or Job parameters:
 Allows
to define and reuse very complex
workflows (as templates with parameter
placeholders);
 Just few parameters on the Job level should be
defined in order to instantiate the workflow.
9
Job further enhancements


Possibility to generate complete job steering
code from interconnected modules
Advantages
 Job
object execution does not depend on the local
DIRAC software installation
 More flexibility when connecting steps:
can pass complex objects from step to step

Already used inside the DIRAC Production
Console
 Generating
simpler XML job representation to
submit to DIRAC WMS
10
Workflow definition

Workflow is an object of the same Job class;


Possibly not fully defined
Dirac Console provides a graphical interface to
assemble workflows of any complexity from standard
building blocks (modules) :


Very useful for production workflow definitions;
May be too powerful ( for the moment ) for a user job.
11
Job Submission: UI

First, the DIRAC UI should be installed:




Single script installation;
Requires only local python interpreter;
Will be installed on lxplus for common use.
Job submission CLI:

dirac-job-submit <jdl>;
• Returns jobID (number);

dirac-job-status <jobID>
• Returns python dictionary;

dirac-job-get-output <jobID>
• Retrieves OutputSandbox to the local directory

The Python API is provided as well
12
Job Submission: API

Job submission API to be used in the user
scripts or UI applications ( GANGA )
example with simplified Job definition API
from Dirac import *
dirac = Dirac()
job=Job()
job.setApplication('DaVinci','v12r11')
job.setOption(‘DV_Pi0Calibr.opts’)
job.setInputSandbox([‘DV_Pi0Calibr.opts’,'Application_DaVinci_v12r11/lib'])
job.setInputData(['/lhcb/production/DC04/v2/DST/00000743_00008447_9.dst’,
‘/lhcb/production/DC04/v2/DST/00000743_00008479_9.dst’])
job.setOutputSandbox(['DVHistos.root','DVNtuples.root'])
jobid = dirac.submit(job)
print "Job ID = ",jobid
13
Example JDL file

JDL file is generated behind the scene to be sent to
DIRAC WMS

Similar to LCG JDL
Requirements = ( member("Lyon_HPSS",other.LocalSE) ) &&
( member(“CERN_Castor",other.LocalSE) );
Executable = "$LHCBPRODROOT/DIRAC/scripts/jobexec";
Arguments = "jobDescription.xml";
JobName = "DaVinci_1";
SoftwarePackages = { "DaVinci.v12r11" };
JobType = "user";
Owner = "ibelyaev";
InputSandbox = { "lib.tar.gz", "jobDescription.xml", "DV_Pi0Calibr.opts" };
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox = { "std.out", "std.err", "pi0calibr.hbook", "pi0histos.hbook" };
InputData = { "LFN:/lhcb/production/DC04/v2/DST/00000743_00008447_9.dst",
"LFN:/lhcb/production/DC04/v2/DST/00000743_00008479_9.dst”};
OutputData = { "/lhcb/test/DaVinci/v1r0/LOG/DaVinci_v12r11.alog" };
parameters = [ STEPS = "1"; STEP_1_NAME = "0_0_1" ];
ProductionId = "00000000";
JobId = 1347
14
Job execution sequence
User interface
Job JDL
Job wrapper

Remarks
handler can be the
same both with WMSlike and direct job
submission
 Job wrapper does not
depend on the CE
back-end
CE
handler
 CE
• Depends on the VO
execution environment
 VO software;
 VO services (DB’s,
monitoring, etc);
DIRAC
WMS
Job JDL
Site
Agent
Job wrapper
Job wrapper
CE
handler
CE
Batch system
WN
Job wrapper
WN
…
WN
Job wrapper
VO B execution
environmentVO C execution
VO A execution
environment
environment
15
Job Wrapper

The wrapper script is what is submitted to CE
and in fact executed on the Worker Node:
 Automatically
generated by the WMS ( Agent );
 Sets up the environment for the job in place.

Wraps the executable provided by the user:
 Brings
in input data;
 Brings in InputSandbox;
 Invokes user analysis executable;
 Reports the job status
 Uploads resulting output data
16
Job Wrapper

Wrapper uses external ( standard ) services
via plug-ins:
 Configuration
Service
 File Catalog
• Plug-ins for BK, Alien, LFC catalogs
 Data moving with Replica Manager
• Plug-ins for gridftp, xxxtp, file, rfio accessible storages
 Job status/parameters reporting
• Plug-ins for DIRAC monitoring, MonALISA
• E.g. providing R-GMA plug-in would be easy
17
Job Wrapper

Same wrappers are used on all the DIRAC
CE back-ends
 If
necessary back-end specific wrappers can be
generated by the specific CE clients.

Other VO execution environments can be
adapted by providing VO specific plug-ins
18
DIRAC CE

CE interface (back-end):
 submitJob(wrapper,jdl,batchname=None)
• Returns localJobID
 getJobStatus(localJobID)
 killJob(localJobID)
 getDynamicInfo()
• Info on numbers of queuing/running jobs

Other useful functions ( missing ):
 getTimeLeft(localJobID)
 getLimits:
• Scratch, memory, spool, etc
19
DIRAC CE

Implementation
 Special
class used by the agents running closely
to the actual CE;
 Can be easily transformed into a client-server pair
for remote job submission:
• Example: ComputingElementEDG was done this way.
• No changes to the Agent code
Agent
Agent
CE
CE
client
Batch
System
CE
server
Batch
System
20