Windows - Indico

Download Report

Transcript Windows - Indico

Extension of DIRAC to enable distributed
computing using Windows resources
3rd EGEE User Forum
11-14 February 2008, Clermont-Ferrand
J. Coles, Y. Y. Li, K. Harrison, A. Tsaregorodtsev,
M. A. Parker, V. Lyutsarev
Overview
Why port to Windows and who is involved?
DIRAC overview
Porting process
 Client (job creation/submission)
 Agents (job processing)
 Resources
Successes/usage
 Deployment
Summary
13th Feb 2008
University of Cambridge
2
Motivation
 Aim:
 Enabling Windows computing resources in the LHCb
workload and data management system DIRAC
 Allow what can be done under Linux to be possible under
Windows
 Motivation:
 To increase the number CPU resources available to LHCb for
production and analysis
 To offer a service to Windows users
 Allow transparent job submissions and execution on Linux
and Windows
 Who’s involved:
 Cambridge, Cavendish – Ying Ying Li, Karl Harrison, Andy
Parker
 Marseilles, CPPM - Andrei Tsaregorodtsev (DIRAC Architect)
 Microsoft Research – Vassily Lyutsarev
13th Feb 2008
University of Cambridge
3
DIRAC Overview
 Distributed Infrastructure with
Remote Agent Control
 LHCb’s distributed production
and analysis workload and data
management system
 Written in Python
 4 sections
 Client
 User interface
 Services
 DIRAC Work Management
System, based on the
main Linux server
 Agents
 Resources
 CPU resources and Data
storage
13th Feb 2008
4
DISET security module
 DIRAC Security Transport module – underlying security module of
DIRAC
 Provides grid authentication and encryption (using X509
certificates and grid proxies) between the DIRAC components
 Uses OpenSSL with pyOpenSSL (DIRAC’s modified version)
wrapped around it.
 Standard: Implements Secure Sockets Layer and Transport
Layer Security, and contains cryptographic algorithm.
 Additional: Grid proxy support
 Pre-built OpenSSL and pyOpenSSL libraries are shipped with
DIRAC
 Windows libraries are provided alongside Linux libraries,
allowing appropriate libraries to be loaded at run time
 Proxy generation under Windows
 Multi-platform command: dirac-proxy-init
 Validation of generated proxy is checked under both Windows
and Linux
13th Feb 2008
University of Cambridge
5
Client – job submissions
SoftwarePackages =
{
“DaVinci.v12r15"
};
InputSandbox =
{
“DaVinci.opts”
};
InputData =
{
"LFN:/lhcb/production/DC04/v2/0098000
0/DST/Presel_00980000_00001212.dst"
};
JobName = “DaVinci_1";
Owner = "yingying";
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox =
{
"std.out",
"std.err",
“DaVinci_v12r15.log”
“DVhbook.root”
};
JobType = "user";
JDL
> dirac-job-submit.py
myjob.jdl
Under Windows
13th Feb 2008
 Submissions made with
valid grid proxy
 Three Ways
 JDL (Job Description
Language)
 DIRAC API
 Ganga
 Built on DIRAC API
commands
 Currently under
porting process to
Windows
 Successful job
submission returns job
ID, provided by Job
Monitoring Service
University of Cambridge
import DIRAC
from DIRAC.Client.Dirac import *
dirac = Dirac()
job = Job()
API
job.setApplication(‘DaVinci',
'v12r15')
job.setInputSandbox(['DaVinci.opts’]
)
job.setInputData(['LFN:/lhcb/producti
on/DC04/v2/00980000/DST/Presel_00
980000_00001212.dst'])
job.setOutputSandbox([‘DaVinci_v12
r15.log’, ‘DVhbook.root’])
dirac.submit(job)
> myjob.py
or enter directly
in python under Windows
6
DIRAC Agent under Windows
 Python installation script
 Downloads and installs DIRAC software, and sets up DIRAC Agent
 Agents are initiated on free resources
 Agent Job retrieval:
 Run DIRAC Agent to see if
there are any suitable jobs on
the server.
 Agent retrieves any
matched jobs.
 Agent Reports to Job
Monitoring Service of job
status
 Agent downloads and
installs required
applications to run the job.
 Agent retrieves any
required data.
 (see next slide)
 Agent creates Job Wrapper
to run the job (wrapper
platform aware).
 Upload output to storage if
requested
13th Feb 2008
Linux Sites
University of Cambridge
Windows Sites
7
Data access
 Data access to LHCb’s distributed data storage system requires:
 Access to LFC (LCG File Catalogue, maps LFNs (Logical File Names) to the
PFNs (Physical File Names))
 Access to the Storage Element
 On Windows a catalogue client is provided via the DIRAC portal service
 Uses DIRAC’s security module DISET and a valid user’s grid proxy
 Authenticates to Proxy server, and proxy server contacts File catalogue on user’s
behalf with its own credentials
 Uses .NetGridFTP client 1.5.0 provided by University of Virginia
 Based on GridFTP v1, from tests it seems to be compatible with GridFTP server
used by LHCb (edg uses GridFTP client 1.2.5-1 and globus GT2)
 Client contains functions needed for file transfers
 get, put, mkdir
 And a batch tool that mimics the command flags of globus-url-copy
 Requirements:
 .Net v2.0
 .NetGridFTP binaries are shipped with DIRAC
 Allows full data registration and transfer to any Storage Element supporting
GridFTP
13th Feb 2008
University of Cambridge
8
DIRAC CE backends
 DIRAC provides a variety of Compute Element
backends under Linux:
 Inprocess (standalone machine), LCG, Condor etc…
 Windows:
 Inprocess
 Agent loops in preset intervals assessing the status of the resource
 Microsoft Windows Compute Cluster
 Additional Windows specific CE backend
 Requires one shared installation of DIRAC and applications on
the Head node of the cluster
 Agents are initiated from the Head node, and communicates with
the Compute Cluster Services
 Job outputs are uploaded to the Sandboxes directly from the
worker nodes
13th Feb 2008
University of Cambridge
9
LHCb applications
 Five main LHCb applications (C++ : Gauss, Boole, Brunel, DaVinci
Python: Bender)
Gauss
Event
Generation
Detector
Simulation
Brunel
Reconstruction
Sim
Boole
Digitalisation
RAWmc
MC
Statistics
Production
Job
Analysis Job
13th Feb 2008
DST
RAW
DaVinci
Analysis
Bender
University of Cambridge
Data flow
from
detector
DST
Sim – Simulation data format
RAWmc – RAW Monte Carlo,
equivalent to RAW data format from
detector
10
DST – Data Storage Tape
Gauss
 Most LHCb applications are compiled for both Linux and Windows

For historical reasons, we use Microsoft Visual Studio .Net 2003
 Gauss – only application, previously not compiled under Windows.
 Gauss relies on three major pieces of software not developed by LHCb







Pythia6: simulation of particle production – Legacy Fortran code
EvtGen: Simulation of particle decays – C++
Geant4: Simulation of detector – C++
Gauss needs each of the above to run under Windows
Work strongly supported by LHCb and LCG software teams
All third-party software now successfully built under Windows
Most build errors have resulted from Windows compiler being less tolerant of “risky
coding” than gcc



Insist on arguments passed to function being of correct type
More strict about memory management
Good for forcing code improvements!
 Able to fully build Gauss under Windows with both Generator and Simulation parts
 We are able to produce full Gauss jobs of BBbar events, with comparable distributions to
those produced under Linux
 Have installed and tested Gauss v30r4 on Cambridge cluster
 Latest release of Gauss v30r5

First fully Windows compatible release
 Contains both pre-built GEANT4 and Generator Windows binaries
13th Feb 2008
University of Cambridge
11
Cross-platform job submissions
 Job creation and submission process is the same under both Linux and
Windows (i.e. uses the same DIRAC API commands, and the same steps)
 Two current types of main LHCb grid jobs
 MC Production Jobs – CPU intensive, no input required. Potentially ideal
for ‘CPU scavenging’ jobs
 Recent efforts (Y.Y.Li, K.Harrison) allowed Gauss to compile under Windows (see
previous slide)
 A full MC production chain is still to be demonstrated on Windows
 Analysis Jobs – Requires input (data, private algorithms, etc …)
 DaVinci, Brunel, Boole
 Note: requires C++ compiler for customised user algorithms
 Jobs submitted with libraries are bound to the same platform for processing

Platform requirements can be added during job submission
 Bender (Python)
 Note: no compiler, linker or private library required
 Allows cross-platform analysis jobs to be performed
 Results retrieved to local computer via
>dirac_job_get_output.py 1234 results in the outputsandbox
>dirac-rm-get(LFN) this uses GridFTP to retrieve outputdata from a Grid SE
13th Feb 2008
University of Cambridge
12
DIRAC Widows usage
 DIRAC is supported on two Windows
platforms


Windows XP
Windows Server 2003
 Use of DIRAC to run LHCb physics
analysis under Windows




Comparison between DC04 and DC06
data on B±→D0(Ksπ+π-)K± channel
917,000 DC04 events processed under
Windows, per selection run
~48hours total CPU time on 4 nodes
Further ~200 jobs (totalling ~4.7 million
events) submitted from Windows to
DIRAC, processing on LCG, retrieved
on Windows
 Further selection background studies
are currently being carried out with the
system
 Processing speed comparisons
between Linux and Windows

Difficult, as currently the Windows
binaries are built in debug mode by
default
13th Feb 2008
University of Cambridge
13
DIRAC deployment
Platform
Bristol
Oxford
Birmingham
13th Feb
Number of CPUs
Available
Disk Size
Compute
Element Backend
Windows XP
Professional
Intel® Pentium®
4CPU 2.00GHz
2.00GHz, 504MB of
RAM
4
Windows XP
Professional
Dell Optiplex GX745
Intel® Core™2 CPU
6400 @ 22.13GHz
2.13GHz, 2.99GB of
RAM
2
Windows Server 2003
x64 + Compute Cluster
Pack 2006
AMD Athlon™ 64x2
Dual Core
Processor 4400+
2.21 GHz, 2.00GB of
RAM
4 nodes available,
with a total of
8CPU
Windows XP Tablet
Intel® Pentium® M
processor 2.00GHz
1.99GHz, 512MB of
RAM
2
Windows Server 2003
x64 + Compute Cluster
Pack 2006
Intel® Xeon™ CPU
2.66GHz 2.66GHz,
31.9 GB of RAM
22 nodes available,
with a total of
100CPU
208GB on Mapped
disk
Compute Cluster
Windows Server 2003
Intel® Xeon™ CPU
2.80GHz 2.80GHz,
2.00GB of RAM
2
136GB on local C:
drive
Inprocess
Cambridge
Laptop
Hardware
Windows Server 2003 +
compute Cluster
Pack2006
2008
-
16 machines, 4
core’s each
University of Cambridge
37.2GB on C: drive
Inprocess
Inprocess
Mapped drives can
be linked to
Cambridge HEP
group storage
disks
Compute Cluster
-
-
Inprocess
Compute Cluster
14
Windows wrapping
 Bulk of DIRAC python code was already platform independent
 However not all python modules are platform independent
 Three types of code modifications/additions:
 Platform specific libraries and binaries (e.g. OpenSSL, pyOpenSSL,
.NetGridFTP)
 Additional Windows specific code (e.g. Windows Compute Cluster CE
backend, .bat files to match Linux shell scripts)
 Minor Python code modifications (e.g. changing process forks to threads)
 Dirac installation ~ 60MB
 Per LHCb application ~ 7GB
Windows Specific
6%
Modified for
cross-platform
compatibility
34%
13th Feb 2008
Windows port
modifications by
file size of used
DIRAC code
Unmodified
60%
University of Cambridge
15
Summary
 Working DIRAC v2r11, and able to integrate
both Windows standalone and cluster CPUs
to existing Linux system
 Porting – replacement of Linux specific
python code & provision of windows
equivalents where platform independence
not possible (e.g. pre-compiled libs, secure
file transfers…)
 Windows platforms tested:
 Windows XP
 Windows Server 2003
 Cross-platform job submissions and
retrievals
 Little change to syntax for user
 Full analysis jobs cycle on Windows, from
algorithm development to results analysis.
(BenderRunning(linux)Getting results )
 Continued use for further physics studies
 All applications for MC production jobs tested
 Deployment extended to three site so far,
totalling 100+ Windows CPUs.
 Two Windows Compute Cluster sites
13th Feb 2008
Requirements
Python 2.4
PyWin32 (Windows specific
python module)
Grid Certificate
Future plans:
 Test the full production chain
 Deploy on further
systems/sites e.g. Birmingham
 Larger scale test
 Continued usage for physics
studies
 Provide a useful tool when
LHC data arrives
University of Cambridge
16
Backup slides
13th Feb 2008
University of Cambridge
17
Cross-platform compatibility
Language
Binaries Available
Ganga
Python
-
DIRAC
Python
Linux/Windows
compatible
Gauss
C++
SLC3, SLC4, Win32
Boole
C++
SLC3, SLC4, Win32
Brunel
C++
SLC3, SLC4, Win32
DaVinci
C++
SLC3, SLC4, Win32
Bender
Python
Linux/Windows
compatible
LHCb Applications
13th Feb 2008
University of Cambridge
18
Head Node
DIRAC
Job Matcher
Agent
Job Submission by User
Job Management
Service
Sandbox
Service
DaVinci
Software
Repository
Job Monitoring
Service
Job
Watchdog
WMS
LFC Service
13th Feb 2008
Proxy
Server
University of Cambridge
DISET
Local
19
SE