keep strong points, work on weaker ones - LHCb Computing

Download Report

Transcript keep strong points, work on weaker ones - LHCb Computing

Grid services based architectures
Some buzz words, sorry…
• Growing consensus that Grid services is the right
concept for building the computing grids;
• Recent ARDA work has provoked quite a lot of
interest:
– In experiments;
– SC2 GTA group;
– EGEE
• Personal opinion - this the right concept that arrived at
the right moment:
– Experiments need practical systems;
– EDG is not capable to provide one;
– Need for pragmatic, scalable solution without having to
start from scratch.
1
Tentative ARDA architecture
1:
Job
Provenance
Information
Service
Auditing
2:
Authentication
3:
API
Authorisation
User
Interface
6:
4:
Accounting
Metadata
Catalogue
DB Proxy
14:
5:
13:
File
Catalogue
7:
10:
Workload
Management
9:
Package
Manager
Data
Management
11:
Discuss more these parts
in the following
15:
Storage
Element
Grid
Monitoring
12:
8:
Computing
Element
Job
Monitor
2
Metadata catalogue (Bookkeeping database) (1)
LHCb Bookkeeping:
 Very flexible schema ;
 Storing objects (jobs, qualities, others ?) ;
 Available as a services (XML-RPC interface) ;
 Basic schema is not efficient for generic queries:
 need to build predefined views (nightly ?) ;
 views fit the query by the web form,
what about generic queries ?
 data is not available immediately after
production ;
Needs further development, thinking, searching …
3
Metadata catalogue (Bookkeeping database) (2)
Possible evolution: keep strong points, work
on weaker ones
• Introduce hierarchical structure
– HEPCAL recommendation for eventual DMC ;
– AliEn experience ;
• Better study sharing parameters between the
job and file objects ;
• Other possible ideas.
This is a critical area – worth investigation !
But… (see next slide)
4
Metadata catalogue (Bookkeeping database) (3)
• Man power problems :
– For development, but also for maintenance ;
• We need the best possible solution :
– Evaluate other solutions:
• AliEn, DMC eventually ;
• Contribute to the DMC development ;
– LHCb Bookkeeping as a standard service :
• Replaceable if necessary ;
• Fair test of other solutions.
5
Metadata catalogue (Bookkeeping database) (4)
• Some work has started in Marseille:
– AliEn FileCatalogue installed and populated by the
information from the LHCb Bookkeeping ;
– Some query efficiencies measurements done ;
– The results are not yet conclusive :
• Clearly fast if the search in the hierarchy of directories;
• Not so fast if more tags are included in the query;
• Very poor machine used in CPPM – not fair to compare to the
CERN Oracle server.
– The work started to provide single interface to both
AliEn FileCatalogue and LHCb Bookkeeping ;
• How to continue:
– CERN group – possibilities to contribute ?
– CPPM group will continue to follow this line, but
resources are limited ;
– Collaboration with other projects is essential;
6
File Catalogue (Replica database) (1)
• The LHCb Bookkeeping was not conceived with
the replica management in mind – added later ;
• File Catalog needed for many purposes:
–
–
–
–
–
Data;
Software distribution;
Temporary files (job logs, stdout, stderr, etc);
Input/Output sandboxes ;
Etc, etc
• Absolutely necessary for DC2004;
• File Catalog must provide controlled access to its
data (private group, user directories) ;
In fact we need a full analogue of a distributed file
system
7
File Catalogue (Replica database) (2)
• We should look around for possible solutions :
– Existing ones (AliEn, RLS) :
• Will have Grid services wrapping soon ;
• Will comply with the ARDA architecture eventually;
• Large development teams behind (RLS, EGEE ?)
• This should be coupled with the whole range of the
data management tools:
– Browsers;
– Data transfers, both scheduled and on demand;
– I/O API (POOL, user interface).
This is a huge enterprise, and we should rely on
using one of the available systems
8
File Catalogue (Replica database) (3)
• Suggestion is to start with the deployment of the
AliEn FileCatalogue and data management tools:
– Partly done;
– Pythonify the AliEn API:
• This will allow developing GANGA and other application
plugins;
• Should be easy as the C++ API (almost) exist.
– Should be interfaced with the DIRAC workload
management (see below);
– Who ? CPPM group, others are very welcome;
– Where ? Install the server at CERN.
• Follow the evolution of the File Catalogue Grid
services (RLS team will not yield easily !);
This is a huge enterprise, we should rely on using
one of the available systems
9
Workload management (1)
• The present production service is OK for the
simulation production tasks ;
• We need more:
– Data Reprocessing in production (planned);
– User analysis (spurious);
– Flexible policies:
• Quotas;
• Accounting;
– Flexible job optimizations (splitting, input prefetching,
output merging, etc) ;
– Flexible job preparation (UI) tools ;
– Various job monitors (web portals, GANGA plugins,
report generators, etc);
– Job interactivity;
– …
10
Workload management (2)
•
•
Possibilities to choose from:
1. Develop the existing service ;
2. Use another existing service ;
3. Start developing the new one.
Suggestion – a mixture of all of choices:
–
–
Start developing the new workload management
service using existing agents based infrastructure and
borrowing some ideas from the AliEn workload
management:
•
•
•
Already started actually (V. Garonne);
First prototype expected next week;
Will also try OGSI wrapper for it (Ian Stokes-Rees);
Keep the existing service as jobs provider for the new
one.
11
Workload management architecture
GANGA
Workload management
Job
Receiver
Optimizer
11
Optimizer
Optimizer 1
Production
service
Command
line UI
Job queue
Job DB
Match
Maker
Site
Agent 1
CE 1
Agent 2
CE 2
Agent 3
CE 3
12
Workload management (3)
• Technology:
–
–
–
–
–
–
–
JDL job description;
Condor Classad library for matchmaking;
MySQL for Job DB and Job Queues;
SOAP (OGSI) external interface;
SOAP and/or Jabber internal interfaces;
Python as development language;
Linux as deployment platform.
• Dependencies:
– File catalog and Data management tools:
• Input/Output sandboxes;
– CE
• DIRAC CE ;
• EDG CE wrapper.
13
Conclusions
• Most experiment dependant services are to be
developed within the DIRAC project:
– MetaCatalog (Job Metadata Catalog);
– Workload management (with experiment specific policies
and optimizations);
– Can be eventually our contribution to the common pool
of services.
• Get other services from the emerging Grid services
market:
– Security/Authentication/Authorization, FileCatalog,
DataMgmt, SE, CE, Information,…
• Aim at having DC2004 done with the new (ARDA)
services based architecture
– Should be ready for deployment January 2004
14