transparencies

Download Report

Transcript transparencies

EGEE Kickoff, Cork, Ireland, 19 April 2004
www.eu-egee.org
D0 Experiences (or
not) on LCG-2
J. A. Templon
Undecided (NIKHEF)
EGEE is a project funded by the European Union under contract IST-2003-508833
Contents
• What do we want to do on LCG-2 (EDG++)?
• What do we have to do to get there?
• Other assorted remarks
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 2
Information I intend to transfer
•
•
•
•
How did we get the data into the EDG?
What did we do to reprocess it?
How did we get it back out?
Problems and plans for future
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 3
How we got the data into EDG
Replica
Location
Service
SAM Station
EDG Storage Element “classic”
EDG UI machine
NFS Mounts
•D0 dist (mcc)
•mc_runjob
•python 2.1
•pyxml
Back-end RAID disk array
•d0rcpy_m.nn.q
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 4
Generic launcher script
• D0 core software is double wrapped
• Submissions are generated by python script; for each:
• d0job.sh is submitted; args:
 version string for d0rcpy util package
 name (LFN) of data file to be reproc’d
 location to store output
• d0job.sh uses RLS to pick up corr. version of d0rc
python utils
• untar d0rc py utils, launch (another) python script
• d0job.sh responsible only for the following:
 Show up on WN
 Get d0/EDG sw and install
 Pass typical run-time parameters
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 5
Python script
• Contains all the grid stuff. Don’t modify D0 SW unless
absolutely necessary!
 Remove a few of the many duplicate system libs
 Change a few of the env vars, linker (py) options, etc.
• Takes care of
 Setting up d0 environment
 Getting data files
 Publishing status and diagnostics
 Run repro
 Basic checking
 Store output & register in EDG RLS
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 6
Step by step
• Publishes “start” record in db (later)
• Makes temporary dir (use $TMPDIR if def’d)
• Gets d0 tarballs (mcc, mc_runjob, python, pyxml) &
•
•
•
•
•
•
install
Uses m4 preproc to insert instance-specific info
(filename) in job-control macro
Publishes “start processing” record
Runs d0
Tar up output files, transfer to SE, register
Publish end record & die (return to shell script)
Shell script erases all trace of job on disk
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 7
Data back into SAM
When project finished, I made a list of all file names
corresponding to files and e-mailed them to Willem van
Leeuwen
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 8
monitoring
• From within python script:
 worker_node = socket.getfqdn()
site = worker_node[string.find(worker_node,'.')+1:]
jstabl.set_val('site',site)
jstabl.set_val('start_time',start_time)
cmdline = string.join(sys.argv)
jstabl.set_val('command',cmdline)
jstabl.insert()
• Under the hood: R-GMA (EDG product)
• Can easily replace as long as don’t require more than
“set_val” and “insert” … R-GMA has SQL like structure
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 9
Unpleasant EDG stuff
•
•
•
•
•
Single Storage Machine => bottleneck
“WP5” SEs
R-GMA stability
Software distribution reliable but inefficient
Poor submission command throughput
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 10
What about LCG-2?
• No RLS services for D0 => no data management
• No standard R-GMA distribution => no monitoring
• No standard mechanism for getting new VOs into LCG-2 or
EGEE
• D0 is “accepted” but that is a long way from “supported”!
EGEE Kickoff, Cork, Ireland, 19 April 2004 - 11