CDF UK MINI-GRID

Download Report

Transcript CDF UK MINI-GRID

CDF-UK
MINI-GRID
Ian McArthur
Oxford University, Physics Department
[email protected]
3rd Nov 2000
HEPiX/HEPNT 2000
1
Background


CDF collaborators in the UK applied for JIF grant
for IT equipment in 1998. Awarded £1.67M in
summer 2000.
First half of grant will buy
– Multiprocessor systems plus 1TB of disk for 4
Universities
– 2 multiprocessors plus 2.5 TB of disk for RAL
– A 32 CPU farm for RAL
– 5 TB of disk and 8 high end workstations for FNAL


Emphasis on high IO throughput ‘superworkstations’.
A dedicated network link from London to FNAL
3rd Nov 2000
HEPiX/HEPNT 2000
2
CDF-UK Equipment Bid
3rd Nov 2000
HEPiX/HEPNT 2000
3
Hardware and Network


Tender document is written and schedule is on
target for equipment delivery in May 2001.
Second phase starts June 2002
Developed a scheme for transparent access to
CDF systems via the US link.
– Each system CDF-UK requires to use the link has an
alternative IP name and address to allow the data to be
sent down the dedicated link.
– A Network Address Translation scheme ensures that
return traffic takes the same path (symmetric routing)
– Demonstrated the scheme working with 2 Cisco routers
on a local network.
– Starting to talk to network providers to implement
physical link.
– Must try to make Kerberos work across this link
3rd Nov 2000
HEPiX/HEPNT 2000
4
3rd Nov 2000
HEPiX/HEPNT 2000
5
Software Project

JIF proposal only covered hardware but in the
meantime GRID has arrived !
 Aim to provide a scheme to allow efficient use of
the new equipment and other distributed
resources.
 Concentrate on solving real-user issues.
 Develop an architecture for locating data, data
transfer and job submission within a distributed
environment
 Based on the GRID architecture initially on top of
the Globus toolkit. Gives us experience in this
rapidly developing field.
3rd Nov 2000
HEPiX/HEPNT 2000
6
Some Requirements

Want an efficient environment: so automate
routine tasks as much as possible
 With few resources available must make best use
of the existing packages and require few or no
modifications to existing software.
 To make best use of the systems available:
– data may need to be moved to where these is available
CPU,
– or a job may need to be submitted to a remote site to
avoid moving the data.

Produce a simple but useful system ASAP.
3rd Nov 2000
HEPiX/HEPNT 2000
7
Design principles



All sites are equal
All sites hold meta-data describing only local data
Use LDAP to publish meta-data kept in:
– Oracle - at FNAL
– msql - at most other places
• may go to MySQL
– Can introduce caching but keep it simple at first


Use local intelligence at each end of data transfer
– allows us to take account of local
idiosyncrasies e.g. use of near-line storage, disk
space management
Use existing Disk Inventory Manager
3rd Nov 2000
HEPiX/HEPNT 2000
8
CDF Data

Dataset: a primary dataset contains all the
processed data from a specific physics channel.
– Secondary datasets by event selection
– Datasets will grow over time as more data is taken and data
continues to be processed.

Fileset: smallest collection of data which can be
requested from the data handling system. At
Fermilab, a fileset is mapped to a single partition on
a tape and contains a few files.

File: A member of a fileset. The smallest unit of data
known to a filesystem, typically 1GB.
Metadata: Stores relationships between files, filesets
and datasets, run conditions, luminosity etc.

3rd Nov 2000
HEPiX/HEPNT 2000
9
Data Location/Copy
3rd Nov 2000
HEPiX/HEPNT 2000
10
Layers
User
Interface
Job
Submission
Dataset
maintainer
Data
locator
...
Data
copier
Globus toolkit
3rd Nov 2000
HEPiX/HEPNT 2000
11
Functionality at a site

A mechanism to allow jobs from participating sites
to be run.
 Publication of the local metadata
 Publication of information about other system
resources (CPU, Disk, Batch queues etc).
 Transmission of data via network.
– This may involve staging of data from tape to disk before
transmission.

Receive data from the network or from tapes.

Copy or construct metadata

Some sites may have reduced functionality
3rd Nov 2000
HEPiX/HEPNT 2000
12
Scope

Plan to install at
– 4 UK universities (Glasgow, Liverpool, Oxford,
UCL)
– RAL
– FNAL (although this would be reduced
functionality, data and metadata exporter)
– More non-UK sites could be included

Intend to have basic utilities in place at
time of equipment installation (May 2001)
3rd Nov 2000
HEPiX/HEPNT 2000
13
Work so far



Project plan under development – once finished
additional resources will be requested.
Globus installed at a number of sites. Remote
execution of shell commands checked.
Some bits demonstrated:
– LDAP to Oracle via Python script
• Python convenient scripting language for the job
• May use a daemon to hold connection to ORACLE
• LDAP only implement search - and even this is quite tricky
because your script should support filter, base and scope.
• LDAP schema will not reflect full SQL schema but just
what is needed.
– Java to LDAP (via JNDI)
• JNDI (Java Naming and Directory Interface) gives very
elegant interface to LDAP
3rd Nov 2000
HEPiX/HEPNT 2000
14
Longer Term Goals


User Interface to be implemented as Java
application to give platform independence.
UI to automate or suggest strategies for moving
data/submitting jobs
– Need to include cost/elapsed time estimates for task
completion
– Need to look up dataset sizes, network health, time to
copy from tape or disk, cpu load etc.



Look for more generic solutions
Evaluate any new GRID tools which might
standardize any parts we’ve implemented
ourselves.
Consolidation with other GRID projects
3rd Nov 2000
HEPiX/HEPNT 2000
15