Physics with SAM

Download Report

Transcript Physics with SAM

SAM middleware components
Stefan Stonjek
University of Oxford
7th GridPP Meeting
GridPP7 - 02 July 2003
02nd July 2003
Stefan Stonjek
Oxford
Slide 1
Outline
Introduction to SAM
Internals of a SAM station
Design of a SAM station
SAM-Grid Architecture
D0 reconstruction effort
Outlook and Summary
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 2
Introduction to SAM
SAM: Sequential data Access via Meta-data
SAM is a distributed data handling system
One SAM station per processing
node/cluster/site
D0: RAL, IC, Manchester, Lancaster
CDF: RAL, Oxford, Glasgow, Scotgrid, UCL,
Liverpool
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 3
SAM – central vs. decentral
Each SAM station has a local file cache
Files are transferred from station to station
(no central storage, peer to peer)
Central database keeps track of all files,
metadata, users, etc. in the SAM system
No full peer to peer yet

Peer to peer with central database
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 4
The SAM Station
Each station runs one
station master process
This communicates with
the outside world
Local SAM processes
talk to the station
master
Station master talks
with the central
database
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 5
A SAM Analysis Project
For every new analysis job a
new project is created
Corresponds to a list of files
Project-Master process keeps
track of the status of each
file in this project
A project can have multiple
consumers
Every file to only one
consumer
Allow easy processing on
farms
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 6
SAM File transfers
Station initiates file
transfers
Station keeps track of
the needs of all projects

transfer files accordingly
Stager uses can use
different transfer
protocols

Depends on local and
remote configuration
Cache content of each
station is kept in central
database
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 7
SAM Station to database
communication
Station talks to a
db-server (=CORBA
to SQL translator)
ORACLE database
Just one client for
the database
Reduce load to
database
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 8
Station to Station Transfer
File transfer is done
station to station
Several possible transfer
protocols
Negotiated between
stations
Each station has it’s
own cache
Location information
from central database
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 9
Grid Job and Information
Management (JIM)
Counterpart for the data handling
system (SAM)
Based on existing tools (Globus, Condor
etc.)
Allow brokering based on information
from the data-handling system
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 10
SAM-Grid Architecture
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 11
Job Handling
Condor for submission and brokering

Decision making is based on:
 Resource information (general and job specific)
 Job information

Decision making is interfaced with data handling middleware
 not just static resource information
 allows brokering to include data handling considerations

Decision making is entirely in the Condor framework
 strong promotion of standards
 interoperability
GRAM protocol to transfer job to execution site
Authentication via GSI (Grid Security Infrastructure)
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 12
Job Management
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 13
JIM Monitoring
Information
Management


Resource description for
brokering
Infrastructure for
monitoring
Monitors sites,
resources and jobs
Distributed knowledge
Web based information
retrival
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 14
SAM-Grid Logistics
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 15
Outlook:
D0 Reprocessing Challenge
D0 will reprocess all Run II data
01st Sep 2003 – 25th Nov 2003 (86 days),
Conference deadline
Lions share at D0 remote computing facilities,
including


RAL, IC, Manchester, Lancaster
Karlsruhe, Wuppertal, Lyon, Michigan, NIKHEF etc.
SAM to move data, runjob site job
management
JIM submission and monitoring
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 16
Outlook:
D0 Reprocessing Challenge (2)
150 million events / 22.5 TByte input data


Second level to second level
25 TByte output data
SAM routinely handles this data volume

Currently mainly on-site of Fermilab
First large scale, large volume “real” data
challenge
First HEP experiment to reprocess data in
distributed fashion
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 17
Summary
SAM is a distributed data handling
system

It is used in production
JIM allows to broker jobs based on job
specific information and dynamic
resources
GridPP plays a vital role for the
development of SAM-Grid
GridPP7 - 02 July 2003
Stefan Stonjek
Slide 18