Storage Resource Managers

Download Report

Transcript Storage Resource Managers

Computing Sciences Directorate, L B N L
Storage Resource Managers:
Essential Components for the Grid
Arie Shoshani
Staff:
Alex Sim, Junmin Gu,
Alex Romosan, Viji Natarajan
Scientific Data Management Group
Lawrence Berkeley National Laboratory
http://sdm.lbl.gov/srm
SC 2003
1
Outline
Computing Sciences Directorate, L B N L
• What are Storage Resource Managers - Motivation
• General Analysis Scenario and the use of SRMs
• SRM functionality
• Real examples of working SRMs
• Advantages of using SRMs
• Conclusions and Future Work
SC 2003
2
Motivation
Computing Sciences Directorate, L B N L
• Grid architecture needs to include reservation &
scheduling of:
• Compute resources
• Storage resources
• Network resources
• Storage Resource Managers (SRMs) role in the
data grid architecture
•
•
•
•
Shared storage resource allocation & scheduling
Especially important for data intensive applications
Often files are archived on a mass storage system (MSS)
large scientific collaborations (100’s of clients) –
opportunities for file sharing
• File replication and caching may be used
• Need to support non-blocking (asynchronous) requests
SC 2003
3
Types of SRMs
Computing Sciences Directorate, L B N L
• Types of storage resource managers
• Disk Resource Manager (DRM)
• Manages one or more disk resources
• Tape Resource Manager (TRM)
• Manages access to a tertiary storage system (e.g. HPSS)
• Hierarchical Resource Manager (HRM=TRM + DRM)
• An SRM that stages files from tertiary storage into its disk cache
• SRMs and File transfers
• SRMs DO NOT perform file transfer
• SRMs DO invoke file transfer service if needed
(GridFTP, FTP, HTTP, …)
• SRMs DO monitor transfers and recover from failures
• TRM: from/to MSS
• DRM: from/to network
SC 2003
4
Computing Sciences Directorate, L B N L
A multi-file request to a
Disk Resource Manager
client
...
client
file
access
multi-file
request
Disk
Cache
DRM
file
transfer requests
File
Transfer Service
Disk
Cache
SC 2003
Tape System
Client-SRM
Communication
...
network
File
Transfer Service
Disk
Cache
5
Computing Sciences Directorate, L B N L
Accessing Remote
Storage Resource Managers
client
...
client
file
access
multi-file
request
Disk
Cache
DRM
SRM-SRM
Communication
file
transfer requests
HRM
Disk
Cache
SC 2003
Tape System
...
network
DRM
Disk
Cache
6
General Analysis Scenario :
Uniform SRM Interface
Computing Sciences Directorate, L B N L
Client’s site
...
client
client
logical
query
Disk
Cache
Storage
Resource
Manager
result
files
Storage
Resource
Manager
Disk
Cache
Compute
Resource
Manager
Compute
Engine
Site 1
SC 2003
Request
Interpreter
Requests for
data placement and
remote computation
Replica
catalog
request
planning
Network
Weather
Service
Execution
DAG
network
Storage
Resource
Manager
Disk
Cache
A set of
logical files
Execution plan
and site-specific
files
Request
Executer
Metadata
catalog
Compute
Resource
Manager
Compute
Engine
Site 2
Storage
Resource
Manager
...
Disk
Cache
MSS
Site N
7
SRM is a Service
(OGSA, CORBA, C++, Java, …)
Computing Sciences Directorate, L B N L
• SRM functionality
• Manage space
• Negotiate and assign space to users
• Manage “lifetime” of spaces
• Manage files on behalf of a user
• Pin files in storage till they are released
• Manage “lifetime” of files
• Manage action when pins expire (depends on file types)
• Manage file sharing
• Policies on what should reside on a storage resource at any one time
• Policies on what to evict when space is needed
• Get files from remote locations when necessary
• Purpose: to simplify client’s task
• Manage multi-file requests
• A brokering function: queue file requests, pre-stage when possible
• Provide grid access to/from mass storage systems
• HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor
(CERN), MSS (NCAR), …
SC 2003
8
Computing Sciences Directorate, L B N L
SRM works with other SRMs
as well as legacy systems
by using GridFTP
client
Request
Interpreter
Logical Request
DRM
Request
Manager
Legend:
GridFTP
Control path
Data Path
Disk
Cache
Chicago
server
Livermore
Berkeley
server
server
server
GridFTP
Disk
Cache
SC 2003
Berkeley
GridFTP
Disk
Cache
DRM
FTP
Disk
Cache
GridFTP
HRM
Disk
Cache
9
Earth System Grid
Computing Sciences Directorate, L B N L
LBNL
HPSS
High Performance
Storage System
HRM
Storage Resource
Management
disk
ANL
gridFTP
server
NCAR openDAPg
server
CAS
Community Authorization Services
gridFTP
Striped
server
MyProxy
server
Tomcat servlet engine
disk
MCS client
MyProxy client
RLS client
DRM
Storage Resource
Management
LLNL
DRM
Storage Resource
Management
GRAM
gatekeeper
gridFTP
server
gridFTP
USC-ISI
MCS
Metadata Cataloguing Services
RLS
Replica Location Services
SC 2003
SOAP
CAS client
ORNL
gridFTP
server
HRM
Storage Resource
Management
gridFTP
gridFTP
server
RMI
disk
NCAR-MSS
Mass Storage System
disk
HRM
Storage Resource
Management
HPSS
High Performance
Storage System
10
Computing Sciences Directorate, L B N L
Uniformity of Interface 
Compatibility of SRMs
Client
USER/APPLICATIONS
Grid Middleware
SRM
SRM
Enstore
SC 2003
SRM
JASMine
SRM
DCache
SRM
CASTOR
SRM
Disk
Cache
11
Where do SRMs belong
in the Grid architecture?
Request
Interpretation
and Planning
Services
SC 2003
Data
Transport
Services
CONNECTIVITY
File Transfer
Service
(GridFTP)
Communication
Protocols (e.g.,
TCP/IP stack)
FABRIC
RESOURCE:
COLLECTIVE 1:
GENERAL
SERVICES FOR
COORDINATING
MULTIPLE
RESOURCES
COLLECTIVE
Computing Sciences Directorate, L B N L
Networks
Workflow or
Request
Management
Services
Data
Federation
Services
ApplicationSpecific Data
Discovery Services
Data Filtering or
Transformation
Services
Storage
Resource
Manager
Community
Authorization
Services
General Data
Discovery
Services
Data Filtering or
Transformation
Services
Consistency Services
(e.g., Update Subscription,
Versioning, Master Copies)
Storage
Management
(Brokering)
Database
Management
Services
Compute
Scheduling
(Brokering)
Compute
Resource
Management
Monitoring/
Auditing
Services
Resource
Monitoring/
Auditing
Authentication and
Authorization
Protocols (e.g., GSI)
Mass Storage
System
(HPSS)
Other
Storage
systems
Compute
Systems
This figure based on the
Grid Architecture paper
by Globus Team
12
Request
Interpretation
and Planning
Services
FABRIC
COLLECTIVE 1:
GENERAL
SERVICES FOR
COORDINATING
RESOURCE:
MULTIPLE
SHARING SINGLE
RESOURCES
RESOURCES
CONNECTIVITY
COLLECTIVE
Computing Sciences Directorate, L B N L
SC 2003
SRMs provide a brokering service
by supporting multi-file requests
Data
Transport
Services
File Transfer
Service
(GridFTP)
Workflow or
Request
Management
Services
Data
Federation
Services
ApplicationSpecific Data
Discovery Services
Storage
Management
(Brokering)
Storage
Resource
Manager
General Data
Discovery
Services
Data Filtering or
Transformation
Services
Communication
Protocols (e.g.,
TCP/IP stack)
Networks
Community
Authorization
Services
Consistency Services
(e.g., Update Subscription,
Versioning, Master Copies)
Data Filtering or
Transformation
Services
Database
Management
Services
Compute
Scheduling
(Brokering)
Compute
Resource
Management
Monitoring/
Auditing
Services
Resource
Monitoring/
Auditing
Authentication and
Authorization
Protocols (e.g., GSI)
Mass Storage
System
(HPSS)
Other
Storage
systems
Compute
Systems
This figure based on the
Grid Architecture paper
by Globus Team
13
DataMover: SRMs use in ESG and PPDG
for Robust Muti-file replication
Computing Sciences Directorate, L B N L
Anywhere
DataMover
Recovers from
file transfer failures
Recovers from
archiving failures
(Command-line Interface)
Recovers from
staging failures
HRM-COPY
(thousands of files)
Get list
of files
From directory
NCAR
SRM-GET (one file at a time)
LBNL/
ORNL
HRM
HRM
(performs writes)
GridFTP GET (pull mode)
Disk
Cache
(performs reads)
Disk
Cache
NCAR-MSS
Network transfer
archive files
SC 2003
Web-based
File
Monitoring
Tool
stage files
14
Concepts: Types of Files
Computing Sciences Directorate, L B N L
• Volatile: temporary files with a lifetime guarantee
• Files are “pinned” and “released”
• Files can be removed by SRM when released or when
lifetime expires
• Permanent
• No lifetime
• Files can only be removed by creator (owner)
• Durable: files with a lifetime that CANNOT be
removed by SRM
• Files are “pinned” and “released”
• Files can only be removed by creator (owner)
• If lifetime expires – invoke administrative action (e.g. notify
owner, archive and release)
SC 2003
15
Concepts: Types of Spaces
Computing Sciences Directorate, L B N L
• Types
• Volatile
• Space can be reclaimed by SRM when lifetime expires
• durable
• Space can be reclaimed by SRM only if it does NOT contain files
• Can choose to archive files and release space
• Permanent
• Space can only be released by owner or administrator
• Assignment of files to spaces
• Files can only be assigned to spaces of the same type
• Spaces can be reserved
• No limit on number of spaces
• Space reference handle is returned to client
• Total space of each type are subject to SRM and/or VO policies
• Default spaces
• Files can be put into SRM spaces without explicit reservation
• Defaults are not visible to client
• Compacting space
• Release all unused space – space that has no files or files whose
lifetime expired
SC 2003
16
Concepts: Directory Management
Computing Sciences Directorate, L B N L
• Usual unix semantics
• srmLs, srmMkdir, srmMv, srmRm, srmRmdir
• A single directory for all file type
• No directories for each type
• File assignment to types is virtual
• File can be placed in SRM-managed directories by
maitaining mapping to client’s directory
• Access control services
• Support owner/group/world permission
• Can only be assigned by owner
• When file requested by user, SRM should check permission
with source site
SC 2003
17
Examples of Directory Structures
(user defined)
Computing Sciences Directorate, L B N L
D1
D2
D1
D3
D2
D3
D4
D4
F1 (D)
F2 (P)
F3 (V)
F1 (V) F2 (V) F3 (V) F4 (D) F5 (D) F6 (D) F7 (P)
F4 (P)
(1) Mixed file types
F8 (P)
F5 (D)
(2) By file type
• Supported function: ChangeFileType
• Advantage of (1): no need to move files when file types are changed
SC 2003
18
Concepts: Space Reservations
Computing Sciences Directorate, L B N L
• Negotiation
• Client asks for space: C-guaranteed, MaxDesired
• SRM return: S-guaranteed <= C-guaranteed,
best effort <= MaxDesired
• Type of space
• Can be specified
• Subject to limits per client (SRM or VO policies)
• Default: volatile
• Lifetime
• Negotiated: C-lifetime requested
• SRM return: S-lifetime <= C-lifetime
• Reference handle
• SRM returns space reference handle
• User can provide: srmSpaceTokenDescription to recover handles
SC 2003
19
Concepts: Transfer Protocol Negotiation
Computing Sciences Directorate, L B N L
• Negotiation
• Client provides an ordered list
• SRM return: highest possible protocol it supports
• Example
• Protocols list: bbftp, gridftp, ftp
• SRM returns: gridftp
• Advantages
• Easy to introduce new protocols
• User controls which protocol to use
• Default – SRM policy choice
• How it is returned?
• The protocol of the Transfer URL (TURL)
• Example: bbftp://dm.slac.edu/temp/run11/File678.txt
SC 2003
20
Concepts: Multi-file requests
Computing Sciences Directorate, L B N L
• Can srmRequestToGet multiple files
• Required: Files URLs
• Optional: space file type, space handle, Protocol list
• Optional: total retry time
• Provide: Site URL (SURL)
• URL known externally – e.g. in Rep Catalogs
• e.g. srm://sleepy.lbl.gov:4000/tmp/foo-123
• Get back: transfer URL (TURL)
• Path can be different that in SURL – SRM internal mapping
• Protocol chosen by SRM
• e.g. gridftp://dm.lbl.gov:4000/home /level1/foo-123
• Managing request queue
•
•
•
•
•
SC 2003
Allocate space according to policy, system load, etc.
Bring in as many files as possible
Provide information on each file brought in or pinned
Bring additional files as soon as files are released
Support file streaming
21
SRM functionality
Computing Sciences Directorate, L B N L
• Space reservation
• Negotiate and assign space to users
• Manage “lifetime” of spaces
• Release and compact space
• File management
•
•
•
•
Assign space for putting files into SRM
Pin files in storage when requested till they are released
Manage “lifetime” of files
Manage action when pins expire (depends on file types)
• Get files from remote locations when necessary
• Purpose: to simplify client’s task
• srmCopy: in “pull” and “push” modes
SC 2003
22
SRM functionality (Cont’d)
Computing Sciences Directorate, L B N L
• Space management policies and file sharing
• Policies on what should reside on a storage resource at any one
time
• Policies on what to evict when space is needed
• Share files to avoid getting them from remote locations
• Manage multi-file requests
• Queues file requests, pre-stage when possible
• Status functions
•
•
•
•
Files: lifetime remaining, what’s available locally
Requests: what files are available (needed in lieu of callbacks)
Request summary: for progress report
Space metadata: space in use, space available, lifetime
• Provide grid access to/from mass storage systems
• HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab),
Castor (CERN), MSS (NCAR), SE (RAL) …
SC 2003
23
SRM Methods
Computing Sciences Directorate, L B N L
File Movement
srm(Prepare)Get:
srm(Prepare)Put:
srmReplicate:
Lifetime management
srmReleaseFiles:
srmPutDone:
srmExtendFileLifeTime:
Terminate/resume
srmAbortRequest:
srmAbortFile
srmSuspendRequest:
srmResumeRequest:
SC 2003
Space management
srmReserveSpace
srmReleaseSpace
srmUpdateSpace
srmCompactSpace:
srmGetCurrentSpace:
FileType management
srmChangeFileType:
Status/metadata
srmGetRequestStatus:
srmGetFileStatus:
srmGetRequestSummary:
srmGetRequestID:
srmGetFilesMetaData:
srmGetSpaceMetaData:
24
Summary: advantages of using SRMs
Computing Sciences Directorate, L B N L
• Synchronization between storage resources
• Pinning file, releasing files
• Allocating space dynamically on as “needed basis”
• Insulate clients from storage and network system failures
• Transient MSS failure
• Network failures
• Interruption of large file transfers
• Facilitate file sharing
• Eliminate unnecessary file transfers
• Support “streaming model”
• Use space allocation policies by SRMs: no reservations needed
• Use explicit release by client for reuse of space
• Control number of concurrent file transfers
• From/to MSS – avoid flooding MSS and thrashing
• From/to network – avoid flooding and packet loss
SC 2003
25
Web-Based File Monitoring Tool
Computing Sciences Directorate, L B N L
Shows:
-Files already
transferred
- Files during
transfer
- Files to be
transferred
Also shows for
each file:
-Source URL
-Target URL
-Transfer rate
SC 2003
26
Computing Sciences Directorate, L B N L
File tracking helps to identify
bottlenecks
Shows that archiving is the bottleneck
SC 2003
27
File tracking shows recovery from transient
failures
Computing Sciences Directorate, L B N L
Total:
45 GBs
SC 2003
28
Ongoing and Future Work
Computing Sciences Directorate, L B N L
• Ongoing work
• Developing Standard SRM interfaces
• Particle Physics Data Grid (PPDG) project
• LBNL, TJNAF, FNAL
• European Data Grid (EDG) project
• WP2 - data management
• WP5 – mass storage
• Deployment
• LBNL, BNL, ORNL, TJNAF, FNAL, CERN, (SE-England)
• Use of SRM by other agents
• Storage Resource Broker (SDSC) calling HRM to Stage files from HPSS
• GridFTP invoking HRM
• New Spec completed (SRM V2.1)
• directory management
• File/directory file movement
• dynamic space management
• Future work
•
•
•
•
•
SC 2003
Access authorization – community access service (CAS)
“On-demand” space allocation, accounting, and charging
Replica management – invoke SRMs and RLS as a single service
Request executer (e.g. DAGMAN) to invoke SRMs
SRMs over NeST (Network STorage)
30