Argonne National Laboratory

Download Report

Transcript Argonne National Laboratory

GridFTP Roadmap
ALCF
Argonne Leadership
Computing Facility
Bill Allcock (on behalf of the GridFTP team)
Argonne National Laboratory
– Usability & Performance
Packaging GridFTP as RPM
GWFTP
GridFTP GUI
Automatic Firewall Traversal
Sync feature for globus-url-copy
Argonne National
Laboratory
Packaging GridFTP as RPM
 Modify packaging of GridFTP and its
dependencies
 Make it suitable for packaging as an RPM
 Make it compatible with major Linux
distribution standards
 Eventually some distribution might pick it up
 GridFTP available as part of standard Linux
distribution
– Attract a whole new set of users
– Put it in par with scp, standard ftp in terms of
availability
Argonne National
Laboratory
GridFTP Where there’s FTP
(GWFTP)
 GridFTP has been in existence for some
time and has proven to be quite robust and
useful
 Only few GridFTP clients available
 FTP has innumerable clients
 GWFTP - created to leverage the FTP
clients
 A proxy between FTP clients and GridFTP
servers
Argonne National
Laboratory
GWFTP
USER <GWFTP username>
::gsiftp://wiggum.mcs.anl.gov:2811/
PASS
FTP
Client
Get request
Data
Argonne National
Laboratory
GWFTP
(GSI
Credential)
GSI
Authentication
Get request
Data
wiggum.mcs.anl.gov
GridFTP Server
(2811)
GUI Client
Computation
08/14/2008
Argonne National Institute
Laboratory
GridFTP GUI
 A Java Web Start Application
– Updates automatically
– Users always use the latest release
 Transfer files and directories
 Third-party transfer
 Multiple concurrent transfers
 Support authentication through MyProxy
 Manage local and remote files and directories
– Browse
– Create and delete
Argonne National
Laboratory
Automatic Firewall Traversal
• Control channel port is statically assigned
• Data channel ports are dynamically assigned
 GridFTP Protocol Changes
• New commands to communicate the 4 tuple
(src ip, src port, dst ip, dst port) to both ends
of transfer
• Use simultaneous Open/TCP splicing or Use a
broker to open ports temporarily
• Hooks in GridFTP to contact a broker at the right
time
Argonne National
Laboratory
Firewall
DATA
GridFTP
Source
Server
TCP 2811
TCP 2811
Client
Argonne National
Laboratory
GridFTP
Dest
Server
Automatic traversal using a
connection Broker
CB
CB
IP 4 tuple
Temporary hole
GridFTP
Source
Server
Temporary hole
DATA
TCP 2811
TCP 2811
Client
Argonne National
Laboratory
IP 4 tuple
GridFTP
Dest
Server
Sync feature for globus-urlcopy
 Check for the existence of a file at the
destination before transferring
 If exists, determine whether the source
version is different from that of the
destination
 Based on how much the source has
changed, optimize the transfer
 Research into developing a logic that does
not involve any changes to the GridFTP
protocol
Argonne National
Laboratory
– Reliability & Security
Improved restart mechanism
Improved memory management algorithm
Load balancing
Data channel security for SSH based
GridFTP
GUMS authorization callout
Argonne National
Laboratory
Improved Restart Mechanism
• globus-url-copy can recover from server and
network failures
• Can not recover from its own failure
• Number of users including ESG, APS and SNS
use this client to transfer large data sets with
complex directory structures
• Develop methods to enable globus-url-copy
to recover from its failure
Argonne National
Laboratory
Gfork architecture
Server Host
Client
Control Channel
Connections
GFork
GridFTP
Server
Plugin
Client
Links
Client
Argonne National
Laboratory
Fork
Inherited
GridFTP
Server
GridFTP
GridFTPServer
Server
Instance
Instance
Instance
State Sharing Link
Memory Management
 Optimistic memory provisioning by operating
system
– possible that under heavy loads GridFTP server can
consume all of systems memory resources.
 Gfork – xinted like super server daemon
– Allows state to be maintained across connections
 GridFTP plugin for Gfork has a simple memory
limiting option
– 90% of the memory to the first 10% of the allowed
connections
– Remaining connections receive half of what is
available
 Develop an improved memory management
algorithm
Argonne National
Laboratory
Load balancing capabilities
DPI
 The separation of processes
buys the ability to proxy
– Allows for load balancing
Frontend
IPC
DPI
– Frontend can choose from
a pool of DPIs to service a
client request
DPI
Client
Argonne National
Laboratory
DPI
SSH based GridFTP (GridFTPLite)
Port 22
Client
sshd
ROOT
ssh
Stdin/out
(control channel)
GridFTP
Server
USER
Argonne National
Laboratory
2811
Data Channel Security for SSH
based GridFTP
 SSH based GridFTP does not have data
channel security
 Investigate and prototype a way to let a client
send a shared secret to both source and
destination GridFTP servers
• Used to secure the data channel(s) between the
two servers
• Shared secret can be used to authenticate,
integrity-protect and encrypt the data channel
• This feature will increase the adoption of SSH
based GridFTP
Argonne National
Laboratory
GUMS Authorization Callout
• GUMS – Grid User Management System
– Grid identity mapping service
– Maps grid identity to local site identity
– Used in OSG
1. Authentication
Client
2. Data transfer
operations
GridFTP
3. Obtain local identity from GUMS server
/DC=org/DC=doegrids/OU=People/CN=John Bresnahanz
GUMS
callout
4. Access data as local identity
Disk
Argonne National
Laboratory
bresnaha
GUMS
server
GUMS Authorization Callout
• Role based authorization using voms
extended proxy
1. Authentication
Client
2. Data transfer
operations
GridFTP
GUMS
callout
3. Obtain local identity from GUMS server
/DC=org/DC=doegrids/OU=People/CN=John Bresnahanz
/VO=ATLAS/Group=USATLAS/Role=developer
4. Access data as local identity
Disk
Argonne National
Laboratory
usatlasdev
GUMS
server
– Quality of Service
Information provider
Provision end-point GridFTP resources
Integrate network provisioning
Integrate storage provisioning
Co-schedule data transfer resources
Argonne National
Laboratory
Information Provider
 GridFTP information provider service
– Max connections
– Open connections
– Load
 Higher level services can utilize this
information for scheduling data transfers
– Help with selecting the appropriate
replica of data
Argonne National
Laboratory
Provision end-point resources
Data Movement Service (RFT replacement)
Provision
GridFTP
Control
Channel
GridFTP
Info
Provider
GridFTP
Server
GFTP
Resource
Broker
Resource
Limiter
CPU Memory BW
Argonne National
Laboratory
Data Point
Integrate Network Provisioning
Reserve
Bandwidth
Data Movement Service
GridFTP
Server
Provision
GridFTP
Control
Channel
GridFTP
Info
Provider
Bandwidth
Token
Network
Reservation
Service
GFTP
Resource
Broker
Resource
Limiter
CPU Memory BW
Data Point
Argonne National
Laboratory
Integrate Storage Provisioning
Provision
Bandwidth
Data Movement Service
GridFTP
Server
Provision
GridFTP
Control
Channel
GridFTP
Info
Provider
Bandwidth
Token
Network
Reservation
Service
GFTP
Resource
Broker
Lotman
Resource
Limiter
CPU Memory BW
Data Point
Argonne National
Laboratory
File
System
Co-schedule Data Transfer
Resources
Data Movement Service
Provision
Bandwidth
Network
Reservation
Service
Source
Data Point
Argonne National
Laboratory
Destination
Data Point