Argonne National Laboratory
Download
Report
Transcript Argonne National Laboratory
GridFTP Roadmap
ALCF
Argonne Leadership
Computing Facility
Bill Allcock (on behalf of the GridFTP team)
Argonne National Laboratory
– Usability & Performance
Packaging GridFTP as RPM
GWFTP
GridFTP GUI
Automatic Firewall Traversal
Sync feature for globus-url-copy
Argonne National
Laboratory
Packaging GridFTP as RPM
Modify packaging of GridFTP and its
dependencies
Make it suitable for packaging as an RPM
Make it compatible with major Linux
distribution standards
Eventually some distribution might pick it up
GridFTP available as part of standard Linux
distribution
– Attract a whole new set of users
– Put it in par with scp, standard ftp in terms of
availability
Argonne National
Laboratory
GridFTP Where there’s FTP
(GWFTP)
GridFTP has been in existence for some
time and has proven to be quite robust and
useful
Only few GridFTP clients available
FTP has innumerable clients
GWFTP - created to leverage the FTP
clients
A proxy between FTP clients and GridFTP
servers
Argonne National
Laboratory
GWFTP
USER <GWFTP username>
::gsiftp://wiggum.mcs.anl.gov:2811/
PASS
FTP
Client
Get request
Data
Argonne National
Laboratory
GWFTP
(GSI
Credential)
GSI
Authentication
Get request
Data
wiggum.mcs.anl.gov
GridFTP Server
(2811)
GUI Client
Computation
08/14/2008
Argonne National Institute
Laboratory
GridFTP GUI
A Java Web Start Application
– Updates automatically
– Users always use the latest release
Transfer files and directories
Third-party transfer
Multiple concurrent transfers
Support authentication through MyProxy
Manage local and remote files and directories
– Browse
– Create and delete
Argonne National
Laboratory
Automatic Firewall Traversal
• Control channel port is statically assigned
• Data channel ports are dynamically assigned
GridFTP Protocol Changes
• New commands to communicate the 4 tuple
(src ip, src port, dst ip, dst port) to both ends
of transfer
• Use simultaneous Open/TCP splicing or Use a
broker to open ports temporarily
• Hooks in GridFTP to contact a broker at the right
time
Argonne National
Laboratory
Firewall
DATA
GridFTP
Source
Server
TCP 2811
TCP 2811
Client
Argonne National
Laboratory
GridFTP
Dest
Server
Automatic traversal using a
connection Broker
CB
CB
IP 4 tuple
Temporary hole
GridFTP
Source
Server
Temporary hole
DATA
TCP 2811
TCP 2811
Client
Argonne National
Laboratory
IP 4 tuple
GridFTP
Dest
Server
Sync feature for globus-urlcopy
Check for the existence of a file at the
destination before transferring
If exists, determine whether the source
version is different from that of the
destination
Based on how much the source has
changed, optimize the transfer
Research into developing a logic that does
not involve any changes to the GridFTP
protocol
Argonne National
Laboratory
– Reliability & Security
Improved restart mechanism
Improved memory management algorithm
Load balancing
Data channel security for SSH based
GridFTP
GUMS authorization callout
Argonne National
Laboratory
Improved Restart Mechanism
• globus-url-copy can recover from server and
network failures
• Can not recover from its own failure
• Number of users including ESG, APS and SNS
use this client to transfer large data sets with
complex directory structures
• Develop methods to enable globus-url-copy
to recover from its failure
Argonne National
Laboratory
Gfork architecture
Server Host
Client
Control Channel
Connections
GFork
GridFTP
Server
Plugin
Client
Links
Client
Argonne National
Laboratory
Fork
Inherited
GridFTP
Server
GridFTP
GridFTPServer
Server
Instance
Instance
Instance
State Sharing Link
Memory Management
Optimistic memory provisioning by operating
system
– possible that under heavy loads GridFTP server can
consume all of systems memory resources.
Gfork – xinted like super server daemon
– Allows state to be maintained across connections
GridFTP plugin for Gfork has a simple memory
limiting option
– 90% of the memory to the first 10% of the allowed
connections
– Remaining connections receive half of what is
available
Develop an improved memory management
algorithm
Argonne National
Laboratory
Load balancing capabilities
DPI
The separation of processes
buys the ability to proxy
– Allows for load balancing
Frontend
IPC
DPI
– Frontend can choose from
a pool of DPIs to service a
client request
DPI
Client
Argonne National
Laboratory
DPI
SSH based GridFTP (GridFTPLite)
Port 22
Client
sshd
ROOT
ssh
Stdin/out
(control channel)
GridFTP
Server
USER
Argonne National
Laboratory
2811
Data Channel Security for SSH
based GridFTP
SSH based GridFTP does not have data
channel security
Investigate and prototype a way to let a client
send a shared secret to both source and
destination GridFTP servers
• Used to secure the data channel(s) between the
two servers
• Shared secret can be used to authenticate,
integrity-protect and encrypt the data channel
• This feature will increase the adoption of SSH
based GridFTP
Argonne National
Laboratory
GUMS Authorization Callout
• GUMS – Grid User Management System
– Grid identity mapping service
– Maps grid identity to local site identity
– Used in OSG
1. Authentication
Client
2. Data transfer
operations
GridFTP
3. Obtain local identity from GUMS server
/DC=org/DC=doegrids/OU=People/CN=John Bresnahanz
GUMS
callout
4. Access data as local identity
Disk
Argonne National
Laboratory
bresnaha
GUMS
server
GUMS Authorization Callout
• Role based authorization using voms
extended proxy
1. Authentication
Client
2. Data transfer
operations
GridFTP
GUMS
callout
3. Obtain local identity from GUMS server
/DC=org/DC=doegrids/OU=People/CN=John Bresnahanz
/VO=ATLAS/Group=USATLAS/Role=developer
4. Access data as local identity
Disk
Argonne National
Laboratory
usatlasdev
GUMS
server
– Quality of Service
Information provider
Provision end-point GridFTP resources
Integrate network provisioning
Integrate storage provisioning
Co-schedule data transfer resources
Argonne National
Laboratory
Information Provider
GridFTP information provider service
– Max connections
– Open connections
– Load
Higher level services can utilize this
information for scheduling data transfers
– Help with selecting the appropriate
replica of data
Argonne National
Laboratory
Provision end-point resources
Data Movement Service (RFT replacement)
Provision
GridFTP
Control
Channel
GridFTP
Info
Provider
GridFTP
Server
GFTP
Resource
Broker
Resource
Limiter
CPU Memory BW
Argonne National
Laboratory
Data Point
Integrate Network Provisioning
Reserve
Bandwidth
Data Movement Service
GridFTP
Server
Provision
GridFTP
Control
Channel
GridFTP
Info
Provider
Bandwidth
Token
Network
Reservation
Service
GFTP
Resource
Broker
Resource
Limiter
CPU Memory BW
Data Point
Argonne National
Laboratory
Integrate Storage Provisioning
Provision
Bandwidth
Data Movement Service
GridFTP
Server
Provision
GridFTP
Control
Channel
GridFTP
Info
Provider
Bandwidth
Token
Network
Reservation
Service
GFTP
Resource
Broker
Lotman
Resource
Limiter
CPU Memory BW
Data Point
Argonne National
Laboratory
File
System
Co-schedule Data Transfer
Resources
Data Movement Service
Provision
Bandwidth
Network
Reservation
Service
Source
Data Point
Argonne National
Laboratory
Destination
Data Point