1st Generation of Grid portals

Download Report

Transcript 1st Generation of Grid portals

1st Generation of Grid portals
1st
Generation
Portals
The first generation of Grid portals mainly used a three-tier
architecture:
Properties
• A three-tiered architecture, consisting of an interface tier of a Web
browser, a middle tier of Web servers, and a third tier of backend
services and resources, such as databases, high performance computers,
disk storage, and specialized devices.
• A user makes a secure connection from their browser to a Web server.
• The Web server then obtains a proxy credential from a proxy credential
server and uses that to authenticate the user.
• When the user completes defining the parameters of the task they want
to execute, the portal Web server launches an application manager,
which is a process that controls and monitors the actual execution of
Grid task(s).
• The Web server delegates the user’s proxy credential to the application
manager, so that it may act on the user’s behalf.
In some systems, the application manager publishes an event/message
stream to a persistent event channel-archive, which describes the state
of an application’s execution and can be monitored by the user through
their browser.
Grid Services Provided
• Authentication: When users access the Grid via a portal, the portal can
authenticate users with their usernames and passwords. Once
authenticated, a user can request the portal to access Grid resources on
the user’s behalf.
• Job Management: A portal provides users with the ability to manage
their job tasks (serial or parallel), i.e., launching their applications via the
Web browser in a reliable and secure way, monitoring the status of tasks
and pausing or cancelling tasks if necessary.
• Data Transfer: A portal allows users to upload input data sets required
by tasks that are to be executed on remote resources. Similarly the
portal allows results sets and other data to be downloaded via a Web
browser to a local desktop.
• Information Services: A portal uses discovery mechanisms to find the
resources that are needed and available for a particular task.
Information that can be collected about resources includes static and
dynamic information such as OS or CPU type, current CPU load, free
memory or file space, and network status. In addition, other details such
as job status and queue information can also be retrieved.
Implementation
The first generation Grid portals mainly use GT2 to provide Grid services.
One main reason for this is that Globus provides a complete package
and a standard way for building Grid enabled services.
• A dynamic graphical user interface (GUI) based on HTML pages, with JSP
(Java Server Pages), or JavaScript. Common Gateway Interface (CGI) and
Perl are also used by some portals. CGI is an alternative to JSP for
dynamically generating Web contents.
• The secure connection from a browser to backend server is via Transport
Layer Security (TLS) and Secure HTTP (S-HTTP).
• Typically, a Java Servlet or Java Bean on the Web server services
requests from a user and accesses backend resources.
• MyProxy and GT2 GSI are used for user authentication. MyProxy
provides credential delegation in a secure manner.
• GT2 GRAM is used for job submission.
• GT2 MDS is used for gathering information on various resources.
• GT2 GSIFTP or GT2 GridFTP for data transfer.
• The Java CoG provides the access to the corresponding
Globus services for Java programs.
MyProxy
MyProxy is an online credential management system for the Grid.
It is used to delegate a user’s proxy credential to Grid portals
which can be authenticated to access Grid resources on the
user’s behalf. Storing your Grid credentials in a MyProxy
repository allows you to retrieve a proxy credential whenever
and wherever you need one. You can
also allow trusted servers to renew
your proxy credentials using MyProxy,
so, for example, long-running tasks do
not fail due to an expired proxy
credential. The figure shows the
steps to securely access the Grid
via a Grid portal with MyProxy.
myproxy.grid-support.ac.uk
Using MyProxy
1. Execute myproxy-init command on the computer where your Grid
credential is located to delegate a proxy credential on a MyProxy server.
The delegated proxy credential normally has a lifetime of one week. The
communication between the computer and the MyProxy server is
securely managed by TLS. You need to supply a user name and pass
phrase for the identity of your Grid credential. Then you need to supply
another different MyProxy pass phrase to secure the delegated proxy
credential on the MyProxy server.
2. Log into the Grid portal with the same username and MyProxy pass
phrase used for delegating the proxy credential.
3. The portal uses myproxy-get-delegation command to retrieve a
delegated proxy credential from the MyProxy server using your
username and MyProxy pass phrase.
4. The portal accesses Grid resources with the proxy credential on your
behalf.
5. The operation of logging out of the portal will delete your delegated
proxy credential on the portal. If you forget to log off,
then the proxy credential will expire at the lifetime specified.
Java CoG Kit
The Java Commodity Grid (CoG) Kit provides access to GT2 services through
Java APIs. The goal of the Java CoG Kit is to provide Grid developers with
the advantage to utilize much of the Globus functionality, as well as,
access to the numerous additional libraries and frameworks developed
by the Java community. Currently GT3 integrates part of Java CoG, e.g.,
many of the command-line tools in GT3 are implemented with the Java
CoG.
The Java CoG has been focused on client-side issues. Grid services that can
be accessed by the toolkit include:
• An information service compatible with the GT2 MDS implemented with
Java Native Directory Interface JNDI;
• A security infrastructure compatible with the GT2 GSI implemented with
the iaik security library;
• A data transfer mechanism compatible with a subset of the GT2 GridFTP
and/or GSIFTP;
• Resource management and job submission with the GT2 GRAM
Gatekeeper;
• Advanced reservation compatible with GT2 GARA;
• A MyProxy server managing user credentials.
GridPort and HPCPortal
GridPort 2.0 (GP2) is a Perl based Grid portal toolkit. The purpose
of GP2 was to facilitate the easy development of application
specific portals. GP2 is a collection of services, scripts and tools
that allow developers to connect Web-based interfaces to
backend Grid services. The scripts and tools provide consistent
interfaces between the underlying infrastructure, which are
based on Grid technologies such as GT2, and standard Web
technologies such as CGI.
•
•
•
•
GridPort Layers
Client Layer: represents the consumers of Grid portals, typically Web browsers, PDAs, or
even applications capable of pulling data from a Web server. Clients interact with a GP2
portal via HTML form elements and use secure HTTP to submit requests.
Portal Layer: consists of portal-specific codes. Application portals run on standard Web
servers and handle client requests and provide responses to those requests. One
instance of GP2 can support multiple concurrent application portals, but they must exist
on the same Web server where they share the same instance of the GP2 libraries. This
allows the application portals to share portal-related user and account data and thereby
makes possible a single-login environment. GP2 portals can also share libraries, file
space, and other services.
Portal Services Layer: GP2 and other portal toolkits or libraries reside at the portal
services layer. GP2 performs common services for application portals including the
management of session state, portal accounts, and Grid information services with GT2
MDS.
Grid Services Layer: consists of those software components and services that are
needed to handle user requests to access the Grid. GP2 employs simple, reusable
middleware technologies, e.g., GT2 GRAM for job submission to remote resources; GT2
GSI and MyProxy for security and authentication; GT2 GridFTP and the San Diego
Supercomputer Center (SDSC) Storage Resource Broker (SRB) for distributed file
collection and management [56]; and Grid Information Services based primarily on
proprietary GP2 information provider scripts and the GT2 MDS.
Distributed Component Architecture
GridPort, HPCPortal and GROWL
GP2 can be used in two ways. The first approach requires that GT2 be
installed, because GP2 scripts wrap the GT2 command line tools in the
form of Perl scripts executed from CGI-Bin. GT2 GRAM, GSIFTP, MyProxy
are used to access backend Grid services. The second approach does not
require GT2, but relies on the CGI scripts that have been configured to
use a primary GP2 Portal as a proxy for accessing GP2 services, such as
user authentication, job submission, and file transfer. The second
approach allows a user to quickly deploy a Web server configured with a
set of GP2 CGI scripts to perform generic portal operations.
HPCPortal uses the C API in the Globus toolkit for MDS, GridFTP and GRAM.
There is a front-end CGI script which passes data from the user’s form
interface to the back end Globus code which in turn submits the remote
job. The front and back end services can be connected using a web
service call so do not need to be located on the same server (previous
figure).
GROWL uses the same back end services but provides a C programming API
to the user in the form of a function liibrary.
GPDK
and
the
Java™
World
GPDK is another Grid portal toolkit that uses Java Server Pages
(JSP) for portal presentation and Java Beans to access back end
Grid resources via GT2. Beans in GPDK are mostly derived from
the Java CoG kit.
Java Services in GPDK
Grid service beans in GPDK can be classified as follows. These
beans can be used for the implementation of Grid portals.
• Security: The security bean, MyproxyBean, is responsible for
obtaining delegated credentials from a MyProxy server. The
MyproxyBean has a method for setting the username,
password, and designated lifetime of a delegated credential on
the Web server. In addition, it allows delegated credentials to
be uploaded securely to the Web server.
• User Profiles: User profiles are controlled by three beans:
UserLoginBean, UserAdminBean and the UserProfileBean.
– The UserLoginBean provides an optional service to authenticate
users to a portal. Currently, it only sets a username/password and
checks a password file on the Web server to validate user access.
– The UserAdminBean provides methods for serializing a
UserProfileBean and validating a user's profile.
– The UserProfileBean maintains user information including
preferences, credential information, submitted job history, and
computational resources used. The UserProfileBean is generally
instantiated with session scope to persist for the duration of the
user's transactions on the portal.
•
•
•
More…
Job Submission: The JobBean contains all the necessary functions used in submitting a
job including memory requirements, name of executble code, arguments, number of
processors, maximum wall clock or CPU time, and the submission queue. A JobBean is
passed to a JobSubmissionBean that is responsible for actually launching the job. Two
varieties of the JobSubmissionBean currently exist. The GramSubmissionBean submits a
job to a GT2 GRAM gatekeeper that can either run the job interactively or submit it to a
scheduling system if one exists. The JobInfoBean can be used to retrieve a job related
timestamped information including the job ID, status, and outputs. The JobHistoryBean
uses multiple JobInfo beans to provide a history of information about jobs that have
been submitted. The history information can be stored in the user's profile.
File Transfer: The FileTransferBean provides methods for transferring files. Both
GSIFTPTranferBean and the GSISCPTransferBean can be used to securely copy files from
source to destination hosts using a user's delegated credential. The GSISCPTransferBean
requires that GSI enabled SSH [57] be deployed on machines to which file transfer via
the GSI enhanced “scp”. The GSIFTPTransferBean implements a GSI enhanced FTP for
third-party file transfers.
Information Services: The MDSQueryBean provides methods for querying a Lightweight
Directory Access Protocol (LDAP) server by setting and retrieving object classes and
attributes such as OS type, memory, and CPU load for various resources. LDAP is a
standard for accessing information directories on the Internet. Currently, the
MDSQueryBean makes use of the Mozilla Directory SDK [27] for interacting with a LDAP
server.
Comparison of 1st Generation Portals
What
are
the
Restrictions?
First generation Grid portals have been focused on providing basic task-oriented services,
•
•
•
such as user authentication, job submission, monitoring, data transfer. However, they
are typically tightly coupled with Grid middleware tools such as Globus. The main
limitations of first generation portals can be summarized as follows.
Lack of Customization: Portal developers instead of portal users normally build portals
because the knowledge and expertise required to use the portal toolkits, as described in
this chapter, is beyond the capability of most Grid end users. When end users access the
Grid via a portal, it is almost impossible for them to customize the portal to meet their
specific needs, e.g., to add or remove some portal services.
Restricted Grid Services: First generation Grid portals are tightly coupled with specific
Grid middleware technologies such as Globus, which results in restricted portal services.
It is hard to integrate Grid services provided by different Grid middleware technologies
via a portal of this generation.
Static Grid Services : A Grid environment is dynamic in nature with more and more Grid
services are being developed. However, first generation portals can only provide static
Grid services in that they lack a facility to easily expose newly created Grid services to
users.