1.1 - Globus Packages

Download Report

Transcript 1.1 - Globus Packages

Distributed Programming in
Computational Grids Using CoG Kits
Prepared by
Gregor von Laszewski
Argonne National Laboratory
[email protected]
Keith R. Jackson, Jason Novotny
Lawrence Berkeley National Laboratory
[email protected], [email protected]
1
Outline

Introduction to Grids
– What is a Grid?
– What is the Globus Toolkit?
– What is a Commodity Grid Kit?

Using and Programming Grids with the Java and
Python CoG Kits
– Secure access to remote resources
– Remote job submission and monitoring
– Distributed grid information management and remote data
access
– Graphical component to access a Grid


Conclusion for this part of the presentation
Presentation on GPDK
2
Integration of High-end Resources
Software
catalogs
Supercomputers
Sensor
nets
Colleagues
Data archives
On-demand creation of powerful virtual computing systems
3
Grid Computing and Existing
Technologies

Commonalities between “Grid computing” and major
industrial thrusts
– Business-to-business, peer-to-peer, application service
providers, storage service providers, distributed computing,
Internet computing

Differences between Grid computing and existing
technologies
– Complicated requirements: “Run program X at site Y subject
to community policy P, providing access to data at Z
according to policy Q”
– High performance: unique demands of advanced & highperformance systems
4
What challenges do we have to
solve?
• Authenticate once
• Specify simulation (code, resources, etc.)
• Locate resources
• Negotiate authorization, acceptable use, etc.
• Acquire resources
• Initiate computation
• Steer computation
• Access remote datasets
• Collaborate on results
• Account for usage
Domain 1
Domain 2
5
An Example Application:
An Advanced Scientific Instrument
Virtual Reality Cave
Advanced Photon Source
Scientist
Avatar
Supercomputer
Electronic Library
and Databases
Computing Portal
Clients
6
The Globus Toolkit
The Globus Toolkit provides a range of basic Grid
Services
– Security, information, fault detection, communication,
resource management

These services are simple and orthogonal
– Independently use: mix and match
– Programming model independence


For each service there is generally a well-defined API
Standards are used extensively
– E.g. LDAP, GSS-API, X.509
7
Globus Approach



A toolkit and collection of services addressing key
technical problems
– Modular “bag of services” model
– Not a vertically integrated solution
– General infrastructure tools (aka middleware) that
can be applied to many application domains
Interdomain issues, rather than clustering
– Integration of intradomain solutions
Distinction between local and global services
8
Globus Hourglass

Focus on architecture
issues
– Propose set of core
services as basic
infrastructure
– Use to construct highlevel, domain-specific
solutions

Design principles
Applications
Diverse global services
Core Globus
services
– Keep participation cost low
– Enable local control
– Support adaptation

“IP hourglass” model
Local OS
9
Layered Grid Architecture
Application
Application
Resource
Connectivity
Transport
Internet
Fabric
Link
Internet Protocol Architecture
Collective
10
Production Grids & Testbeds
GUSTO Testbed
NASA’s Information Power Grid
The Alliance National Technology Grid
11
Commodity Grid Kits

A Commodity Grid (CoG) Kit defines and implements
a set of general components that map Grid
functionality into a higher-level environment.
– Java, Python, Perl, CORBA, DCOM, ....
– CoG kits harness strengths of object oriented languages
and component frameworks to:
• Enable code reuse.
• Allow rapid application development.
• Improve code maintainability.
– CoG Kits help us build applications, Problem Solving
Environments, and Portals.
12
Motivation: Java & Python CoG Kits
• Use and leverage existing technologies for Grid programming
- The capabilities of the framework onto which Grid
Services are mapped can be exploited:
Objects, Events, Exceptions, Components ...
- Objects like jobs/tasks can be defined.
- XML support is provided.
- GUI's, ...., IDE's can be used (Forte, BOA Constructor)
• Maximize software flexibility, extensibility, and reusability
• Reduce development and maintenance cost
• Enable application developers to use tools and
programming languages they are familiar with.
• Use as glue for many technologies
• Python is well suited to tying together many different
languages/technologies.
13
What is the Java CoG Kit ?





The Java CoG Kit provides a mapping between Java
and the Globus Toolkit. It extends the use of Globus
by enabling access to advanced Java features such
as events and objects for Grid programming.
The Java CoG Kit is implemented in pure Java. It
speaks the Grid protocols.
It is not a wrapper of the C Globus Toolkit
This allows integration within applets.
Mostly client side support
14
What is the Python CoG Kit?



Similarly the Python CoG Kit provides a mapping
between Python and the Globus Toolkit. It extends
the use of Globus by enabling to access advanced
Python features such as events and objects for Grid
programming.
The Python CoG Kit is implemented as a series of
Python extension modules that wrap the Globus C
code.
Uses SWIG (http://www.swig.org) to help generate
the interfaces.
15
Status: Java CoG Kit


Modified core Globus components (Protocols)
Basic services are provided accessing:
–
–
–
–
–
–


Security
(GSI)
Remote job submission and monitoring (GRAM)
Quality of service
(GARA)
Remote Data Access
(GSIFTP)
Information Service Access
(MDS)
Certificate store
(myProxy)
Current 100% client side components includes
Reusable Grid GUI components
16
Status: Python CoG Kit

Basic services are provided accessing:
–
–
–
–
–
–
–
Security
Remote job submission and monitoring
Secure high-performance network IO
Protocol independent data transfers
High performance Grid FTP transfers
Support for building Grid FTP servers
Remote file IO
(security)
(gramClient)
(io)
(gassCopy)
(ftpClient)
(ftpControl)
(gassFile)
17
Common Grid Operations

On which computers can I perform my task?
– Use a Grid information service (MDS) to answer this.

How can I execute a job on a remote machine?
– Use the job submission API or GUI

How can I access data on a remote machine?
– Use Grid FTP.
– Use the protocol independent Gass Copy module.

Secure high-performance network IO.
18
Tiny Information Query Program
MDS mds = new MDS ("mds.globus.org", 389, "o=Grid")
MDSresult result = mds.search
("&(objectclass=GridComputeResource)
(freenodes=64))",
"contact freenodes totalnodes");
String contact = result.get("contact");
System.out.println(result.print());
This can also be done with JNDI or Netscape SDK
 We have this layer for portability

19
Java CoG LDAP Browser
20
Remote Job Submission


Uses Globus GRAM protocol
Uses GSI for mutual authentication
– Can perform delegation to the remote process

Supports access to heterogeneous resources
– Supports uni-processor and parallel job submissions
– Supports various scheduling systems e.g., LSF, PBS, etc…

Provides a job description language
– Uses Globus Resource Specification Language (RSL)
21
Gram

Creating a job
– GramJob job = new GramJob
("&(executable=/bin/sleep)(arguments=15)");

Listening to state changes via Listeners
– job.addListener( new GramJobListener() {
public void statusChanged(GramJob job) {
System.out.println(“Job [” +
job.getIDAsString() + “]:" +
" Status : "+ job.getStatusAsString());
}
});
22
gramClient

Creating a job.
try:
gramClient = GramClient.GramClient()
callbackContact = gramClient.set_callback(func, condV)
jobContact = gramClient.submit_request(“clipper.lbl.gov”,
“&(executable=/bin/sleep)(argument=15)”,
GramClient.JOB_STATE_ALL)
except GramClient.GramClientException, ex:
print ex.msg
Callback for state changes.
"
def func(cv, contact, state, error):
if state == GramClient.JOB_STATE_PENDING:
print "Job is pending"
elif state == GramClient.JOB_STATE_ACTIVE:
print "Job is active"
23
Gram Submission Demo
Hello World
A test to stderr
1
3
2
2
3
1
Bang
Bang
24
Remote Data Transfer

Provides both protocol dependent and protocol
independent transfer mechanisms
– Directly use the Grid FTP protocol to transfer the data.
•
•
•
•
Uses GSI for mutual authentication.
Supports partial file transfers and restarts.
Supports tunable network parameters.
Provides extensions to standard FTP protocol to support
striped and parallel data transfer.
– Use the gassCopy module for protocol independent
transfers.
• It currently supports the ftp, gsiftp, http, and https protocols.
25
GSI FTPClient
(not GridFTP, as it is not yet Globus,
alpha)
import org.globus.io.ftp.*;
GSIFTPClient ftp = new GSIFTPClient(host, port);
ftp.authenticate();
ftp.setType(FTPClient.BINARY);
// we have convenient calls like
ftp.makeDir(dir);
ftp.deleteDir(dir3);
// the listener for the transfers....
listener = new TransferProgressListener {
public void transfer(int total, int current, String from, String to) {
System.out.println(total + " " + current + " " + from + " " + to);
}
public void transferError(String from, String to, Exception e) {
System.out.println("transfer failed: " + from + " " + to);
}
}
26
Simple Examples

Retrieve a Makefile from a remote site and save it to
Makefile.bak
– File dst = new File (“Makefile.bak”);
– ftp.get (“Makefile”, dst, listener);

Retrieve all files in remote directory to a directory called
“Destination”
– File destinationDir = newFile(“Destination”);
– ftp.Get(“*”, destinationDir, true, listener);

Disconnect
– ftp.disconnect();

Examples in source code: apis/examples, UrlCopy, FTPClient
27
ftpClient

Use Grid FTP to transfer a file.
handleAttr = FtpClientHandleAttr.FtpClientHandleAttr()
opAttr = FtpClientOperationAttr.FtpClientOperationAttr()
marker = FtpClientRestartMarker.FtpClientRestartMarker()
ftpClient = FtpClient(handleAttr)
ftpClient.get(url, opAttr, marker, done_func, condV)
handle = ftpClient.register_read(buf, data_func, 0)
def data_func(cv, handle, buffer, bufHandle, bufLen, offset, eof, error):
g_dest.write(buffer)
if not eof:
try:
handle = g_ftpClient.register_read(g_buffer, data_func, 0)
except Exception, e:
28
Java UrlCopy (protocol independent file transfer)
import org.globus.io.urlcopy.*;
UrlCopy c = new UrlCopy();
c.setSourceUrl(from);
c.setDestinationUrl(to);
c.setUseThirdPartyCopy(true); // hint to enable thridparty transfer
// register a transfer listener....
c.setListener(new UrlCopyListener() {
public void transfer(int total, int current) {
System.out.println(total + " " + current); }
public void transferError(Exception e) {
System.out.println("transfer failed: " + e.getMessage());
}
});
c.copy(); // this starts the copy
29
gassCopy

Provides a protocol independent API to transfer
remote files.
srcAttr
= GassCopyAttr()
handleAttr = GassCopyHandleAttr()
destAttr
= GassCopyAttr()
ftpSrcAttr = FtpOperationAttr()
ftpDestAttr = FtpOperationAttr()
srcAttr.set_ftp(ftpSrcAttr)
destAttr.set_ftp(ftpDestAttr)
copy = GassCopy(handleAttr)
copy.copy_url_to_url(srcUrl, srcAttr, destUrl, destAttr)
30
Secure High-Performance IO.


Uses the Grid Security Infrastructure to provide
authentication.
Provides access to the underlying network
parameters for tuning performance.
31
TCP Server example
attr = NetIOAttr.TCPIOAttr()
attr.set_authentication_mode(
io.GLOBUS_IO_SECURE_AUTHENTICATION_MODE_GSS_API)
authData = AuthData.AuthData()
authData.set_callback(auth_callback, None)
attr.set_authorization_mode(
io.GLOBUS_IO_SECURE_AUTHORIZATION_MODE_CALLBACK,
authData)
attr.set_channel_mode(
io.GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAP)
soc = GSITCPSocket.GSITCPSocket()
port = soc.create_listener(attr)
soc.listen()
childSoc = soc.accept(attr)
buf = Buffer.Buffer(size)
bytesRead = childSoc.read(buf, size, size)
We will develop a similar example for Java, It is already available as part of the GASS server.
32
TCP Client example
attr = NetIOAttr.TCPIOAttr()
attr.set_authentication_mode(
io.GLOBUS_IO_SECURE_AUTHENTICATION_MODE_GSS_AP)
authData = AuthData.AuthData()
attr.set_authorization_mode(
io.GLOBUS_IO_SECURE_AUTHORIZATION_MODE_SELF,
authData)
attr.set_channel_mode(
io.GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAP)
soc = GSITCPSocket.GSITCPSocket()
soc.connect(host, port, attr)
nBytes = soc.write(str, len(str))
33
Java CoG Requirements



JDK 1.2.2
Security
– Needs security package such as IAIK
We are able to replace security packages based on a
couple of common functions with public domain
packages.
– We hope that these package will reach the
maturity of IAIK
34
Python CoG Requirements




Python 2.0+
SWIG 1.3.6+
The gsi-ftp-alpha release of Globus.
Support for dynamic libraries.
35
Future Directions


Develop a uniform CoG API to be used by Java and
Python
Investigate and prototype web services
– WSDL, SOAP, UDDI
• Integration with existing Grid protocols
– Investigate integrating GSI security


Investigating XML based GUI description languages
for the rapid development of language independent
interfaces.
Improve documentation and general organization of
various CoG kit projects
36
Review




CoG kits can be used to integrate with COTS.
Take advantage of existing software libraries
(network, ldap, ...).
CoG kits provide greater support for scripting and
GUI’s.
Grid programming in Java and Python is easier and
faster!
37
References
Java CoG Kit
– http://www.globus.org/cog.
– A Java Commodity Grid Kit, Gregor von Laszewski, Ian
Foster, Jarek Gawor, Peter Lane, Concurrency and
Computation: Practice and Experience, pages 643-662,
Volume 13, Issue 8-9, 2001.
Python CoG Kit
– http://www-itg.lbl.gov/Grid/projects/pyGlobus/
Globus
– http://www.globus.org.
38
The Grid Portal Development Kit





A "Grid portal" is a customizable, personalized web interface for
harnessing Grid services and resources.
The Grid Portal Development Kit (GPDK) provides a core set of
modular, reusable components for accessing Grid services in
the form of Java beans.
GPDK makes use of the Java CoG kit for access to Globus
Grid services
GPDK takes advantage of the Tomcat servlet container, the
latest open source reference implementation of the Sun Servlet
and Java Server Pages specifications.
GPDK provides a complete development environment including
template projects that can be easily extended to support
additional services or customized problem solving
environments.
39
GPDK Architecture
40
Core Grid Services





Uses GSI for mutual authentication to remote
resources
Portals use Myproxy online credential repository to
access users delegated credentials.
Grid FTP allows secure file transfer/browsing
capabilities including third-party file transfers
LDAP provides access to Grid information services
Globus GRAM protocol provides job submission
capabilities to Globus gatekeepers
41
GPDK Design




Contains core Grid services expressed as Java
beans.
Provides template JSP pages to demonstrate use of
Grid service beans.
User profiles expressed as session data are
represented as serializable beans.
Follows "Model 2" MVC JSP/Servlet architecture.
– Controller servlet forwards control to Page objects for
business logic and JSP view pages
42
GPDK Core Service Beans

Job Submission
– JobBean, JobSubmissionBean, etc.

Security
– MyproxyBean for retrieval of credentials

File Transfer
– GSIFTPServiceBean, GSIFTPViewBean

Information Services
– MDSQueryBean, MDSResultsBean

User session information
– UserProfileBean
43
GPDK for Portal Development



GPDK allows developers to create self-contained
portal projects complete with build scripts and preprocessed documentation.
Provides sub-classed GPDK java beans to provide
project specific user profile beans and portal
initialization/shutdown routines
Build script compiles and deploys portal to Tomcat
server
44
Projects Using GPDK

Several portal projects are making use of GPDK:
–
–
–
–
The ASC Collaboratory - an astrophysical application portal
The NCSA Chemical Engineering Workbench portal
NASA IPG Launchpad portal
2 NASA application portals in the areas of data mining and
computational astrophysics.
– NCSA working on providing portal access to computational
chemistry packages.
– CERN portal investigating GPDK for managing HEP data.
45
Future Directions




Support for additional Grid services including secure
LDAP, databases, etc.
Taking advantage of new Java CoG developments
e.g. Improved GRAM protocol, GridFTP
checkpoint/restart ability.
GPDK administrative pages and tools to enable
simple creation of new portal pages
Preliminary investigation into migrating GPDK beans
to "web services" model.
46
WebServer-SG


Provide a turn-key solution to building and deploying
secure web server
Installs the following software
–
–
–
–
OpenSSL
Apache
mod_ssl, mod_dav
Tomcat

Preprocesses Apache and Tomcat config files

http://www-itg.lbl.gov/Grid/projects/WebServer-SG.html
47
More Info

GPDK Project page (also have developer
slides online)
http://www-itg.lbl.gov/Grid/projects/GPDK/index.html

Get source off CVS
cvs -d :pserver:gpdk:palomar.extreme.indiana.edu:/work/cvs
login
(hit return)
cvs -d :pserver:gpdk:palomar.extreme.indiana.edu:/work/cvs
co gpdk

Subscribe to mail list
http://mailman.cs.indiana.edu/mailman/listinfo/extreme-portals
48