Relational data - University of Kansas
Download
Report
Transcript Relational data - University of Kansas
Here is a Perl command-line client program that accesses this
CLSDservice:
Making relational data available on the Grid:
A survey of methods from CGI to OGSA-DAI
Michael Grobe
Indiana University
1 Introduction
This poster presents several alternative approaches to making
(read-only) relational data sources available over a computational
Grid as an aid for information providers who may wish to make
their data available. The various alternatives help define what it
means to “put data on the Grid.”
2 JDBC
Java DataBase Connectivity (JDBC) is used within each approach
presented here, but there exist several alternative ways in which
to use it. The major question are: From where should the JDBC
calls be made, and what user interfaces can be used?
The target database for demonstration examples in this poster is
the Centralized Life Sciences Data (CLSD) service at Indiana
University [5]. CLSD presents a collection of life science data
converted to relational form and/or federated into a single relational
database managed by an IBM DB2 database management system
(DBMS). Figure 0 shows shows a Java code fragment that will
interact with CLSD via JDBC.
3 CGI and Java Servlet Engines
There exists several platforms that can be used to pass dynamic
queries to database systems, two of the most important being the
Common Gateway Interface (CGI) and Java Servlet “engines”,
such as Apache Tomcat. Both of these interfaces rely upon HTTP
POST. Here is an example POST request to an Apache HTTP
server running on a computer housing a file named “/cgibin/getdata.pl” in its filesystem:
POST /cgi-bin/getdata.pl HTTP/1.0
Host: my.host.edu:8080
Accept:text/html,text/plain,image/*
Accept-Encoding:gzip,compress
Accept-Language:en
Pragma:no-cache
Cache-Control:no-cache
User-Agent:Lynx/2.8.5dev.7
Content-type:application/x-www-form-urlencoded
Content-length:83
..blank line..
Query=select+tabschema,tabname+from+syscat.tables
&account=testaccount&password=test
This request defines three variables to be passed to the script
within /cgi-bin/getdata.pl: an SQL select command, an account
name, and an account password. When the server receives this
request, it will start the script, and arrange for script output to be
passed back to the browser.
4 Web Services, JAX-RPC and SOAP
There is a separate set of technologies for performing so-called
"remote procedure calls," which allow a program running on one
computer to call functions running on another computer.
There have been several approaches to remote procedure calls in
the Web world. In general they allow the definition of remote
procedure calls as "Web Services". (This is NOT the same as
accessing an arbitrary Web page or using a CGI or Servlet engine
script to build pages on-the-fly.)
The most important such approach for purposes of this discussion
is the Java API over XML for Remote Procedure Calls (JAX-RPC).
Here, the Extensible Markup Langage (XML) is used to exchange
data between a program and its (remote) functions.
Figure 1 shows an outline for a Web Service called “CLSDservice”
that “exposes” a method named “queryCLSD”. queryCLSD sends
an SQL command to CLSD via DB2 and returns all or part of the
resulting table.
queryCLSD requires 6 parameters: the SQL command, the number
of the first row of the resulting table to return, the total number of
rows to return, an account name, that account’s password, and the
format in which to return the result.
#!perl -w
use SOAP::Lite;
# Set up the call to CLSD using SOAP.
$host = “discover.uits.indiana.edu”;
$service = SOAP::Lite -> service(
“http://$host:8421/axis/CLSDservice.jws?wsdl” );
# Make the call to CLSD.
$result = $service->queryCLSD(
“select tabschema,tabname from syscat.tables”,
1, 5, "DB2account", "password“, “CSV“ );
print $result;
This program does NOT need an interface to the database, since it
doesn’t interact with the database directly, but rather through the
“queryCLSD” method provided by the CLSDservice.
This program DOES, however, require the SOAP::Lite Perl module to
orchestrate interaction with CLSDservice, which then uses JDBC to
interact with DB2. SOAP::Lite will embed the function call in an HTTP
request within a SOAP “envelope”, transmit the request, receive and
decode the response, which will also be embedded in a SOAP
envelope, and return the result to the calling routine.
Here is an example JAX-RPC POST generated by the above call to
queryCLSD (highly edited for readability):
POST http://localhost:8421/axis/CLSDservice.jws
Accept: text/xml
Accept: multipart/*
Content-Length: 1009
Content-Type: text/xml; charset=utf-8
SOAPAction: ""
..blank line..
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope
SOAP-ENV:encodingStyle=
"http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<namesp1:queryCLSD">
<query xsi:type="ns:string">
select tabschema,tabname from syscat.tables
</query>
<startingRowToPrint xsi:type="ns:string">
1
</startingRowToPrint>
<maxRows xsi:type="ns:string">
5
</maxRows>
<account xsi:type="ns:string">
DB2account
</account>
<password xsi:type="ns:string">
password
</password>
<format xsi:type=”ns:string”>
CSV
</format>
</namesp1:queryCLSD>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
On the server side, the Apache Axis Web Services container [1] will
arrange for parameters to be delivered to the queryCLSD method
within the CLSDservice.
This approach allows the creation of lightweight desktop, Servlet,
Portlet, and AJAX applications that can access CLSD or other
relational data resources from anywhere on the Internet.
5 WSRF
The Web Services Resource Framework (WSRF) extends Web
Services by defining a “Web Services Resource” [7] that provides:
- access to a Web Service, typically one that can interact with a realworld computational resource, plus
- a persistent property list containing information relevant to that
service, and
- a standardized list of methods that clients can use to manipulate
those properties.
For example, a WS-Resource may provide the capability to submit a
batch job to a supercomputer queue, along with a property list that
describes that queue, such as queue length, mean expected wait time,
max number of processors available, all within the Grid Security
Interface (GSI) environment, etc.
WSRF is actually a collection of specifications that describe different
aspects of managing Resource services and properties:
WS-Resource
WS-ServiceGroup (WS-SG)
WS-ResourceProperties (WS-RP) WS-BaseFaults (WS-BF)
WS-ResourceLifetime (WS-RL)
WSRF-RP, in particular, defines operations (also known as
“PortTypes”) that can be provided to manipulate properties, such as:
GetResourceProperty
SetResourceProperties
GetMultipleResourceProperties
QueryResourceProperties
The simplest client code for accessing CLSD will be very similar to the
WebServices code. However, WSRF opens up other options, such as
the ability to build data pipelines that pass EPRs identifying semipermanent storage among applications, much as a file name is passed
to a program subroutine in common programming practice. For
example, one WSRF version of the CLSD service accepts an EPR as
well as an SQL command, stores query results in the resource
identified by the EPR, and returns control to the client.
6 OGSA and OGSA-DAI
The Open Grid Services Architecture (OGSA) “…defines a set of core
capabilities and behaviors that address key concerns in Grid systems.”
[2] It does not, however, implement or define how to implement such
core capabilities.
In general, “Grid systems aim to integrate, virtualize, and manage
resources and services within distributed, heterogenous, dynamic
virtual organizations.” [2] Virtualization brings multiple technologies to
bear to insulate users and developers from the physical locations,
network addresses, administrative details, and underlying software
infrastructures that enable access to higher-level services and data.
OGSA Data Access and Integration (OGSA-DAI) [6] is a very flexible
and powerful data access framework that can be used within an OGSA
grid environment. It provides various data movement, virtualization,
and manipulation services that transform the use of data into a higherlevel workflow.
OGSA-DAI has been implemented over WSRF as a higher-level OGSA
service. It supports the construction of configurable WSRF-based
services that take client requests for both access to, and transformation
of, data which can be returned to clients in a variety of ways.
An OGSA-DAI service can replace the CLSD Web Service and the
WSRF-based web services just described; it can accept SQL queries,
relay them to DB2, and return DB2 responses to the user. Users do
not have to write their own Web Services to access their data sources,
but can simply install the OGSA-DAI service over a WSRF container
and create clients for that OGSA-DAI service.
The OGSA-DAI Client Toolkit [6] provides an interface to many OGSADAI services. The Java client shown in Figure 2 uses the Client Toolkit
to send a hard-coded query to CLSD (here known as the
“DB2Resource”) and retrieve an OGSA-DAI WebRowSet. It then
creates a JDBC ResultSet object from the WebRowSet, and returns
data to the user via JDBC calls to retrieve individual fields within each
returned row.
This script displays only a small part of the functionality provided by
OGSA-DAI. In addition, an OGSA-DAI service can be configured to:
- operate on XML or text data sources, as well as relational data
sources,
- perform a series of operations (also known as “activities”) as
part of a single request,
- deliver results to a third party (via FTP, GridFTP, SMTP, etc.)
or to another data service,
- deliver results asynchronously, which can be very useful
for long-running requests, and
- utilize authentication methods supported by WSRF to provide
grid-based security.
Also, exposing a database via OGSA-DAI makes it available for OGSA
Distributed Query Processing (OGSA-DQP), so that its use may be
further virtualized within the DQP model. (In some cases, however,
OGSA-DAI and DQP may introduce performance penalties.)
Fig. 0. Using Java to access CLSD (without try/catch).
// Declare JDBC drivers and connect to DB2.
Class.forName( "com.ibm.db2.jcc.DB2Driver" );
Connection con = DriverManager.getConnection(
"jdbc:db2://hostname.uits.iu.edu:50000/clsd2",
account, password );
// Prepare a JDBC statement containing an SQL query,
// submit it to DB2, and capture the returned result set.
Statement statement = con.createStatement();
ResultSet resultSet = statement.executeQuery( query );
// Query result set metadata for column names and types
// to return as the first row, and...
ResultSetMetaData rsmd = resultSet.getMetaData();
int numcols = rsmd.getColumnCount();
String resultLine =“”;
for( int colCount = 1; colCount <= numcols; colCount++ )
{ String result = rsmd.getColumnLabel( colCount );
resultLine += “ “ + result;
}
System.out.println( resultLine );
// ...then collect and print the contents of each data row.
while ( resultSet.next() )
{
resultLine = “”;
for( int colCount = 1; colCount <= numcols; colCount++ )
{
String returned = resultSet.getString( colCount );
resultLine += “ “ + returned;
}
System.out.println( resultLine );
}
Fig. 1. Outline of queryCLSD method in CLSDservice class.
public class CLSDservice
{
public String queryCLSD( String query,
String startingRowToPrint,String maxRows,
String account, String password, String format )
{ // Process input parameters.
// Access CLSD as shown in Fig. 0.
}
Fig. 2. Example client to access an OGSA-DAI resource.
public class queryCLSD
{
public static void main(String[] args) throws Exception
{ // Create an instance of the data service.
String handle = http://localhost/wsrf/ +
“services/ogsadai/DataService";
String id = "DB2Resource";
DataService service =
GenericServiceFetcher.getInstance().\
getDataService(handle, id);
// Define a request composed of one activity.
SQLQuery query = new SQLQuery(
"select tabschema,tabname from syscat.tables");
WebRowSet rowset = new
WebRowSet( query.getOutput() );
ActivityRequest request = new ActivityRequest();
request.add( query );
request.add( rowset );
// Submit the request and retrieve results.
Response response = service.perform( request );
ResultSet result = rowset.getResultSet();
ResultSetMetaData rsmd = result.getMetaData();
int numCols = rsmd.getColumnCount();
Acknowledgments
Thanks to Lisa Childers, et al. for a great tutorial on WSRF at
TeraGrid 2006. Thanks to Andy Arenson and Scott McCaulay for
providing the opportunity to prepare this poster.
References
[1] Apache Foundation, “Axis User’s Guide”.
[2] Foster, Ian, et al. “The Open Grid Systems Architecture, Version
1.5”, 2006.
[3] Globus Alliance, “How to Build a Service Using GT4”.
[4] Globus Alliance, The Globus Grid Toolkit
[5] Indiana University, The Centralized Life Sciences Data (CLSD)
Service.
[6] Open Middleware Infrastructure Institute, “The OGSA-DAI Project”
[7] Sotomayer, Boria and Lisa Childers, Globus Toolkit 4:
Programming Java Services, Morgan-Kaufmann Publishers, San
Francisco, 2006.
// Return results as single String.
} // end queryCLSD
// end Class CLSDservice
}
// Display each column from each row.
while( result.next() )
{
for( int colCount = 1; colCount <= numCols;
colCount++ )
{
out.print( “ “ + result.getString(colCount) );
}
out.println();
}
} // end main
// end Class queryCLSD