Transcript Slide 1

The Centralized Life Sciences Data (CLSD) service
Michael Grobe
Scientific Data Services
Research Computing
University Information Technology Services
Indiana University at Indianapolis
([email protected])
January 2007
1
Outline
Basic genome science processes and vocabulary
Basic relational algebra
Simple SQL as an expression of the relational algebra
DB2 and the Federated Server
CLSD data sources: “relationalized”, mirrored, and federated
Accessing CLSD
Directions for possible future work:
Adding data sources
Integrating more completely with the TeraGrid
Integrating with other Grids
Questions, suggestions
2
Some chemistry
A “polymer” is a chemical composed of many similar units, e.g. polyvinyl
chloride, starches, etc.
DNA is a (usually double-stranded) polymer composed of nucleotides:
Thymine, Adenosine, Cytosine, and Guanine
DNA carries genetic information. Individual units of genetic information
are stored in individual (possibly quite long) segments of DNA.
RNA is a (usually single-stranded) polymer composed of nucleotides:
Uracil, Adenosine, Cytosine, Guanine
There are many varieties of RNA (mRNA, snRNA, rRNA, snoRNA,etc.),
and they serve different functions within a cell. For example, RNA
“transfers” genetic information, catalyses reactions, and otherwise assists
or interferes with reactions.
3
Some more chemistry
Polymers are synthesized by catalysts called “polymerases” in a process
called “polymerization.”
Proteins are polymers composed of (over 20 different kinds of) amino
acids, such as:
Methionine (M), Isoleucine (I), Cysteine(C), Histidine (H), Alanine(A),
Glutamic acid (E), Leucine (L), etc.
Proteins:
•provide structure:
•microfilaments (polymers of actin),
•microtubules (polymers of tubulins),
•channels thru the cell wall, etc.
•catalyse and co-catalyse reactions, as “enzymes,”
•bind with DNA to enhance or inhibit “transcription” and “translation”,
•are sometimes marked for transport or degradation.
Protein primary, secondary and tertiary structures are important.
Proteins are degraded within proteasomes..
4
Genetic material: 2 meters of DNA packaged into less than 1.4 microns
From Atherly,et al., 1999
5
The central model of molecular genetics
DNA can be reliably replicated during the process of cell division, by DNAdependent DNA polymerases.
DNA can be “transcribed” to messenger RNA (mRNA) by DNA-dependent
RNA polymerases. Transcription takes place in the nucleus (or equivalent).
mRNA is transported to the cytoplasm where it is used as a template for
creating proteins by “ribosomes” in a process called “translation.”
The translation process encodes 1 amino acid for each 3 DNA bases in a
sequence (“triplet”).
The function mapping each of the 64 possible triplets to an amino acid is the
“genetic code.”
Ribosomes are complexes of RNA and protein.
6
The central model within the cell
Diagram from: http://www.ncbi.nih.gov/About/primer/images/proteinsynth4.GIF
(Don’t forget about degradation and recyling of AAs.)
7
The central model in more detail
(Graphics of DNA and RNA from Atherly, et al. 1999)
8
Mutations and polymorphisms
Nucleotide sequence
Wildtype:
ACTGAACTGATT
Substitution:
ACTGACCTGATT
Deletion:
ACTCTGATT
Insertion: ACTGAACCTGAACTGATT
Translated AA sequence
Thr–Glu–Leu-Ile
Thr-Asp-Leu-Ile
Thr-Leu-Ile
Thr-Glu-Pro-Gly-Leu-Ile
If mutations like these occur in genetic material within oocytes, they
may be transmitted to offspring, and define “polymorphic” gene
variations.
A Single Nucleotide Polymorphism (SNP) is a variation where one
base is changed and passed on to offspring (and occurs with sufficient
frequency).
A Deletion/Insertion Polymorphism (DIP) is a variation where multiple
bases have been removed or inserted into a sequence.
dbSNP is a database of SNPs and DIPs containing millions of entries,
and over 120K unique sequences that are inserted or deleted.
9
Scale of human genome data
Total number of bases:
3.2Gbp
(DNA from one half of one chromosome (chromatid) from each of 24
chromosomes: 22 autosomal chromosome pairs plus the sex
chromosomes.)
Percentage of genome consisting of protein coding genes:
Average gene length:
< 2%
~3Kbp (but up to 2.4Mbp)
Average exon length: 200bp
Average protein length: 500-600AA
Percentage of “junk” DNA: often said to be ~50%
Percentage of “junk” DNA now suspected to be transcribed (the “dark
matter” of the genome): ~50 to 100%
Some of that junk is mRNA that negatively regulates translation.
10
Process control: cancer-related reaction pathways from Hanahan, et al.
11
Basic relational algebra
The relational algebra operates on relations, which are sets of tuples of
the same arity, which is to say, collections of lists of the same length.
Here are two 4-tuples:
( 1, 2, 3, 4 )
( 8, 7, 9, 4 )
Relations are commonly represented as tables.
There are 5 primitive operations within the relational algebra:
Projection: extract specific columns from a relation
Selection: extract specific rows
Set union: create a new table composed of all the rows of
two other tables
Set difference: remove the rows in one relation that appear in
another
Cartesian product: “multiply” two tables to create a third
12
Cartesian product in more detail
Relation1 (arity 4; length 3)
Relation 2 (arity 3; length 2)
8
7
9
1
1
2
3
4
7
6
2
3
3
4
7
1
9
8
8
7
9
1
3
4
7
8
7
9
1
1
9
8
1
2
3
4
3
4
7
1
2
3
4
1
9
8
7
6
2
3
3
4
7
7
6
2
3
1
9
8
Cartesian product (arity: 4 + 3; length: 3 * 2)
13
Relational databases and query languages
Database management systems based on the relational algebra were
described by Edward F. Codd working for IBM in the early 1970s.
Codd’s formulation included:
•indexes and keys,
•decomposition into normal forms, and
•integrity constraints.
Multiple languages and interfaces were developed to query and modify
collections of relations, among them the Structured English Query
Language, SEQUEL, developed by Chamberlain and Boyce.
14
SQL as an implementation of the relational algebra
The most successful such language,SQL, was based on SEQUEL.
SQL requires that each relation has a tablename, and each tuple position
has a “fieldname”:
Players (arity 4; length 3)
Player
Innings
Hits
Teamnum
ber
8
7
9
1
1
2
3
8
7
6
2
3
Teams (arity 3; length 2)
t_num
games
rank
3
4
7
1
9
8
15
SQL as an implementation of the relational algebra
SQL commands map to the relational primitives as follows, where
“*” stands for all fields in a table:
Projection
select fieldname_list from tablename
ex: select tnum,rank from Teams
Selection
select * from tablename where <logical expression>
ex: select * from Players where Teamnumber = 1
Union
(select fieldname_list from tablename1)
union
(select fieldname_list from tablename2)
use ALL to keep duplicates
Set difference
select * from (tablename1 except tablename2)
Cartesian product
select * from tablename1, tablename2
Note that SQL does not specify how to perform a query; only what the
result should be. It is a “declarative,” rather than “procedural,” language.
16
The relational join operation
An SQL “join” is a Cartesian product followed by a selection, as in:
select * from Players, Teams
where Players.Teamnumber = Teams.t_num
which results in a Cartesian product table with only 2 (red) rows:
Player
Innings
Hits
Teamnu
mber
t_num
games
rank
8
7
9
1
3
4
7
8
7
9
1
1
9
8
1
2
3
4
3
4
7
1
2
3
4
1
9
8
7
6
2
3
3
4
7
7
6
2
3
1
9
8
17
IBM’s DB2 and WebSphere Federated Server,
nee Information Integrator, nee DiscoveryLink
DB2 is a fully-featured relational database system that can house and serve
large databases.
Data is usually imported in relational form, structured as rows composed of
individual data values, possibly identified by unique IDs (keys).
DB2 can also access data in tables managed by other, usually physically
remote, database management systems, such as Oracle, MySQL or DB2.
This process is known as “data federation.”
DB2 can also federate some external resources that are not normally
accessed as relational tables (e.g. Blast). Such resources are transformed,
or “relationalized” on-the-fly by “wrappers”.
Once these resources have been registered with their wrappers they may be
referred to within SQL queries as is any other resource.
18
WFS diagram from Del Prete
19
Some WFS jargon
Wrapper: a library to access a particular class of data sources or
protocols.
Each wrapper contains information about data source characteristics.
There are BLAST and PubMed wrappers, and now a “generic Script
wrapper” that talks to user scripts.
Server: represents a specific data source (user mappings maybe required
for authentication)
Nickname: a local table name (alias) for a data on a server (mapped to
rows and columns)
A nickname looks like a table, but links to a server, which links to a
wrapper/data source, where the wrapper knows how to process the data
from the source.
20
Using NCBI data within DB2: More than just mirroring
Mirroring usually implies maintaining exact copies of data sources.
Most data mirrored by CLSD must not only be copied, but also inserted
into the CLSD relational structure.
This is accomplished by a series of scripts that:
•Download the data from its external site,
•Convert it to a form that can be used to update CLSD tables,
•Insert the data into tables, and
•Monitor the overall process to identify and log errors.
These scripts are run regularly from crontab entries, and monitoring results
are examined after every run.
21
CLSD “relationalized” data sources
BIND -- Pathways, Gene interactions
ENZYME -- Enzyme nomenclature
ePCR -- ePCR results of UniSTS vs Homo sapiens
KEGG data sources:
LIGAND -- Pathways, Reactions, & Compounds
PATHWAY -- Pathway map coordinates
NCBI data sources:
LocusLink -- Genetic Loci. (LocusLink has been inactive since
July 1, 2005 when it was retired in favor of UniGene.)
UniGene -- Gene clusters
SGD -- Saccharomyces Genome Database
22
KEGG datasource info
PATHWAY:
LIGAND:
42,273 pathways generated from 306 reference pathways
14,238 compounds,
4,111 drugs,
10,951 glycans,
6,810 reactions ,
7,127 reactant pairs
23
CLSD federated data sources
Federated NCBI data sources (subject to hit rate throttling):
Nucleotide -- Nucleotide sequences
PubMed -- Journal abstracts
Federated local mirrors of NCBI data sources (not throttled):
Blast (updated monthly) is mirrored by UITS
dbSNP (updated at major builds) is mirroed by IUSM
Some KEGG resources are federated via the FS KEGG user-defined
functions
24
Examples from the CLSD web site
http://scidata.iu.edu/CLSD/sql-in-db2.shtml
•
To get a list of genes containing "brain" in their LOCUS_NAME in
dbSNP126_shared:
select * from DBSNP126_SHARED.GENEIDTONAME
where locus_name like '%brain%'
•
To get a list of Bind Genes and their species:
select GeneNameA,Organism from bind.bind_interaction
•
To get a list of genes mentioning "HUMAN" in their descriptions in KEGG:
select * from KEGG.GENE where description like '%HUMAN%'
•
To get some info from PubMed:
select PMID, ArticleTitle FROM NCBI.pmarticles
where entrez.contains (ArticleTitle, 'granulation') = 1
AND entrez.contains (PubDate, '1992') = 1
25
BLAST: Both mirrored and federated
NCBI Blast is typically accessed via a web page at NCBI, or some mirrored
site.
Data is returned in a typical web interface format suitable for users.
Within CLSD, BLAST is accessed via an SQL query and data is returned as
a table that can be manipulated as is any other DB2 table.
For example, here is an SQL query that invokes a blastall process running
on libra00 from within DB2:
select GB_ACC_NUM, description, e_value from
ncbi.BLASTN_NT where BlastSeq =
'AGTACTAGCTAGCTAGCTACTAGCTGACTGACTGACTGATGCATCGATGATGC‘
The local version of blastall conducts the search and returns results
encoded within XML (by specifying the –m7 parameter).
26
The DB2 federation software converts the XML encoded results into something
like this:
GB_ACC_NUM
(VARCHAR)
DESCRIPTION
(VARCHAR)
E_VALUE
(DOUBLE)
AE003644
Drosophila melanogaster chromosome
2L, section 53 of 83 of the complete sequence
0.00666475
AE003410
Drosophila melanogaster, chromosome
2L, region 34C4-36A7 (Adh region), section
4 of 10 of the comple
0.00666475
AC092228
Drosophila melanogaster, chromosome
2L, region 35X-35X, BAC clone
BACR21J17, complete sequence
0.00666475
AP008207
Oryza sativa (japonica cultivar-group)
genomic DNA, chromosome 1, complete
sequence
0.0263349
AP003197
Oryza sativa (japonica cultivar-group) genomic
DNA, chromosome 1, BAC clone:B1015E06
0.0263349
AP003105
Human DNA sequence from chromosome 1,
putative argumentativeness gene GROBE1
0.0263349
27
Modifying BLAST search settings via SQL
Parameters sent to blastall can be set by using equality comparisons as
assignment statements within SQL conditionals, as in:
select Score, E_Value, HSP_Info, HSP_Q_Seq, HSP_H_Seq
from ncbi.BLASTN_NT
where BlastSeq = 'gagttgtcaatggcgagg'
and gapcost=8 and E_Value < .0005
which will pass gapcost and e-value settings on to blastall.
28
BLAST data sources available via CLSD
Here is a list showing which search types are supported by the DB2 BLAST
wrapper within CLSD.
BLAST search type: Data sources
BLASTN: NT, EST_HUMAN, EST_MOUSE, and EST_OTHER
A nucleotide sequence is compared with the contents of a
nucleotide sequence database.
BLASTP: NR, SP
An amino acid sequence is compared with the contents of an amino
acid database.
BLASTX: NR, SP
A nucleotide sequence is compared with the contents of an amino
acid sequence database. Query is translated in all six reading frames.
29
Examples from IBM
Query 1: Given a search sequence, search nucleotide (NT), and return the hits
for only those sequences not associated with a Cloning Vector. For each hit,
display the Cluster ID and Title from Unigene, in additon to the Accession
Number and E-Value. Only show the top 5 hits, based on the ones with the
lowest E-values.
Select nt.GB_ACC_NUM, nt.DESCRIPTION, nt.E_VALUE,
useq.CLUSTER_ID, ugen.TITLE
From ncbi.BLASTN_NT nt, unigene.SEQUENCE useq,
unigene.GENERAL ugen
Where BLASTSEQ =
‘GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGC
CGAGGCGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCTGGCTAACACG
GTGAAACCCCGTC’
And nt.DESCRIPTION not like ‘%cloning vector%’
And nt.GB_ACC_NUM = useq.ACC
And useq.CLUSTER_ID = ugen.CLUSTER_ID
Order by E_VALUE FETCH FIRST 5 ROWS ONLY
30
User-defined functions (supplied by IBM)
There exist special functions for manipulating sequence patterns:
•LSPatternMatch
•LSPrositePattern
To get a list of (aspartate aminotranserase) BLAST results filtered by a
(pyridoxal phosphate attachment site) pattern specified in PROSITE pattern
language:
select gb_acc_num, HSP_H_SEQ from ncbi.blastp_nr where
blastseq='MSQICKRGLLISNRLAPAALRCKSTWFSEVQMGPPDAILGVTE\
AFKKDTNPKKINLGAGAYRDDNTQPFVLPSVREAEKRVVSRSLDKEYATIIGI\
PEFYNKAIELALGKGSKRLAAKHNVTAQSISGTGALRIGAAFLAKFWQGNREI\
YIPSPSWGNHVAIFEHAGLPVNRYRYYDKDT'
and DB2LS.LSPatternMatch(HSP_H_SEQ,
DB2LS.LSPrositePattern(
'[GS]-[LIVMFYTAC]-[GSTA]-K-x(2)-[GSALVN].' ) ) > 0
Note the use of the period (.) to terminate the PROSITE pattern, and that the
LSPatternMatch function returns the character position of the left-most
substring matching the pattern, or zero if there is no match.
31
Accessing CLSD: getting an account
To access CLSD you must have an account on the Libra Cluster at IU (aka
libra00.uits.iu.edu).
If you don’t have an account and are associated with Indiana University,
request an account by filling out a Research Systems Account Application at
http://rac.uits.iu.edu/rats/forms/application.php.
In the comments section of the account request, add that you need a local
and persistent password for use with CLSD.
Once you have a Libra account, send email to SDS at data @ indiana.edu
and request instructions for defining a local and persistent password for use
with CLSD.
TeraGrid users should send e-mail to SDS at data @ indiana.edu explaining
how CLSD will be used, and describing their TeraGrid activities. SDS will
then arrange for an appropriate Libra account and send instructions for
defining a suitable password.
32
Accessing CLSD: options
DB2 can be accessed in a variety of ways:
•DB2 Command Line Processor (Unix, Windows)
•DB2 Control Center (wherever JRE is running)
•DB2 driver for Perl DBI
•DB2 drivers for the Java Database Connectivity (JDBC) Application Program
Interface (API), especially the JDBC Universal Driver
•Demonstration Web page (invokes a Java servlet that uses JDBC):
http://discover.uits.indiana.edu:8421/access/
•Demonstration WebService (invoked as a function call via JAX-RPC):
http://discover.uits.indiana.edu:8421/axis/CLSDservice.jws?wsdl
•Demonstration Web page (invokes a Java servlet that invokes the CLSD
WebService):
http://discover.uits.indiana.edu:8421/access/index-for-service.html
•Experimental WSRF Resource (using WSRF within a GT4 container)
•Experimental OGSA-DAI service (running within a GT4 container)
33
JDBC access
Connect to the CLSD:
Class.forName( "com.ibm.db2.jcc.DB2Driver" );
con = DriverManager.getConnection(
"jdbc:db2://libra00.uits.iu.edu:50000/clsd2",
accountName, accountPassword );
Prepare a query, send it to the db, and receive a result:
statement = con.createStatement();
resultSet = statement.executeQuery( query );
Get some query meta-data (column labels and column data types):
ResultSetMetaData rsmd = resultSet.getMetaData();
result = rsmd.getColumnLabel( colCount );
result2 = rsmd.getColumnTypeName( colCount );
34
JDBC access (continued)
Get a row of data:
for( int colCount = 1; colCount <= numcols; colCount++ )
{
String returnedString = "";
// Must be predefined.
returnedString = resultSet.getString( colCount ) + "";
out.println( "<td>" + returnedString + "</td>\n" );
}
35
Accessing CLSD thru a WebService (JAX-RPC)
The Java API for XML-based Remote Procedure Calls, or JAX-RPC, is a
specification that defines a system for building distributed services (so-called
“WebServices”) within the client-server model.
JAX-RPC makes it possible for a function invocation in a client like:
a_variable = function_name( parameter_list)
to cause the function, “function_name,” to run on a remote server and return
a response containing the value to be assigned to the variable “a_variable”,
and a function invocation in a client like:
returnString = queryCLSD( "select * from syscat.tables",
"1", "5", "accountName", "accountPassword", “table” )
will return a (possibly very long) string containing the response to the query
(given that various linkages have been prearranged).
36
Outline of the CLSDservice
public class CLSDservice
{ // Full source at:
// http://scidata.iu.edu/CLSD/examples/CLSDservice.jws.txt
public String queryCLSD( String query, String startingRowToPrint,
String maxRows, String account, String password,
String format )
{
// Get a query string, etc. from the command line or Web
// browser.
// Declare JDBC drivers and connect to DB2.
// Prepare a JDBC statement containing the SQL query, submit
// it to DB2, and capture the returned JDBC result set.
// Query result set metadata for column names and types to
// return as the first row, and then collect the contents of
// each data row.
}
return theResponse;
} // end queryCLSD
// end Class CLSDservice
37
SOAP and WSDL
JAX-RPC uses SOAP and WSDL to establish the various linkages required to
implement remote procedure calls.
SOAP messages are usually encoded as XML messages within HTTP requests
where:
• A SOAP request is an HTTP POST request with an XML body.
• A SOAP response is an HTTP response header followed by an XML body.
Such RPC functions are “exposed” as “operations” when described within web
pages using the Web Services Description Language (WSDL).
38
Java command-line client to access CLSD via CLSDservice
public class testCLSDClient
{
public static void main(String [] args) {
try
{
String endpoint =
"http://discover.uits.indiana.edu:8421/axis/CLSDservice.jws";
Service service = new Service();
Call call = (Call) service.createCall();
call.setTargetEndpointAddress( new java.net.URL( endpoint ) );
call.setOperationName(
new QName("http://soapinterop.org/", "queryCLSD" ) );
String returnString = (String) call.invoke( new Object[]
{ "select * from syscat.tables",
"1", "5", "accountName", "accountPassword", “table” } );
System.out.println( returnString );
}
catch (Exception e)
{
System.err.println(e.toString());
}
}
}
39
Perl command-line client to access CLSD via CLSDservice
#!perl –w
use SOAP::Lite;
# Set up the call to CLSD using SOAP.
$host = “discover.uits.indiana.edu”;
$service = SOAP::Lite -> service(
“http://$host:8421/axis/CLSDservice.jws?wsdl” );
# Make the call to CLSD.
$result = $service->queryCLSD(
“select tabschema,tabname from syscat.tables”,
1, 5, "DB2account", "password" "table" );
print $result;
40
OGSA
The Open Grid Services Architecture (OGSA) is an “architecture” for building
computational grids.
In particular, OGSA “…defines a set of core capabilities and behaviors that
address key concerns in Grid systems.” [2] It does not, however, implement
or define how to implement such core capabilities.
OGSA is NOT layered or object oriented.
However, both will be exploited naturally in some implementations.
OGSA provides an architecture for building services such as:
•“Service-Based distributed query processing,”
•“Grid Workflow”,
•“Grid Monitoring Architecture”
•etc.
41
OGSA-DAI
OGSA-Data Access and Integration (OGSA-DAI) is a very flexible and
powerful data access framework that can be used within an OGSA grid
environment.
It provides various data movement, virtualization, and manipulation services
that transform the use of data into a higher-level workflow.
The OGSA-DAI client shown in the next slide uses the OGSA-DAI Client
Toolkit to send a hard-coded query to CLSD (here known as the
“DB2Resource).
The Toolkit allows clients to use JDBC by creating a JDBC ResultSet object
from an OGSA-DAI WebRowSet.
The response is encoded using XML and may be retrieved as a single string,
or as individual fields by using individual JDBC calls as shown below.
42
Java command-line client to access CLSD via OGSA-DAI
public class queryCLSD
{
public static void main(String[] args) throws Exception
{
// Create an instance of the data service.
String handle =
"http://localhost:8080/wsrf/services/ogsadai/DataService";
String id = "DB2Resource";
DataService service =
GenericServiceFetcher.getInstance().getDataService(
handle, id);
// Define a request composed of one activity.
SQLQuery query = new SQLQuery(
"select tabschema,tabname from syscat.tables");
WebRowSet rowset = new WebRowSet( query.getOutput() );
ActivityRequest request = new ActivityRequest();
request.add( query );
request.add( rowset );
43
Java command-line client to access CLSD via OGSA-DAI 2
// Submit the request and retrieve results.
Response response = service.perform( request );
ResultSet result = rowset.getResultSet();
ResultSetMetaData rsmd = result.getMetaData();
int numCols = rsmd.getColumnCount();
// Display each column from each row.
while( result.next() )
{
for( int colCount = 1; colCount <= numCols; colCount++ )
{
out.print( “ “ + result.getString( colCount ) );
}
out.println();
}
}
}
44
This client displays a small part of the functionality provided by OGSA-DAI. In
addition, an OGSA-DAI service can be configured to:
•operate on XML or text data sources, as well as relational data sources,
•perform a series of operations (also known as “activities”) as part of a
single request,
•deliver results to a third party (via FTP, GridFTP, SMTP, etc.) or to another
data service,
•deliver results asynchronously, which can be very useful for long-running
requests, and
•utilize authentication methods supported by WSRF to provide grid-based
security.
Also, exposing a database via OGSA-DAI makes it available for OGSA
Distributed Query Processing (OGSA-DQP), so that its use may be further
virtualized within the DQP model.
In some cases, however, OGSA-DAI and DQP may introduce performance
penalties.
45
Current and possible directions
Adding data sources: mirrored and federated
•Requests for mirroring or federating will be gladly entertained
•DB2 now provides a user-configurable script wrapper that connects to a
remote DB2 daemon that can start any co-located arbitrary script and return
data encoded in XML (restricted to one foreign key per table)
Such a script could be built to relay any web resource that returns XML
meeting key restrictions.
Wrappers could be constructed to relay some OGSA-DAI resources
Implementing the OGSA-DAI service in productional mode.
Integrating with the TeraGrid
CLSD is currently accessible from the TeraGrid, but authentication is local.
It may be possible to enforce TeraGrid based X.509 authentication, using
either WSRF or OGSA-DAI interfaces.
46
References:
– Atherly, Alan G, et al., The Science of Genetics, 1999.
– Apache Foundation, AXIS User’s Guide,
http://ws.apache.org/axis/java/user-guide.html
– Codd, Edward F., A Relational Model of Data for Large Shared Data
Banks, http://www.acm.org/classics/nov95/toc.html
(See also: http://en.wikipedia.org/wiki/Edgar_F._Codd)
– CSLD web page: http://rac.uits.iu.edu/clsd/
– Del Prete, Doug, Efficient access to Blast using IBM DB2 Information
Integrator,
http://www-03.ibm.com/industries/healthcare/doc/content/bin/blast.pdf
– Foster, Ian, et al. “The Open Grid Systems Architecture, Version 1.5”.
– Sotomayer, Boria and Lisa Childers, Globus Toolkit 4: Programming
Java Services
– Sundaram, Babu, Understanding WSRF,
http://www-128.ibm.com/developerworks/edu/gr-dw-gr-wsrf1-i.html
Questions, comments, suggestions?
47