AM403 - Bioinformatics 1
Download
Report
Transcript AM403 - Bioinformatics 1
Remote Procedure Calling
Dr. Andrew C.R. Martin
[email protected]
http://www.bioinf.org.uk/
Aims and objectives
Understand the concepts of remote
procedure calling and web services
To be able to describe different methods
of remote procedure calls
Understand the problems of ‘screen
scraping’
Know how to write code using LWP
and SOAP
What is RPC?
RPC
Network
Web
Service
A network accessible interface to application
functionality using standard Internet technologies
Why do RPC?
distribute the load between computers
access to other people's methods
access to the latest data
Ways of performing RPC
screen scraping
simple cgi scripts (REST)
custom code to work across networks
standardized methods
(e.g. CORBA, SOAP, XML-RPC)
Web services
RPC methods which work across the
internet are often called
“Web Services”
Web Services can also
be self-describing (WSDL)
provide methods for discovery (UDDI)
Screen scraping
Web service
Network
Web
service
Screen scraping
Screen
scraper
Network
Web
server
Extracting content from a web page
Fragile procedure...
Data
Provider
Consumer
Data
Extract
data
Web
page
Partial
Partial
Errors
(errordata
in
Visual markup
data
Semantics
prone)
extraction
data
lost
Extract
data
Fragile procedure...
Trying to interpret semantics
from display-based markup
If the presentation changes,
the screen scraper breaks
Web servers…
Send request for page
to web server
Pages
Web browser
Web server
RDBMS
CGI
Script
External
Programs
Screen scraping
Straightforward in Perl
Perl LWP module
easy to write a web client
Pattern matching and
string handling routines
Example scraper…
A program for
secondary structure prediction
Want a program that:
specifies an amino acid sequence
provides a secondary structure prediction
Example scraper...
#!/usr/bin/perl -w
use LWP::UserAgent;
use strict;
my($seq, $ss);
$seq = "KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDY
GILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNR
CKGTDVQAWIRGCRL";
if(($ss = PredictSS($seq)) ne "")
{
print "$seq\n";
print "$ss\n";
}
Example scraper…
NNPREDICT web server at
http://alexander.compbio.ucsf.edu/~nomi/nnpredict.html
http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl
Example scraper…
Program must:
connect to web server
submit the sequence
obtain the results and extract data
Examine the source for the page…
<form method="POST”
action="http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl">
<b>Tertiary structure class:</b>
<input TYPE="radio" NAME="option"
VALUE="none" CHECKED> none
<input TYPE="radio" NAME="option"
VALUE="all-alpha"> all-alpha
<input TYPE="radio" NAME="option"
VALUE="all-beta"> all-beta
<input TYPE="radio" NAME="option"
VALUE="alpha/beta"> alpha/beta
<b>Name of sequence</b>
<input name="name" size="70">
<b>Sequence</b>
<textarea name="text" rows=14 cols=70></textarea>
</form>
Example scraper…
option
'none', 'all-alpha', 'all-beta', or 'alpha/beta’
name
optional name for the sequence
text
the sequence
Example server...
sub PredictSS
Create a LWP-based connection;
{
my($seq) = @_;
the post $ss);
request;
my($url, $post, $webproxy, $ua,Create
$req, $result,
Connect and get the returned page
# $webproxy
= 'http://user:[email protected]:8080';
$webproxy = "";
$url
= "http://alexander.compbio.ucsf.edu/cgi-bin/nnpredict.pl";
$post
= "option=none&name=&text=$seq";
$ua = CreateUserAgent($webproxy);
$req = CreatePostRequest($url, $post);
$result = GetContent($ua, $req);
if(defined($result))
{
$ss = GetSS($result);
return($ss);
If behind
CGI script
}
a firewall
to access
else
{
print STDERR "connection failed\n";
}
return("");
}
Values passed
to CGI script
<HTML><HEAD>
<TITLE>NNPREDICT RESULTS</TITLE>
</HEAD>
<BODY bgcolor="F0F0F0">
<h1 align=center>Results of nnpredict query</h1>
<p><b>Tertiary structure class:</b> alpha/beta
<p><b>Sequence</b>:<br>
<tt>
MRSLLILVLCFLPLAALGKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQA<br>
TNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDG<br>
NGMNAWVAWRNRCKGTDVQAWIRGCRL<br>
</tt>
<p><b>Secondary structure prediction
<i>(H = helix, E = strand, - = no prediction)</i>:<br></b>
<tt>
----EEEEEEE-H---H--EE-HHHHHHHHHH--------------HHHHHH--------<br>
------------HHHHE-------------------------------HH-----EE---<br>
---HHHHHHH--------HHHHHHH--<br>
</tt>
</body></html>
Example server…
sub GetSS
{
my($html) = @_;
my($ss);
$html =~ s/\n//g;
$html =~ /^.*<tt>(.*)<\/tt>.*$/;
$ss = $1;
$ss =~ s/\<br\>//g;
return($ss);
Remove return
characters
Match
the last
<tt>...</tt>
Grab the text
within
<tt>the
tags
Remove
<br> tags
}
If authors changed presentation
of results, this might break!
Wrappers to LWP
CreateUserAgent()
CreatePostRequest()
GetContent()
CreateGetRequest()
Pros and cons
Advantages
'service provider' doesn’t do anything
special
Disadvantages
screen scraper will break if format changes
may be difficult to determine semantic
content
Simple CGI scripts
REST:
Representational State
Transfer
http://en.wikipedia.org/wiki/REST
Simple CGI scripts
Extension of screen scraping
relies on service provider to provide a script
designed specifically for remote access
Client identical to screen scraper
but guaranteed that the data will be
parsable (plain text or XML)
Simple CGI scripts
Server's point of view
provide a modified CGI script which returns
plain text
May be an option given to the CGI script
Simple CGI scripts
'Entrez programming utilities'
http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
I have provided a script you can try to
extract papers from PubMed
Simple CGI scripts
Search using EUtils is performed in
2 stages:
specified search string returns a set of
PubMed Ids
fetch the results for each of these PubMed
IDs in turn.
Custom code
Custom code
Generally used to distribute tasks on a
local network
Code is complex
low-level OS calls
sample on the web
Custom code
Link relies on IP address and a 'port’
Ports tied to a particular service
port 80 : HTTP
port 22 : ssh
See /etc/services
Custom code
Generally a client/server model:
server listens for messages
client makes requests of the server
Client
request message
response message
Server
Custom code: server
Server creates a 'socket' and 'binds' it
to a port
Listens for a connection from clients
Responds to messages received
Replies with information sent back to
the client
Custom code: client
Client creates a socket
Binds it to a port and the IP address
of the server
Sends data to the server
Waits for a response
Does something with the returned data
Standardized
Methods
Standardized methods
Various methods. e.g.
CORBA
XML-RPC
SOAP
Will now concentrate on SOAP...
Advantages of SOAP
Application
client
Platform and
language
independent
code
Web
service
Platform and
language
specific code
Application
code
Advantages of SOAP
Application
XML message
Application
Information encoded in XML
Language independent
All data are transmitted as simple text
Advantages of SOAP
HTTP post SOAP request
HTTP response SOAP response
Normally uses HTTP for transport
Firewalls allow access to the HTTP protocol
Same systems will allow SOAP access
Advantages of SOAP
W3C standard
Libraries available for many
programming languages
XML encoding
Which of these is correct?
<phoneNumber>01234 567890</phoneNumber>
<phoneNumber>
<areaCode>01234</areaCode>
<number>567890</number>
</phoneNumber>
<phoneNumber areaCode='01234' number='567890' />
<phoneNumber areaCode='01234'>567890</phoneNumber>
SOAP XML encoding
Defined by
SOAP message
data: format
Defined by
various transport
protocols
Must define a standard way of encoding
Type of data being exchanged
How it will be expressed in XML
How the information will be exchanged
SOAP messages
SOAP Envelope
SOAP Header (optional)
Header block
Header block
SOAP Body
Message body
SOAP Envelope
<s:Envelope xmlns:s=”http://www.w3.org/2001/06/soapenvelope”>
<s:Header>
<m:transaction xmlns:m=”soap-transaction”
s:mustUnderstand=”true”>
<transactionID>1234</transactionID>
</m:transaction>
Header block
SOAP Header
</s:Header>
<s:Body>
<n:predictSS xmlns:n=”urn:SequenceData”>
<sequence id='P01234'>
SARTASCWIPLKNMNTYTRSFGHSGHRPLKMNSGDGAAREST
</sequence>
</n:predictSS>
Message body
</s:Body>
SOAP Body
</s:Envelope>
Example SOAP message
Header block
Specifies data must be handled as a single
'transaction’
Message body
contains a sequence simply encoded in XML
Perfectly legal, but more common to use
special RPC encoding
The RPC ideal
Ideal situation:
$ss = PredictSS($id, $sequence);
Client
request message
response message
Server
Subroutine calls
Only important factors
the type of the variables
the order in which they are handed to the
subroutine
SOAP type encoding
SOAP provides standard encoding for
variable types:
integers
floats
strings
arrays
hashes
structures
…
Encoded SOAP message
<s:Envelope
xmlns:s=”http://www.w3.org/2001/06/soap-envelope”>
<s:Body>
<n:predictSS xmlns:n=”urn:SequenceData”>
<id xsi:type='xsd:string'>
P01234
</id>
<sequence xsi:type='xsd:string'>
SARTASCWIPLKNMNTYTRSFGHSGHRPLKMNSGDGAAREST
</sequence>
</n:predictSS>
</s:Body>
</s:Envelope>
Response
<s:Envelope
xmlns:s=”http://www.w3.org/2001/06/soap-envelope”>
<s:Body>
<n:predictSSResponse
xmlns:n=”urn:SequenceData”>
<ss xsi:type='xsd:string'>
---HHHHHHH-----EEEEEEEE----EEEEEEE-------</ss>
</n:predictSSResponse>
</s:Body>
</s:Envelope>
SOAP transport
SOAP is a packaging protocol
Layered on networking and transport
SOAP doesn't care what these are
SOAP transport
Generally uses HTTP, but may also use:
FTP, raw TCP, SMTP, POP3, Jabber, etc.
HTTP is pervasive across the Internet.
Request-response model of RPC
matches HTTP
Web service components
Service
Listener
Web application server
Service
Proxy
Application
specific
code
SOAP::Lite
You need to know very little of this!
Simply need a good SOAP library
SOAP::Lite for Perl
Apache SOAP
Toolkits available for many languages
Java, C#, C++, C, PHP, Python, …
A simple SOAP server
HTTPD
Service
Listener
Web application server
Simple
SOAP-specific
CGI script
Application
code
Service
Proxy
Application
specific
code
SOAP-specific CGI script
use SOAP::Transport::HTTP;
SOAP::Transport::HTTP::CGI
->dispatch_to('/home/httpd/cgi-bin/SOAPTEST')
->handle;
Directory is where application-specific
code is stored
Any Perl module stored there will be accessible via SOAP
(Can limit to individual modules and routines)
Application code
Lives in a Perl module:
Filename with extension .pm
Starts with a package statement:
package mymodule;
Filename must match package name
mymodule.pm
Must return 1. Generally end file with
1;
Application code
package hello;
sub sayHello {
my($class, $user) = @_;
return "Hello $user from the SOAP server";
}
1;
Simply place this file (hello.pm)
in the directory specified in the
SOAP proxy
A simple SOAP client
#!/usr/bin/perl
use SOAP::Lite;
my $name = shift;
print "\nCalling the SOAP server...\n";
print "The SOAP server says:\n";
$s = SOAP::Lite
->uri('urn:hello')
->proxy('http://localhost/cgi-bin/SOAPTEST.pl');
print $s->sayHello($name)->result;
print "\n\n";
Calling the SOAP server...
The SOAP server says:
Hello Andrew from the SOAP server
SOAP::Lite
The code is very simple!
None of the hard work to package or
unpack the request in XML
All the hard work is hidden…
Query
<s:Envelope
xmlns:s=”http://schemas.xmlsoap.org/soap/envelope”
xmlns:xsi=”http://www.w3.org/1999/XMLSchema-instance”
xmlns:xsd=”http://www.w3.org/1999/XMLSchema”>
<s:Body>
<m:sayHello xmlns:m=”urn:hello”>
<name xsi:type='xsd:string'>Andrew</name>
</m:sayHello>
</s:Body>
</s:Envelope>
Response
<s:Envelope
xmlns:s=”http://schemas.xmlsoap.org/soap/envelope”
xmlns:xsi=”http://www.w3.org/1999/XMLSchema-instance”
xmlns:xsd=”http://www.w3.org/1999/XMLSchema”>
<s:Body>
<n:sayHelloResponse xmlns:n=”urn:hello”>
<return xsi:type='xsd:string'>Hello Andrew from
the SOAP server</return>
</n:sayHelloResponse>
</s:Body>
</s:Envelope>
An even simpler SOAP client
#!/usr/bin/perl
use SOAP::Lite +autodispatch=>
uri=>"urn:hello",
proxy=>"http://localhost/cgi-bin/SOAPTEST.pl";
my $name = shift;
print
print
print
print
"\nCalling the SOAP server...\n";
"The SOAP server says:\n";
sayHello($name);
"\n\n";
Summary - RPC
RPC allows access to methods and data
on remote computers
Four main ways of achieving this
Screen scraping
Special CGI scripts
Custom code
Standardized methods (SOAP, etc.)
Summary - SOAP
Platform and language independent
Uses XML to wrap RPC data and requests
Various transport methods
generally use HTTP
Good toolkits make coding VERY easy
all complexity hidden
Related technologies allow
service discovery (UDDI)
self-describing services (WSDL)