Transcript Jinja2

A data retrieval workflow using
NCBI E-Utils + Python
Part II: Jinja2 / Flask
John Pinney
Tech talk Tue 19th Nov
My tasks
✓1. Produce a list of human genes that are associated with
at least one resolved structure in PDB
AND
at least one genetic disorder in OMIM
2. Make an online table to display them
ESearch
db=structure
term=9606[TAXID]
ESearch
db=omim<
term="omim<medgen"[Filter]
(grey<background<
=<using<WebEnv)
structure'IDs
ELink
db=protein
dbfrom=structure
OMIM'IDs
protein'IDs
ELink
db=gene,<
dbfrom=protein
ELink
db=gene,<
dbfrom=omim
gene'IDs
gene'IDs
&
gene'IDs
Workflow for gene list
Python modules used in part 1
PyCogent
Simple request handling for the main EUtils.
pycogent.org
urllib2
General HTTP request handler.
docs.python.org/2/library/urllib2.html
BeautifulSoup
Amazingly easy to use object model for XML/HTML.
www.crummy.com/software/BeautifulSoup/bs4/doc/
Some REST services need API keys
The OMIM server requires a license agreement but is free for
academic use.
They provide a personal API key which must be submitted with each
HTTP request.
OMIM_APIKEY = 'E835870B16FBAF479E826FA5168CB2615EDA0F11'
result = urllib2.urlopen( \
"http://api.europe.omim.org/api/entry?mimNumber=" + \
omimid + "&apiKey=" + OMIM_APIKEY \
).read()
Throttling queries
Most bioinformatics web servers have limits on the number of
queries that can be sent from the same IP address (per day / per
second etc.)
They will ban you from accessing the site if you attempt too
many requests.
This can have serious consequences (e.g. the whole institution
being blocked from NCBI).
Throttling queries
To ensure compliance with usage limits, implement a simple
throttle:
def omim_info(omimid):
checktime('api.europe.omim.org')
result = urllib2.urlopen(...
Throttling queries
import time
lastRequestTime = {}
throttleDelay = {'eutils.ncbi.nlm.nih.gov':0.25, \
'api.europe.omim.org':0.5}
def checktime(host):
if((host in lastRequestTime) and (time.time() - \
lastRequestTime[host] < throttleDelay[host])):
time.sleep(throttleDelay[host] - (time.time() - \
lastRequestTime[host]))
lastRequestTime[host] = time.time()
HTML templating
I need to produce an HTML table containing basic information
about the genes I have collected.
The Jinja2 templating engine is an easy way to generate these
kinds of documents.
I will use web services at NCBI and OMIM to assemble the
information I need.
Jinja2
Using Jinja2 as an HTML templating engine, we need to split the
work between 2 files:
a normal python script (in which I call the web services).
an HTML template with embedded python commands.
Not all python functions are available within the template, so it
makes sense to do as much work as possible within the script
before passing the data over.
Jinja2 (script)
from jinja2 import Template
template = Template(file("gene_row_template.html").read())
fout = open("gene_list.html",'w')
...
(variables passed to template as
for g in sorted_genes:
kwargs)
fout.write( template.render(
g=g,
gene=gene_info(g),
omim=[omim_info(x) for x in omim_links(g)],
struc=[struc_info(x) for x in struc_links(g)]
)
)
Jinja2 (template)
<tr>
<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term={{g}}[uid]'>
{{gene.find('Gene-ref_locus').text}}
</a></td>
<td>{{gene.find('Gene-ref_desc').text}}</td>
<td>{% for m in omim %}
<a href='http://omim.org/entry/{{m.mimNumber.text}}'>
{{m.preferredTitle.text}}
</a><br>
{% endfor %}</td>
<td>{% for s in struc -%}
<a
href='http://www.rcsb.org/pdb/explore/explore.do?structureId={{s.find('
Item',attrs={'Name':'PdbAcc'}).text}}'>
{{s.find('Item',attrs={'Name':'PdbAcc'}).text}}
</a><br>
{%- endfor %}</td>
</tr>
Jinja2 (template)
<tr>
<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term={{g}}[uid]'>
{{gene.find('Gene-ref_locus').text}}
{{ }} = print statement
</a></td>
<td>{{gene.find('Gene-ref_desc').text}}</td>
<td>{% for m in omim %}
<a href='http://omim.org/entry/{{m.mimNumber.text}}'>
{{m.preferredTitle.text}}
</a><br>
{% %} = other command
{% endfor %}</td>
<td>{% for s in struc -%}
<a
href='http://www.rcsb.org/pdb/explore/explore.do?structureId={{s.find('
Item',attrs={'Name':'PdbAcc'}).text}}'>
{{s.find('Item',attrs={'Name':'PdbAcc'}).text}}
</a><br>
I can access the methods of an object from
{%- endfor %}</td>
within the template, so I can make use of all
</tr>
the nice BeautifulSoup shortcuts
Jinja2 (output)
<tr>
<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term=94[uid]'>
ACVRL1
</a></td>
<td>activin A receptor type II-like 1</td>
<td>
<a href='http://omim.org/entry/600376'>
TELANGIECTASIA, HEREDITARY HEMORRHAGIC, TYPE 2; HHT2
</a><br>
</td>
<td><a
href='http://www.rcsb.org/pdb/explore/explore.do?structureId=4FAO'>
4FAO
</a><br><a
href='http://www.rcsb.org/pdb/explore/explore.do?structureId=3MY0'>
3MY0
</a><br></td>
</tr>
Something more interactive
What if I need to produce a report on-the-fly?
Flask is a ‘micro’ web development framework for Python, which
is useful for putting together a simple webserver.
For anything more substantial (e.g. if database queries are
needed), consider using Django.
Flask uses Jinja2 as its template engine.
A simple webapp in Flask
from flask import Flask, request, render_template, Response
app = Flask(__name__)
@app.route('/report/')
def report_handler():
gene = request.args.get('gene')
if( gene == None):
return render_template('report_form.html', unfound=None)
else:
return report_for_gene_name(gene)
if __name__ == '__main__':
app.run(debug=True)
Summary
Some web services may be more fiddly than others to set up,
especially if they involve
API keys
Request limits (requires throttling)
Combining web services with an HTML template (either offline
or on-the-fly via a webserver) is an easy way to generate userfriendly reports.
Python modules used in part 2
Jinja2
An elegant and highly versatile templating engine.
http://jinja.pocoo.org/
Flask
Python ‘micro’ web development framework.
http://flask.pocoo.org