The Structured Advanced Query Page

Download Report

Transcript The Structured Advanced Query Page

The Structured Advanced Query
Page
Mario Latendresse
Tomer Altman
Bioinformatics Research Group
SRI International
March, 2014
1
SRI International Bioinformatics
Introduction
Structured
Advanced Query Page (SAQP)
Web page for interactively constructing advanced and precise
queries to PGDBs
SAQP is not available on Ptools desktop
Queries are translated to BioVelo and sent to the server for
processing
Top Menu Bar command Search -> Advanced Search
http://biocyc.org/query.shtml
Documentation: http://biocyc.org/webQueryDoc.shtml
BioVelo is a query language
Like SQL but simpler and no updates allowed
Documentation: http://biocyc.org/bioveloLanguage.shtml
Free-Form Advanced Query Page (FFAQP) allows Web
submission of BioVelo queries
2
SRI International Bioinformatics
Why a query interface?
 Allow
a structured way to access the rich data
representation stored in a PGDB.
 Most advanced databases have a high-level,
declarative method of access (i.e., SQL).
 Provides an intermediate level of access between
graphically browsing the PGDB and programmatically
processing the data using Lisp.
3
SRI International Bioinformatics
The Structured Advanced Query
Page
 'Structured’:
it is a dynamic HTML form, that provides
greater ease in crafting queries, but trades flexibility
and power for simplicity (FFAQP).
 'Advanced’:
it allows to write more precise queries
than the basic search interface.
 'Page’:
it is accessed via the Web interface for BioCyc
(www.biocyc.org/query.shtml), or from your own
Pathway Tools web server.
4
SRI International Bioinformatics
SAQP Architecture
 The
SAQP is built on top of a high-level functional
declarative language called BioVelo, which is built on
top of Pathway Tools.

BioVelo was designed at SRI
 On
every result page, you will see the equivalent
BioVelo code that was generated from the SAQP,
which, in turn, generated the results.
 You
don't need to know anything about BioVelo to use
the SAQP, but it might be helpful later if you need the
ability to write even more complicated queries using
the Free Form Advanced Query Page (FFAQP).
5
SRI International Bioinformatics
How to Use the SAQP
 1.
Database and class selection then adding
conditions
 2.
Selection of attributes to output (columns)
 3.
Select the output data format (HTML vs TXT)
 4.
Click the “Submit Query” button
 Documentation
about each attribute is displayed by
mousing over its name once selected
6
SRI International Bioinformatics
Example #1:
A
simple query usually consists of querying a
particular database about a particular class
 Find
all the proteins in E. coli K-12
 Display
7
the protein names
SRI International Bioinformatics
Structure of the Results
A
line that shows the equivalent BioVelo expression
that the SAQP generated to answer the query
A
button to create a SmartTable from the result
A
HTML table of the results, with the corresponding
entries hyperlinked to the matching Pathway Tools
Web pages
 Sorting
can be applied on each column
 If
a text data format was requested, then a tabdelimited text file is generated, with just the table data
8
SRI International Bioinformatics
Example #2,
 We
will add a condition to example #1
 Find
all the proteins of E. coli K-12 for which the DNAFOOTPRINT-SIZE is smaller than 10
 Display
9
the protein name, and the DNA footprint size.
SRI International Bioinformatics
Example #3
 In
EcoCyc, display polypeptides constrained by
experimentally determined molecular weight and
isoelectric point
 The
experimental molecular weight should be between
50 and 100 kD
 The
pI should be less than 7
 Display
the polypeptide name, the experimental
molecular weight, and the pI
10
SRI International Bioinformatics
Example #4:
 The
SAQP allows for specifying quantifiers on
relations between PGDB classes
 Extending
example #3: only proteins where at least
one of the genes that encodes the protein to be within
the first 500 kilobases of the E. coli chromosome.
11
SRI International Bioinformatics
Exercises
1) Find all genes of E. coli that contain “trp” in their
name.
2) Find all genes in MetaCyc that have more than one
product. Output the gene names and product
names.
3) Find all reactions in E. coli which have the reactant
(i.e., the left side) “acetaldehyde”.
4) Find all monomers in E. coli. A monomer has no
components.
5) Find all reactions in MetaCyc that have more than 4
reactants.
6) Find all metabolic pathways, in MetaCyc, that have
more than 5 reactions. Output the reaction lists as
well as the pathway names.
13
SRI International Bioinformatics
Introduction to BioVelo
 BioVelo
is based on set and list comprehension.
 In Mathematics, a set comprehension describes a set of
values as in: {x | x in Prime, x > 100}
 The output is 'x', the body has a generator 'x in Prime' and
a condition 'x > 100'. Several conditions and several
generators could be used.
 BioVelo used a concise syntax:
1) [ output-expression : generator, condition, ... ]
2) a generator has the form v ← database^^class
3) a condition uses logical and relational operators
14
SRI International Bioinformatics
Examples of BioVelo Queries
 [r : r <- ecoli^^reactions]
 [p^name : p <- ecoli^^proteins]
 [p^?name : p<- ecoli^^proteins]
 [p^?name : p <- ecoli^^proteins, p^dna-footprint-size < 10]
 [(g^?name, g^left-end-position): g <- ecoli^^genes,
g^left-end-position < 153000]
 [(g^?name, k): g<- ecoli^^genes, k := abs(g^left-endposition – g^right-end-position)+1, k < 200 ]
 [(r^?name, c^?name) : r<- ecoli^^reactions, c<- r^left, c in
r^right]
15
SRI International Bioinformatics