SAQP - SRI International

Download Report

Transcript SAQP - SRI International

The Structured Advanced Query
Page
Tomer Altman &
Mario Latendresse
Bioinformatics Research Group
SRI, International
August 18, 2009
1
SRI International Bioinformatics
Introduction
 BioVelo
is a query language
 Like SQL but simpler and easier to learn
 Documentation: http://biocyc.org/bioveloLanguage.html
 Free-Form Advanced Query Page allows Web submission of
BioVelo queries
 Structured
Advanced Query Page (SAQP)
 Web page for interactively constructing precise queries to
PGDBs
 Queries are translated to BioVelo and sent to the server for
processing
 SAQP: http://biocyc.org/query.html
 Documentation: http://biocyc.org/webQueryDoc.html
1
SRI International Bioinformatics
Why a query interface?
 Allow
a structured way to access the rich data
representation stored in a PGDB.
 Most
advanced databases have a high-level,
declarative method of access (i.e., SQL).
 Provides
an intermediate level of access between
graphically browsing the PGDB and programmatically
querying the data using an API or BioVelo
1
SRI International Bioinformatics
The Structured Advanced Query
Page
 'Advanced',
in that it allows you to ask more advanced
and complicated queries than the basic search
interface
 In
other words, the SAQP allows you to search for data
that satisfy a precise set of conditions
 'Structured',
in that it is a dynamic HTML form that
guides you in creating a well-formed query
 'Page',
in that it is accessed via the Web interface for
Pathway Tools
1
SRI International Bioinformatics
The Structure of the SAQP:
 Database
 Class
specification
specification
 Conditions
 Output
 Data
1
on attributes of classes
attributes description
format (HTML vs TXT)
SRI International Bioinformatics
Example #1:
A
simple query usually consists of querying a
particular database about a particular class.
 Find
all the proteins in E. coli K-12.
 Display
1
the protein names.
SRI International Bioinformatics
Structure of the Results
A
line that shows the equivalent BioVelo expression
that the SAQP generated to answer the query.
A
HTML table of the results, with the corresponding
entries hyperlinked to the matching Pathway Tools
Web pages.
 If
a text data format was requested, then a tabdelimited text file is generated, with just the table data.
1
SRI International Bioinformatics
Example #2:
 Find
all the proteins of E. coli K-12 for which the DNAFOOTPRINT-SIZE is smaller than 10.
 Display
1
the protein name, and the DNA footprint size.
SRI International Bioinformatics
Example #3:
 In
EcoCyc, display polypeptides constrained by
experimentally determined molecular weight and
isoelectric point.
 The
experimental molecular weight should be between
50 and 100 kD.
 The
pI should be less than 7.
 Display
the polypeptide name, the experimental
molecular weight, and the pI.
1
SRI International Bioinformatics
Example #4:
 The
SAQP allows for specifying quantifiers on
relations between PGDB objects.
 Extend
example #3 to select only proteins whose
encoding gene is situated within the first 500 kilobases
of the E. coli chromosome.
1
SRI International Bioinformatics
Example #5: Queries with
Several Components
A
second search component will search potentially
another database and another class of objects for each
element found in the first search component.
 It is called a 'cross-product' search.
 Any number of search components can be added. In
general, the new search component is done for each
set of objects found in the previous components.
 Some restraints is needed not to build a query that
takes too long to answer. (The server gives a limit of a
few minutes for a query.)
 Example: Search for MetaCyc pathways in the
taxonomic range of Bacteria that also exist in E. coli K12 using the common-name attribute.
1
SRI International Bioinformatics
Introduction to BioVelo
 BioVelo
is based on set and list comprehension.
 In Mathematics, a set comprehension describes a set of
values as in: {x | x in Prime, x > 100}
 The output is 'x', the body has a generator 'x in Prime' and
a condition 'x > 100'. Several conditions and several
generators could be used.
 BioVelo used a concise syntax:
1) [ output-expression : generator, condition, ... ]
2) a generator has the form v ← database^^class
3) a condition uses logical and relational operators
1
SRI International Bioinformatics
Examples of BioVelo Queries
 [r : r <- ecoli^^reactions]
 [p^name : p <- ecoli^^proteins]
 [p^?name : p<- ecoli^^proteins]
 [p^?name : p <- ecoli^^proteins, p^dna-footprint-size < 10]
 [(g^?name, g^left-end-position): g <- ecoli^^genes,
g^left-end-position < 153000]
 [(g^?name, k): g<- ecoli^^genes, k := abs(g^left-endposition – g^right-end-position)+1, k < 200 ]
 [(r^?name, c^?name) : r<- ecoli^^reactions, c<- r^left, c in
r^right]
1
SRI International Bioinformatics