The Structured Advanced Query Page

Download Report

Transcript The Structured Advanced Query Page

The Structured Advanced Query
Page
Tomer Altman &
Mario Latendresse
Bioinformatics Research Group
SRI, International
August 18, 2009
1
SRI International Bioinformatics
Introduction
 BioVelo
is a query language
 Like SQL but simpler and easier to learn
 Documentation: http://biocyc.org/bioveloLanguage.html
 Free-Form Advanced Query Page allows Web submission of
BioVelo queries
 Structured
Advanced Query Page (SAQP)
 Web page for interactively constructing advanced and
precise queries to PGDBs
 Queries are translated to BioVelo and sent to the server for
processing
 SAQP: http://biocyc.org/query.html
 Documentation: http://biocyc.org/webQueryDoc.html
1
SRI International Bioinformatics
Why a query interface?
 Allow
a structured way to access the rich data
representation stored in a PGDB.
 Most advanced databases have a high-level,
declarative method of access (i.e., SQL).
 Provides an intermediate level of access between
graphically browsing the PGDB and programmatically
processing the data using Lisp.
1
SRI International Bioinformatics
The Structured Advanced Query
Page
 'Advanced',
in that it allows you to ask more advanced
and complicated queries than the basic search
interface.
 In other words, the SAQP allows you to search for a
precise set of answers given simple or complex
conditions
 'Structured', in that it is a dynamic HTML form, that
provides greater ease in crafting queries, but trades
flexibility and power for simplicity (FFAQP).
 'Page', in that it is accessed via the Web interface for
BioCyc (www.biocyc.org/query.html), or from your own
Pathway Tools Web server.
1
SRI International Bioinformatics
SAQP Architecture
 The
SAQP is built on top of a high-level functional
declarative language called BioVelo which is built on
top of Pathway Tools.
 On every result page, you will see the equivalent
BioVelo code that was generated from the SAQP,
which, in turn, generated the results.
 You don't need to know anything about BioVelo to use
the SAQP, but it might be helpful later if you need the
ability to write even more complicated queries using
the Free Form Advanced Query Page (FFAQP).
1
SRI International Bioinformatics
The Structure of the SAQP:
 Database
specification
 Class specification
 'Where' constraints on attributes of classes
 Output attributes description
 Data format (HTML vs TXT)
1
SRI International Bioinformatics
Example #1:
A
simple query usually consists of querying a
particular database about a particular class.
 Find all the proteins in E. coli K-12.
 Display the protein names.
1
SRI International Bioinformatics
Structure of the Results
A
line that shows the equivalent BioVelo expression
that the SAQP generated to answer the query.
 A HTML table of the results, with the corresponding
entries hyperlinked to the matching Pathway Tools
Web pages.
 If a text data format was requested, then a tabdelimited text file is generated, with just the table data.
1
SRI International Bioinformatics
Example #2:
 Find
all the proteins of E. coli K-12 for which the DNAFOOTPRINT-SIZE is smaller than 10.
 Display the protein name, and the DNA footprint size.
1
SRI International Bioinformatics
Example #3:
 In
EcoCyc, display polypeptides constrained by
experimentally determined molecular weight and
isoelectric point.
 The experimental molecular weight should be between
50 and 100 kD.
 The pI should be less than 7.
 Display the polypeptide name, the experimental
molecular weight, and the pI.
1
SRI International Bioinformatics
Example #4:
 The
SAQP allows for specifying quantifiers on
relations between PGDB classes.
 Extending example #3, now we want only proteins
where at least one of the genes that encodes the
protein to be within the first 500 kilobases of the E. coli
chromosome.
1
SRI International Bioinformatics
Example #5: Queries with
Several Components
A
second search component will search potentially
another database and another class of objects for each
element found in the first search component.
 It is called a 'cross-product' search.
 Any number of search components can be added. In
general, the new search component is done for each
set of objects found in the previous components.
 Some restraints is needed not to build a query that
takes too long to answer. (The server gives a limit of a
few minutes for a query.)
 Example: Search for MetaCyc pathways in the
taxonomic range of Bacteria that also exist in E. coli K12 using the common-name attribute.
1
SRI International Bioinformatics
Introduction to BioVelo
 BioVelo
is based on set and list comprehension.
 In Mathematics, a set comprehension describes a set of
values as in: {x | x in Prime, x > 100}
 The output is 'x', the body has a generator 'x in Prime' and
a condition 'x > 100'. Several conditions and several
generators could be used.
 BioVelo used a concise syntax:
1) [ output-expression : generator, condition, ... ]
2) a generator has the form v ← database^^class
3) a condition uses logical and relational operators
1
SRI International Bioinformatics
Syntax of BioVelo (the big picture)
1.
2.
3.
4.
5.
6.
7.
8.
9.
1
[ head-output : generators, conditions, … ]
{ head-output : generators, conditions, …}
The comma can be read as “and”.
Head-output is a single expression or a tuple of
expressions: (exp1, exp2, …, expn)
To get objects from database: orgid ^^ class-name
Typical generator: var <- ecoli^^proteins
To access an attribute value: Object ^ attribute
Conditions are formed with variables, constants,
logical and relational operators.
Special biological functions: reaction-to-genes,
enzyme-to-genes, pathway-to-reactions, etc.
SRI International Bioinformatics
Examples of BioVelo Queries
 [r : r <- ecoli^^reactions]
 [p^name : p <- ecoli^^proteins]
 [p^?name : p<- ecoli^^proteins]
 [p^?name : p <- ecoli^^proteins, p^dna-footprint-size < 10]
 [(g^?name, g^left-end-position): g <- ecoli^^genes,
g^left-end-position < 153000]
 [(g^?name, k): g<- ecoli^^genes,
k := abs(g^left-end-position – g^right-end-position)+1,
k < 200 ]
 [(r^?name, c^?name) : r<- ecoli^^reactions,
c<- r^left, c in r^right]
1
SRI International Bioinformatics
BioVelo Grammar in EBNF
1
SRI International Bioinformatics
BioVelo, Table of Operators (1)
1
SRI International Bioinformatics
BioVelo, Table of Operators (2)
1
SRI International Bioinformatics
BioVelo, Special Functions
1
SRI International Bioinformatics
BioVelo and the Free Form Advanced
Query Page (FFAQP)
• Any BioVelo query can be entered at the FFAQP
• There is an interactive online documentation at the FFAQP
• The FFAQP can be reached from the Search->Advanced command
Menu Bar via the SAQP by clicking the button Switch to Free Form
Advanced Query Page.
• Here is a demo…
1
SRI International Bioinformatics
BioVelo from Pathway Tools (Desktop)




1.
2.
1
BioVelo queries can be executed from the Lisp
prompt by using the bv Lisp function.
The query is given as a Lisp string to bv.
The result is displayed and put on the answerlist.
Examples:
(bv “[ r : r <-ecoli^^reactions, #r^left > 4]”)
(bv “[(g,s) : g <-ecoli^^genes, s <- g^synonyms,
s ~= \”b0[0-9]\”]”)
SRI International Bioinformatics
BioVelo from Pathway Tools (Desktop)
 The
result can be further stored or manipulated using
Lisp.
 Objects are always returned as frame-structures, not
frame-ids, so that multiple databases can be handled
without worrying about orgids (database identifiers).
 But, not all functions in Pathway Tools take frame
structures at “face value”. The current selected
organism must match the frame-structure database.
1
SRI International Bioinformatics