Genome Informatics, CSHL, Nov. 2011

Download Report

Transcript Genome Informatics, CSHL, Nov. 2011

THE EUPATHDB / GUS-WDK
SEARCH STRATEGY SYSTEM
Cristina Aurrecoechea1, Brian P. Brunk2, Steve Fischer2, Xin Gao2, Omar S. Harb2,
Mark Heiges1, Jessica C. Kissinger1, Eileen T. Kraemer1, Cary Pennington1, David S.
Roos2, Chris Ross1, Christian J. Stoeckert2 & Charles Treatman2
1Univ. Georgia, Athens GA, & 2Univ. Pennsylvania, Philadelphia PA
The EuPathDB suite of genome database web sites recently introduced a graphical
search interface that motivates users to undertake dynamic computational
experiments, exploring relationships across datasets to identify biologically
meaningful genes and other entities. For example, users seeking novel therapeutic
targets may wish to prioritize putative enzymes that distinguish pathogens from
their hosts, and are expressed during appropriate developmental stages. Strategies
are initiated by running one of 80+ queries, and extended by adding additional
searches, linked via Boolean operators represented graphically as Venn diagrams.
Sub-strategies allow modular construction and tree structures, and searches may be
extended using filters (e.g. by strain or species) and transforms (e.g. orthologs). A
graphical display makes the overall logic obvious, and facilitates revision of
individual steps, with changes propagated forward through the strategy. Users may
name and save their strategies, creating protocols that can be shared with
colleagues. (See, e.g., http://plasmodb.org/plasmo/im.do?s=2aa0454db6a6cca0.)
The strategy system has been subjected to extensive usability studies, and
deployed on all EuPathDB databases (CryptoDB, GiardiaDB, PlasmoDB, ToxoDB,
TrichDB and TriTrypDB). Although these sites have offered text-based Boolean
operations for many years, usability analysis indicated that most users were not
taking full advantage of that feature. Following release of the graphical Search
Strategy system, the number of searches per visit dramatically increased.
Response from our user community has been extremely positive, as investigators
have discovered the power of combining datasets and making dynamic
adjustments to define optimal parameters and highlight biologically-relevant
relationships. With the accelerating growth in diversity and scale of available
datasets, the potential for exploiting interrelationships increases dramatically, and
we expect this interface to have a significant impact in bringing “genomic
thinking” to a broad audience.
This system was developed using the GUS Web Development Kit (WDK), a
schema-independent middleware system for generating genomics websites
Challenge: exploit the power of integrated genome annotation, expression data, proteomics data, SNPs, etc.
Solution: Strategies… A Graphical Query Interface for Genomics Databases
The EuPathDB suite of databases covers genomic and functional
genomics datasets for a variety of eukaryotic pathogens.
Shown here is PlasmoDB, which contains the genus Plasmodium,
including P.falciparum, the malaria parasite.
Build a
Use Case
Use data in PlasmoDB to find parasite (Plasmodium) drug target genes
This panel shows a schematic of a strategy, using queries and booleans. The actual strategy is built below.
[union]
[intersect]
[intersect]
[intersect]
[transform]
Transferases (E.C.)
# Nested Strategy
Kinase activity (GO)
P.f. transcript expr. at 24 hours +/- 8
---------------------------------------------------------------------------
[union] P.f. transcript expr. in Trophozoites
present in Haemosporida, not Mammals
[union] P.f. protein expr. in Trophozoites
not under diversifying selection (SNPs)
orthology to any Plasmodium genes
It’s Easy to Build a Strategy…
Strategy
2 Add a step (another query)
1 Run a query (choose from menu)
3 Add more steps…
…Strategies are Powerful
4
Dynamically revise,
add or delete steps.
A strategy can integrate data from genome
annotation, expression, SNPs, proteomics,
etc.
Save and browse strategies.
Different types of strategies: Genes,
Isolates, SNPs, Transcript assemblies,
Chromosomes, Array Elements, ORFs, etc.
Email a strategy link tocollegaues.
Use orthology to transform results to other
species.
Revise steps at any time….
Changes propagate forward.
Download customized reports of results.
Nest strategies to add complexity.
Choose from many available columns.
Sort and move columns.
View results from all or any species.
View (web)
WDK Implementation
•Runs on any relational database schema
•Model: configured by you in XML.
•Abstracts DB to high level Records (Genes, ORFs, etc)
•Also specifies queries and returned columns
•Automated sanity testing
•Can talk to processes (BLAST) via a WS Framework
•View: Tomcat, JSP, tag library, JavaScript, Ajax, CSS
• You embed JSP tags in your site and style them w/ CSS
•Controller: Struts
WDK Upcoming features
• Add genes to a “basket” to generate a report, add to a strategy as a
step or send to a tool (e.g., multiple sequence alignment)
•Web services access to queries
•Assign weights to results from individual steps for improved filtering
•Transform a set of one type into another type based on genome span
relations
WDK Model
(XML)
WDK Sanity
Test
User perspectives on Strategies
JSP and CSS
Genomics Data
Genomics Data
Denormalized
For Query
Speed
WDK Engine
Query Cache
Genomics
Database
WDK Model
JavaBeans
(Java Objects)
(JSP compatible)
JSP Tag
Library
Web
Services
Framework
Processes
(eg, BLAST)
Model
WDK Query
Engine
(Java)
Struts
controller
Controller
User Login
and Search
History
= You provide
= WDK provides
= Optional
• Computer-human interaction (CHI) studies during
prototyping drove the design, and showed high user
enthusiasm.
• Usage stats show 3-fold increase in use of Booleans in two
months since release.
• User feedback very positive.
Strategies Web Dev Kit (WDK)
www.gusdb.org/wdk
EuPathDB is an NIAID Bioinformatics Resource Center
Supported by NIAID Contract No. HHSN266200400037C and
The Bill & Melinda Gates Foundation