GUS: A Functional Genomics Data Management System
Download
Report
Transcript GUS: A Functional Genomics Data Management System
GUS: A Functional
Genomics Data
Management System
Chris Stoeckert, Ph.D.
Center for Bioinformatics and Dept. of Genetics
University of Pennsylvania
ASM Conference on Functional Genomics and Bioinformatics
Approaches to Infectious Disease Research
October 8, 2004 Portland, Oregon
Database Options for Integrated
Functional Genomics
Requirements
Covers genomics and functional genomics
Active and open developer community
Options
GUS: Genomics Unified Schema
Chado: generic model organism database
(GMOD http://www.gmod.org)
A Few GUS Web Sites
U. Penn
Sanger Institute
U. Georgia
Java Servlets
DoTS
RAD
TESS
SRES
U. Toronto
Core
U. Chicago
Oracle RDBMS
Object Layer for Data Loading
GUS
Flora
Centromere
Database
Phytophthora
sojae
Virginia
genome
Bioinformiatics
Insitiute
GUS (Genomics Unified Schema)
http://www.gusdb.org
Namespace
Domain
Features
DoTS
Sequence and
annotation
EST clusters
Gene models
RAD
Gene Expression
MIAME/MAGE-OM
TESS
Gene Regulation
TFBS
organization
Sres
Shared
Resources
Ontologies
Core
Data Provenance
Documentation
BioMaterial annotation
SRES
EST clustering
and assembly
RAD
Identify shared
TF binding sites
DoTS
Genomic alignment
and comparative
sequence analysis
TESS
Examples of GUS users
Large sequencing center
Lightly staffed genomics project
Multiple plant species: Brett Tyler, Virginia Bioinformatics
Institute and collaborators
Expression based project
CryptoDB: Kissinger Lab, University of Georgia
Data mining project
GeneDB: Pathogen Sequencing Unit at the Sanger
Institute
dbDirt: Allen Okey, University of Toronto
Bioinformatics Core Facility
University of Pennsylvania Bioinformatics Core Facility
GUS Project Goals
Provide:
A platform for broad genomics data integration
An infrastructure system for functional genomics
Support:
Websites with advanced query capabilities
Research driven queries and mining
GUS components
Your data
GenBank
NRDB
dbEST
SNPs
Genetraps
MicroArrays
Phenotypes
Pathways
Orthologs
Taxonomy
GO
SO
EC
More…
Pipeline API
Plugins (data loaders)
Data Load API
Web
Development
Kit
Perl Object Layer
Queries
And
analysis
Warehouse
(Oracle or PostgreSQL)
Functional genomics with
GUS
Expression (RAD)
Proteomics
ImmunoHistChem
Sequence
& Features
MIAME
Study
Sample
MIAPE Study
MISFISHIE Study
Sample
Sample
In Situ
Hybridization
Central Dogma
Image Analysis
Image Analysis
Image Analysis
Statistical Processing
Statistical Processing
Statistical Processing
Regulation (TESS)
Interaction
Functional Annotation of the Genome
www.mged.org
psidev.sf.net
www.scgap.org
GUS versus chado
GUS represents biology in the database
tables
Forces applications to load and retrieve data
consistently
Chado represents biology in the
applications
Allows flexibility in what can be stored but
applications may not be consistent
Central dogma and sequences
Gene
Feature
RNA
Feature
NA Sequence
Protein
Feature
AA Sequence
Central dogma and sequences
Gene
RNA
Protein
Gene
Feature
RNA
Feature
Protein
Feature
NA Sequence
AA Sequence
Central dogma and sequences
Gene
RNA
Protein
RNA
Multiple
genes
Gene 1
Gene 2
Multiple
sequences
(experimental
variety)
genome
NA Sequence
AA Sequence
Central dogma and sequences
Gene
RNA
Protein
Gene
Instance
RNA
Instance
Protein
Instance
Gene
Feature
RNA
Feature
Protein
Feature
NA Sequence
AA Sequence
Obtaining and Using GUS
www.gusdb.org
More info at www.gusdb.org/documentation
Active gusdev mailing list
Relatively straightforward to install
Loading data a struggle for new users
Growing number of tools available
Addressing how to use and write tools with visits
Web Development Kit (WDK) to generate web
sites on GUS
Current GUS Developers
At Penn
Steve Fischer: Project manager, WDK,
Elisabetta Manduchi: RAD project manager, RAD study annotator
Angel Pizarro: Schema development, proteomics, MAGE export
Mike Saffitz: DBA, web services, Postgres
Dave Barkan: WDK, GO pipeline, Apollo interface
Thomas Gan: WDK, genomic alignments pipeline
John Iodice: ApiDoTS pipeline, data loading
Li Li: OrthoMCL pipeline
Junmin Liu: RAD websites, expression displays
Debbie Pinney: Data loaders, Hum and MusDoTS pipeline
Jonathan Schug: TESS, architecture and schema development
Trish Whetzel: Data loading, RAD, schema development
Plus rest of group contributes through various GUS-based projects
Pathogen Sequencing Unit, Sanger Institute
Kissinger Group, U. of Georgia
Terry Clark, U. of Chicago
WDKTestSite
Developed in collaboration with Adrian Tivey&
Marie-Adele Rajandream (PSU, Sanger Institute)
The PlasmoDB Team
Shailesh Date
Kobby Essien
Martin Fraunholz
Bindu Gajria
Greg Grant
John Iodice
Jessie Kissinger
Philip Labo
Li Li
Jules Milgram
David Roos
Chris Stoeckert
Trish Whetzel
NIAID grant:
R01 AI058515
GUS supports a wide variety of queries
Suppose you want to find all kinases in P. falciparum
Gene Report Pages Integrate Genomics and
Functional Genomics
RAD Study-Annotator
Covers the MIAME checklist and exploits
the MGED Ontology
Allows entering of very specific details of an
experiment
Web-based forms:
Modular structure
Written in PHP
Front-end data integrity checks using
JavaScript
Manages Data Privacy based on
Project/Group selections present in GUS
schema
Manduchi et al. 2004 Bioinformatics 20:452-459.
Vision for GUS
Installable for every lab
Extendable to all areas of functional genomics
Improve install scripts, documentation
Postgres version
Sequence, array-based expression experiments
Array CGH, 2-D gel electrophoresis, mass spectrometry,
yeast 2-hybrids
In situ hybridizations, metabolites
Interoperable with other GUS installations and with
common tools
Exchange files and scripts, MAGE-ML (use community
standards)
Web services (exchange objects)
Interface with open source tools such as Gbrowse, Artemis,
Apollo
Standards and Ontologies for Functional
Genomics 2
October 23-26, 2004
held at the University of Pennsylvania Medical
School
www.jax.org/courses/events
Co-Hosted by
The Jackson
Laboratory
University of
Pennsylvania
European
Bioinformatics
Institute
Student Scholarships
Available
--------------------------------------------------------
Funded in part by
NHGRI
NCRR
NERC
GSK
Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC