GUS_Web_Applications

Download Report

Transcript GUS_Web_Applications

GUS
We have created the Genomic Unified Schema
(GUS), a relational database that warehouses
and integrates biological sequence, sequence
annotation, and gene expression data from a
large number of heterogeneous sources. Userfriendly web interfaces present slices of the GUS
database and allow researchers to execute
structured queries for information concerning
gene structure, function, and expression.
Please visit poster #146A for details of the Genomics Unified Schema (GUS).
GUS Supports Multiple Projects
AllGenes
Allgenes is based on a comprehensive
mouse and human gene index. The genes
are approximated by transcripts predicted
from EST and mRNA clustering
PlasmoDB
PlasmoDB is the official database of the
Plasmodium falciparum genome project which
provides an integrated view of genome
sequence data including expression data from
EST, SAGE, and microarray projects
EPConDB
EPConDB is an index of genes expressed in
endocrine pancreas. Expression is defined
either through microarray experiments or
sequence annotation.
allgenes.org query
"Is my cDNA similar to any mouse genes that are
predicted to encode transcription factors and have
been localized to mouse chromosome 5?"
This query illustrates several aspects of the GUS database including:
Data Integration
Data Analysis
Tools
•RHMap
•GOFunction
•Sequence
•GOFunction
assigments
•Boolean function
•History function
•BLAST
http://www.allgenes.org/
Select the allgenes.org boolean query page
Click on the "AND" button
Choose the RH map and GO function queries
Select mouse chromosome 5 and "transcription factor"
There are 22 mouse RNAs (assemblies) that meet these criteria:
This query result set now appears on the query "history" page:
Now use the BLAST page to identify RNAs similar to my cDNA
The results of the BLAST search appear in the query history
Intersect ("AND") the BLAST search with the previous query:
And we have our answer (the third row on the query history page):
Other transcripts from
the same gene
Predicted GO function(s)
(some manually reviewed)
External links
Mapping information
Protein/motif hits
Gene trap insertions,
etc.
predicted protein
CAP4 assembly EST expression profile UCSC BLAT
PlasmoDB query:
"List all genes whose proteins are predicted to
contain a signal peptide and for which there is
evidence that they are expressed in Plasmodium
falciparum's late schizont stage."
This query illustrates several aspects of the GUS database including:
Data Integration
•Predicted genome
translation
•Microarray
expression
Data Analysis
•Spot intensity
http://plasmodb.org/
Tools
•History function
Select Text Search from the PlasmoDB homepage
Choose signal
peptide
Choose chromosome and Gene/prediction type-submit
There are 1952 genes
with predicted signal
peptides
Choose gene expression-microarray from the homepage
Then choose an experiment, chromosome, and Gene/prediction type
- submit
There are 12170 gene
predictions that satisfy
this query
Go to the history page and choose which simple
queries to combine. Select intersect.
We have an answer.
There are 949
predicted genes that
satisfy our complex
query
Click on a gene
to get a full
report
There is a variety of
information available
from the report page
including:
Gene models predicted using a
variety of approaches
and mRNA and protein predictions
EPConDB
query:
"Which DOTS assemblies (RNA) represented on
the Endocrine Pancreas Consortium’s chip 2.0 are
constituents of the insulin initiated signal
transduction pathway ?"
Data Integration
•Sequence
•Microarray
experiment
•Transduction
pathway
Data Analysis
•BLAST
Tools
•History function
http://www.cbil.upenn.edu/EPConDB
Go to the gene information query page and click on
“DOTS assemblies involved in a pathway”
Choose the insulin pathway, a p-value, pancreas, the species, and
whether an assembly must include an mRNA - submit
There are 59 dots
assemblies that are
constituents of the
insulin pathway
Return to the gene information query page and select
clones sets. Choose chip 2.0 - submit
There are 3242 assemblies represented on chip 2
Go to the history page, select the queries to
combine and select intersect – view the results
There are 8
assemblies
that satisfy
the complex
query.
Clicking on an
RNA retrieves
an allgenes
report.
Acknowledgements and
References
The Plasmodium Genome Consortium
Sanger http://www.sanger.ac.uk/Projects/P_falciparum
TIRG/NMRC http://www.tigr.org/tdb/edb2/pfal/htmls
Stanford http://sequence-www.stanford.edu/group/malaria/
The many researchers who have contributed data and software to the database
Funding Agencies
National Institutes of Health, Wellcome Trust, US Dep’t of Defense,
Burroughs Wellcome Fund, World Health Organization, etc
The research community who has supported these large-scale ventures for the benefit of all
References
1. K2/Kleisli and GUS: Experiments in integrated access to genomic data sources (2001) Davidson,
S.B., J. Crabtree, B.P. Brunk, J. Schug, V.Tannen, G.C. Overton and C.J. Stoeckert, Jr. IBM Systems
Journal 40(2):1-20
2. A relational schema for both array-based and SAGE gene expression experiments (2001)
Stoeckert, C., A. Pizarro, E. Manduchi, M. Gibson, B. Brunk, J. Crabtree, J. Schug, S. Shen-Orr and G.C.
Overton. Bioinformatics 17(4):300-308
3. The GUS schema is available at http://www.allgenes.org/cgi-bin/schemaBrowser.pl
4.The RAD schema is available at http://www.cbil.upenn.edu/cgi-bin/RAD2/schemaBrowserRAD.pl
Acknowledgements
Funding:
National Institutes of Health, Wellcome Trust, US Dep’t of Defense,
Burroughs Wellcome Fund, World Health Organization, etc
EPConDB is part of the NIDDK-sponsored consortium on "Functional
Genomics of the Developing Endocrine Pancreas". We gratefully
acknowledge support through NIDDK 56947 and 56954 with
cosponsorship from the JDFI.
allgenes
.org
Funding for allgenes.org is provided by NIH grant RO1-HG01539-03 and DOE grant DE-FG02-00ER62893
References
Bahl, A., Brunk, B., Coppel, R.L., Crabtree, J., Diskin, S.J., Fraunholz, M.J., Grant, G.R., Gupta,
D., Huestis, R.L., Kissinger, J.C., Labo, P., Li, L., McWeeney, S.K., Milgram, A.J., Roos, D.S.,
Schug, J., Stoeckert, C.J. (2002) PlasmoDB: The Plasmodium Genome Resource. An integrated
database providing tools for accessing and analyzing mapping, expression and sequence data
(both finished and unfinished). Nucleic Acids Res. 2002 30: 87-90
Davidson, S.B., Crabtree, J., Brunk, Brian P., Schug, J., Tannen, V., Overton, G.C., Stoeckert, C.J.
Jr. (2001) K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM
Systems Journal: 40(2), p. 512-531.
Scearce, L. Marie, Brestelli, John E., McWeeney, Shannon K., Lee, Catherine S., Mazzarelli, Joan,
Pinney, Deborah F., Pizarro, Angel, Stoeckert, C. J. Jr., Clifton, Sandra, Permutt, M. Alan, Brown,
Juliana, Melton, Douglas A., Kaestner, Klaus H. (2002) Functional Genomics of the Endocrine
Pancreas: The Pancreas Clone Set and PancChip, New Resources for Diabetes Research
Diabetes 51: 1997-2004, 2002.
The Plasmodium Genome Database Collaborative (2001) PlasmoDB: An integrative database of
the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished
sequence data. Nucleic Acids Res., 2001, Vol. 29, No. 1 66-69