Transcript Slide 1
PIR and caBIG™
Baris E. Suzek, Hongzhan Huang, Hsing-Kuo Hua, Peter McGarvey and Cathy H. Wu
Protein Information Resource, Georgetown University, Washington, DC 20007
The cancer Biomedical Informatics Grid, or caBIG™, is a voluntary network
or grid connecting individuals and institutions to enable the sharing of data and
tools, creating a World Wide Web of cancer research.
https://cabig.nci.nih.gov/
The goal is to speed the delivery of innovative approaches for the prevention and treatment of cancer. The
infrastructure and tools created by caBIG™ also have broad utility outside the cancer community. caBIG™ is being
developed under the leadership of the National Cancer Institute's Center for Bioinformatics.
Grid-Enablement of Protein Information Resource (gridPIR)
PIR developed one of four reference projects for the first year of caBIG.
The initial development phase of gridPIR has been completed and the UniProKB data is currently available and
searchable on the first release of the caGrid.
Enterprise Vocabulary
System Annotations
Common
Data
Elements
http://cagrid-browser.nci.nih.gov/
Object
Model
caBIG Query
Language
Search proteins for a gene
using caBIG Query language
Development
Process Flow
Structure
Family
Protein Sequence
PDB
SCOP
CATH
PDBSum
MMDB
PIRSF
InterPro
Pfam
Prosite
COG
UniProt
UniRef
UniParc
RefSeq
GenPept
…
…
…
Function/Pathway
iProClass
Integrated Protein
Knowledgebase
…
Protein Expression
Interaction
Ontology
…
…
Taxonomy
GO
RESID
PhosphoBase
…
GeneConnect
GEO
GXD
ArrayExpress
CleanEx
SOURCE
OMIM
HapMap
…
SEED
…
Disease/Variation
Swiss-2DPAGE
PMG
Modification
GenBank/EMBL/DDBJ
LocusLink
UniGene
MGI
TIGR
Gene Expression
EC-IUBMB
KEGG
BioCarta
EcoCyc
WIT
System Overview
Gene/Genome
DIP
BIND
…
NCBI Taxon
NEWT
Literature
PubMed
GeneConnect
Database
IDs are linked by
Direct Annotation,Mappings
Inferred Annotation, or Sequence Alignment
Proposed
Identifier
Ensembl Gene
Entrez Gene
Gene
SEED (http://theseed.uchicago.edu/FIG/index.cgi) is a
powerful tool for the analysis and annotation of genomes.
The University of Chicago and Argonne National Labs have
modified SEED for caBIG to better serve the cancer research
community. PIR, as caBIG adopter for this project, worked
with the developers to help design and test the SEED
modifications.
Object
Model
GeneConnect is an
identifier mapping
service designed to
facilitate data
integration and
semantic
interoperability.
PIR is caBIG adopter for this project, working with
the developers to help design and test the
GeneConnect software.
UniGene
GenBank mRNA
(no RefSeq)
mRNA
Ensembl Transcript
RefSeq mRNA
Ensembl Protein
RefSeq Protein
GenBank Protein
(no RefSeq)
Protein
UniProtKB
Thick lines indicate links where exact and inexact matches are possible
Revised
Vocabulary and Common Data Elements
(VCDE) Workspace Participation
• Mentoring and reviewing caBIG funded software
projects to ensure caBIG compatibility guidelines
are met
SEED Web
Interface
System
Overview
http://pir.georgetown.edu/pirwww/cabig.shtml
• Development and/or review of data and
vocabulary standards to enhance semantic
interoperability
Contact
[email protected]