Outline - Gene Ontology

Download Report

Transcript Outline - Gene Ontology

Joined up ontologies:
incorporating the Gene
Ontology into the UMLS
The Gene Ontology (GO)

Controlled vocabulary for describing
molecular biology



hierarchical
multiple parentage allowed
defined terms
Structure of GO
(Created using the tool GenNav, developed at
NLM)
The ontologies
What does it do?
What processes is
it involved in?
Where does it act?
gene product
The ontologies
What does it do?
molecular function
What processes is
it involved in?
Where does it act?
gene product
The ontologies
What does it do?
molecular function
What processes is
it involved in?
biological process
Where does it act?
gene product
The ontologies
What does it do?
molecular function
What processes is
it involved in?
biological process
Where does it act?
gene product
cellular component
Gene annotation: assigning GO terms to
gene products




Genes or gene products
GO terms “linked” to gene products
Gene products annotated to all 3 ontologies
May be linked to more than one term in each ontology
nucleus
regulation of transcription
ATP dependent helicase
DNA binding
Queries across databases
fly
rat
nuclease
signal transducer
mouse
DNA binding
helicase
regulation of transcription
membrane
osmosensory signaling
pathway
yeast
cytoplasm
toxin catabolism
DNA binding
nucleus
mitotic cell cycle
nucleus
Find me all gene products with ‘DNA binding
activity’…
Associating with different levels of ontology
(Created using the tool GenNav, developed at
NLM)
GO and other systems

Useful to equate GO with other systems

Mappings files


References in GO


e.g. ec2go
as dbxrefs e.g. BioCyc
References in other systems
e.g. BRENDA (in process)
 UMLS Metathesaurus

GO into UMLS

Unified Medical Language System




Long-term project at NLM
Three parts: specialist lexicon; sematic network;
Metathesaurus
Metathesaurus interrelates biomedical vocabularies
Includes ~60 vocabularies including SNOMED and MeSH.
Inserting GO into UMLS

inversion


insertion


converting GO to correct format for UMLS
inserting GO using matching algorithms
editing

all concepts containing GO term reviewed by hand
7.34 %
Statistics
CSP2002 (Computer Retrieval of
Information on Scientific Projects
Thesaurus)
11.05 %
SNMI98 (Systemized
Nomenclature of Human and
Veterinary Medicine)
SNOMED
CRISP
GO
MeSH

19.74 %
MSH2003_2002_08_14 (Medical
Subject Headings)
% of GO in sources with other
concepts, by source
Potential applications

Mining abstracts using GO terms:
DNA helicase ;
GO:0003678
UMLS
GO <-> MeSH
MeSH term
Status of GO into UMLS

Molecular function ontology already
inserted

Hope to insert other two ontologies by
April

Release GO with UMLS by end of year
www.geneontology.org
•FlyBase & Berkeley Drosophila Genome Project
•Saccharomyces Genome Database
• PomBase (Sanger Institute)
• Rat Genome Database
• Genome Knowledge Base (CSHL)
• The Institute for Genomic Research
• Compugen, Inc
•The Arabidopsis Information Resource
•WormBase
•DictyBase
•Mouse Genome Informatics
•Swiss-Prot/TrEMBL/InterPro
•Pathogen Sequencing Unit
(Sanger Institute)
•National Library of Medicine
•Alexa McCray
•Stuart Nelson
•Bill Hole
•Oak Ridge Institute for Science and Education
•National Library of Medicine
•U. S. Department of Energy
The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a
P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and
Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the
National Science Foundation [grant DBI-9978564]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by
an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].