Transcript Document

The Gene Ontology Project:
Content for the Semantic Web
GO Project Goals
• Compile structured vocabularies describing
aspects of molecular biology
• Describe gene products using vocabulary terms
(annotation)
• Develop tools:
• to query and modify the vocabularies and
annotations
• annotation tools for curators
GO Data
GO provides two bodies of data:
• Terms with definitions and crossreferences
• Gene product annotations with
supporting data
The Three Ontologies
•Molecular Function — elemental activity or task
nuclease, DNA binding, transcription factor
•Biological Process — broad objective or goal
mitosis, signal transduction, metabolism
•Cellular Component — location or complex
nucleus, ribosome, origin recognition complex
DAG Structure
Directed acyclic graph: each child
may have one or more parents
Relationship Types
• is-a
subclass; a is a type of b
• part-of
physical part of (component)
subprocess of (process)
The True Path Rule
Every path from a node back to the
root must be biologically accurate
GO Terms: Associated Data
• ID
• Text string
• Definition with source
• Synonyms (optional)
• Cross-references (optional)
GO Terms: Cross-References
• Enzyme Commission (EC)
• Transport Commission (TC)
• University of Minnesota Biocatalysis/
Biodegradation Database (UM-BBD)
• MetaCyc
GO Annotation
• Association between gene product and
applicable GO terms
• Provided by member databases
• Made by manual or automated methods
GO Annotation: Data
• Database object: gene or gene product
• GO term ID
• Reference
•publication or computational method
• Evidence supporting annotation
DAG Structure
Annotate to any level within DAG
The Future of GO:
• Improve coverage:
• Developmental processes
• Physiological processes
• Relational database
• Support ontology development for
additional domains of biology
Terms outside the Scope of GO
• Names of gene products
• Protein domains
• Protein sequence features
• Phenotypes; diseases
• Anatomical terms
generated by cross-products)
(except as part of terms
The GOBO Proposal
• Global Open Biology Ontologies
• Umbrella site for shared genomics and
proteomics vocabularies
• Present incarnation: subdirectory within
GO repository:
ftp://ftp.geneontology.org/pub/go/gobo/README
www.geneontology.org
•
•
•
•
•
FlyBase & Berkeley Drosophila Genome Project
Saccharomyces Genome Database
Mouse Genome Informatics
The Arabidopsis Information Resource
Swiss-Prot/TrEMBL/InterPro
• WormBase
• DictyBase
• Compugen, Inc
• Pathogen Sequencing Unit (Sanger Institute)
• PomBase (Sanger Institute)
• Rat Genome Database
• Genome Knowledge Base (CSHL)
• The Institute for Genomic Research
The Gene Ontology Consortium is
supported by NHGRI grant HG02273
(R01). The Gene Ontology project
thanks AstraZeneca for financial
support. The Stanford group
acknowledges a gift from Incyte
Genomics.
Conference:
Standards and Ontologies for
Functional Genomics (SOFG)
Towards unified ontologies for describing biology
and biomedicine
17 – 20 November 2002
Hinxton Hall Conference Centre
Hinxton, Cambridge, UK
www.ebi.ac.uk/SOFG/
First Standards and Ontologies
for Functional Genomics
(SOFG)
17-20 November 2002,
Hinxton, UK
Keynote Speakers
Ken Buetow, NCI, USA
Win Hide, SANBI, South Africa
Peter Karp, SRI International, USA
Aims and Objectives
• Bring together scientists developing
standards and ontologies, both biologists,
bioinformaticians and computer scientists
Topics
•
•
•
•
•
•
•
•
Introduction to Ontologies
Tools for building ontologies
Go and related ontologies
Species specific ontologies
Implementation
Inter-ontology mapping
Ontologies for pathology, toxicology
Chemical ontologies
Structure
•
•
•
•
•
3 keynote speakers
~20 invited talks
10 short talks selected from poster abstracts
Panel discussion
Parallel working groups/tutorials
Programme Committee
Michael Ashburner, University of Cambridge, UK (Chair)
Cathy Ball, Stanford University, USA
Mike Bittner, NHGRI, USA
Alvis Brazma, EMBL-EBI, UK
Catherine Brooksbank, EMBL-EBI, UK
Duncan Davidson, MRC HGU, Edinburgh, UK
Liz Ford, EMBL-EBI, UK
Midori Harris, EMBL-EBI, UK
Victor Markowitz, Gene Logic, USA
Helen Parkinson, EMBL-EBI, UK
John Quackenbush, TIGR, USA
Martin Ringwald, The Jackson Laboratories, USA
Steffen Schulze-Kremer, RZPD, Germany
Paul Spellman, U.C. Berkeley, USA
Robert Stevens, University of Manchester, UK
Chris Stoeckert, University of Pennsylvania, USA
URL
http://www.ebi.ac.uk/microarray/General/Events/SOFG/SOFG.html
The True Path Rule
cell wall biosynthesis
cuticle synthesis
chitin metabolism
chitin biosynthesis
chitin catabolism
chitin metabolism: before revision
The True Path Rule
chitin metabolism: after revision
The True Path Rule
chitin metabolism
chitin biosynthesis
cuticle synthesis
cuticle chitin metabolism
cuticle chitin biosynthesis
chitin metabolism: after revision
GOBO Criteria
• Open source
• Can be instantiated in DAML+OIL
or GO syntax
• Orthogonal
• Shared ID space
• Defined terms
DAG Cross-Products
hexose
glucose
fructose
metabolism
biosynthesis
catabolism
hexose metabolism
hexose biosynthesis
glucose biosynthesis
fructose biosynthesis
hexose catabolism
glucose catabolism
fructose catabolism
glucose metabolism
... etc.
Some GOBO Ontologies
gene
gene_attribute
gene_structure
SO
gene_variation
ME
gene_product
gene_product_attribute
molecular_function GO
protein_family
INTERPRO
phenotype
mutant phenotype
anatomy
For complete current draft see
ftp://ftp.geneontology.org/pub/go/gobo/README
What GO is NOT:
• Not a way to unify biological databases
• Not a dictated standard
• Does not define evolutionary relationships
• Additional ontologies needed to model
biology and experimentation
DAG Structure
mitosis
S.c. NNF1
mitotic chromosome condensation
S.c. BRN1, D.m. barren
Annotate to any level within DAG
Using GO Annotation:
Example Workflow
text
ID
synonyms
definition
cross-reference
Using GO Annotation:
Example Workflow