Gene Ontology (GO)
Download
Report
Transcript Gene Ontology (GO)
Bioinformatics master course
DNA/Protein structure-function analysis and prediction
Lecture 13: Protein Function
Centre for Integrative Bioinformatics VU (IBIVU)
Faculty of Sciences / Faculty of Earth & Life
Sciences
Sequence-Structure-Function
Sequence
Threading
Folding: impossible
but for the smallest
structures
Ab initio
Structure
BLAST
Function
Function prediction
from structure –
very difficult
Experimental
•
•
•
•
Structural genomics
Functional genomics
Protein-protein interaction
Metabolic pathways
• Expression data
Protein function categories
• Catalysis (enzymes)
• Binding – transport (active/passive)
– Protein-DNA/RNA binding (e.g. histones, transcription factors)
– Protein-protein interactions (e.g. antibody-lysozyme) (experimentally determined by yeast two-hybrid (Y2H) or bacterial two-hybrid
(B2H) screening )
– Protein-fatty acid binding (e.g. apolipoproteins)
– Protein – small molecules (drug interaction, structure decoding)
• Structural component (e.g. -crystallin)
• Regulation
• Signalling
• Transcription regulation
• Immune system
• Motor proteins (actin/myosin)
Km
kcat
• E+S
ES
E+P
E = enzyme
S = substrate
ES = enzyme-substrate complex (transition state)
P = product
Km = Michaelis constant
kcat = catalytic rate constant (turnover number)
Kcat/Km = specificity constant (useful for comparison)
Moles/s
Catalytic properties of enzymes
Vmax
Vmax/2
Km
[S]
Vmax × [S]
V = ------------------- Michaelis-Menten equation
Km + [S]
Protein interaction domains
http://pawsonlab.mshri.on.ca/html/domains.html
Energy difference upon binding
Examples of protein interactions (and functional importance) include:
• Protein – protein (pathway analysis);
• Protein – small molecules (drug interaction, structure decoding);
• Protein – peptides, DNA/RNA (function analysis)
The change in Gibb’s Free Energy of the protein-ligand binding interaction can be monitored
and expressed by the following;
G = H – T S
(H=Enthalpy, S=Entropy and T=Temperature)
Protein function
• Many proteins combine functions
• Some immunoglobulin structures are thought to have more than 100
different functions (and active/binding sites)
• Alternative splicing can generate (partially) alternative structures
Protein function
Protein-protein interaction
Active site / binding cleft
Shape complementarity
Protein function evolution
Chymotrypsin
How to infer function
• Experiment
• Deduction from sequence
– Multiple sequence alignment – conservation patterns
– Homology searching
• Deduction from structure
– Threading
– Structure-structure comparison
– Homology modelling
Cholesterol biosynthesis primarily
occurs in eukaryotic cells. It is
necessary for membrane synthesis,
and is a precursor for steroid
hormone production as well as for
vitamin D. While the pathway had
previously been assumed to be
localized in the cytosol and ER,
more recent evidence suggests that
a good deal of the enzymes in the
pathway exist largely, if not
exclusively, in the peroxisome (the
enzymes listed in blue in the
pathway to the left are thought to be
at least partly peroxisomal). Patients
with peroxisome biogenesis
disorders (PBDs) have a variable
deficiency in cholesterol
biosynthesis
Mevalonate plays a role in epithelial cancers: it can inhibit EGFR
Epidermal Growth Factor as a Clinical Target in Cancer
Introduction:
A malignant tumour is the product of uncontrolled cell proliferation. Cell growth is controlled by a delicate
balance between growth-promoting and growth-inhibiting factors. In normal tissue the production and activity of
these factors results in differentiated cells growing in a controlled and regulated manner that maintains the normal
integrity and functioning of the organ. The malignant cell has evaded this control; the natural balance is disturbed
(via a variety of mechanisms) and unregulated, aberrant cell growth occurs. A key driver for growth is the
epidermal growth factor (EGF) and the receptor for EGF (the EGFR) has been implicated in the development and
progression of a number of human solid tumours including those of the lung, breast, prostate, colon, ovary, head
and neck.
Energy housekeeping:
Adenosine diphosphate (ADP) – Adenosine triphosphate (ATP)
Metabolic
networks
Glycolysis
and
Gluconeogenesis
Kegg database (Japan)
Gene Ontology (GO)
• Not a genome sequence database
• Developing three structured, controlled vocabularies (ontologies) to describe gene products
in terms of:
– biological process
– cellular component
– molecular function
in a species-independent manner
The GO ontology
Gene Ontology Members
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
FlyBase - database for the fruitfly Drosophila melanogaster
Berkeley Drosophila Genome Project (BDGP) - Drosophila informatics; GO database & software, Sequence Ontology development
Saccharomyces Genome Database (SGD) - database for the budding yeast Saccharomyces cerevisiae
Mouse Genome Database (MGD) & Gene Expression Database (GXD) - databases for the mouse Mus musculus
The Arabidopsis Information Resource (TAIR) - database for the brassica family plant Arabidopsis thaliana
WormBase - database for the nematode Caenorhabditis elegans
EBI GOA project : annotation of UniProt (Swiss-Prot/TrEMBL/PIR) and InterPro databases
Rat Genome Database (RGD) - database for the rat Rattus norvegicus
DictyBase - informatics resource for the slime mold Dictyostelium discoideum
GeneDB S. pombe - database for the fission yeast Schizosaccharomyces pombe (part of the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute)
GeneDB for protozoa - databases for Plasmodium falciparum, Leishmania major, Trypanosoma brucei, and several other protozoan parasites (part of the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute)
Genome Knowledge Base (GK) - a collaboration between Cold Spring Harbor Laboratory and EBI)
TIGR - The Institute for Genomic Research
Gramene - A Comparative Mapping Resource for Monocots
Compugen (with its Internet Research Engine)
The Zebrafish Information Network (ZFIN) - reference datasets and information on Danio rerio