Information Encoding in Biological Molecules: DNA and

Download Report

Transcript Information Encoding in Biological Molecules: DNA and

Protein Pathways and Pathway
Databases
Shan Sundararaj
University of Alberta
Edmonton, AB
[email protected]
Lecture 4.3
1
Interactions  Networks  Pathways
• A collection of interactions defines a network
• Pathways are a subset of networks
– All pathways are networks of interactions,
however not all networks are pathways!
– Difference in the level of annotation/understanding
• We can define a pathway as a biological
network that relates to a known physiological
process or phenotype
Lecture 4.3
2
Pathways
• However, there is no precise biological
definition of a pathway
• Our partitioning of networks into pathways is
somewhat arbitrary
– We choose the start/finish points based on
“important” or easily understood compounds
– Gives us the ability to conceptualize the mapping
of genotype  phenotype
Lecture 4.3
3
Biological pathways
• There are 3 type of interactions that can be
mapped to pathways:
1) enzyme – ligand
• metabolic pathways
2) protein – protein
• cell signaling pathways
• complexes for cell processes
3) gene regulatory elements – gene products
• genetic networks
Lecture 4.3
4
Pathways are inter-linked
Signalling pathway
Genetic
network
STIMULUS
Metabolic pathway
Lecture 4.3
5
Metabolic Pathways
1993 Boehringer Mannheim GmbH - Biochemica
Lecture 4.3
6
What the pathway represents
•
•
•
•
•
•
•
•
Metabolites involved
Enzymes/transport proteins
Order of reactions
General biological function
Reaction rates
Expression data
Inhibitors, activators, alternate pathways
Genetic regulatory information
Lecture 4.3
7
Describing metabolic networks
• Classical biochemical pathways
– glycolysis, TCA cycle, etc.
• Stoichiometric modeling
– flux balance analysis, extreme pathways
• Kinetic modeling (CyberCell, E-cell, …)
– Need to accumulate comprehensive kinetic
information
Lecture 4.3
8
Complexity
• Pathways involve multiple enzymes, which may have
multiple subunits, alternate forms, alternate
specificities
• Enzymes may be involved in multiple pathways
• Malate dehydogenase appears in 6 different metabolic
pathways in some databases
Lecture 4.3
9
Metabolic Pathway Reconstruction
• Given a genomic sequence, we can infer
what metabolic pathways are available to an
organism
• Used to design culture medium for
Tropheryma whipplei by seeing what nutrients
were essential for growth (Renesto et al.,
Lancet, 362, 447-449, 2003)
Lecture 4.3
10
Co-expression within pathways
• Tempting thought: genes that occur within the same
pathway will show similar expression profiles
• Reality: depends greatly on how you identify your
pathways, KEGG pathways show at best 50% coexpression in survey of available yeast expression
data (Ihmels et al., Nat Biotechnol. 22, 86-92, 2004).
• Expression levels do not correlate very well with
protein interactions (unless they are “stable”
complexes, maintained in many different conditions)
Lecture 4.3
11
Pathway Databases
•
•
•
•
•
•
•
KEGG
BioCyc
Reactome
GenMAPP
BioCarta
TransPATH
…175 more at Pathway Resource List
http://www.cbio.mskcc.org/prl/index.php
Lecture 4.3
12
BioPAX
(www.biopax.org)
• Collaborative effort to create a data exchange
format for biological pathway data
Lecture 4.3
13
KEGG
•
•
•
•
5904 chemical reactions
15,037 pathways
229 reference pathways
85 ortholog tables
• 181 organisms
http://www.genome.ad.jp/kegg/
Lecture 4.3
14
KEGG
• GENES Database
– The universe of genes and proteins in complete
genomes
• LIGAND Database
– The universe of chemical reactions involving
metabolites and other biochemical compounds
• Pathway Database
– Molecular interaction networks, metabolic and
regulatory pathways, and molecular complexes
Lecture 4.3
15
Connection between KEGG and other
Databases
Lecture 4.3
16
Pathways
• Represented as
diagrams, manually
created, stored as gifs
• Easy to link to, highlight
genes of interest
• Generate orthologous
pathways in other
organisms
Lecture 4.3
2.7.2.4
1.2.1.11
1.1.1.3
2.3.1.46
2.5.1.48
4.4.1.8
2.1.1.13
2.5.1.6
17
http://www.biocyc.org/
Lecture 4.3
18
BioCyc
• The primary database was EcoCyc (E. coli)
• 21 more curated pathway/genome databases (PGDB),
each focusing on one organism (e.g. HumanCyc)
– Also 142 more non-curated (computationally generated) pathways
• MetaCyc database contains non-redundant reference
pathways from more than 240 organisms
• Supports “Pathway Tools” software suite to analyze
PGDBs, and “PathoLogic” pathway prediction program
for new genomes
Lecture 4.3
19
BioCyc
• Each PGDB includes
info about:
–
–
–
–
Pathways, reactions, substrates
Enzymes, transporters
Genes, replicons
Transcription factors, promoters,
operons, DNA binding sites
• MetaCyc and EcoCyc
are literature-based,
the others are computationally derived
Lecture 4.3
Pathways
Reactions
Compounds
Proteins
Genes
Operons,
Promoters,
DNA Binding Sites
Chromosomes,
Plasmids
20
164 datasets
Query by
protein, gene,
compound,
reaction,
pathway
BLAST sequence
if protein name
unknown
Lecture 4.3
21
MetaCyc Statistics
Lecture 4.3
22
EcoCyc Statistics
Lecture 4.3
23
BioCyc: Pathway Tools
(Adapted from Pathway Tools tutorial, http://bioinformatics.ai.sri.com/ptools/)
• Full Metabolic Map
– Paint gene expression data on
metabolic network; compare
metabolic networks
• Pathways
– Pathway prediction (PathoLogic)
• Reactions
– Balance checker
• Compounds
– Chemical substructure comparison
• Enzymes,Transcription Factors
• Genes: Blast search
• Operons
– Operon prediction
Lecture 4.3
24
PathoLogic – Making PGDBs
Lecture 4.3
25
Completeness of Pathways
Lecture 4.3
26
Completeness of Pathways
Lecture 4.3
27
Issues with predicting pathways
• Predicting metabolic pathways from genome:
–
–
–
–
–
Predict genes
Assign enzymatic function to genes
Look for enzymes unique to pathway
Check if pathway is “balanced” (no holes)
Try to fill holes by re-searching genome
Lecture 4.3
28
Reactome
http://www.reactome.org/
Lecture 4.3
29
Reactome
• Joint venture of CSHL and EBI (supercedes
the Genome Knowledgebase project)
• Curated database of biological processes in
humans
– Also rat, mouse, fugu, zebrafish, chicken
• Everything referenced by curators to literature
citation or inference based on sequence
similarity
Lecture 4.3
30
Reactome model
• Model reactions: (input_entities) (output_entities)
• Distinguishes between modified/unmodified
proteins (modification is an explicit reaction)
• Highly annotated at every step, very
micromanaged, hope to find interesting links
between reactions
Lecture 4.3
31
Reactome: PathFinder
• Pathfinding
between distant
processes
• Enter two
molecules or
events and see if
they can be
joined together
by reactions
Lecture 4.3
32
Reactome: SkyPainter
• Find all reactions that contain a molecule or
event
– Very flexible input, any one or more of:
•
•
•
•
Lecture 4.3
protein/gene ID (UniProt, Genbank or others)
protein/gene sequence
GO or OMIM identifier
time series from a gene expression study
33
Reactome: SkyPainter
• Starry sky output
• If expression data used, you get different colours for
different levels of expression
• If time series available, you can make an
animation
Lecture 4.3
34
GenMAPP
(www.genmapp.org)
• Designed to rapidly analyze gene profiling
data in the context of known biochemical
pathways
• Pathways (MAPPs) are authored by experts,
as well as adapting several pathways from
KEGG
• Pathways easily web-queryable
• Free for all users
• But… Windows platform only
Lecture 4.3
35
GenMAPP
• Easy to draw/edit pathways
• Color genes from user imported expression data
Lecture 4.3
36
MAPPFinder – maps to GO ontology
Lecture 4.3
37
BioCarta
(www.biocarta.com)
Lecture 4.3
38
BioCarta
• Not a public database, but offers free,
clickable, graphics-rich pathway database
and gene information
– Community annotation
• Easy to use glyph system for genes
• 355 pathways
– mostly human/mouse metabolic and signaling
pathways
Lecture 4.3
39
TransPATH
Lecture 4.3
40
TransPATH
• Part of larger BioBase package (commercial)
• PathwayBuilder package for network
visualization
• Highly integrated with signaling networks and
transcription factor networks (TransFAC)
• Linked to extensive enzyme information in
BRENDA (www.brenda.uni-koeln.de/)
• 28,456 molecules; 52,007 reactions; 54 handdrawn pathways
Lecture 4.3
41
Pathway Database Comparison
KEGG
BioCyc
GenMAPP
Reactome
BioCarta
TransPATH
181
(varied)
E.Coli,
human (20
others)
Human,
mouse, rat,
fly, yeast
Human, rat,
mouse,
chicken, fugu,
zebrafish
Human,
mouse
Human,
mouse
Pathway
types
Metabolic,
genetic,
signaling,
complexes
Metabolic,
complexes
Metabolic,
signaling,
complexes
Metabolic,
signaling,
complexes
Metabolic,
signaling,
complexes
Signaling,
genetic
Tools/
visualization
linked to
from many
Pathway
Tools
GenMAPP
PathView
applets
none
Pathway
Builder
Images
Static box
flow
diagrams
Detailed
flow
diagrams
Static box
flow
diagrams
“starry sky”
“Graphics
rich” cell
diagrams
Graphics
rich cell
diagrams
KGML
XML
BioPax
SBML
MAPP
format
SBML
MySQL
Just
images
Propietary
XML files
Organisms
Download
Formats
Lecture 4.3
42
Conclusion
• Pathway databases are continually evolving,
and are an important abstract mid-level of
expressing data: between genes/proteins and
observable phenotypes
• Metabolic pathways are most well
studied/modeled
• Many different formats of storage and display,
but moving towards standards (PSI-MI,
Biopax)
Lecture 4.3
43