From Databases to Dynamics
Transcript From Databases to Dynamics
From Databases to Dynamics
Dr. Raquell M Holmes
Center for Computational Science
• Computational Biology Bioinformatics
– More than sequences, database searches,
statistics or image analysis.
• A part of Computational Science
– Using mathematical modeling, simulation and
– Complementing theory and experiment
– Publicly available
• Why so many?
– Metabolic Pathway databases
• How do we find and visualize information?
• How did they get it?
• Beyond information to behaviors
• Expression data
• Dynamic behaviors
Biologists search for function at all levels
RNA secondary structures
Genomic information/mapping: RicBase
Center for biotechnology
folds, motifs, active sites
Jeong et al 2001. Nature 411,41
• Databases enable biologists to access large
amounts of research data.
• Commercial, publicly available
• Downloads, web accessible
• 100’s to 1000’s
Publicly available databases
Search methods: Navigation
Over 11 Data Categories
Sequence: DNA, RNA, protein
Structure: genomics, protein, carbohydrate
Networks: metabolic enzymes and pathways (signaling)
Organisms: human/vertebrate, human genes and diseases
Expression: mircroarry and gene expression, proteomics
• Nucleic Acid Research
– Dedicated to review of databases (548-’04, 386-’03)
• Most discussion focus on sequence databases.
Why so many?
Where do they get their data?
Many types of Protein Databases
• Sequence (103):
– publication, organism, sequence type, function
• Protein properties (80):
– Motifs, active sites, individual families, localization
• Structure (15)
– Resolution, experiment type, type of chain, space
Nucl. Ac. Res. 32, D3 2004
Where data comes from…Exp.
Why do we care?
– Sequencing data: Gene or protein identity
– Enzymatic assays: Biochemical properties
– Expression studies:Localization, putative
cellular function, regulation patterns
– Protein interactions: complexes and networks
Where data comes from…Comp.
Why do we care?
• Annotated sequences
– Align sequences (full/partial)
• Homology to other genes, Identity
– Pattern recognition, property predictions
• Biochemical properties
– Motifs, profiles, families
– Enzyme activity and structure prediction
– Homologous function
• From protein sequence/name
• To Metabolic pathway databases
• Metabolic behaviors
• Modeling glycolysis
Discussing a familiar pathway
Karp: Cell and Molecular Biology Textbook
Generic Protein Seq. Record
E.C. #: 220.127.116.11
Publication: Stachelek et al 1986
Function: main glucose phosphorylating enzyme
Links: other databases or tools
– families, folds…
• Swissprot example for enzyme.
From protein (enzyme) sequence
• Conserved domains, active sites, folding
• How do we find the related pathway?
What is a metabolic pathway?
Contains a series of reactions.
Reactions: Metabolites (substrate, product),
From enzyme to reaction
• Are there databases of metabolites?
• How can we get from enzyme to pathways?
Databases on Molecular Networks
Metabolic Pathways from NAR (5):
• EcoCyc: http://www.ecocyc.org/
– Began with E. coli Genes and Metabolism
– BioCyc includes additional genomes
– Encyclopedia of genes and genomes
• Others: WIT2, PathDB, UMBDD
• Metabolism pathway databases
– Search by name or sequence
• Compounds, Reactions, Pathway, Genes
– Associated information
• Formulas, Names, Synonyms, Links to other
KEGG: search for compound or enzyme by key word.
Glucose, 82 hits
1. cpd:C00029 UDPglucose; UDP-D-glucose; UDP-glucose; Uridine diophosphate
2. cpd:C00031 D-Glucose; Grape sugar; Dextrose
3. cpd:C00092 D-Glucose 6-phosphate; Glucose 6-phosphate; Robison ester
79. cpd:C11911 dTDP-D-desosamine; dTDP-3-dimethylamino-3,4,6-trideoxy-Dglucose
80. cpd:C11915 dTDP-3-methyl-4-oxo-2,6-dideoxy-L-glucose
81. cpd:C11922 dTDP-4-oxo-2,6-dideoxy-D-glucose
82. cpd:C11925 dTDP-3-amino-3,6-dideoxy-D-glucose
NAME D-Glucose grape sugar Dextrose
List of reactions involving compound
List of enzymes acting on compound
– list of categories
Hunt and peck
•a sugar phosphate
• Keyword search
– Results are a list of data
• Proteins, compounds, reactions, pathways
• Hunt and peck
Visualization: Data levels
EcoCyc draws multi-level views
Based on the pathway
E. coli K-12 Reaction: 18.104.22.168
Gene reaction schematic
• Protein ->enzyme->pathway database
• Pathway Database Content:
– Species specific (BioCyc), general (KEGG)
– Known and proposed enzymes, co-factors,
– Searches provide similar results (compounds,
reactions…) with different appearance.
Populating metabolic databases
Where do the pathways come from?
Databases: KEGG, WIT, BioCyc
Experimental data based on literature
Genomic data from other databases
Determination of metabolic pathway
Comparison to known pathways.
Karp et al 1999, TIBT
Linking Genome to Reactions: EC
EC number assignment
EC # provides information on catalyzed reaction and synonyms.
Search for EC# in EcoCyc database.
Assign reactions and possible pathway association.
How correct is the pathway assignment?
PathLogic/EcoCyc scoring new species pathway
X=# reactions for pathway 4
Y=# reactions found
Z=# found in other pathways 1
Karp etal., 1999. TIBT
Probably, possibly, not
• Probability score depends on X:Y ratio
• 4,2,1 has a 4:2 ratio which equals 0.5
0< X:Y< 0.5
Pathway Evidence Glyph
Homo sapien glycolysis pathway
Key to edge colors:
•green: reactions in which the enzyme is present
•black: reactions for which the enzyme is not identified in this
•orange: reactions in which the enzyme is unique to this pathway
•magenta: reactions that are spontaneous, or edges that do not
represent reactions at all (e.g. in polymerization pathways)
• Genomes used to create database of
• EC# link gene product to enzyme in
• Pathways in the database vary in degrees of
From Data to Dynamics
• Database information is static
– Just the facts
• What is the behavior of the pathway?
– Expression data