From Databases to Dynamics

Download Report

Transcript From Databases to Dynamics

From Databases to Dynamics
Dr. Raquell M Holmes
Center for Computational Science
Boston University
Computational Biology
• Computational Biology  Bioinformatics
– More than sequences, database searches,
statistics or image analysis.
• A part of Computational Science
– Using mathematical modeling, simulation and
– Complementing theory and experiment
• Databases:
– Publicly available
• Why so many?
– Metabolic Pathway databases
• How do we find and visualize information?
• How did they get it?
• Beyond information to behaviors
• Expression data
• Dynamic behaviors
Biologists search for function at all levels
Chromosomal structures
Genomic organization
RNA secondary structures
Genomic information/mapping: RicBase
Center for biotechnology
Protein properties:
folds, motifs, active sites
Protein interactions:
network control
Jeong et al 2001. Nature 411,41
• Databases enable biologists to access large
amounts of research data.
• Commercial, publicly available
• Downloads, web accessible
• 100’s to 1000’s
Publicly available databases
Web Interfaces
Search methods: Navigation
Analysis tools
Database Content
Over 11 Data Categories
Sequence: DNA, RNA, protein
Structure: genomics, protein, carbohydrate
Networks: metabolic enzymes and pathways (signaling)
Organisms: human/vertebrate, human genes and diseases
Expression: mircroarry and gene expression, proteomics
• Nucleic Acid Research
– Dedicated to review of databases (548-’04, 386-’03)
• Most discussion focus on sequence databases.
Protein databases
Why so many?
Where do they get their data?
Many types of Protein Databases
• Sequence (103):
– publication, organism, sequence type, function
• Protein properties (80):
– Motifs, active sites, individual families, localization
• Structure (15)
– Resolution, experiment type, type of chain, space
Nucl. Ac. Res. 32, D3 2004
Where data comes from…Exp.
Why do we care?
• Experiments
– Sequencing data: Gene or protein identity
– Enzymatic assays: Biochemical properties
– Expression studies:Localization, putative
cellular function, regulation patterns
– Protein interactions: complexes and networks
Where data comes from…Comp.
Why do we care?
• Annotated sequences
– Align sequences (full/partial)
• Homology to other genes, Identity
– Pattern recognition, property predictions
• Biochemical properties
– Motifs, profiles, families
– Enzyme activity and structure prediction
• Pathways
– Homologous function
Today’s journey
• From protein sequence/name
• To Metabolic pathway databases
• Metabolic behaviors
• Modeling glycolysis
Discussing a familiar pathway
Karp: Cell and Molecular Biology Textbook
Generic Protein Seq. Record
Sequence: hexokinase
E.C. #:
Publication: Stachelek et al 1986
Organism: Yeast
Function: main glucose phosphorylating enzyme
Links: other databases or tools
Pre-determined Properties:
– families, folds…
• Swissprot example for enzyme.
From protein (enzyme) sequence
• Conserved domains, active sites, folding
• How do we find the related pathway?
What is a metabolic pathway?
Contains a series of reactions.
Reactions: Metabolites (substrate, product),
enzyme, co-factors.
From enzyme to reaction
• Are there databases of metabolites?
• How can we get from enzyme to pathways?
Databases on Molecular Networks
Metabolic Pathways from NAR (5):
• EcoCyc:
– Began with E. coli Genes and Metabolism
– BioCyc includes additional genomes
– Encyclopedia of genes and genomes
• Others: WIT2, PathDB, UMBDD
Starting out…
• Metabolism pathway databases
– Search by name or sequence
• Compounds, Reactions, Pathway, Genes
– Associated information
• Formulas, Names, Synonyms, Links to other
Example results
KEGG: search for compound or enzyme by key word.
Glucose, 82 hits
Grape sugar:
1. cpd:C00029 UDPglucose; UDP-D-glucose; UDP-glucose; Uridine diophosphate
2. cpd:C00031 D-Glucose; Grape sugar; Dextrose
3. cpd:C00092 D-Glucose 6-phosphate; Glucose 6-phosphate; Robison ester
79. cpd:C11911 dTDP-D-desosamine; dTDP-3-dimethylamino-3,4,6-trideoxy-Dglucose
80. cpd:C11915 dTDP-3-methyl-4-oxo-2,6-dideoxy-L-glucose
81. cpd:C11922 dTDP-4-oxo-2,6-dideoxy-D-glucose
82. cpd:C11925 dTDP-3-amino-3,6-dideoxy-D-glucose
NAME D-Glucose grape sugar Dextrose
List of reactions involving compound
List of enzymes acting on compound
Navigation bar
– list of categories
• Compounds
Hunt and peck
•a sugar
•a sugar phosphate
•a sugar-1-phosphate
• Keyword search
– Results are a list of data
• Proteins, compounds, reactions, pathways
• Hunt and peck
Visualizing Data
Visualization: Data levels
EcoCyc draws multi-level views
Based on the pathway
Metabolite perspective
Additional detail
E. coli K-12 Reaction:
Gene reaction schematic
Different species
Summary I
• Protein ->enzyme->pathway database
• Pathway Database Content:
– Species specific (BioCyc), general (KEGG)
– Known and proposed enzymes, co-factors,
– Searches provide similar results (compounds,
reactions…) with different appearance.
Populating metabolic databases
Where do the pathways come from?
Databases: KEGG, WIT, BioCyc
Experimental data based on literature
Genomic data from other databases
Determination of metabolic pathway
Comparison to known pathways.
Karp et al 1999, TIBT
Linking Genome to Reactions: EC
Gene structure
Protein sequence
EC number assignment
EC # provides information on catalyzed reaction and synonyms.
Search for EC# in EcoCyc database.
Assign reactions and possible pathway association.
How correct is the pathway assignment?
PathLogic/EcoCyc scoring new species pathway
X=# reactions for pathway 4
Y=# reactions found
Z=# found in other pathways 1
Karp etal., 1999. TIBT
Probably, possibly, not
• Probability score depends on X:Y ratio
• 4,2,1 has a 4:2 ratio which equals 0.5
• Probable
• Possible
• Not
X:Y>= 0.5
0< X:Y< 0.5
Pathway Evidence Glyph
Homo sapien glycolysis pathway
E.Coli pathway
Key to edge colors:
•green: reactions in which the enzyme is present
•black: reactions for which the enzyme is not identified in this
•orange: reactions in which the enzyme is unique to this pathway
•magenta: reactions that are spontaneous, or edges that do not
represent reactions at all (e.g. in polymerization pathways)
Summary II
• Genomes used to create database of
• EC# link gene product to enzyme in
• Pathways in the database vary in degrees of
From Data to Dynamics
• Database information is static
– Just the facts
• What is the behavior of the pathway?
– Expression data
– Dynamics