Automated Genome-scale Metabolic Network Generation and

Transcript Automated Genome-scale Metabolic Network Generation and

Genome-scale Metabolic
Reconstruction and Modeling
of Microbial Life
Aaron Best, Biology
Matthew DeJongh, Computer Science
Nathan Tintle, Mathematics
Hope College, Holland, Michigan
Timeline of Collaboration

Fall 2004/Spring 2005



Summer 2005





DeJongh on 1 year sabbatical
Project-based bioinformatics course (CS/Bio/Chem students)
Summer 2006



HHMI Faculty Development Grant to Best, DeJongh
Cultivate collaboration with Argonne National Lab
Student research support (NSF REU)
Fall 2005/Spring 2006


Best, DeJongh brainstorming
Sabbatical planning for DeJongh
HHMI Faculty Development Grant to Best, DeJongh, Tintle
Student research support (NSF REU, HHMI)
Fall 2006


Bioinformatics course runs a second time
Microbiology - Wet-lab projects to test bioinformatics hypotheses
And so it begins…





Introduce the Big Picture -- Aaron
Bioinformatics Tools to Implement
Reconstruction and Modeling -- Matt
Statistical Methods to Integrate Reconstructions
in data analyses -- Nathan
Incorporate into the curriculum
Reflections on Interdisciplinary experience
The Genomics Era

Why Microbial Life?
Diversity: majority of life on earth
 Tractable:





You are here
~400 complete genomes
Genome size range: 1 million to 10 million bases
Explore, Enrich, Exploit
Why Metabolic Modeling?
Links genotype with phenotype understanding
 Allows rational engineering of organisms




Amino acid production in Corynebacterium
Bioremediation of toxic wastes from environment
Alternative energy sources -- Bioenergy
Metabolic Modeling
Genome Sequence Annotation
Genome-scale Metabolic
Reconstruction
(Qualitative Framework)
Genome-scale Metabolic
Modeling
(Quantitative Analyses)
Covert et al. (2001) Trends Biochem. Sciences 25:179-186.
Research Method





Reverse-engineer existing metabolic models that have
been created by hand
Develop software for automating genome-scale metabolic
reconstructions
Verify that our software regenerates the existing
metabolic models accurately
Generate metabolic reconstructions for new organisms
Use metabolic reconstructions for quantitative analysis of
phenotypic data
Mapping Metabolic Pathways
Finding Paths through Networks
Linking Metabolic Subsystems
Reconstructing Networks for
Other Organisms
Capitalizi ng on Common Aspects of Metabolism: Reuse of Scenarios
Category
Amino Acids
Carbohydrates
Cell Wall
Lipids
Nitrogen
Metabolism
Nucleotide
Metabolism
One Carbon
Redox
Sulfur
Vitamins and
Cofactors
Totals
Subsys tems
23
15
3
3
Sce narios
34
39
8
9
E. coli
25
35
6
9
H. pylori
10
6
4
2
L. lactis
15
23
7
1
1
1
1
0
0
6
22
21
14
19
2
5
1
5
3
1
3
3
1
1
1
0
3
1
0
6
11
7
1
5
65
133
111
40
74
To this point…



Created process to automate generation of
metabolic networks from genome annotations
Currently extending tools to create metabolic
networks for new organisms
Metabolic networks as resources


Interpretation of gene expression data
Interpretation of other “omics” data (large-scale
data sets)
Gene Expression Data

Gene expression data from microarrays can give insight into
biological processes at work in specific organisms

Each location (probe) on the microarray corresponds to a particular
gene.

A typical microarray will produce data for tens of thousands of
genes under defined environmental conditions
Gene Expression Data
Typical analysis:


Examine all probes (locations) on the microarray for
over- and under-expressed (differentially expressed)
genes

Use statistical methods (e.g. Fisher’s exact test) to see
which biological processes are statistically overrepresented among the differentially expressed genes

This assumes we know which gene is involved in which
biological process
Problems

Gene Ontology (GO) terms for biological processes
Attempt to standardize terminology for gene annotations
 Use of GO terms is not consistent


Dimensionality
Microarray data have few replicates
 Many standard statistical methods fail because of small
sample size problems

Loss of Statistical Power

Statistical power (the ability to find genes that are
truly differentially expressed) is lost as a result of
these problems
One solution

First, impose a biological structure (e.g.,
metabolic reconstruction) on the microarray data

Then, look for over- and under-represented
groups of genes

Result, gain statistical power by grouping
Where we go from here…

Step 1. Validation of metabolic reconstruction
using gene expression data

Step 2. Implementation of currently available
statistical methods that capitalize on an imposed
data structure

Step 3. Refinement of statistical methods
Incorporation of Research into Curriculum
Address open scientific questions in systems biology using
bioinformatics and targeted experimentation, while training
undergraduates for careers in the sciences, mathematics,
engineering and technology fields.
Microbiology
Bioinformatics
Genome Annotation
Predicted Function
Created automated
pipeline that uses
the SEED
Standard genetics,
biochemistry and
molecular biology
Tool generation
and curation by
students
Experimentation
by students in
classroom lab
Genome-scale Metabolic
Network
Validation of Function
The Projects thus far:
Bioinformatics
Toward the automatic reconstruction of genome-scale metabolic networks
in the SEED. BMC Bioinformatics (2007), in review
4 undergraduate co-authors
Microbiology
1.
Examination of a predicted L-threonine kinase required for coenzyme B12 biosynthesis in Streptomyces coelicolor and Salmonella typhimurium.
2.
Validation of missing gene functions in the rhamnose metabolic pathway
of Bacillus, Streptomyces, and Salmonella.
3.
Predicted alternative N-formylglutamate deformylase in histidine
catabolism.
Collaboration with Dr. Andrei Osterman, The Burnham Institute, San Diego
Linking the Bioinformatics and Experimental
Pieces:
Annotation
Prediction/Validation
Network Generation
Identification of candidates
for missing genes
Preliminary hypotheses
(network analysis)
Validation of networks in
gene expression data
Ranking via tools (e.g.,
functional variants,
phylogenetic distribution,
which parts of pathways
present)
Leverage networks to
interpret gene expression data
Microbiology/Statistics
Students
Modeling
Bioinformatics students
Future directions…

Spring 2008


First offering of revamped statistics course
Research Program



Publications
Continued incorporation into curriculum
Funding Opportunities

DOE, NSF, NIH
Acknowledgements






HHMI Faculty Research Development Grants
NSF REU to Computer Science Department
Argonne National Laboratories
Fellowship for the Interpretation of Genomes
(FIG)
The Burnham Institute
Hope College Students:


Bioinformatics classes 2005-2006
Microbiology class Fall 2006

Automated Genome-scale Metabolic Network Generation and

Transcript Automated Genome-scale Metabolic Network Generation and

Directory