Automated Genome-scale Metabolic Network Generation and
Download
Report
Transcript Automated Genome-scale Metabolic Network Generation and
Genome-scale Metabolic
Reconstruction and Modeling
of Microbial Life
Aaron Best, Biology
Matthew DeJongh, Computer Science
Nathan Tintle, Mathematics
Hope College, Holland, Michigan
Timeline of Collaboration
Fall 2004/Spring 2005
Summer 2005
DeJongh on 1 year sabbatical
Project-based bioinformatics course (CS/Bio/Chem students)
Summer 2006
HHMI Faculty Development Grant to Best, DeJongh
Cultivate collaboration with Argonne National Lab
Student research support (NSF REU)
Fall 2005/Spring 2006
Best, DeJongh brainstorming
Sabbatical planning for DeJongh
HHMI Faculty Development Grant to Best, DeJongh, Tintle
Student research support (NSF REU, HHMI)
Fall 2006
Bioinformatics course runs a second time
Microbiology - Wet-lab projects to test bioinformatics hypotheses
And so it begins…
Introduce the Big Picture -- Aaron
Bioinformatics Tools to Implement
Reconstruction and Modeling -- Matt
Statistical Methods to Integrate Reconstructions
in data analyses -- Nathan
Incorporate into the curriculum
Reflections on Interdisciplinary experience
The Genomics Era
Why Microbial Life?
Diversity: majority of life on earth
Tractable:
You are here
~400 complete genomes
Genome size range: 1 million to 10 million bases
Explore, Enrich, Exploit
Why Metabolic Modeling?
Links genotype with phenotype understanding
Allows rational engineering of organisms
Amino acid production in Corynebacterium
Bioremediation of toxic wastes from environment
Alternative energy sources -- Bioenergy
Metabolic Modeling
Genome Sequence Annotation
Genome-scale Metabolic
Reconstruction
(Qualitative Framework)
Genome-scale Metabolic
Modeling
(Quantitative Analyses)
Covert et al. (2001) Trends Biochem. Sciences 25:179-186.
Research Method
Reverse-engineer existing metabolic models that have
been created by hand
Develop software for automating genome-scale metabolic
reconstructions
Verify that our software regenerates the existing
metabolic models accurately
Generate metabolic reconstructions for new organisms
Use metabolic reconstructions for quantitative analysis of
phenotypic data
Mapping Metabolic Pathways
Finding Paths through Networks
Linking Metabolic Subsystems
Reconstructing Networks for
Other Organisms
Capitalizi ng on Common Aspects of Metabolism: Reuse of Scenarios
Category
Amino Acids
Carbohydrates
Cell Wall
Lipids
Nitrogen
Metabolism
Nucleotide
Metabolism
One Carbon
Redox
Sulfur
Vitamins and
Cofactors
Totals
Subsys tems
23
15
3
3
Sce narios
34
39
8
9
E. coli
25
35
6
9
H. pylori
10
6
4
2
L. lactis
15
23
7
1
1
1
1
0
0
6
22
21
14
19
2
5
1
5
3
1
3
3
1
1
1
0
3
1
0
6
11
7
1
5
65
133
111
40
74
To this point…
Created process to automate generation of
metabolic networks from genome annotations
Currently extending tools to create metabolic
networks for new organisms
Metabolic networks as resources
Interpretation of gene expression data
Interpretation of other “omics” data (large-scale
data sets)
Gene Expression Data
Gene expression data from microarrays can give insight into
biological processes at work in specific organisms
Each location (probe) on the microarray corresponds to a particular
gene.
A typical microarray will produce data for tens of thousands of
genes under defined environmental conditions
Gene Expression Data
Typical analysis:
Examine all probes (locations) on the microarray for
over- and under-expressed (differentially expressed)
genes
Use statistical methods (e.g. Fisher’s exact test) to see
which biological processes are statistically overrepresented among the differentially expressed genes
This assumes we know which gene is involved in which
biological process
Problems
Gene Ontology (GO) terms for biological processes
Attempt to standardize terminology for gene annotations
Use of GO terms is not consistent
Dimensionality
Microarray data have few replicates
Many standard statistical methods fail because of small
sample size problems
Loss of Statistical Power
Statistical power (the ability to find genes that are
truly differentially expressed) is lost as a result of
these problems
One solution
First, impose a biological structure (e.g.,
metabolic reconstruction) on the microarray data
Then, look for over- and under-represented
groups of genes
Result, gain statistical power by grouping
Where we go from here…
Step 1. Validation of metabolic reconstruction
using gene expression data
Step 2. Implementation of currently available
statistical methods that capitalize on an imposed
data structure
Step 3. Refinement of statistical methods
Incorporation of Research into Curriculum
Address open scientific questions in systems biology using
bioinformatics and targeted experimentation, while training
undergraduates for careers in the sciences, mathematics,
engineering and technology fields.
Microbiology
Bioinformatics
Genome Annotation
Predicted Function
Created automated
pipeline that uses
the SEED
Standard genetics,
biochemistry and
molecular biology
Tool generation
and curation by
students
Experimentation
by students in
classroom lab
Genome-scale Metabolic
Network
Validation of Function
The Projects thus far:
Bioinformatics
Toward the automatic reconstruction of genome-scale metabolic networks
in the SEED. BMC Bioinformatics (2007), in review
4 undergraduate co-authors
Microbiology
1.
Examination of a predicted L-threonine kinase required for coenzyme B12 biosynthesis in Streptomyces coelicolor and Salmonella typhimurium.
2.
Validation of missing gene functions in the rhamnose metabolic pathway
of Bacillus, Streptomyces, and Salmonella.
3.
Predicted alternative N-formylglutamate deformylase in histidine
catabolism.
Collaboration with Dr. Andrei Osterman, The Burnham Institute, San Diego
Linking the Bioinformatics and Experimental
Pieces:
Annotation
Prediction/Validation
Network Generation
Identification of candidates
for missing genes
Preliminary hypotheses
(network analysis)
Validation of networks in
gene expression data
Ranking via tools (e.g.,
functional variants,
phylogenetic distribution,
which parts of pathways
present)
Leverage networks to
interpret gene expression data
Microbiology/Statistics
Students
Modeling
Bioinformatics students
Future directions…
Spring 2008
First offering of revamped statistics course
Research Program
Publications
Continued incorporation into curriculum
Funding Opportunities
DOE, NSF, NIH
Acknowledgements
HHMI Faculty Research Development Grants
NSF REU to Computer Science Department
Argonne National Laboratories
Fellowship for the Interpretation of Genomes
(FIG)
The Burnham Institute
Hope College Students:
Bioinformatics classes 2005-2006
Microbiology class Fall 2006