U54 GM093342: “Enzyme Function Initiative” (EFI)
Download
Report
Transcript U54 GM093342: “Enzyme Function Initiative” (EFI)
Enzyme Function Initiative Overview
John A. Gerlt, PI
Enzyme Function Initiative (EFI)
Advisory Committee Meeting
November 30, 2011
The number of protein sequences is “exploding” !
At least one-half have unknown/uncertain functions
Functional assignment: high-throughput computation ?
U54 GM093342: “Enzyme Function Initiative” (EFI)
bioinformatics
x-ray / computation
enzymology
biology
500
initial velocity
V max
400
Group 2
Group 1
Group 3
»
»
200
KM
100
0
0
0.1
Outliers
Group 4
sequence
Illinois
John Gerlt
John Cronan
Jonathan Sweedler
Texas A&M
Frank Raushel
structure
»
300
0.2
0.3
0.4
[substrate]
reaction
0.5
function
0.6
function
UCSF
Patricia Babbitt
Matthew Jacobson
Andrej Sali
Brian Shoichet
Boston University
Karen Allen
University of New Mexico
Debra Dunaway-Mariano
Albert Einstein
Steven Almo
University of Utah
C. Dale Poulter
University of Virginia
Wladek Minor
Vanderbllt University
Richard Armstrong
EFI: Deliverables
1. Develop a robust sequence/structure-based strategy
for facilitating discovery of in vitro enzymatic and in vivo
metabolic/physiological functions of unknown enzymes
discovered in genome projects.
2. Disseminate to the community the intellectual,
computational, and experimental tools, protocols,
materials, and guidelines for determining in vitro and in
vivo functions of unknown enzymes.
3. Collaborate with the community to facilitate
sequence/superfamily analyses as well as homology
modeling and in silico docking of ligand libraries to
unknown membes of other enzyme superfamilies.
EFI’s “funnel” for functional discovery
Scientific Cores
• Superfamily/Genome (Patsy Babbitt, UCSF): Sequences,
genome context, operons
• Protein (Steve Almo, AECOM): Gene cloning/synthesis,
protein purification, ligand binding
• Structure (Steve Almo, AECOM): Crystallization and
structure determination (50 new structures/year)
• Computation (Matt Jacobson, Andrej Sali, Brian Shoichet;
UCSF): Functional prediction by homology modeling and in
silico ligand docking
• Microbiology (John Cronan and Jonathan Sweedler, UIUC):
Genetics, transcriptomics, metabolomics
• Data/Dissemination (Heidi Imker, UIUC, Wladek Minor, UVa,
and Patsy Babbitt, UCSF): EFI website, EFI-DB/LabDB, SFLD
Bridging Projects: targets from diverse superfamilies
• Amidohydrolase (Frank Raushel, TAMU): large/diverse
superfamily, single substrate, single domain
• Enolase (John Gerlt, UIUC): small/”simple” (?) superfamily,
single substrate, catalytic and specificity domains
• Glutathione Transferase (Richard Armstrong, Vanderbilt):
large/diverse superfamily, bisubstrate (“always” glutathione),
small molecule and protein substrates
• Haloalkanoic Acid Dehalogenase (Karen Allen, BU, and
Debra Dunaway-Mariano, UNM): : large/diverse superfamily,
phosphomonoesterases, catalytic and specificity domains,
• Isoprenoid Synthase (C. Dale Poulter, UU): one (cyclases) or
two (isoprenyl transfer) substrates, limited number of
substrates, product determined by active site shape
EPI pipeline: develop assignment strategy
Target Selection
(Bioinformatics)
Ligand Binding
(Experimental)
Homology Modeling
(Computation)
Structure Determination
(Experimental)
Ligand Docking
(Computation)
Refinement/Rescoring
(Computation)
Library Synthesis
(experimental)
Activity Measuement
(Experimental)
EPI pipeline: if correct, functional assignment
Target Selection
(Bioinformatics)
Ligand Binding
(Experimental)
Homology Modeling
(Computation)
Structure Determination
(Experimental)
Ligand Docking
(Computation)
Refinement/Rescoring
(Computation)
Library Synthesis
(experimental)
Activity Measuement
(Experimental)
High activity
in vitro
Function
in vivo Testing
(experimental)
Phenotype
in vivo
Function
Annotation Transfer
(Bioinfomatics)
EPI pipeline: if incorrect, inform and improve strategy
Target Selection
(Bioinformatics)
Ligand Binding
(Experimental)
Homology Modeling
(Computation)
Structure Determination
(Experimental)
Ligand Docking
(Computation)
Refinement/Rescoring
(Computation)
Library Synthesis
(experimental)
No/low activity
Activity Measuement
(Experimental)
High activity
in vitro
Function
in vivo Testing
(experimental)
Phenotype
in vivo
Function
Annotation Transfer
(Bioinfomatics)
Criteria for Target Selection
Specificity Boundaries: As sequence diverges within a superfamily, the
substrate specificity (function) changes. An important test of substrate
specificity predictions by the Computation Core is whether changes in the
substrate specificity of homologous enzymes can be predicted.
Sequence/Function Diversity: Sequence similarity networks allow facile
identification of divergent families that have not been experimentally or
structurally characterized, and such divergent families likely will have new
substrate specificities. An important test of the Computation Core’s
algorithms is whether novel specificities can be predicted for targets
selected from divergent families.
Structures with No Functions (SNFs): The goal of the Protein Structure
Initiative (PSI-1 and PSI-2) was to explore sequence space in order to
define “fold space.” To meet that goal, structures were determined for many
functionally uncharacterized enzymes. A challenge is to “rescue” these
targets by testing Computation Core generated predictions of substrate
specificities.
EFI: enzymefunction.org
EFI: Data Access
SFLD (“dry data”)
EFI-DB (“wet” data)
LabDB (internal LIMS)
Collaborations
Collaborations
Challenges
1. Yr 1 budget was reduced 18% for the initial award
2. Yr 2 budget was reduced 3.9% for the first noncompeting
renewal
3. Yr 2-Yr 5 budgets are projected to be flat, but the out-years
may be subject to additional reductions
4. Given the available resources, funds for synthetic genes
are limited, restricting most of the targets to organisms for
which gDNAs are available (currently 559 from ATCC)
5. How does the EFI move forward: reduction in scope
and/or reallocation (number of targets, number of Bridging
Projects, Core activities) ?
Is the EFI doing something important ?
RFI from the White House Office of Science
and Technology Policy
http://www.whitehouse.gov/blog/2011/10/12/building-bioeconomy
RFI for National Bioeconomy Blueprint
(4) The speed of DNA sequencing has outstripped advances
in the ability to extract information from genomes given the
large number of genes of unknown function in genomes; as
many as 70% of genes in a genome have poorly or unknown
functions. All areas of scientific inquiry that utilize genome
information could benefit from advances in this area. What
new multidisciplinary funding efforts could revolutionize
predictions of protein function for genes?
What the EFI is doing is important !!
Perspective
· The EFI is a “once in a lifetime opportunity” to define the
“new enzymology” that allows the potential provided by
genome projects to be fully realized.
· The PIs cannot be “independent operators”, the hallmark of
P01/R01 projects. Instead, with the support of the EFI, the
PIs must be dedicated to collaborations that will ensure that
the EFI will be greater than the sum of its parts.
· The EFI is receiving significant support from NIGMS to
achieve its deliverables: let’s make it work and be a success!
U54 GM093342: “Enzyme Function Initiative” (EFI)
bioinformatics
x-ray / computation
enzymology
biology
500
initial velocity
V max
400
Group 2
Group 1
Group 3
»
»
200
KM
100
0
0
0.1
Outliers
Group 4
sequence
Illinois
John Gerlt
John Cronan
Jonathan Sweedler
Texas A&M
Frank Raushel
structure
»
300
0.2
0.3
0.4
[substrate]
reaction
0.5
function
0.6
function
UCSF
Patricia Babbitt
Matthew Jacobson
Andrej Sali
Brian Shoichet
Boston University
Karen Allen
University of New Mexico
Debra Dunaway-Mariano
Albert Einstein
Steven Almo
University of Utah
C. Dale Poulter
University of Virginia
Wladek Minor
Vanderbllt University
Richard Armstrong