Structural Genomics - KSU Faculty Member websites

Download Report

Transcript Structural Genomics - KSU Faculty Member websites

Structural Genomics and the
Protein Folding Problem
George N. Phillips, Jr.
University of Wisconsin-Madison
February 15, 2006
From DNA to biological function
Modeling
& Inference
High-throughput
DNA Sequencing
Gene
Model
Functional
Assignments
Structure Determination
& Experimental Analysis
Basic Understanding/
Applications
(e.g. therapeutics)
Developing a gene model
Glimmer (Gene Locator and Interpolated Markov ModelER)
GlimmerHMM for eukaryotic genomes (more advanced)
Genome sequencing
Genome assembly
Regulatory elements
Identification of ORF’s
All but the simplest genomes are works in progress.
It is estimated that 80% of gene models have errors at present!
Comparative genomics should help the process, as will sequencing
of expressed sequence tags and other genomics projects
Efficient implementation of a generalized pair hidden Markov model for comparative gene finding.
W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.
The “sequence-space” of proteins
PSI-BLAST
HMM
Pfam
Many others…
Universe of all protein sequences
HYSIELNASLLERGV…
HLNIEDNPSCNAMGV…
PLNIELNASLNEPGV…
WERIELNASLNER--…
HQRIEL--SLMMRG-…
HLNIEDNPSCNAMGV…
PLNIELNASLNEPGV…
WERIELNASLNER--…
HQRIEL--SLMMRG-…
HLNIEDNPSCNAMGV…
PLNIELNASLNEPGV…
WERIELNASLNER--…
HQRIEL--SLMMRG-…
HYSIELNASLLERGV…
HLNIEDNPSCNAMGV…
PLNIELNASLNEPGV…
WERIELNASLNER--…
HQRIELK-SLMMRG-…
HYSIELNASLLERGV…
HLNIEDNPSCNAMGV…
PLNIELNASLNEPGV…
WERIELNASLNER--…
HQRIEL--SLMMRG-…
HYSIELNASLLERGV…
HLNIEDNPSCNAMGV…
WERIELNASLNER--…
HQRIEL--SLMMRG-…
PFAM “domains”
Alex Bateman, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna,
Mhairi Marshall, Simon Moxon, Erik L. L. Sonnhammer, David J. Studholme, Corin Yeats and Sean R. Eddym
Nucleic Acids Research(2004) Database Issue 32:D138-D141
Flow of information from DNA
to functional understanding
Modeling
& Inference
High-throughput
DNA Sequencing
Gene
Model
Functional
Assignments
Structure Determination
& Experimental Analysis
Basic Understanding/
Applications
(e.g. therapeutics)
X-ray Laboratory
Crystallography reveals locations
of electron ‘clouds’ of the atoms:
And the polypeptide chain can
be traced through space
The “fold-space” of proteins
Scop
Cath
Universe of all protein structures
Murzin et al. http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html
Glimpes of the “fold space” of proteins
Hou, Sims, Zhang, and Kim, PNAS 100:2386 (2003)
Flow of information from DNA
to functional understanding
Modeling
& Inference
High-throughput
DNA Sequencing
Gene
Model
Functional
Assignments
Structure Determination
& Experimental Analysis
Basic Understanding/
Applications
(e.g. therapeutics)
Connections between
sequence and structure
Universe of sequences
Universe of structures
Connections between
sequence and structure
?
Universe of sequences
Universe of structures
At what level of homology can
one trust a structural inference?
Redfern, Orengo et al., J. Chromatography B 815:97 (2005)
What is structural genomics?
• Experimental determination of key structures
(target selection is a key part of the idea)
• Modeling of family members
• Inferring function (note “infer”)
• Making direct use of the new structures
Protein Sequences and Folds
• ~100,000 families of proteins that cannot be
reliably modeled at present (modeling
families: <30% identity over large fraction to a
known structure)
• ~50% of all domain families can be assigned
to a structure under CATH
Protein Structure Initiative (PSI)
Mission Statement
“To make the three-dimensional atomic
level structures of most proteins easily
available from knowledge of their
corresponding DNA sequences.”
Genseration of new structures
Chandonia and Brenner, Science 311:347 2006.
Center for Eukaryotic Structural Genomics
Exclusively eukaryotic targets
• 60% fold-space targets (emphasis on eukaryote-only
families
• 20% disease relevant
• 20% outreach – targets from the community
Overall goals are to reduce the costs of determining
structures of proteins from eukaryotes by refining all steps in
the pipeline
Supported by National Institutes of Health
John Markley- PI, George Phillips/Brian Fox Co-PI’s
University of Wisconsin’s Center for
Eukaryotic Structural Genomics
(~75 total, 3/4 unique)
How does one clone, express,
purify, and solve structures not
previously studied?
An industry-style pipeline
Pipeline details: cell-based and cell-free
protein production for X-ray and NMR
Construct design
PCR cloning -> DNA
Flexi®Vector plasmids
1-5 mg scale
Protein from E. coli cells
Fluidigm chip
crystallization
screening (+)
10-100 mg scale:
13C,15N for NMR,
Se-Met for X-ray
Protein from cell-free
Screening:
Yield
MS
Functional assays
Protein from E. coli cells
NMR 15N-1H HSQC
or 1H screening (+)
2-10 mg
scale:
13C,15N for
NMR, Se-Met
for X-ray
Protein from cell-free
Note: project involves sequencing, which aids gene modeling!
Sesame—integrated LIMS in use at CESG
Open access to the public—structures, protocols, reagents, progress…
http://www.uwstructuralgenomics.org
Zolnai et al., J. Struct. Func. Genomics 4:11 (2003)
At1g18200
Mis-annotated prior to our work, but
structure led to discovery of function.
Pfam B:
13 and 136 matches to #’s 7198 and 11634
>>Alignment of GalP_UDP_transf vs 1Z84:A|PDBID|CHAIN|SEQUENCE/15-196
1Z84:A|PDB
*->kkfsplDhvhrrynpLtlvwilVsphrakRPikqsqsLidlkkeLwq
++ ++ + +r p t +w+ sp+rakRP
15
GDSVENQSPELRKDPVTNRWVIFSPARAKRP---------------- 45
1Z84:A|PDB
gavetpkvptdplhdp.dcysakLcpg........atratgevNPdyest
+ ++k p+ p
p++c+
c g++++ ++ r++ ++ P +
46 -TDFKSKSPQNPNPKPsSCP---FCIGreqecapeLFRVP-DHDPNWKLR 90
1Z84:A|PDB
yvLkspkkftndFyalseDnpyikvsvSNeaIaknplfqlksvrGhelci
+
+n ++als+
+++
+++++ G +++
91 VI-------ENLYPALSRN---LETQ------------STQPETG--TSR 116
1Z84:A|PDB
VI...CF......SKPehDptlpalakeeirevvdaWqlcteelGyegre
+I + F++
+S P h+ l +
i+ ++ a + +
117 TIvgfGFhdvvieS-PVHSIQLSDIDPVGIGDILIAYKKRINQIA----- 160
1Z84:A|PDB
nhpayqnvqIFEmNkGaemGcsnpHPYaYFnEHGQvwatsfiP<-*
h + + q+F N Ga G s H
H Q a++ +P
161 QHDSINYIQVFK-NQGASAGASMSHS------HSQMMALPVVP
196
http://www.sanger.ac.uk/Software/Pfam/
Blind prediction of structure:
CASP and At5g18200
Flow of information from DNA
to functional understanding
Modeling
& Inference
High-throughput
DNA Sequencing
Gene
Model
Functional
Assignments
Structure Determination
& Experimental Analysis
Basic Understanding/
Applications
(e.g. therapeutics)
Function space of proteins
KEGG = Kyoto Encyclopedia of Genes and Genomes
The Gene Ontology project (GO)
Metabolism
Cellular
Processes
Enzymes
Signal
Processing
Don’t forget protein-protein interactions exist also!
At2g17340
Related to a human protein
associated with Hallervorden-Spatz
syndrome, a neurological disorder?
Parallel Enzyme Activity Testing
(Collaboration with University of Toronto)
81 protein samples sent to Toronto:
8 solved CESG structures, 73 randomly chosen
Generalized assays for:
phosphatase, esterase, phospodiesterase, protease, amino
acid dehydrogenase, alcohol dehydrogenase, organic acid
dehydrogenase, amino acid oxidase, alcohol oxidase, organic
acid oxidase, beta-lactamase, beta-galactosidase,
arylsulfatase, lipase.
Results:
- Solid hits: 3 phosphatases, 5 esterases
- Weaker hits: 9 more esterases, 6 phosphodiesterases
- No hits: all others
A. Yakuknin et al. Current Opinion in Chemical Biology, 8:42 (2004)
Target: At2g17340/JR5670
Initial Assay: Wide-spectrum
Activity Assay
Substrate
Phosphodiesterase
bis-pNPP
0.016
Dehydrogenase
Amino Acids
0.032
Dehydrogenase
Acids
0.016
Dehydrogenase
Alcohols
0.022
Dehydrogenase
Aldehyde
-0.045
Dehydrogenase
Sugars
0.003
Thioesterase
palmitoyl-CoA
0.108
Oxidase
NAD(P)H Ox
-0.115
Protease
Protease Mix
0.118
Phosphatase
pNPP
• Absorbance >0.25 is a tentative signal, >0.5 is a strong signal.
JR5670
>1
Flow of information from DNA
to functional understanding
Modeling
& Inference
High-throughput
DNA Sequencing
Gene
Model
Functional
Assignments
Structure Determination
& Experimental Analysis
Basic Understanding/
Applications
(e.g. therapeutics)
At2g17340
Enzyme of unknown specificity.
A functional annotation lesson
Functional Annotation by
Inference
From raw DNA sequences, one looks for genomic features
such as promoters, alternative splicing of mRNAs,
retrotransposons, pseudogenes, tandem duplications,
synteny, and homology.
It Is homology, both from sequence and from structure, that
allow functional inferences to be made.
Prosite, Dali, VAST, FFAS03
Some tool integrate knowledge from many sources into one
place, acting a meta-servers of clues.
Connections between
structure and function
Universe of functions
Universe of structures
Connections between structure
and function
Convergent evolution
Universe of functions
Universe of structures
Connections between structure
and function
Divergent evolution
Universe of functions
Universe of structures
At1g18200
Misleading annotation prior to our
work, but structure led to
discovery of function.
Flow of information from DNA
to functional understanding
Modeling
& Inference
High-throughput
DNA Sequencing
Gene
Model
Functional
Assignments
Structure Determination
& Experimental Analysis
Basic Understanding/
Applications
(e.g. therapeutics)
Summary
Structural genomics efforts are
gaining momentum and helping to
assign new functions to orfs and
to fill in the space of all possible
protein folds.
The Center for Eukaryotic Structural Genomics
(supported by NIH GM64598 and GM074901)
Administration
Cloning/sequencing pipeline
Expression pipeline
E. coli cell growth pipeline
Cell-Free System
Protein purification pipeline
Mass spectrometry
NMR spectroscopy
Crystallization / crystallography
Bioinformatics
Computational support
Sesame
Madison (Primm, Troestler, Markley, Phillips, Fox)
Madison (Wrobel, Fox)
Madison (Frederick, Fox, Riters)
Madison (Sreenath, Burns, Seder, Fox)
Madison (Vinarov, Markley, Newman)
Madison (Vojtik, Phillips, Fox, Ellefson, Jeon)
Madison (Aceti, Sabat, Sussman)
Madison NMRFAM (Song, Tyler, Cornilescu, Markley)
Milwaukee MCW (Peterson, Volkman, Lytle)
Madison (Bingman, Phillips, Bitto, Han, Bae, Meske)
Argonne (Advanced Photon Source)
Madison (Bingman, Sun, Phillips, Wesenberg)
Indianapolis (Dunker)
Milwaukee MCW (Twigger, de la Cruz)
Madison (Bingman, Ramirez, Phillips)
Madison (Zolnai, Markley, Lee)