Seminar - Jarek Meller

Download Report

Transcript Seminar - Jarek Meller

Knowledge-based protocols for protein structure prediction:
from protein threading to solvent accessibility prediction and
back to protein structure prediction by threading
Jarek Meller
Division of Biomedical Informatics,
Children’s Hospital Research Foundation
& Department of Biomedical Engineering, UC
JM - http://folding.chmcc.org
1
Outline of the talk

Protein structure and complexity of conformational
search: from de novo structure prediction to similarity
based methods
 Protein structure prediction by sequence-to-structure
matching (threading and fold recognition)
 Secondary structure and solvent accessibility prediction
 Improving fold recognition and de novo simulations with
accurate solvent accessibility prediction
 A story from our backyard: predicting interaction
between pVHL and RNA Pol II
JM - http://folding.chmcc.org
2
Polypeptide chains: backbone and side-chains
N-ter
C-ter
JM - http://folding.chmcc.org
3
Distinct chemical nature of amino acid side-chains
C-ter
PHE
N-ter
CYS
VAL
ARG
GLU
JM - http://folding.chmcc.org
4
Hydrogen bonds and secondary structures
b-strand
a-helix
JM - http://folding.chmcc.org
5
Tertiary structure and long range contacts: annexin
JM - http://folding.chmcc.org
6
Domains, interactions, complexes: VHL
b
HIF-1a
VHL
Elongin C
a
Elongin B
JM - http://folding.chmcc.org
7
Multiple alignment and PSSM
JM - http://folding.chmcc.org
8
Protein folding problem

The protein folding problem consists of
predicting three-dimensional structure of a
protein from its amino acid sequence
 Hierarchical organization of protein structures
helps to break the problem into secondary
structure, tertiary structure and protein-protein
interaction predictions
 Computational approaches for protein
structure prediction: similarity based and de
novo methods
JM - http://folding.chmcc.org
9
Ab initio (or de novo) folding simulations




Ab initio folding simulations consist of conformational
search with an empirical scoring function (“force field”)
to be maximized (minimized)
Computational bottleneck: exponential search space
and sampling problem (global optimization!)
Fundamental problem: inaccuracy of empirical force
fields and scoring functions (folding potentials)
Importance of mixed protocols, such as Rosetta by D.
Baker and colleagues (Monte Carlo fragment
assembly)
JM - http://folding.chmcc.org
10
Similarity based approaches to structure prediction:
from sequence alignment to fold recognition




High level of redundancy in biology: sequence similarity is often
sufficient to use the “guilt by association” rule: if similar sequence
then similar structure and function
Multiple alignments and family profiles can detect evolutionary
relatedness with much lower sequence similarity, hard to detect
with pairwise sequence alignments: Psi-BLAST by S. Altschul
et. al.
Many structures are already known (see PDB) and one can
match sequences directly with structures to enhance structure
recognition: fold recognition (not for new folds!)
For both, fold recognition and de novo simulation, prediction of
intermediate attributes such secondary structure or solvent
accessibility helps to achieve better sensitivity and specificity
JM - http://folding.chmcc.org
11
Why “fold recognition”?

Divergent (common ancestor) vs. convergent
(no ancestor) evolution

PDB: virtually all proteins with 30% seq.
identity have similar structures, however most
of the similar structures share only up to 10%
of seq. identity !
JM - http://folding.chmcc.org
12
Going beyond sequence similarity:
threading and fold recognition
When sequence similarity is not
detectable use a library of known
structures to match your query
with target structures.
One needs a scoring (“energy”) function
that measures compatibility
between sequences and structures.
JM - http://folding.chmcc.org
13
Scoring alternative conformations with
empirical (knowledge-based) folding potentials
Ideally, each misfolded structure should have
an energy higher than the native energy, i.e. :
E
Emisfolded - Enative > 0
misfolded
native
JM - http://folding.chmcc.org
14
Simple contact model for protein structure prediction
Each amino acid is represented by a point in 3D space and two amino acids are
said to be in contact if their distance is smaller than a cutoff distance, e.g. 7 [Ang].
JM - http://folding.chmcc.org
15
Sequence-to-structure matching with contact models

Generalized string matching problem: aligning a string
of amino acids against a string of “structural sites”
characterized by other residues in contact

Finding an optimal alignment with gaps using interresidue pairwise models:
E = S k< l e k l ,
is NP-hard because of the non-local character of scores
at a given structural site (identity of the interaction
partners may change depending on location of gaps in
the alignment)
R.H. Lathrop, Protein Eng. 7 (1994)
JM - http://folding.chmcc.org
18
Hydrophobic contact model and
sequence-to-structure alignment
HPHPP
Solutions to this yet another instance of the global optimization problem:
a) Heuristic (e.g. frozen environment approximation)
b) “Profile” or local scoring functions (folding potentials)
JM - http://folding.chmcc.org
19
Implementing threading protocols: LOOPP
LOOPP in CAFASP4
•About average for all fold recognition targets
(missing some easy targets, recognized by PsiBlast)
• Third best server in the category of difficult targets
• Best predictions among the servers for 3 difficult
targets
• Further improvements necessary to make the
predictions more robust
Joint work with Ron Elber
JM - http://folding.chmcc.org
20
Using sequence similarity, predicted secondary structures
and contact potentials: fold recognition protocols
In practice fold recognition methods are often mixtures
of sequence matching and threading, with compatibility
between a sequence and a structure measured by:
i) sequence alignment
ii) contact potentials
iii) predicted secondary structures (compared to the
secondary structure of a template)
JM - http://folding.chmcc.org
21
Predicting 1D protein profiles from sequences:
secondary structures and solvent accessibility
a) Multiple alignment and family profiles improve prediction of local
structural propensities
b) Use of advanced machine learning techniques, such as Neural
Networks or Support Vector Machines improves results as well
B. Rost and C. Sander were first to achieve more than 70%
accuracy in three state (H, E, C) classification, applying a) and b).
SABLE server
http://sable.cchmc.org
POLYVIEW server
http://polyview.cchmc.org
JM - http://folding.chmcc.org
22
Predicting 1D protein profiles from sequences:
secondary structures and solvent accessibility
PDB
Sable
PsiPred
Prof
Relative solvent accessibility prediction is typically cast as a classification problem
JM - http://folding.chmcc.org
23
Variability in surface exposure for structurally
equivalent residues does not support classification
JM - http://folding.chmcc.org
24
Neural Network-based regression for relative
solvent accessibility (RSA) prediction
Input
layer
Hidden layers
Output layer
[0,1]
SSE( z)   ( yi ( z) - oi ) 2
i
Context units (Elman)
JM - http://folding.chmcc.org
25
Accuracy of predictions depends on the level of
surface exposure: error measures and fine tuning
JM - http://folding.chmcc.org
26
Overall accuracy of different regression models
S163
S156
S135
S149
cc / MAE / RMSE
cc / MAE / RMSE
cc / MAE / RMSE
cc / MAE / RMSE
SABLE-a
0.65 / 15.6 / 20.8
0.64 / 15.9 / 21.0
0.66 / 15.3 / 20.5
0.64 / 16.0 / 21.0
SABLE-wa
0.66 / 15.5 / 21.2
0.64 / 15.7 / 21.3
0.67 / 15.3 / 20.9
0.65 / 15.8 / 21.4
LS
0.63 / 16.3 / 21.0
0.62 / 16.5 / 21.1
0.65 / 15.9 / 20.5
0.62 / 16.5 / 21.2
SVR1
0.62 / 15.9 / 21.3
0.61 / 16.1 / 21.4
0.64 / 15.6 / 20.8
0.62 / 16.2 / 21.5
SVR2
0.62 / 16.6 / 22.8
0.61 / 16.7 / 22.7
0.64 / 16.4 / 22.5
0.61 / 16.9 / 23.0
Non-linear models: Rafal Adamczak; Linear models: Michael Wagner;
Datasets and servers: Aleksey Porollo and Rafal Adamczak
JM - http://folding.chmcc.org
27
Regression vs. two-class classification
Method
S163
S156
S135
S149
ACCpro server 25%
70.4% / 0.41
69.8% / 0.41
70.6% / 0.42
71.1% / 0.43
SABLE-wa BS62
71.7% / 0.43
71.1% / 0.42
72.2% / 0.44
72.2% / 0.44
SABLE-wa binary
71.4% / 0.42
70.9% / 0.41
71.9% / 0.43
72.1% / 0.44
SABLE-2c 25%
76.7% / 0.53
75.8% / 0.52
77.1% / 0.54
76.4% / 0.53
SABLE-wa
77.3% / 0.54
76.5% / 0.52
77.3% / 0.54
76.6% / 0.53
JM - http://folding.chmcc.org
28
Predicting transmembrane domains
JM - http://folding.chmcc.org
29
Predicting transmembrane domains
JM - http://folding.chmcc.org
30
Now back to threading and folding simulations

Applications in filtering out incorrect models in
both de novo simulations and fold recognition
 Domain structure prediction, protein-protein
interactions
 Better sensitivity in finding correct matches in
threading: one story as an example
JM - http://folding.chmcc.org
31
Modeling the RNA Polymerase II Interaction with
the von Hippel-Lindau Protein: from experimental
clues to structure prediction and back to experiment.
Jarek Meller
Children’s Hospital Research Foundation
Joint work with M. Czyzyk-Krzeska and her group,
College of Medicine, University of Cincinnati
JM - http://folding.chmcc.org
32
A play of life (script and beyond):


Stage: protein society or proteosome
Rules of life: proteins are assembled and degraded:
nursery (ribosome) vs. police and gillotine (ubiquitination and
proteasome)

Social order: one look at the equilibrium in the system:
Army of scribers (middle class proteins)
Transcription
Translation
Law and oppression
Holy scriptures (DNA)
Temple priests (selected proteins)
“I think we need to adjust
the interpretation of the script … “
(regulation of replication and transcription)
JM - http://folding.chmcc.org
33
Hypoxia-induced stabilization of Hif-1a
Graphics from R.K. Bruick and S.L.McKnight, Science 295
JM - http://folding.chmcc.org
34
Experimental clues:

Observation: correlation between pVHL levels
and transcript elongation of the tyrosine
hydroxylase gene (M. Czyzyk-Krzeska)

Could pVHL influence the transcription by
interaction with elongation complex co-factors ?

Where to start? Experiment without a model is
usually not a very good idea. Could in silico
study and bioinformatics help?
JM - http://folding.chmcc.org
35
Searching for pVHL interaction targets:


Hif-1a ODD interacts with pVHL – other pVHL
targets should have domains structurally
resembling that of Hif1-a ODD
Use the Hif-1a ODD sequence as a query in
order to find other structures that are compatible
with it
Rpb1
Rpb6
Hif-1a ODD
pVHL Pro-OH
36
RNA Polymerase II in the act of transcription,
JM - http://folding.chmcc.org
Gnatt, Kornberg et. al., Science 292 (2001)
37
The C-terminal of Rpb1 and Rpb6 form a pocket on the surface of
RNA Polymerase II complex. C-ter of Rpb1 and Rpb6 represented by cartoons.
C-ter Rpb1
Rpb6
JM - http://folding.chmcc.org
38
Could the Hif ODD fragment resemble C-terminal
fragment of RNA Polymerase II ?

A motif similar to that of ODD found, but that could occur by chance.
We used sequence alignments and threading to measure similarity
between these fragments.

Sequences about 25% identical for a short fragment of about 50 aa
– not significant.

Predicted secondary structures similar.

Suggestive but still not significant similarity.

However, a weak match between the adjacent Rpb6 and the
consecutive part of the Hif-1a sequence was observed in threading
(3D-PSSM, Loopp).

Prediction: the ODD shares 3D structure with C-ter fragment of
Rpb1 and Rpb6.

Implication: VHL is likely to interact with Rpb1/Rpb6!
JM - http://folding.chmcc.org
39
Experimental results (MCK):




RNA Pol II peptides suggested by
computational analysis do bind to pVHL and
this binding is controlled by hydroxylation of the
critical PRO residue.
Co-immunoprecipitations of hyperphosphorylated RNA Pol II and pVHL observed:
interaction confirmed.
Ubiquitination of Rpb1 confirmed.
Biological meaning?
JM - http://folding.chmcc.org
40