Protein Structure
Download
Report
Transcript Protein Structure
Protein Structure Prediction
.
Protein Structure
Amino-acid
chains can fold to form 3-dimensional
structures
Proteins are sequences
that have (more or less)
stable 3-dimensional
configuration
Why Structure is Important?
The structure a protein takes is crucial for its function
Forms “pockets” that can recognize an enzyme
substrate
Situates side chain of
specific groups to co-locate
to form areas with desired
chemical/electrical properties
Creates firm structures such as
collagen, keratins, fibroins
Determining Structure
X-Ray
and NMR methods allow to determine the
structure of proteins and protein complexes
These methods are expensive and difficult
Could take several work months to process one
proteins
A
centralized database (PDB) contains all solved
protein structures
XYZ coordinate of atoms within specified
precision
~19,000 solved structures
Growth of the Protein Data Bank
Structure is Sequence Dependent
Experiments
show that for many proteins, the 3dimensional structure is a function of the sequence
Force the protein to loose its structure, by
introducing agents that change the environment
After sequences put back in water, original
conformation/activity is restored
However,
for complex proteins, there are cellular
processes that “help” in folding
Amino Acids
What Forces Hold the Structure?
Structure
is supported by several types of chemical
bonds/forces
Hydrogen Bonds
What Forces Hold the Structure?
Charge-charge
interactions
Positive charged groups prefer to be situated
against negatively charged groups
What Forces Hold the Structure?
Disulfide
bonds
S-S bonds between
cysteine residues
These form during
folding
What Forces Hold the Structure?
Hydrophobic
effect
Levels of structure
Secondary Structure
-helix
-strands
Hydrogen Bonds in -Helixes
-Strands form Sheets
parallel
Anti-parallel
These sheets hold together by hydrogen bonds across strands
Angular Coordinates
Secondary
residues
structures force specific angles between
Ramachandran Plot
We
can related angles to types of structures
Labeling Secondary Structure
Using
both hydrogen bond patterns and angles, we
can label secondary structure tags from XYZ
coordinate of amino-acids
These do not lead to absolute definition of
secondary structure
Prediction of Secondary Structure
Input:
amino-acid sequence
Output:
Annotation sequence of three classes:
alpha
beta
other (sometimes called coil/turn)
Measure of success:
Percentage of residues that were correctly labeled
Protein Folds: sequential, spatial and
topological arrangement of
secondary structures
The Globin fold
Approaches for structure prediction
Homology modeling
(25-30% identity as a predictor)
Fold recognition
Remote homology
Ab initio Prediction
Heavy computations
Newly Determined StructuresFraction of New Folds
Fraction of new folds
(PDB new entries in 1998)
Koppensteiner et al., 2000,
JMB 296:1139-1152.
A Finite Number of Protein Folds
Aim:
recognize fold that “matches” a given sequence
Approaches:
PSI-Blast, Profile HMMs, etc.
Threading
Threading: Essential components
• structural template
4E
• neighbor definition
C3
• energy function
C2
ACCECADAAC
-3-1-4-4-1-4-3-3=-23
A1
E
E
aib j
positionsi, j
10
5
C
9
6 A
8
7 D
Eab
A
C
D
E
.
A C
-3 -1
-1 -4
0 1
0 2
. .
C
A
A
D
0
1
5
6
.
E …..
0 ..
2 ..
6 ..
7 ..
.
Find best fold for a protein sequence:
Fold recognition (threading)
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
Potential fold
20.5
GenTHREADER
(Jones , 1999, JMB 287:797-815)
For each template provide MSA
align the query sequence with the MSA
assess the alignment by sequence alignment
score
assess the alignment by pairwise potentials
assess the alignment by solvation function
record lengths of: alignment, query, template
Essentials of GenTHREADER
Ab-initio Structure Recognition
Goal:
Predict structure from “first principles”
Benefits:
Works for novel folds
Shows that we understand the process
Approaches to Ab-initio Prediction
Molecular Dynamics
Simulates the forces that governs the protein within
water
Since proteins natural fold, this would lead to
solved structure
Problems:
Thousands of atoms
Huge number of time steps to reach folded protein
Intractable problem
Approaches to Ab-initio Prediction
Minimal Energy
Assumption: folded form is the minimal energy
conformation of the protein
Decomposition:
Define energy function
Search for 3-D conformation that minimize energy
Energy Function
Account
for the forces that apply on the molecule
Van der wals forces
Covalent bonds
Hydrogen bonds
Charges
Hydrophobic effects
Issues:
Estimating parameters
How do we compute it --- O( (# atoms)^2 )
Simplified Energy Functions
Different levels of granularity
Residue-Residue energy function (Bead model)
Partial
model
Backbone as a bid
Side-chain as a rigid body that can move wrt to
backbone
Many
other variants
Search Strategy
High
dimensional search problem
How do we represent partial solutions?
Position
of each atom (too detailed!)
Position of each reside (too coarse!)
Intermediate solutions (e.g., backbone and side
chain)
Search Strategy
Representation tradeoffs
X,Y,Z
coordinates
Easy to compute distances between residues
Might represent infeasible solutions
Angles
between successive residues
Easy to ensure a “legal” protein
Harder to compute distances
Search Strategy
Typical approach:
Secondary structure prediction
Attempts at different conformation keeping
secondary structure fixed
Finer moves relaxing secondary structure
Use
Greedy search
Simulated annealing
…
Rosetta Method
Idea:
“Structural” signatures are reoccurring within
protein structures
Use these as cues during structure search
Local structure motifs
I-sites Library = a catalog of local sequence-structure correlations
diverging type-2
turn
Frayed helix
Serine hairpin
Proline helix C-cap
Type-I hairpin
alpha-alpha corner
glycine helix N-cap
Example: Non-polar Alpha-helix
Example: Non-polar beta-strand
Example: Gly alpha-C-cap Type 1
Construction of I-sites library
Construct
profiles (PSI-BLAST like) for each solved
structure
Collect each possible segments of fixed length
(len = 3, 9, 15)
Perform k-means clustering of segments
Check each cluster for a “coherent” structure (in
terms of dihedral angles
Prune incoherent structures
Iteratively refine remaining clusters by removing
structurally different segments, redefining cluster
membership, etc.
All proteins can be constructed from
fragments
Recent experiment:
For representative proteins, backbones were
assembled from a library of 1000 different 5residue fragments.
Rosetta: a folding simulation program
Fragment insertion Monte Carlo
backbone torsion angles
fragments
accept or
reject
Choose a fragment
change backbone
angles
Energy
function
evaluate
Convert to 3D
Rosetta’s energy function
Sequence dependent features
Residue-residue contact energies are derived from the database
Rosetta’s energy function
Sequence-independent features
Current structure
vector representation
Probabilities from the database
The energy score for a contact between secondary structures
is summed using database statistics.
Rosetta prediction results
61% “topologically correct”
60% “locally correct”
73% secondary structure (Q3) correct
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
RMSD
L=windowsize
Tertiary structure %correct is the fraction of
the sequence that is in a 30-residue window
with RMSD < 6.0Å
6.0Å
L=30
L=20
L=8
Sequence
MDA
Local structure
Teriary structure
Evaluation of partially correct predictions
Local structure %correct is the fraction of the
sequence that has mda < 90°.
90°
Sequence
mda = maximum
deviation in
backbone angles
over an 8 residue
window.
T0116 262-322 (61 residues)
prediction
true structure
Topologically correct (rmsd=5.9Å) but helix is mispredicted as loop.
T0121 126-199 (66 residues)
prediction
true structure
Topologically correct (rmsd=5.9Å) but loop is mispredicted as helix.
T0122 57-153 (97 residues)
prediction
true structure
...contains a 53 residue stretch with max deviation = 96°
prediction
T0112 153-213
true structure
Low rmsd (5.6Å) and all angles correct ( mda = 84°),
but topologically wrong!!
(this is rare)