ppt - Avraham Samson`s Lab

Download Report

Transcript ppt - Avraham Samson`s Lab

Secondary Structure &
Solvent accessible surface
Calculation
Lecture 6
Structural Bioinformatics
Dr. Avraham Samson
81-871
DSSP
Dictionary of protein secondary structure: Pattern
recognition of hydrogen-bonded and geometrical features
Wolfgang Kabsch, Christian Sander
Biopolymers, Volume 22, Issue 12, pages 2577–2637, December
1983
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
2
Amino
Acids
Secondary
Structure
Solvent
Accessibility
Hydrogen bond donors and acceptors
there are also side-chain
acceptors and donors
the carbonyl oxygen:
main-chain hydrogen bond
acceptor
O
H
O
N
C
H2C
H
CH
CH
N
C
CH3
H
O
the amide nitrogen:
main-chain hydrogen bond donor
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
5
Hydrogen bonded turns
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
7
Hydrogen bonded bridges
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
8
Bend
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
9
Chirality
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
10
Dihedral angle calculation
The book "Crystal Structure Analysis for Chemists and Biologists" by Jenny P.
Glusker gives four different ways of calculating the dihedral angle, p 465-469.
Probably the most direct is:
Consider the four atom chain 1 - 2 - 3 - 4
The distances between any two atoms is denoted d(ij).
For example d13 is the distance between atoms 1 and 3. Since you already have
cartesian coordinates, this is easily calculated as SQRT( SQ(x3-x1) + SQ(y3-y1) +
SQ(z3-z1) )
The dihedral angle is defined as follows: cos(angle) = P/SQRT(Q)
where P = SQ(d12) * ( SQ(d23)+SQ(d34)-SQ(d24)) + SQ(d23) * (SQ(d23)+SQ(d34)+SQ(d24)) + SQ(d13) * ( SQ(d23)-SQ(d34)+SQ(d24)) - 2 *
SQ(d23) * SQ(d14)
and Q = (d12 + d23 + d13) * ( d12 + d23 - d13) * (d12 - d23 + d13) * (-d12 + d23 +
d13 ) * (d23 + d34 + d24) * ( d23 + d34 - d24 ) * (d23 - d34 + d24) * (-d23 + d34 +
d24 )
A test case, d12 = 2.38, d23 = 1.48, d34 = 1.48, d13 = 3.56, d14 = 3.61, d24 = 2.40
P = 20.83, SQRT(Q) = 21.40, angle = 13.3 degrees
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
11
Helices
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
12
Ladders and sheets
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
13
More details
•
•
•
•
SS-bonds
Chain breaks
Handedness (chirality)
Pymol and molmol use DSSP to assign
secondary structure
2012
Avraham Samson - Faculty of
Medicine - Bar Ilan University
14
Because of the repetitive nature of secondary structures, and
particularly beta-sheets, proteins can form fibrillar structures and
aggregates
amyloid-like fibril(left) of peptide
GNNQNNY from the yeast
prion protein Sup35, and its
atomic structure (right)
fibril axis
in the case
of this fibril
the side
chains
also
hydrogen
bond to
each other
amide stacks
Nelson et al (Eisenberg lab), Nature 435:773 (2005).
for background on “polar zippers”: Perutz et al. PNAS 91:5355 (1991)
These types of fibrils important in Huntington’s disease etc
Fibrillar helical structures: the leucine zipper
Leu
Leu
GCN4 “leucine zipper” (green) bound as
a dimer (two copies of the polypeptide)
to target DNA
The GCN4 dimer is formed through
hydrophobic interactions between
leucines (red) in the two polypeptide
chains
DSSP Code:
H = alpha helix
G = 3-helix (3/10 helix)
I = 5 helix (pi helix)
B = residue in isolated beta-bridge
E = extended strand, participates in beta ladder
T = hydrogen bonded turn
S = bend
Blank = loop
18
• Question: How would you assign
structural neighbors (<5 A) from a PDB
file?
• Answer: Parse PDB file for atoms with
distance less than 5 Angstroms!
19
Contact maps of protein structures
-both axes are the sequence of the protein
map of Ca-Ca distances < 6 Å
near diagonal: local
contacts in the
sequence
rainbow ribbon diagram
blue to red: N to C
off-diagonal: long-range
(nonlocal) contacts
1avg--structure of triabin
Contact maps of protein structures
-both axes are the sequence of the protein
map of Ca-Ca distances < 6 Å
rainbow ribbon diagram
blue to red: N to C
Structure of n15 Cro
Contact maps of protein structures
-both axes are the sequence of the protein
map of all heavy atom distances
< 6 Å (includes side chains)
rainbow ribbon diagram
blue to red: N to C
Structure of n15 Cro
Surface and interior of globular proteins
solvent accessible surface
molecular surface
residue fractional accessibility
pockets and cavities
“hydrophobic core”
ordered waters in protein structures
“Accessible Surface”
represent atoms as spheres w/appropriate
radii and eliminate overlapping parts...
mathematically roll a
sphere all around that
surface...
the sphere’s
center traces
out a surface
as it rolls...
Lee & Richards, 1971
Shrake & Rupley, 1973
Now look at a cross-section (slice) of a protein structure:
Inner surfaces here are van der Waals. Outer surface is that traced out by the
center of the sphere as it rolls around the van der Waals’ surface. If any part of
the arc around a given atom is traced out, that atom is accessible to solvent.
The solvent accessible surface of the atom is defined as the sum the arcs
traced around an atom.
there’s not much solvent accessible surface
in the middle
van der Waals
surface
solvent
accessible
surface
from
Lee &
Richards,
1971
arc traced around atom
“Accessible surface”/“Molecular surface”
note: these are alternative ways of representing the same reality:
the surface which is essentially in contact with solvent
• molecular and accessible surfaces are both useful
representations, but molecular surface is more
closely related to the actual atomic surfaces. This
makes it somewhat better for visualizing the texture
of the outer surface, as well as for assessing the
shape and volume of any internal cavities.
• you will hear the term Connolly surface used often,
after Michael Connolly. A Connolly surface is a
particular way of calculating the molecular surface.
The accessible surface is also occasionally called the
Richards surface, after Fred Richards.
Molecular surface of proteins
depiction of heavy atoms (O,
N,C, S) in a protein as van der
Waals spheres
depiction of the corresponding
“molecular surface”--volume contained
by this surface is vdW volume plus
“interstitial volume”--spaces in between
The irregular surface of proteins:
pockets and cavities
•
a pocket is an empty
concavity on a protein
surface which is
accessible to solvent
from the outside.
•
a cavity or void in a
protein is a pocket
which has no opening
to the outside. It is
an interior empty
space inside the
protein.
Pockets and cavities can be critical features of proteins in terms of
their binding behavior, and identifying them is usually a first step in
structure-based ligand design etc.
Fractional accessibility
•
•
•
•
calculate total solvent accessible surface of protein structure (also can
calculate solvent accessible surface for individual residues/sidechains
within the protein)
can also model the accessible surface area in a disordered or unfolded
protein using accessible surface area calculations on model tripeptides
such as Ala-X-Ala or Gly-X-Gly.
from these we can calculate what fraction of the surface is buried
(inaccessible to solvent) by virtue of being within the folded, native
structure of the protein.
this is done by dividing the accessible surface area in the native
protein structure by the accessible surface in the modelled
unfolded protein. That’s the fractional accessibility. The residue
fractional accessibility and side chain fractional accessibility refer to the
same thing calculated for individual residues/sidechains within the
structure.
Accessible surface area in globular protein structures
Accessible surface area As in native states of proteins is a non-linear
function of molecular weight (Miller, Janin, Lesk & Chothia, 1987):
As = 6.3Mr0.73
`
where Mr is molecular wt
This is an empirical
correlation but it comes
close to the expected
two-thirds power law
relating surface area to
volume or mass for a set
of bodies of similar shape
and density.
How much surface area is buried when a protein
adopts its native structure in solution?
•
estimate total accessible surface area in extended/disorded polypeptide
chain using the accessible surface areas in Gly-X-Gly or Ala-X-Ala
models. This is a linear function of molecular weight
At = 1.48Mr + 21
•
the total fractional accessibility is As/At ,and the fraction of surface
area buried is 1-
•
As /At
What is the total fractional surface area buried for a protein of
molecular weight 10,000? 20,000? Is the fraction higher for small
proteins or large?
Distribution of residue fractional accessibilities
note that a sizeable group are completely buried
(hatched) or nearly completely buried
note broad distribution among non-buried
residues, and mean fractional
accessibility for non-buried residues
of around 0.5
note that few residues are
completely exposed to
solvent, but that fractional
accessibility of >1 is possible
from Miller et al,
1987
Buried residues in proteins
•the fraction of buried residues (defined by 0% or 5% ASA cutoffs)
increases as a function of molecular weight--for your average protein
around 25% of the residues will be buried. These form the core.
size class
small
medium 16000
large
XL
all
mean Mrfraction of buried residues
0% ASA 5% ASA
8000
0.070
0.107
0.240
25000
0.139
34000
0.155
0.118
0.154
0.309
0.324
0.257
Residue fractional accessibility correlates with free
energies of transfer for amino acids between water
and organic solvents
• (Miller, Janin, Lesk & Chothia, 1987)
• (Fauchere & Pliska, 1983)
• the interior of a protein is akin to a
nonpolar solvent in which the nonpolar
sidechains are buried. Polar sidechains,
on the other hand, are usually on the
surface. However, some polar side chains
do get buried, and it must also be
remembered that the backbone for every
residue is polar, including those with
nonpolar side chains. So a lot of polar
moieties do get buried in proteins.
The hydrophobic core of a small protein:
N15 Cro
0% ASA:
Pro 3
Leu 6
Ala 16
Val 27
Ile 36
Ile 44
< 5 % ASA:
Met 1
Ala 17
Val 20
Gln 41
Ser 54
note that some polar residues
are buried
11 of 66 ordered residues have less than 5% ASA
The outer surface: water in protein structures
Structures of water-soluble
proteins determined at
reasonably high resolution
will be decorated on their
outer surfaces with water
molecules (cyan balls) with
relatively well-defined
positions, and waters may
also occur internally
Water is not just surrounding
the protein--it is interacting
with it
Water interacts with protein surfaces
Most waters visible in crystal structures make hydrogen bonds
to each other and/or to the protein, as donor/acceptor/both
second shell water:
only contacts other waters
first shell waters:
in contact with/
hydrogen bound
to protein
DSSP Web Service
http://mrs.cmbi.ru.nl/hsspsoap/
Amino
Acids
Secondary
Structure
Solvent
Accessibility
STRIDE web service
http://webclu.bio.wzw.tum.de/cgibin/stride/stridecgi.py
41
REM --------------- Detailed secondary structure assignment------------REM
REM |---Residue---|
|--Structure--|
|-Phi-|
|-Psi-| |-Area-|
ASG ILE A
1
1
C
Coil
360.00
168.01
69.6
ASG VAL A
2
2
E
Strand
-97.71
163.93
42.5
ASG CYS A
3
3
E
Strand
-164.52
149.74
1.4
ASG HIS A
4
4
E
Strand
-98.82
174.84
39.5
ASG THR A
5
5
E
Strand
-171.97
161.21
25.5
ASG THR A
6
6
E
Strand
-119.23
98.92
13.1
ASG ALA A
7
7
C
Coil
-159.51
-46.53
10.0
ASG THR A
8
8
T
Turn
-76.14
-145.16
41.5
ASG SER A
9
9
T
Turn
-67.19
-64.98
58.7
ASG PRO A
10
10
T
Turn
-98.83
-165.54
75.7
ASG ILE A
11
11
E
Strand
-63.95
136.61
71.6
ASG SER A
12
12
E
Strand
-95.58
151.90
4.8
ASG ALA A
13
13
E
Strand
-149.03
116.85
55.7
ASG VAL A
14
14
E
Strand
-140.58
165.04
77.2
ASG THR A
15
15
E
Strand
-95.72
140.63
82.1
ASG CYS A
16
16
C
Coil
-90.67
106.54
11.5
ASG PRO A
17
17
C
Coil
-62.41
-47.14
122.3
ASG PRO A
18
18
T
Turn
-71.40
-166.42
60.1
ASG2012GLY A
19
19
T Avraham Samson
Turn
-28.03
66.1
- Faculty-69.07
of
42
Ilan University
ASG GLU A
20
20
T Medicine - BarTurn
-76.00
94.17
91.2
Structure Analysis
• Assign secondary structure for amino acids from
3D structure
• Generate solvent accessible area for amino
acids from 3D structure
• Most widely used tool: DSSP (Dictionary of
Protein Secondary Structure: Pattern
Recognition of Hydrogen-Bonded and
Geometrical Features. Kabsch and Sander,
1983)
2D: Contact Map Prediction
2D Contact Map
3D Structure
1
2
3
.
.
.
.
i
.
.
.
.
.
.
.
n
1 2 ………..………..…j...…………………..…n
Distance Threshold = 8Ao
Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005
3D Structure Prediction Tools
• MULTICOM
(http://sysbio.rnet.missouri.edu/multicom_toolbox/index.html )
• I-TASSER (http://zhang.bioinformatics.ku.edu/I-TASSER/)
• HHpred
(http://protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpr
ed)
• Robetta (http://robetta.bakerlab.org/)
• 3D-Jury (http://bioinfo.pl/Meta/)
• FFAS (http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl)
• Pcons (http://pcons.net/)
• Sparks (http://phyyz4.med.buffalo.edu/hzhou/anonymous-foldsp3.html)
• FUGUE (http://wwwcryst.bioc.cam.ac.uk/%7Efugue/prfsearch.html)
• FOLDpro (http://mine5.ics.uci.edu:1026/foldpro.html)
• SAM (http://www.cse.ucsc.edu/research/compbio/sam.html)
• Phyre (http://www.sbg.bio.ic.ac.uk/~phyre/)