PowerPoint slides - George Mason University

Download Report

Transcript PowerPoint slides - George Mason University

A Statistical Geometry
Approach to the Study of
Protein Structure
Majid Masso
Bioinformatics and Computational Biology
George Mason University
Protein Basics
=
=
A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V
,W,Y
H O
+
H3N Cα C OCH2
CH
H3C CH3
Leucine (Leu or L)
Identical for all
amino acids
Unique side chain
(R group) for each
amino acid
H2O
H O
H O
+
H3N Cα C N Cα C OR1
H R2
peptide bond
=

H O
H O
Cα C O- + +H3N Cα C OR1
R2
=

formed by linearly linking amino
acid residues (aa’s are the
+
H3N
building blocks of proteins)
20 distinct aa types
=

Protein Basics




genes: code, or “blueprint”
proteins: product, or “building”
protein structure gives rise to
function
why do “things go wrong”?



mistakes in “blueprint”
incorrectly built, or nonexistent
“buildings”
Protein Data Bank (PDB):
repository of protein structural
data, including 3D coords. of all
atoms (www.rcsb.org/pdb/)
PDB ID: 1REZ
Structure reference: Muraki M., Harata K., Sugita N., Sato K.,
Origin of carbohydrate recognition specificity of human
lysozyme revealed by affinity labeling, Biochemistry 35 (1996)
Computational Geometry Approach to
Protein Structure Prediction
Tessellation






protein structure represented as a set
of points in 3D, using Cα coordinates
Voronoi tessellation: convex polyhedra,
each contains one Cα , all interior points
closer to this Cα than any other
Delaunay tessellation: connect four Cα
whose Voronoi polyhedra meet at a
common vertex
vertices of Delaunay simplices
objectively define a set of four nearestneighbor residues (quadruplets)
5 classes of Delaunay simplices
Quickhull algorithm (qhull program),
Barber et al., UMN Geometry Center
Voronoi/Delaunay tessellation in 2D space. Voronoi
tessellation-dashed line, Delaunay tessellation-solid
line (Adapted from Singh R.K., et al. J. Comput. Biol.,
1996, 3, 213-222.)
k
j
l
i
k
j
j
+
1
j
i
+
3
j
i
+
1
i
+
2 i
+
1
i
+
1
i
+
1
+
2i
i
i
i
i
{
1
1
1
1
}{
2
2
}
2
1
1
} {
{
4
}
{
3
1
}
Five classes of Delaunay simplices. (Adapted from
Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.)
Counting Quadruplets

assuming order independence among residues comprising
Delaunay simplices, the maximum number of all possible
combinations of quadruplets forming such simplices is 8855
C D E F
 20 
 
 4
C C D E
 19 
20   
2
C C D D
 20 
 
 2
C C C D
20 19
C C C C
20
Residue Environment Scores

log-likelihood: qijkl  log  fijkl pijkl 

f ijkl = normalized frequency of quadruplets containing
residues i,j,k,l in a representative training set of highresolution protein structures with low primary sequence
identity


i.e., f ijkl = total number of quadruplets in dataset containing only
residues i,j,k,l divided by total number of observed quadruplets
pijkl = frequency of random occurrence of the
quadruplet (multinomial)



i.e., pijkl  cai a j ak al
ai= total number of occurrences of residue i divided by total
number of residues in the dataset
4!
, where n = number of distinct residue types in the
c n
  ti ! quadruplet, and t i is the number of residues of type i.
i
Residue Environment Scores


total statistical potential (topological score) of protein: sum the loglikelihoods of all quadruplets forming the Delaunay simplices
individual residue potentials: sum the log-likelihoods of all quadruplets
in which the residue participates (yields a 3D-1D potential profile)
3phv Potential Profile
12
PDB ID: 3phv
HIV-1 Protease Monomer
99 amino acids
(total potential 27.93)
10
Potential
8
6
4
2
0
-2
0
10
20
30
40
50
60
70
80
90
Residue Number
Structure reference: R. Lapatto, T. Blundell, A. Hemmings, et al., X-ray analysis of HIV-1 proteinase at 2.7 Å resolution
confirms structural homology among retroviral enzymes, Nature 342 (1989) 299-302.
100
HIV-1 Protease Comprehensive Mutational Profile (CMP)



mutate 19 times the residue present at each of the 99 positions in the primary sequence
get total potential and potential profile of each artificially created mutant protein
create 20x99 matrix containing total potentials of all the single residue mutants

columns labeled with residues in the primary sequence of wild-type (WT) HIV-1
protease monomer, and rows labeled with the 20 naturally occurring amino acids
subtract WT total potential (TP) from each cell, then average columns to get CMP
1 20
1 20

CMPj = 20  [(mutant TP)ij-(WT TP)] = 20 [(mutant TP)ij-27.93] , j=1,…,99
i1
i1
3phv Comprehensive Mutational Profile
4
2
0
Mean Change in Total
Protein Potential

-2
-4
-6
-8
0
10
20
30
40
50
60
Residue Number
70
80
90
100
3phv Comprehensive Mutational Profile vs. Potential Profile
4
N83
D25
2
Mean Change in Total Protein Potential (CMPj)
K55
0
G78
G16
I50
G94 L19 T4G40
P9
G68
G73
P1
R57 Q92T12 N98
Q2
G86
D30
P44
Q61
P39H69
S37
Q7
Q18
K70
T91
A28
T80
T96
T26
G51
V82G17
K14
K43 T31
K45P81
R41
G27
M46
P79
W6 Q58 I54 G48
A71
R87
F99
I93
E34
L5
G49
E21
W42
T74
F53
E65
M36 L63
I3
G52
L97
R8
D29
V56
N88
I72
Y59 L38
E35
K20
I47
I84
-2
C95
L10
A22
L76
D60
L23
L89
V77
C67
L90
I62 V11
I13
-4
I15
V32
L33
V75
L24
-6
I66
I85
I64
-8
-2
0
2
4
6
8
10
Individual Residue Potentials of Wild-Type Protein (potential of residue j in WT HIV-1 protease)
12
Structure-Function Correlations



536 single point missense mutations

336 published mutants:

200 mutants provided by R. Swanstrom (UNC)
Loeb D.D., Swanstrom R., Everitt L.,
Manchester M., Stamper S.E., Hutchison III C.A. Complete mutagenesis of
the HIV-1 protease. Nature, 1989, 340, 397-400
each mutant placed in one of 3 phenotypic
categories, positive, negative, or intermediate,
based on activity
mutant activity compared with change in
sequence-structure compatibility elucidated by
potential data
3phv Structure-Function Correlations
Average Change in Potential
0.00
-0.20
-0.40
-0.60
-0.80
-1.00
-1.20
-1.40
-1.60
-1.80
Positive
Intermediate
Negative
ALL
-0.23
-0.74
-1.39
C
-0.14
-0.75
-0.23
NC
-0.29
-0.73
-1.65
HIV-1 Protease Assay
HIV-1 Protease Mutagenesis Data
Observations



set of mutants with unaffected protease activity exhibit minimal (negative) change
in potential
set of mutants that inactivate protease exhibit large negative change in potential,
weighted heavily by NC
set of mutants with intermediate phenotypes exhibit moderate negative change in
potential (similar among C and NC); wide range for intermediate phenotype in the
experiments
Acknowledgements



Iosif Vaisman (Ph.D. advisor, first to
apply Delaunay to protein structure)
Zhibin Lu (Java programs for calculating
statistical potentials from tessellations)
Ronald Swanstrom (experimental HIV-1
protease mutants and activity measure)