Transcript Slide 1
Chapter 12
Protein Structure Basics
•20 naturally occurring amino acids
•Free amino group (-NH2)
•Free carboxyl group (-COOH)
•Both groups linked to a central carbon (C)
Dihedral Angles
Ramachandran plot
Hierarchy
•Primary structure
•Linear sequence of amino acids
•Secondary structure
•Local conformation of the peptide chain
•Stabilized by H-bonds between NH and C=O of
different residues
•Tertiary structure
•3 dimensional arrangement of all secondary structure
elements and connecting regions
•Quaternary structure
•Assembly of several polypeptide chains into a protein
complex
Stabilizing forces
Secondary to Quaternary structure maintained by non-covalent forces
Electrostatic interactions
Excess negative charge balanced by positive charge in another region
Salt bridge
Van der Waals forces
Induced dipole
Hydrogen bonding
Sharing of proton by two electron negative atoms
Short distance (<3Å)
Helices
•3.6 aa per turn
•=60º
•=45º
•A, Q, L M frequent
•P, G, Y scarce
-Sheet
•H-bonded -strands
•Parallel
•Anti-parallel
Coiled-coil
1KD8
Tertiary Structures
Globular proteins
Compact
Polar and hydrophilic aa on the outside
Hydrophobic amino acids on the inside
Integral Membrane Proteins
Exist in lipid bilayers
Helix segments
Connecting loopsliein aqueous phase
X-ray crystallography
•Protein crystallized
•Illuminated with X-ray beam, and diffraction pattern recorded
•Diffraction pattern converted to electron density map by Fourier
transformation
•To interpret 3D structure from 2D electron density ,map require
phase information
•Molecular replacement
•Use homologous protein structure as template
•Multiple isomorphous replacement
•Compare e- density changes in protein crystals containing strongly
diffracting heavy metals
•Model with amino acid residues that best fit the density map
NMR
•Proteins labeled with 13C or 15N
•Radiofrequency radiation used to induce nuclear spin state transitions
in a magnetic field
•Interactions between spinning isotope pairs produce radio signal
peaks that correlate with distance between them
•Information on distanmces between all pairs allow protein model to be
derived
•NMR determines structure in solution
•Dynamic conformations means that 20-40 structures satisfy distance
constrains
•Can only solve <200aa proteins
Protein Structure Database
x,y,z position of each atom in crystal
http://www.rcsb.org/pdb/
60000
1200
50000
1000
40000
800
30000
600
20000
400
10000
200
19
72
19
74
19
76
19
78
19
80
19
82
19
84
19
86
19
88
19
90
19
92
19
94
19
96
19
98
20
00
20
02
20
04
20
06
20
08
0
Total proteins
0
1972
1975
1978
1981
1984
1987
1990
1993
1996
Total folds
1999
2002
2005
2008
PDB File Format
HEADER
TITLE
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
STRUCTURAL PROTEIN
19-JAN-00
1DXX
N-TERMINAL ACTIN-BINDING DOMAIN OF HUMAN DYSTROPHIN
MOL_ID: 1;
2 MOLECULE: DYSTROPHIN;
3 CHAIN: A, B, C, D;
4 FRAGMENT: ACTIN-BINDING;
5 ENGINEERED: YES;
6 MUTATION: YES
....
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
N
CA
C
O
CB
N
CA
C
O
CB
OG
N
CA
C
O
CB
N
CA
C
O
CB
CG
CD
OE1
OE2
N
CA
C
O
CB
CG
ASP
ASP
ASP
ASP
ASP
SER
SER
SER
SER
SER
SER
TYR
TYR
TYR
TYR
TYR
GLU
GLU
GLU
GLU
GLU
GLU
GLU
GLU
GLU
ARG
ARG
ARG
ARG
ARG
ARG
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
9
9
9
9
9
10
10
10
10
10
10
11
11
11
11
11
12
12
12
12
12
12
12
12
12
13
13
13
13
13
13
12.508
13.095
12.436
12.528
14.604
11.786
11.064
9.584
9.105
11.170
12.228
8.923
7.469
7.021
6.507
6.902
7.465
7.227
6.129
6.474
8.238
9.467
9.287
8.844
9.501
4.898
3.758
3.458
2.709
2.546
2.797
-13.297
-13.021
-11.836
-11.643
-12.820
-10.979
-9.874
-10.270
-10.327
-8.536
-8.489
-10.531
-10.665
-9.267
-9.012
-11.787
-8.308
-6.877
-6.708
-6.721
-5.854
-5.254
-4.625
-3.454
-5.315
-6.585
-6.423
-4.954
-4.478
-7.147
-8.664
-10.855
-9.506
-8.798
-7.564
-9.611
-9.601
-8.982
-8.884
-7.742
-9.692
-10.623
-10.021
-10.022
-9.544
-8.432
-9.161
-10.384
-10.295
-11.389
-12.555
-10.720
-10.159
-8.796
-8.787
-7.773
-10.978
-11.854
-11.964
-11.111
-11.212
-11.236
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
72.03
73.14
73.18
73.10
73.74
70.17
65.94
62.93
65.59
64.61
66.53
55.09
47.65
47.76
43.11
49.31
46.50
40.38
35.66
32.67
43.57
45.68
51.24
54.08
52.85
32.78
25.88
24.51
30.56
23.91
27.29
N
C
C
O
C
N
C
C
O
C
O
N
C
C
O
C
N
C
C
O
C
C
C
O
O
N
C
C
O
C
C
Other structure file formats
mmCIF
•Macromolecular crystallographic information file
•Similar to relational database
•Each field assigned a tag and linked to another field
MMDB
•Molecular modeling database
•ASN.1 format
•Nested hierarchy
Chapter 13
Protein structure visualization, comparison and classification
Download and install Jmol
http://jmol.sourceforge.net/
wireframe
CPK (Corey, Pauling and Koltan)
Ball-and-stick
Cartoon
Rendered in POV-Ray: http://www.povray.org/
Protein structure comparisons
Comparing two protein structures is a fundamental
technique in protein analysis
Finding remote homologs
Proteins structures can be very similar even if sequence
identity is very low (<20%)
Intermolecular method
Identify equivalent residues
Translate one structure relative to the other unlik both occupy same
space
Rotate one structure relative to other, and continuously calcuilate
distances between equivalent residues
N
Root mean square deviation
i 1
Di2
N
Larger proteins have larger RMSD
Difficult to identify equivalent residues
Discard regions outside secondary structures
Work with 6-9 residue fragments
Dynamic programming, starting with few equivalent residues
Intramolecular method
•Calculate a distance matrix of all residue distances in two
proteins, separately
•Translate two matrices until differences are minimal
•Good to identify similar secondary structure regions in two
proteins
Multiple structure alignment
Compare structures in pairwise fashion, generating matrices based on RSMD scores
Construct phylogenetic tree
Two must similar structures are realigned
Median structure =created to which other more distant structures are systematically
aligned
DALI
Distances calculated from intra-molecular C distances matrices
Matrices are aligned to find local structural similarities
Calculate Z-score
CE Combinatorial Extension
Like DALI, but uses every 8th residue
VAST Vector Alignment Search Tool
Uses intra- and intermolecular approaches
SSAP
Intramolecular based methods
Dynamic programming to find residue path with optimal score
STAMP
Intermolecular approach, using dynamic programming
Protein structure classification
•Classification systems allows identification of relationships between structures
•Provide evolutionary view of all structures
•Newly solved structures can be fitted into hierarchy, defining possible functions
SCOP (Structural Classification of Proteins)
Manual; examination of structures
Classes, folds, families and super families
Families share high sequence homology
Super families may have common ancestral proteins
Folds look at order and connectivity of secondary structures, may not be
evolutionary related
Classes: folds with similar core structures: all-. all-, and , etc.
CATH (Class, architecture, topology and homologs)
Uses automatic assignment with SSAP as well as manual comparison
Class similar to SCOP
Architecture intermediate between SCOP fold and class: overall packaging and
arrangement of secondary structures without regard for connectivity
Topology = SCOP fold
Homologous superfamily and homologous family equivalent to SCOP super family
and family