www.stat.tamu.edu

Download Report

Transcript www.stat.tamu.edu

Computing for Bioinformatics
Lecture 8: protein folding
Problem description
●
●
●
●
Definition: Given the amino acid sequence of a protein, what
is the protein's structure in three dimension?
Importance: The structure of a protein provides a key to
understanding its biological function.
Assumption: The amino acid sequence contains all
information about the native 3-D structure.
Thermodynamic principle: (Christian Anfinsen's
denaturation-renaturation experiments on ribonuclease.) If
one changes the solvent condition, the protein will undergo a
transition from the native state to an unfolded state and
become inactive. When solvent condition is changed back,
the protein refolds and becomes active again.
Methods of 3D structure
determination
●
Experimental approaches: expensive, slow
➢
Nuclear magnetic resonance (NMR)
➢
X-ray crystallography
●
●
Today we have much more sequenced proteins
than protein’s structures. The gap is rapidly
increasing.
Protein structure prediction is becoming
increasingly important.
Protein Structure
• Primary structure – sequence of amino acids constituting
polypeptide chain
• Secondary structure – local organization of polypeptide chain into
secondary structures such as  -helices and  -sheets
• Tertiary structure –three dimensional arrangements of
amino acids as they react to one another due to polarity
and interactions between side chains
• Quaternary structure – Interaction of several protein
subunits
Amino acids
• Hydrophobic: Glycine(G), Alanine(A), Valine(V),
phenylalanine (F), Proline (P), Methionine (M),
isoleucine (I), Leucine(L), Tryptophan (W)
• Charged: Aspartic acid (D), Glutamic Acid (E), Lysine
(K), Arginine (R), Histidine (H)
• Polar: Serine (S), Theronine (T), Tyrosine (Y); Histidine
(H), Cysteine (C), Asparagine (N), Glutamine (Q),
Tryptophan (W)
Types of Secondary Structures
 Sheets
● -Helices
●Loops
●
– Image source:
http://www.ebi.ac.uk/microarray/biology_intro.html
 Helix
– Most abundant secondary structure
– 3.6 amino acids per turn
– Hydrogen bond formed between
every
fourth reside
– Average length: 10 amino acids, or 3
turns
– Varies from 5 to 40 amino acids
 Helix
• Every third amino acid tends to be hydrophobic
• Rich in alanine (A), gutamic acid (E), leucine (L), and
methionine (M)
• Poor in proline (P), glycine (G), tyrosine (Y), and serine
(S)
 Sheet
Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html; http://www4.ocn.ne.jp/~bio/biology/protein.htm
 Sheet
• Hydrogen bonds between 5-10 consecutive amino acids in one
portion of the chain with another 5-10 farther down the chain
• Interacting regions may be adjacent with a short loop, or far apart
with other structures in between
– Directions:
• Same: Parallel Sheet
• Opposite: Anti-parallel Sheet
• Mixed: Mixed Sheet
– Pattern of hydrogen bond formation in parallel and anti-parallel
sheets is different
Interactions in Helices and Sheets
Loops
• Regions between  helices and  sheets
• Various lengths and three-dimensional configurations
• Located on surface of the structure
• More variable sequence structure
• Tend to have charged and polar amino acids
• Frequently a component of active sites
Classes of Protein Structure
The classes are made based on the percentages of secondary
structure components.
1) Class :: bundles of  -helices connected by loops on surface of
proteins
2) Class  : antiparallel  sheets, usually two sheets in close contact
forming sandwich
3) Class  / : mainly parallel  sheets with intervening  helices;
may also have mixed  sheets (metabolic enzymes)
4) Class  +  : mainly segregated  -helices and antiparallel  sheets
 Class Protein (hemoglobin)
 Class Protein (T-Cell CD8)
/  Class Protein
(tryptohan synthase)
+ Class Protein
(1RNB)
Protein structure database
• Databases of three dimensional structures of proteins, where
structure has been solved using X-ray or NMR techniques
• Protein Databases:
– PDB
– SCOP
– Swiss-Prot
– PIR
• Most extensive for 3-D structure is the Protein Data Bank (PDB).
• Current release of PDB (April 8, 2003) has 20,622 structures
Partial PDB File
ATOM
162
ATOM
163
ATOM
164
ATOM
165
ATOM
166
ATOM
167
ATOM
168
ATOM
169
ATOM
170
ATOM
171
ATOM
172
1
N
VAL A
1
6.452
16.459
4.843
7.00 47.38
3HHB
2
CA
VAL A
1
7.060
17.792
4.760
6.00 48.47
3HHB
3
C
VAL A
1
8.561
17.703
5.038
6.00 37.13
3HHB
4
O
VAL A
1
8.992
17.182
6.072
8.00 36.25
3HHB
5
CB
VAL A
1
6.342
18.738
5.727
6.00 55.13
3HHB
6
CG1 VAL A
1
7.114
20.033
5.993
6.00 54.30
3HHB
7
CG2 VAL A
1
4.924
19.032
5.232
6.00 64.75
3HHB
8
N
LEU A
2
9.333
18.209
4.095
7.00 30.18
3HHB
9
CA
LEU A
2
10.785
18.159
4.237
6.00 35.60
3HHB
10
C
LEU A
2
11.247
19.305
5.133
6.00 35.47
3HHB
11
O
LEU A
2
11.017
20.477
4.819
8.00 37.64
3HHB
Description of PDB File
• second column: amino acid position in the polypeptide
chain
• fourth column: current amino acid
• Columns 7, 8, and 9: x, y, and z coordinates (in
angstroms)
• The 11th column: temperature factor -- can be used as a
measurement of uncertainty
Visualization of Proteins
• Most popular program for viewing 3-dimensional
structures is Rasmol
Rasmol: http://www.umass.edu/microbio/rasmol/
Chime: http://www.umass.edu/microbio/chime/
Cn3D: http://www.ncbi.nlm.nih.gov/Structure/
Mage: http://kinemage.biochem.duke.edu/website/kinhome.html
Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html