Protein Folding
Download
Report
Transcript Protein Folding
Proteins
What is a protein?
• A protein is a molecule consisting of
amino acids linked in a linear chain
through peptide bonds.
Protein primary structure
Peptide formation
There are many kinds of
proteins.
• Structural--determine shape and
function of cells
• Enzymes--speed up chemical reactions
• Ligand-binding--bind small molecules
and transport them to other locations
Cells
• muscle
•
nerve
Structural proteins
• collagen -- in connective tissue such as
cartilage
• elastin -- in connective tissue such as
cartilage
• keratin--in hair and nails
• actin -- in muscle
• myosin -- in muscle to generate mechanical
forces
Enzymes
• glucose isomerase--convert glucose into
fructose
• rennin--make cheese
• cellulase--break down cellulose into sugars to
make ethanol
• amylase--detergent for machine dish washing
Ligand-binding proteins.
• hemoglobin--transport oxygen from the
lungs
• antibodies--bind foreign substances for
destruction
The string of amino acids
tends to “fold” into a shape.
Hemoglobin structure
Heart of Steel (Hemoglobin)
by Julian Voss-Andreae
Protein views (Triose
phosphate isomerase)
Visualizing proteins
Amino acids
• There are 20 different standard amino
acids
• The different amino acids differ in
chemical properties.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Amino Acid
Alanine
Arginine
Asparagine
Aspartic acid
Cysteine
Glutamic acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
3-Letter
Ala
Arg
Asn
Asp
Cys
Glu
Gln
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
1-Letter
A
R
N
D
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
Polarity
nonpolar
polar
polar
polar
nonpolar
polar
polar
nonpolar
polar
nonpolar
nonpolar
polar
nonpolar
nonpolar
nonpolar
polar
polar
nonpolar
polar
nonpolar
Acidity
neutral
basic (s)
neutral
acidic
neutral
acidic
neutral
neutral
basic (w)
neutral
neutral
basic
neutral
neutral
neutral
neutral
neutral
neutral
neutral
neutral
Hydrophobicity index
1.8
-4.5
-3.5
-3.5
2.5
-3.5
-3.5
-0.4
-3.2
4.5
3.8
-3.9
1.9
2.8
-1.6
-0.8
-0.7
-0.9
-1.3
4.2
Hydrophobicity index.
• The larger the index, the stronger the
tendency to be internal in the protein;
the lower the index, the stronger the
tendency to appear near the protein
surface.
• Amino acids with high index are called
hydrophobic; with low index are called
hydrophilic.
What is the shape of the
protein?
• This is the “protein folding problem.”
• The geometry and chemistry of the
parts of the protein determine how it
behaves in the cell.
DNA
• DNA is deoxyribose nucleic acid.
• It occurs as long molecules in a double
helix.
DNA is a long
molecule in a
double helix
What makes DNA?
• DNA consists of sequences of
nucleotides.
• There are 4 kinds of nucleotide:
• Adenine (A), Cytosine (C), Guanine (G),
and Thymine (T)
Matching
• Each A has weak (“hydrogen”) bonds
with T on the other chain.
• Each C has weak (“hydrogen”) bonds
with G on the other chain.
A single chain carries the
information
• For example, the two strings might be
ACGGTCAG
TGCCAGTC
• Hence all the information is in the order
of A, C, G, T in one of the chains.
• We write DNA as a (long) string of A, C,
G, T for example AGGCTACATAG…
Human DNA
• Humans have 46 chromosomes.
• Each chromosome is essentially a
double helix of DNA, with variable
numbers of nucleotides, from
50,000,000 to 250,000,000 base pairs.
• There are a total of about
2,860,000,000 nucleotide pairs.
Genes
• A gene is a portion of the DNA that tells
how to make a protein.
DNA for beta hemoglobin
• ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG
CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA
GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA
GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA
AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC
AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT
GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC
AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG
TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAA
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Amino Acid
Alanine
Arginine
Asparagine
Aspartic acid
Cysteine
Glutamic acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
3-Letter
Ala
Arg
Asn
Asp
Cys
Glu
Gln
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
1-Letter
A
R
N
D
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
Polarity
nonpolar
polar
polar
polar
nonpolar
polar
polar
nonpolar
polar
nonpolar
nonpolar
polar
nonpolar
nonpolar
nonpolar
polar
polar
nonpolar
polar
nonpolar
Acidity
neutral
basic (s)
neutral
acidic
neutral
acidic
neutral
neutral
basic (w)
neutral
neutral
basic
neutral
neutral
neutral
neutral
neutral
neutral
neutral
neutral
Hydrophobicity index
1.8
-4.5
-3.5
-3.5
2.5
-3.5
-3.5
-0.4
-3.2
4.5
3.8
-3.9
1.9
2.8
-1.6
-0.8
-0.7
-0.9
-1.3
4.2
DNA determines the order of
amino acids
• ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG
CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA
GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA
GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA
AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC
AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT
GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC
AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG
TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAA
Primary structure for beta
hemoglobin--the order
• MVHLTPEEKSAVTALWGKVNVDEVG
GEALGRLLVVYWTQRFFESFGDLSTP
DAVMGNPKVKAHGKKVLGAFSDGLA
HLDNLKGTFATLSELHCDKLHVDPEN
FRLLGNVLVCVLAHHFGKEFTPPVQA
AYQKVVAGVANALAHKYH
Hemoglobin structure
How does DNA determine the
order of amino acids?
• Three successive nucleotides form a
“codon.”
• Different codons stand for different
amino acids.
Translating codons
•
•
•
•
•
•
•
•
•
•
•
Ala/A GCT, GCC, GCA, GCG
Arg/R CGT, CGC, CGA, CGG, AGA, AGG
Asn/N AAT, AAC
Asp/D GAT, GAC
Cys/C TGT, TGC
Gln/Q CAA, CAG
Glu/E GAA, GAG
Gly/G GGT, GGC, GGA, GGG
His/H CAT, CAC
Ile/I
ATT, ATC, ATA
START ATG
Leu/L
Lys/K
Met/M
Phe/F
Pro/P
Ser/S
Thr/T
Trp/W
Tyr/Y
Val/V
STOP
TTA, TTG, CTT, CTC, CTA, CTG
AAA, AAG
ATG
TTT, TTC
CCT, CCC, CCA, CCG
TCT, TCC, TCA, TCG, AGT, AGC
ACT, ACC, ACA, ACG
TGG
TAT, TAC
GTT, GTC, GTA, GTG
TAG, TGA, TAA
DNA for beta hemoglobin
• ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG
CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA
GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA
GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA
AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC
AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT
GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC
AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG
TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA
CTAA
Primary structure for beta
hemoglobin
• MVHLTPEEKSAVTALWGKVNVDEVG
GEALGRLLVVYWTQRFFESFGDLSTP
DAVMGNPKVKAHGKKVLGAFSDGLA
HLDNLKGTFATLSELHCDKLHVDPEN
FRLLGNVLVCVLAHHFGKEFTPPVQA
AYQKVVAGVANALAHKYH
Hemoglobin structure
The order of amino acids is
important
• Consider what may happen when the
“wrong” amino acid is in a certain
position.
Primary structure for beta
hemoglobin
• MVHLTPEEKSAVTALWGKVNVDEVG
GEALGRLLVVYWTQRFFESFGDLSTP
DAVMGNPKVKAHGKKVLGAFSDGLA
HLDNLKGTFATLSELHCDKLHVDPEN
FRLLGNVLVCVLAHHFGKEFTPPVQA
AYQKVVAGVANALAHKYH
Sickle cell anemia beta
hemoglobin
• MVHLTPVEKSAVTALWGKVNVDEVG
GEALGRLLVVYWTQRFFESFGDLSTP
DAVMGNPKVKAHGKKVLGAFSDGLA
HLDNLKGTFATLSELHCDKLHVDPEN
FRLLGNVLVCVLAHHFGKEFTPPVQA
AYQKVVAGVANALAHKYH
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Amino Acid
Alanine
Arginine
Asparagine
Aspartic acid
Cysteine
Glutamic acid
Glutamine
Glycine
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
3-Letter
Ala
Arg
Asn
Asp
Cys
Glu
Gln
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
1-Letter
A
R
N
D
C
E
Q
G
H
I
L
K
M
F
P
S
T
W
Y
V
Polarity
nonpolar
polar
polar
polar
nonpolar
polar
polar
nonpolar
polar
nonpolar
nonpolar
polar
nonpolar
nonpolar
nonpolar
polar
polar
nonpolar
polar
nonpolar
Acidity
neutral
basic (s)
neutral
acidic
neutral
acidic
neutral
neutral
basic (w)
neutral
neutral
basic
neutral
neutral
neutral
neutral
neutral
neutral
neutral
neutral
Hydrophobicity index
1.8
-4.5
-3.5
-3.5
2.5
-3.5
-3.5
-0.4
-3.2
4.5
3.8
-3.9
1.9
2.8
-1.6
-0.8
-0.7
-0.9
-1.3
4.2
Simple model
• Pretend there are only 2 kinds of amino
acid--H and P.
• H stands for “hydrophobic”.
• Pretend that they must be placed on a
grid.
• Example: HHPPPPPPPHH
A folding of HHPPPPPPPHH
H
H
P
P
P
P
H
H
P
P
P
Another folding of
HHPPPPPPPHH
H
H
P
H
P
P
H
P
P
P
P
Energy
•
•
•
•
•
HH has energy -1.
PP has energy 0.
HP has energy 0.
PH has energy 0.
The protein folds so as to minimize the
energy.
A folding of HHPPPPPPPHH
with energy -2
H
H
P
P
P
P
H
H
P
P
P
A folding of HHPPPPPPPHH
with energy -4
H
H
P
H
P
P
H
P
P
P
P
A folding of HHPPPPPPPHH
with ? energy
H
H
H
H
P
P
P
P
P
P
P
The real problem
• There are 20 amino acids.
• Pairs have different energies.
• Typically a protein has about 100 amino
acids.
• The protein is in 3 dimensions.
• It does not need to be on a grid.
• It must be worked on a computer.
The Direct Approach
• Write down a formula for the energy E,
taking into account the (variable)
locations of all amino acids, all charges
and electrostatic attractions and
repulsions, and all constraints.
• Minimize E.
Indirect Methods
• Statistics of amino acids in known
structures
• Neural network models
• Nearest neighbor methods
• Hidden Markov models
Does a method work?
• We want to be able to check some
answers, to see whether a method
appears to work.
• Professor Zhijun Wu works on some
problems related to this.
NMR
• NMR is Nuclear Magnetic Resonance
• Using NMR one can often find the
distances between some particular
atoms in a protein.
Distances
A1
A2
d(2,3)
A3
d(1,4)
A4
• Here d(1,4) is the
distance between
the first and fourth
atoms.
Locations
A1
•
•
•
•
A2
A1 is at (x11, x12, x13).
A2 is at (x21, x22, x23).
A3 is at (x31, x32, x33).
A4 is at (x41, x42, x43).
d(2,3)
A3
d(1,4)
A4
• Once you know all the
locations, you know the
shape of the protein.
Position Matrix
• Form the matrix X
A1
A2
d(2,3)
A3
d(1,4)
A4
x11
x21
x31
x41
x12
x22
x32
x42
x13
x23
x33
x43
Matrix Equation
• It turns out that
A1
A2
d(2,3)
A3
d(1,4)
A4
X XT = D where D is
a matrix that can be
obtained just using
all the numbers
d(i,j).
The matrix D
A1
A2
d(2,3)
A3
d(1,4)
A4
• If there are n atoms
and the last is at the
origin, then the entry
of D in the ith row
and jth column is
(d(i,n)2 - d(i,j)2 +
d(j,n)2) / 2
Solving the matrix equation
A1
A2
d(2,3)
A3
d(1,4)
A4
• Professor Zhijun Wu
studies ways to
solve such matrix
equations rapidly.
Energy
•
•
•
•
•
HH has energy -1.
PP has energy 0.
HP has energy 0.
PH has energy 0.
The protein folds so as to minimize the
energy.
What is the best folding of
• HPPHPPHPHPPHPHPHHH
• (Careful: answer is on the next slide)
HPPHPPHPHPPHPHPHHH
P
P
H
P
H
H
H
H
H
H
P
H
H
P
P
P
P
P
with energy -11