生物計算

Download Report

Transcript 生物計算

Chapter 7
Protein and RNA Structure
Prediction
暨南大學資訊工程學系
黃光璿
2004/05/24
1
Proteins

Built from a repertoire of 20 amino
acids
2
3
7.1 Amino Acids
4
胺基酸





中心碳
胺基(NH2)
COOH
氫(H)
側鏈(side chain, R)
5
同分異構物
6
7
8
Fig. 7.2
9
10
11
12
pH, pKa, and pI

pH
-log [H+]


pKa


= pH ~ half of the amino acid residues will
dissociate (釋放出H+).
pI

= pH, isoelectric point for protein
13
7.2 Polypeptide Composition
14
15
7.3 Secondary Structure
16
7.3.1 Backbone Flexibility
17
Conformation of Polypeptide Chain
18
Ramachandran Plot
N:藍 C:黑 O:紅 H:白
19
二級結構(Secondary Structure)

Alpha helix
20

Beta sheet
21
22

Beta turn
23

Loop
24
7.3.2 Accuracy of Prediction

Computational methods





neural network
discrete-state models
hidden Markov models
nearest neighbor classification
evolutionary computation
25

PHD, Predator


structure prediction algorithms
accuracies in the range 70% ~ 75%
26
7.3.3 Chou-Fasman Method
27
Identifying Alpha Helices
1.
2.
3.
Find all regions where four out of six
have P(a)>100.
Extend the regions until four with
P(a) < 100 in both directions.
If ΣP(a) > ΣP(b) and the stretch >5,
then it is identified as a helix.
28
Identifying Beta Sheets
1.
2.
3.
Find all regions where four out of six
have P(b)>100.
Extend the regions until four with
P(b) < 100 in both directions.
If ΣP(b) > ΣP(a) and the average
value of P(b) over the stretch >100,
then it is identified as a helix.
29
Resolving Overlapping Regions
1.
Identified as helix if ΣP(a) > ΣP(b), as
sheet if ΣP(b) > ΣP(a) over the
overlapping regions.
30
Identifying Turns
Let P(t) = f(i)xf(i+1)xf(i+2)xf(i+3) for
each position i.
Identify as a turn if
1.
2.
1.
2.
3.
P(t) > 0.000075;
The average of P(turn) over the four
residues > 100;
ΣP(a) < ΣP(turn) > ΣP(b) over the four
residues.
31
7.3.4 GOR Method

on a window of 17 residues
32
7.4 Tertiary and Quaternary Structure
33
三級結構(Tertiary Structure)

折疊成立體的形狀
34
四級結構(Quaternary Structure)

數個三級結構結合成具
有功能的大分子
人類的血球蛋白
35
Driving Forces for Folding





electrostatic forces
hydrogen bonds
van der Waals forces
disulfide bonds
solvent interactions
36
7.4.1 Hydrophobicity (疏水性)

hydrophobic collapse


Tend to keep polar, charged residues on the
surface.
The class of membrane-integral proteins is
an exception.
37

sickle-cell anemia (鐮狀細胞性貧血)


human hemoglobin: 2 alpha & 2 beta
globins
charged glutamic acid residue 
hydrophobic valine residues
38
7.4.2 Disulfide Bonds
39
40
41
7.4.3 Active Structures vs Most Stable
Structures

Natural selection favors proteins that
are both active and robust.
42
Levinthal Paradox


in 1968
100 residues, each assume 3 different
conformations



3100 ~ 5x1047 possibilities
Suppose it takes 10-13 s for one trial.
Proteins fold by progressive
stabilization of intermediates rather
than by random search.
43
7.5 Algorithms for Modeling Protein
Folding


Lattice Models
Off-Lattice Models
44
7.5.1 Lattice Models
Reduce the search space and make
computing tractable.
 Minimize free energy conformation

45
HP-model

hydrophobic-polar model



Scoring is based on hydrophobic contacts.
Maximize the H-to-H contacts.
Fig. 7.8
46
47
7.5.2 Off-Lattice Models


Use RMSD (root mean square deviation)
to measure the accuracy.
Determine Φ and Ψin the allowable
region of the Ramachandran plot.
48
7.5.3 Energy Functions and Optimization

Problems


The exact forces that drive the folding
process are not well understood.
It is too computationally expensive.
49
Summary




model
representation
scoring function
search (optimization)

Folding@Home (V. Pande, Stanford)
50
7.6 Structure Prediction

very high accuracy

< 3.0 Å
51
7.6.1 Comparative Modeling


Also called homology modeling
Rely on the robustness of the folding
code
52
1.
2.
3.
4.
5.
6.
Identify a set of protein structures
related to the target protein.
Align the sequence of the target with
the sequence of the template.
Construct the model.
Model the loop.
Model the side chains.
Evaluate the model.
53
7.6.2 Threading

Given


a conformation and
a protein sequence,
measure its favorability.
54
7.7 Predicting RNA Secondary Structures
55
Nearest Neighbor Energy Rules

Zuker’s Mfold program
56
Why study RNA secondary structures?
For understanding of


gene regulation
expression of protein products
57
參考資料及圖片出處
1.
2.
Fundamental Concepts of Bioinformatics
Dan E. Krane and Michael L. Raymer,
Benjamin/Cummings, 2003.
Biochemistry, by J. M. Berg, J. L.
Tymoczko, and L. Stryer, Fith Edition,
2001.
58