MolecularViewers
Download
Report
Transcript MolecularViewers
Protein structure prediction
May 15, 2001
Quiz#4 postponed
Writing assignment
Learning objectives-Understand the basis of
secondary structure prediction programs.
Understand neural networks. Become familiar
with manipulating known protein structures with
Cn3D.
Workshop-Manipulation of the PTEN protein
structure with Cn3D.
What is secondary structure?
Two major types:
Alpha Helical Regions
Beta Sheet Regions
Other classification schemes:
Turns
Transmembrane regions
Internal regions
External regions
Antigenic regions
Some Prediction Methods
ab initio methods
Based on physical properties of aa’s and
bonding patterns
Statistics of amino acid distributions
Chou-Fasman
Position of amino acid and distribution
Garnier, Osguthorpe-Robeson (GOR)
Neural networks
Chou-Fasman Rules (Mathews, Van Holde, Ahern
Amino Acid
Ala
Cys
Leu
Met
Glu
Gln
His
Lys
Val
Ile
Phe
Tyr
Trp
Thr
Gly
Ser
Asp
Asn
Pro
Arg
-Helix
1.29
1.11
1.30
1.47
1.44
1.27
1.22
1.23
0.91
0.97
1.07
0.72
0.99
0.82
0.56
0.82
1.04
0.90
0.52
0.96
-Sheet
0.90
0.74
1.02
0.97
0.75
0.80
1.08
0.77
1.49
1.45
1.32
1.25
1.14
1.21
0.92
0.95
0.72
0.76
0.64
0.99
Turn
0.78
0.80
0.59
0.39
1.00
0.97
0.69
0.96
0.47
0.51
0.58
1.05
0.75
1.03
1.64
1.33
1.41
1.23
1.91
0.88
Favors
-Helix
Favors
-Sheet
Favors
-Sheet
Chou-Fasman
First widely used procedure
If propensity in a window of six residues (for a
helix) is above a certain threshold the helix is
chosen as secondary structure.
If propensity in a window of five residues (for a
beta strand) is above a certain threshold then beta
strand is chosen.
The segment is extended until the average
propensity in a 4 residue window falls below a
value.
Output-helix, strand or turn.
GOR
Position-dependent propensities for helix, sheet or turn is
calculated for each amino acid. For each position j in the
sequence, eight residues on either side of aaj is considered.
It uses a PSSM
A helix propensity table contains info. about propensity for
certain residues at 17 positions when the conformation of
residue j is helical. The helix propensity tables have 20 x
17 entries.
The predicted state of aaj is calculated as the sum of the
position-dependent propensities of all residues around aaj.
Neural networks
• Computer neural networks are based on simulation of adaptive
learning in networks of real neurons.
•Neurons connect to each other via synaptic junctions which are either
stimulatory or inhibitory.
•Adaptive learning involves the formation or suppression of the right
combinations of stimulatory and inhibitory synapses so that a set
of inputs produce an appropriate output.
Neural Networks (cont. 1)
•The computer version of the neural network involves
identification of a set of inputs - amino acids in the
sequence, which transmit through a network of
connections.
•At each layer, inputs are numerically
weighted and the combined result passed to the next
layer.
•Ultimately a final output, a decision, helix, sheet or
coil, is produced.
Neural Networks (cont. 2)
90% of training set was used (known structures)
10% was used to evaluate the performance of the neural
network during the training session.
Neural Networks (cont. 3)
•During the training phase, selected sets of proteins of known
structure are scanned, and if the decisions are incorrect, the
input weightings are adjusted by the software to produce the
desired result.
•Training runs are repeated until the success rate is maximized.
•Careful selection of the training set is an important aspect of
this technique. The set must contain as wide a range of
different fold types as possible, but without duplications of
structural types that may bias the decisions.
Neural Networks (cont. 5)
•An additional component of the PSIPRED procedures involves
sequence alignment with similar proteins.
•The rationale is that some amino acids positions in a sequence
contribute more to the final structure than others. (This has been
demonstrated by systematic mutation experiments in which each
consecutive position in a sequence is substituted by a spectrum of
amino acids. Some positions are remarkably tolerant of
substitution, while others have unique requirements.)
•To predict secondary structure accurately, one should place little
weight on the tolerant positions, which clearly contribute little to
the structure, and strongly emphasize the intolerant positions.
PSIPRED
Uses multiple aligned sequences for prediction.
Uses training set of proteins with known structure.
Uses a two-stage neural network to predict
structure based on position specific scoring
matrices generated by PSI-BLAST (Jones, 1999)
First network converts a window of 15 aa’s into a raw
score of h,b,c or terminus
Second network filters the first output. For example, an
output of hhhhehhhh might be converted to hhhhhhhhh.
Can obtain a Q3 value of 70-78% (may be the
highest achievable)
Column specifies position within the protein
15 groups of 21 units
(1 unit for each aa plus
one specifying the end)
Provides info
on tolerant or
intolerant positions
Filtering network
three outputs are helix, strand or coil
Example of Output from
PSIPRED
PSIPRED PREDICTION RESULTS
Key
Conf: Confidence (0=low, 9=high)
Pred: Predicted secondary structure (H=helix, E=strand, C=coil)
AA: Target sequence
Conf: 923788850068899998538983213555268822788714786424388875156215
Pred: CCEEEEEEEHHHHHHHHHHCCCCCCHHHHHHCCCCCEEEEECCCCCCHHHHHHHCCCCCC
AA: KDIQLLNVSYDPTRELYEQYNKAFSAHWKQETGDNVVIDQSHGSQGKQATSSVINGIEAD
10
20
30
40
50
60
3D structure predictionThreading
Threading, alluded to earlier, is a mechanism to address the
alignment of two sequences that have <30% identity and are
typically considered non-homologous. Essentially, one fits—or
threads—the unknown sequence onto the known structure and
evaluates the resulting structure’s fitness using environment- or
knowledge-based potentials.
Helical Wheel
If you can predict an alpha helix it is
sometimes useful
to be able to tell if the helix is
amphipathic. This would indicate
whether one face of the helix faces
the solvent or perhaps another
protein. They have been particularly
useful in predicting a
“super-secondary” structure known
as coiled coils.
The helical wheel is based on the
ideal alpha helix placing an amino
acid every 100* around the
circumference of the helix cylinder
Coiled-coil predictors
The alpha-helical coiled-coil structure has a strong signature
heptad pattern abcdefg where a and d are typically non
polar (leucine rich) and e and g are often charged. This makes
scoring from a sequence scale plot relatively easy.
3D structure data
The largest 3D structure database is the
Protein Database
It contains over 15,000 records
Each record contains 3D coordinates for
macromolecules
80% of the records were obtained from X-ray
diffraction studies, 16% from NMR and the rest
from other methods and theoretical calculations
Part of a record from the PDB
ATOM
1
N
ARG A
14
22.451
98.825
31.990
1.00 88.84
N
ATOM
2
CA
ARG A
14
21.713 100.102
31.828
1.00 90.39
C
ATOM
3
C
ARG A
14
22.583 101.018
30.979
1.00 89.86
C
ATOM
4
O
ARG A
14
22.105 101.989
30.391
1.00 89.82
O
ATOM
5
CB
ARG A
14
21.424 100.704
33.208
1.00 93.23
C
ATOM
6
CG
ARG A
14
20.465 101.880
33.215
1.00 95.72
C
ATOM
7
CD
ARG A
14
20.008 102.147
34.637
1.00 98.10
C
ATOM
8
NE
ARG A
14
18.999 103.196
34.718
1.00100.30
N
ATOM
9
CZ
ARG A
14
18.344 103.507
35.833
1.00100.29
C
ATOM
10
NH1 ARG A
14
18.580 102.835
36.952
1.00 99.51
N
ATOM
11
NH2 ARG A
14
17.441 104.479
35.827
1.00100.79
N
Molecular Modeling DB
(MMBD)
Relies on PDB for data
It contains over 10,000 structure records
Links connect the records to Medline and NCBI’s
taxonomy database
Sequence “neighbors” of the structures are are provided
by BLAST.
Structure “neighbors” are provided by VAST algorithm.
Cn3D is a molecular graphics viewer that allows one to
view the three-dimensional structure.