ProteinPrediction

Download Report

Transcript ProteinPrediction

Protein Structure Prediction and
Structural Genomics
Computer Science Department
North Dakota State University
Fargo, ND
Outline



Structure of Protein
Prediction Methods
CASP Cup
2
Protein
Polypeptide sequence

Proteins are synthesized as linear
chains of amino acids, but they quickly
fold into a compact, globular
structure.
3
Peptide bond formation:



Each amino acid has two parts, a backbone and a side chain.
The side chain, R, distinguishes the different amino acids.
Backbone is constant for all 20 amino acids. It consists of an
amide (--NH2) group, an alpha carbon, and a carboxylic acid (-COOH) group.
4
The Amino Acids list
Group Code
Group Name
1
Gly
Glycine
2
Ala
Alanine
3
Val
Valine
4
Leu
Leucine
5
Ile
Isoleucine
6
Ser
Serine
7
Thr
Threonine
8
Cys
Cysteine
9
Met
Methionine
10
Pro
Proline
11
Asp
Aspartic acid
12
Asn
Asparagine
13
Glu
Glutamic acid
14
Gln
Glutamine
15
His
Histidine
16
Lys
Lysine
17
Arg
Arginine
18
Phe
Phenylalanine
19
Tyr
Tyrosine
20
Trp
Tryptophan
ID
5
Protein Primary structure:

Protein Primary Sequences can be
written with a 3-letter code for the 20
amino acids (above) or with a 1-letter
code:
Ex: Human Insulin
A-Chain: GIVEQCCTSICSLYQLENYCN
B-Chain: FVNQHLCGSHLVEALYLVCGERGFFYTPKT
6
Protein Secondary structure


Protein secondary structure refers to
regular, repeated patters of folding of
the protein backbone.
Patterns result from regular hydrogen
bond patterns of backbone atoms.
7
Protein Secondary Structure

The two most common folding patterns
are the alpha helix and the beta sheet.
8

Two elements of secondary structure are alpha helices (= =60o) and beta strands (= -135o,=135o), which associate with
other beta strands to form parallel or anti-parallel beta sheets
-helix
antiparallel -sheet
9
Secondary Structure

Only two rotatable bonds in
protein


The bond between the amide
nitrogen and the alpha carbon,
referred to as  (phi) angle
The bond between the alpha
carbon and the carboxyl carbon,
referred to as  (psi) angle
10
Protein Tertiary Structure

Final shapes of proteins are determined and
stabilized by chemical bonds and forces,
including weak bonds like Hydrogen bonds,
Ionic bonds, Van der Waals bonds, and
Hydrophobic attractions.
Tertiary Structure of Ribonuclease:
A globular protein
Alpha helices, beta sheets,
and turns contribute to the
Ribonuclease A tertiary structure.
11
Protein Quaternary Structure

The arrangement of the individual
subunits of a protein with multiple
polypeptide subunits gives the protein a
quaternary structure


Ex: Hemoglobin has 2 alpha and 2 beta
subunits.
Only proteins with multiple polypeptide
subunits can have quaternary structure.
12
Different protein structure formation:
13
The Goal of Protein Structure
Prediction


“The goal of fold assignment and comparative
modeling is to assign, using computational
methods, each new genome sequence to the
known protein fold or structure that it most
closely resembles.”
In other words, to class structure into families
that share similar folds or motifs and to
construct phylogenies.
14
Significant



Identifying these shared structural motifs can
provide significant insight into the functional
mechanisms of the protein family.
“The key to understanding the inner workings
of cells is to learn the structure of Proteins
that form their architecture and carry out
their metabolism.”
Comparing proteomics with genomics, it is
fair to say that “genes were easy” and the
real work of bioinformatics has just begun.
15
Protein Classification: Families
and superfamilies


By definition, proteins that are more than
50% identical in amino acid sequence across
their entire length are said to be members of
a single family.
Superfamilies are groups of protein families
that are related by lower but still detectable
levels of sequence similarity (and therefore
have a common but more ancient
evolutionary origin).
16
Protein Classification: Folds


Proteins are said to have a common fold if
they have the same major secondary
structures in the same arrangement and with
same topological connections. For example,
all alpha proteins, all beta proteins,
alpha/beta proteins, membrane and cell
surface proteins, etc.
In many respects, the term fold is used
synonymously with structural motif but
generally refers to larger combinations of
secondary structures.
17
Protein Classification: Enzyme
nomenclature

Each enzyme can be assigned a
numerical code, such as 3.2.1.14,
where the first number specifies the
main class, the second and third
numbers correspond to specific
subclasses, and the final number
represents the serial listing of the
enzyme in its subclass.
18
Experimental Techniques





X-ray Crystallography
NMR Spectroscopy
2D electrophoresis
Mass spectrometry
Protein microarrays
19
Two Prediction Methods

Protein Folding Model



to simulate the protein folding process at various
levels of abstraction which provides insights into
the forces that determine protein structure and
the folding process.
No algorithm developed to date can determine the
native structure of a protein accurately.
Comparative Modeling

sometimes called homology modeling, seeks to
predict the structure of a target protein via
comparison with the structures of related proteins.
20
Comparative Modeling
Algorithms






DALI (Holm1993)
STRUCTAL (Gerstein1996)
VAST (Gibrat1996)
MINAREA (Falicov1996)
LOCK (Singh1997)
3dSEARCH (Singh1998)
21
Prediction Algorithm:
3dSEARCH



Designed to compute fast but approximate
alignments of protein structures based on
secondary structure elements alone.
The fundamental idea is to represent all
secondary structure vectors from all target
proteins in a large, highly redundant hash
table. Each secondary structure vector from a
given query structure can be simultaneously
compared to the entire table.
It performed surprisingly well given the
simplicity of its technique.
22
Prediction Algorithm: VAST


Aligning secondary structure elements using
graph theory.
Steps of VAST Algorithm




All element pairs (one from each protein) that
have the same type are represented as nodes.
Two nodes are connected if the distance and angle
within some threshold.
Find the maximal subgraph that are fully
connected, which is the pairwise alignment.
Compute alignment score as well as P-value.
23
Prediction Algorithm: DALI


Attempt to compute the optimal similar
contact patterns from a 2-d distance
matrices.
Use branch-and-bound algorithm to find
an approximate solution.
24
Prediction Algorithm:
STRUCTAL


To minimize the root-mean-square
difference (RMSD) between two protein
backbones.
Use dynamic programming to minimize.
25
Prediction Algorithm:
MINAREA


To compute a triangulation between the
C-a atoms of the two proteins in order
to minimize the stretched surface area
between their backbones.
Use dynamic programming (DP) to find
the minimum.
26
Prediction Algorithm: LOCK


Attempt to find the optimal rigid-body
superposition of two structures such
that root-mean-square difference
(RMSD) between the aligned C-a atoms
is minimized.
An iterative approach that performs a
greedy search to the nearest local
minimum in alignment space.
27
Gold Standard for Evaluation


Scope database is being widely used and has
been recognized as a current standard in
structural classification.
(http://pdb.wehi.edu.au/scop)
It has been constructed by visual inspection
of all structures in Protein Data Bank (PDB).
Four levels, ‘class’, ‘fold’, superfamily’, and
‘family’. ‘Class’ are those that have similar
overall secondary structure content.
28
CASP Competition


CASP competition (Critical Assessment
of Techniques for Protein Structure
Prediction)
http://predictioncenter.llnl.gov/
Their goal is to help advance the
methods of identifying protein structure
from sequence.
29