Protein Structure Prediction

Download Report

Transcript Protein Structure Prediction

Protein Structure Prediction
Protein
Sequence +
Dr. G.P.S. Raghava
Structure
Protein Structure Prediction
• Experimental Techniques
– X-ray Crystallography
– NMR
• Limitations of Current Experimental Techniques
– Protein DataBank (PDB) -> 23000 protein structures
– SwissProt -> 100,000 proteins
– Non-Redudant (NR) -> 10,00,000 proteins
• Importance of Structure Prediction
– Fill gap between known sequence and structures
– Protein Engg. To alter function of a protein
– Rational Drug Design
Different Levels of Protein Structure
Protein Architecture
• Proteins consist of amino acids linked by
peptide bonds
• Each amino acid consists of:
–
–
–
–
a central carbon atom
an amino group
a carboxyl group and
a side chain
• Differences in side chains distinguish the
various amino acids
Amino Acid Side Chains
Vary in:
• Size
• Shape
• Polarity
Peptide Bond
Peptide Bonds
Dihedral Angles
Conformation Flexibility
• Backbone (main
chain of atoms in
peptide bonds,
minus side chains)
conformation:
– Torsion or rotation
angles around:
• C-N bond ()
• C-C bond ()
– Sterical hinderance:
• Most – Pro
• Least - Gly
Ramachandran Plot
Protein Secondary Structure
Secondary Structure
Regular
Secondary
Structure
(-helices, sheets)
Irregular
Secondary
Structure
(Tight turns,
Random coils,
bulges)
Secondary Structure:
Helices
ALPHA HELIX : a result of H-bonding between every fourth peptide
bond (via amino and carbonyl groups) along the length of the
polypeptide chain
Individual
Amino acid
H-bond
Helix formation is local
THYROID hormone receptor
(2nll)
residues
i
and
i+3
Secondary Structure:
Beta Sheets
BETA PLEATED SHEET: a result of H-bonding between polypeptide chains
-sheet formation is NOT local
Erabutoxin  (3ebx)
Definition of -turn
A -turn is defined by four consecutive residues i, i+1, i+2 and i+3
that do not form a helix and have a C(i)-C(i+3) distance less than
7Å and the turn lead to reversal in the protein chain. (Richardson,
1981).
The conformation of -turn is defined in terms of  and  of two
central residues, i+1 and i+2 and can be classified into different
types on the basis of  and .
i+1
i
i+2
H-bond
D <7Å
i+3
Tight turns
Type
No. of residues
H-bonding
-turn
2
NH(i)-CO(i+1)
-turn
3
CO(i)-NH(i+2)
-turn
4
CO(i)-NH(i+3)
-turn
5
CO(i)-NH(i+4)
-turn
6
CO(i)-NH(i+5)
Secondary Structure
shortcuts
Tertiary Structure: Hexokinase
(6000 atoms, 48 kD, 457 amino acids)
polypeptides with a tertiary level of structure are usually referred to as
globular proteins, since their shape is irregular and globular in form
Quarternary Structure:
Haemoglobin
What determines fold?
• Anfinsen’s experiments in 1957
demonstrated that proteins can fold
spontaneously into their native
conformations under physiological
conditions. This implies that primary
structure does indeed determine
folding or 3-D stucture.
• Some exceptions exist
– Chaperone proteins assist folding
– Abnormally folded Prion proteins can
catalyze misfolding of normal prion
proteins that then aggregate
Levels of Description of
Structural Complexity
• Primary Structure (AA sequence)
• Secondary Structure
– Spatial arrangement of a polypeptide’s backbone atoms
without regard to side-chain conformations
• , , coil, turns (Venkatachalam, 1968)
– Super-Secondary Structure
• , , /, + (Rao and Rassman, 1973)
• Tertiary Structure
– 3-D structure of an entire polypeptide
• Quarternary Structure
– Spatial arrangement of subunits (2 or more polypeptide
chains)
Techniques of Structure Prediction
• Computer simulation based on energy calculation
– Based on physio-chemical principles
– Thermodynamic equilibrium with a minimum free energy
– Global minimum free energy of protein surface
• Knowledge Based approaches
– Homology Based Approach
– Threading Protein Sequence
– Hierarchical Methods
Energy Minimization Techniques
Energy Minimization based methods in their pure form, make
no priori assumptions and attempt to locate global minma.
• Static Minimization Methods
– Classical many potential-potential can be construted
– Assume that atoms in protein is in static form
– Problems(large number of variables & minima and validity of
potentials)
• Dynamical Minimization Methods
– Motions of atoms also considered
– Monte Carlo simulation (stochastics in nature, time is not cosider)
– Molecular Dynamics (time, quantum mechanical, classical equ.)
• Limitations
– large number of degree of freedom,CPU power not adequate
– Interaction potential is not good enough to model
Molecular Dynamics
• Provides a way to observe the motion of
large molecules such as proteins at the
atomic level – dynamic simulation
• Newton’s second law applied to molecules
• Potential energy function
– Molecular coordinates
– Force on all atoms can be calculated, given this
function
– Trajectory of motion of molecule can be
determined
Knowledge Based Approaches
• Homology Modelling
–
–
–
–
Need homologues of known protein structure
Backbone modelling
Side chain modelling
Fail in absence of homology
• Threading Based Methods
–
–
–
–
–
New way of fold recognition
Sequence is tried to fit in known structures
Motif recognition
Loop & Side chain modelling
Fail in absence of known example
Homology Modeling
• Simplest, reliable approach
• Basis: proteins with similar sequences tend
to fold into similar structures
• Has been observed that even proteins with
25% sequence identity fold into similar
structures
• Does not work for remote homologs (< 25%
pairwise identity)
Homology Modeling
• Given:
– A query sequence Q
– A database of known protein structures
• Find protein P such that P has high sequence
similarity to Q
• Return P’s structure as an approximation to
Q’s structure
Threading
• Given:
– sequence of protein P with unknown structure
– Database of known folds
• Find:
– Most plausible fold for P
– Evaluate quality of such arrangement
• Places the residues of unknown P along the
backbone of a known structure and determines
stability of side chains in that arrangement
Hierarcial Methods
Intermidiate structures are predicted, instead of predicting
tertiary structure of protein from amino acids sequence
• Prediction of backbone structure
– Secondary structure (helix, sheet,coil)
– Beta Turn Prediction
– Super-secondary structure
• Tertiary structure prediction
• Limitation
Accuracy is only 75-80 %
Only three state prediction