How to classify proteins on basis of structure?

Transcript How to classify proteins on basis of structure?

Applied Bioinformatics
Week 12
Bioinformatics &
Functional Proteomics
• How to classify proteins into functional classes?
• How to compare one proteome with another?
• How to include functional/activity/pathway
information in databases?
• How to extract functional motifs from sequence
data?
• How to predict phenotype from proteotype?
Bioinformatics &
Expressional Proteomics
• How to correlate changes in protein expression with
disease?
• How to distinguish important from unimportant
changes in expression?
• How to compare, archive, retrieve gel data?
• How to rapidly, accurately identify proteins from
MS and 2D gel data?
• How to include expression info in databases?
Bioinformatics &
Structural Proteomics
•
•
•
•
•
How to predict 3D structure from 1D sequence?
How to determine function from structure?
How to classify proteins on basis of structure?
How to recognize 3D motifs and patterns?
How to use bioinformatics databases to help in 3D
structure determination?
• How to predict which proteins will express well or
produce stable, folded molecules?
Protein Folding Problem
“Predict a three-dimensional structure of a
protein from its amino acid sequence.”
“How does a protein fold into the structure?”
This question has not been solved
for more than half a century.
Proteins Can Fold into 3D
Structures Spontaneously
The three-dimensional structure of a protein is
self-organized in solution.
The structure corresponds to the state with the lowest free
energy of the protein-solvent system. (Anfinsen’s dogma)
If we can calculate the energy of the system precisely, it is
possible to predict the structure of the protein!
Levinthal Paradox
We assume that there are three conformations for each amino acid (ex.
α-helix, β-sheet and random coil). If a protein is made up of 100 amino
acid residues, a total number of conformations is
3100 = 515377520732011331036461129765621272702107522001
≒ 5 x 1047.
If 100 psec (10-10 sec) were required to convert from a conformation to
another one, a random search of all conformations would require
5 x 1047 x 10-10 sec ≒ 1.6 x 1030 years.
However, folding of proteins takes place in msec to sec order.
Therefore, proteins fold not via a random search but a more sophisticated
search process.
We want to watch the folding process of a protein using molecular
simulation techniques.
Why is the “Protein Folding” so
Important?
• Proteins play important roles in living organisms.
• Some proteins are deeply related with diseases. And structural
information of a protein is necessary to explain and predict its gene
function as well as to design molecules that bind to the protein in
drug design.
• Today, whole genome sequences (the complete set of genes) of
various organisms have been deciphered and we realize that functions
of many genes are unknown and some are related with diseases.
• Therefore, understanding of protein folding helps us to investigate the
functions of these genes and to design useful drugs against the
diseases efficiently.
• In addition to that, the understanding opens the door to designing of
proteins having novel functions as new nano machines.
Forces Involved in the Protein
Folding
•
•
•
•
Electrostatic interactions
van der Waals interactions
Hydrogen bonds
Hydrophobic interactions
The Energy Function
• Calculate energies for each particle
• Since long range interactions important for
each pair of particles the pair-wise
interactions should be calculated
 Aij Bij qi q j 
Vn
E Pair   K r (r  req )   K (   eq )   1  cos( n   )    12  6 

2

R
R
R
bonds
angles
torsions
i j 
ij 
ij
 ij

2
2
System for Folding Simulations
Without water molecules
With water molecules
# of atoms: 304
# of atoms: 304 + 7,377 =
7,681
Much Faster, Much Larger!
• Special-purpose computer
– Calculation of non-bonded interactions is performed using the
special chip that is developed only for this purpose.
– For example;
• MDM (Molecular Dynamics Machine) or MD-Grape: RIKEN
• MD Engine: Taisho Pharmaceutical Co., and Fuji Xerox Co.
• Parallelization
– A single job is divided into several smaller ones and they are
calculated on multi CPUs simultaneously.
– Today, almost all MD programs for biomolecular simulations (ex.
AMBER, CHARMm, GROMOS, NAMD, MARBLE, etc) can
run on parallel computers.
• Fold@Home
Homology Modeling
• Template Selection
and Fold Assignment
• Target – Template
Alignment
• Model Building
– Loop Modeling
– Sidechain Modeling
• Model Evaluation
Fold Assignment and Template
Selection
• Identify all protein structures with sequences
related to the target, then select templates
• 3 main classes of comparison methods
– Compare the target sequence with each database
sequence independently, pair-wise sequence – sequence
comparison, BLAST and FASTA
– Multiple sequence comparisons to improve sensitivity,
PSI-BLAST
– Threading or 3-D template matching methods
Target – Template Alignment
• Most important step in Homology Modeling
• A specialized method should be used for
alignment
– Over 40% identity the alignment is likely to be correct.
– Regions of low local sequence similarity become
common when overall sequence identity is under 40%.
(Saqi et al., Protein Eng. 1999)
– The alignment becomes difficult below 30% sequence
identity. (Rost, Protein Eng. 1999)
Model Building
• Construct a 3-D model of the target sequence
based on its alignment on template structures
• Three different model building approaches
– Modeling by rigid body assembly
– Modeling by segment matching
– Modeling by satisfaction of spatial restraints
• Accuracies of these models are similar
• Template selection and alignment have larger
impact on the model
Screenshots from the Homology
Modeling Server Swiss-Model
• Construct a framework using known
protein structures
• Generate the location of the target
amino acids on the framework
• If loop regions not determined,
additional database search or short
simulations
Swiss-MOD Web Server
Procedure of the MODELLER
program
• After obtaining restraints run a
geometry optimization or realspace optimization to satisfy them
Errors in Homology Models
a. Errors in side chain packing
b. Distortions and shifts in correctly aligned regions
c. Errors in regions without a template
d. Errors due to misalignment
e. Incorrect templates
Model Building Programs
COMPOSER
P
www-cryst.bioc.cam.ac.uk
CONGEN
P
www.congenomics.com/congen/congen.html
CPH models
S
www.cbs.dtu.dk/services/CPHmodels/
DRAGON
P
www.nimr.mrc.ac.uk/~mathbio/a-aszodi/dragon.html
ICM
P
www.molsoft.com
InsightII
P
www.msi.com
MODELLER
P
guitar.rockefeller.edu/modeller/modeller.html
LOOK
P
www.mag.com
QUANTA
P
www.msi.com
SYBYL
P
www.tripos.com
SCWRL
P
www.cmpharm.ucsf.edu/~bower/scrwl/scrwl.html
SWISS-MOD S
www.expasy.ch/swissmod
WHAT IF
www.sander.embl-heidelberg.de/whatif/
P
Applications
End Theory
• Mind mapping
• 10 min break
Practice
3D Structure Prediction?
• Get a protein sequence
• Go to: http://bioinf.cs.ucl.ac.uk/psipred
– Use threading
• Got to: http://www.rcsb.org/pdb
– Find known structure
• Folding@home
– Ab inito prediction
Crystal structure of a monomeric retroviral protease solved by protein folding game players.
• FoldIt (http://fold.it/portal/)
Increased Diels-Alderase activity through backbone remodeling guided by Foldit players.

How to classify proteins on basis of structure?

Transcript How to classify proteins on basis of structure?

Directory