Powerpoint slides - School of Engineering and Applied Science

Download Report

Transcript Powerpoint slides - School of Engineering and Applied Science

CS 177
Proteins I (Structure-function relationships)
Review of protein structures
Computational modeling
Three-dimensional structural analysis in laboratory
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Structure-function relationships
Recommended readings
(very) basic
A science primer: Molecular modeling
http://www.ncbi.nlm.nih.gov/About/primer/molecularmod.html
Brown, S.M. (2000) Bioinformatics, Eaton Publishing, pp. 99-119
Veeramalai, M. & Gilbert, D.: Bioinformatics Tools for protein structure
visualisation and analysis
http://www.brc.dcs.gla.ac.uk/~mallika/Publications/scwbiw-article.htm
Review of protein
structures
Mount, D.W. (2001) Bioinformatics,
Cold Spring Harbor Lab Press, pp.382-478
advanced
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Review of protein structure
Primary structure
Proteins are chains of amino acids joined by peptide bonds
Polypeptide chain
The structure of two amid acids
The N-C-C sequence is repeated throughout the protein, forming the backbone
Review of protein
structures
The bonds on each side of the C atom are free to rotate within spatial constrains,
the angles of these bonds determine the conformation of the protein backbone
Computational
Modeling
The R side chains also play an important structural role
Three-dimensional
structural analysis
in laboratory
Review of protein structure
Secondary structure:
Interactions that occur between the C=O and N-H groups on amino acids
Much of the protein core comprises  helices and  sheets, folded into a threedimensional configuration:
-
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
regular patterns of H bonds are formed between neighboring amino acids
the amino acids have similar angles
the formation of these structures neutralizes the polar groups on each amino acid
the secondary structures are tightly packed in a hydrophobic environment
Each R side group has a limited volume to occupy and a limited number of interactions
with other R side groups
 helix
 sheet
Secondary structure
Other Secondary structure elements
(no standardized classification)
- random coil
- loop
- others (e.g. 310 helix, -hairpin, paperclip)
Super-secondary structure
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
- In addition to secondary structure elements that apply to all proteins
(e.g. helix, sheet) there are some simple structural motifs in some proteins
- These super-secondary structures (e.g. transmembrane domains, coiled
coils, helix-turn-helix, signal peptides) can give important hints about
protein function
Review of protein structure
Q: If we have all the Psi and Phi angles in a protein, do we then have enough
information to describe the 3-D structure?
A: No, because the detailed packing of the amino acid side chains is not
revealed from this information. However, the Psi and Phi angles do
determine the entire secondary structure of a protein
Review of protein
structures
Computational
Modeling
Tertiary structure
Three-dimensional
structural analysis
in laboratory
Tertiary structure
The tertiary structure describes the organization in three dimensions
of all the atoms in the polypeptide
The tertiary structure is determined by a combination of different types of bonding:
- Ionic interactions between oppositely charged residues can pull them together,
- Hydrogen Bonds - Hydrogens are partially positively charged, are attracted to partially
negative oxygens. (weaker)
- van Der Waals - hydrophobic residues become attractive to each other when forced
together by exclusion from the aqueous surroundings. (weakest)
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Many of these bonds are very week and easy to break, but hundreds or thousands working
together give the protein structure great stability
If a protein consists of only one polypeptide chain, this level then describes the
complete structure
Tertiary structure
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Tertiary structure
Proteins can be divided into two general classes based on their tertiary structure:
- Fibrous proteins have elongated structure with the polypeptide chains arranged
in long strands. This class of proteins serves as major structural component of cells
Examples: silk, keratin, collagen
- Globular proteins have more
compact, often irregular structures.
This class of proteins includes most
enzymes and most proteins involved
in gene expression and regulation
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Quaternary structure
The quaternary structure defines the conformation assumed by a multimeric protein.
The individual polypeptide chains that make up a multimeric protein are often referred to
as protein subunits. Subunits are joined by ionic, H and hydrophobic interactions
Example:
Haemoglobin
(4 subunits)
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Summary protein structure
Primary structure:
Sequence of amino acids
Secondary structure:
Interactions that occur between
the C=O and N-H groups on amino acids
Tertiary structure:
Organization in three dimensions of all the atoms in the
polypeptide
Review of protein
structures
Quaternary structure:
Conformation assumed by a multimeric protein
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
The four levels of protein structure are hierarchical:
each level of the build process is dependent upon the one below it
Summary protein structure
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Structure displays
Common displays are (among others) cartoon, spacefill, and backbone
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
cartoon
spacefill
backbone
Need for analyses of protein structures
A protein performs metabolic, structural, or regulatory functions in a cell.
Cellular biochemistry works based on interactions between 3-D molecular
structures
The 3-D structure of a protein determines its function
Therefore, the relationship of sequence to function is primarily concerned with
understanding the 3-D folding of proteins and inferring protein functions from these
3-D structures
(e.g. binding sites, catalytic activities, interactions with other molecules)
The study of protein structure is not only of fundamental scientific interest in
terms of understanding biochemical processes, but also produces very
valuable practical benefits
Medicine
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
The understanding of enzyme function allows the design of new and improved drugs
Agriculture
Therapeutic proteins and drugs for veterinary purposes and for treatment of plant diseases
Industry
Protein engineering has potential for the synthesis of enzymes to carry out various industrial
processes on a mass scale
Sources of protein structure information
3-D macromolecular structures stored in databases
The most important database: the Protein Data Bank (PDB)
The PDB is maintained by the Research Collaboratory for Structural Bioinformatics
(RCSB) and can be accessed at three different sites (plus a number of mirror sites
outside the USA):
- http://rcsb.rutgers.edu/pdb (Rutgers University)
- http://www.rcsb.org/pdb/ (San Diego Supercomputer Center)
- http://tcsb.nist.gov/pdb/ (National Institute for Standards and Technology)
It is the very first “bioinformatics” database ever build
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Sources of protein structure information
Experimental structure determination
In practice, most biomolecular structures (>99% of structures in PDB) are
determined using three techniques:
- X-ray crystallography (low to very high resolution)
Problem: requires crystals; difficult to crystallize proteins by maintaining their
native conformation; not all protein can be crystallized;
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
X-ray crystallography
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Sources of protein structure information
Experimental structure determination
In practice, most biomolecular structures (>99% of structures in PDB) are
determined using three techniques:
- X-ray crystallography (low to very high resolution)
Problem: requires crystals; difficult to crystallize proteins by maintaining their
native conformation; not all protein can be crystallized;
- Nuclear magnetic resonance (NMR) spectroscopy of proteins in solution
(medium to high resolution)
Problem: Works only with small and medium size proteins (~50% of proteins
cannot be studied with this method); requires high solubility
- Electron microscopy and crystallography (low to medium resolution)
Problem: (still) relatively low resolution
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Experimental methods are still very time consuming and expensive;
in most cases the experimental data will contain errors and/or are
incomplete. Thus the initial model needs to be refined and rebuild
Sources of protein structure information
Computational Modeling
Researches have been working for decades to develop procedures for
predicting protein structure that are not so time consuming and not hindered
by size and solubility constrains.
As protein sequences are encoded in DNA, in principle, it should therefore be
possible to translate a gene sequence into an amino acid sequence, and to
predict the three-dimensional structure of the resulting chain from this amino
acid sequence
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Computational modeling
How to predict the protein structure?
Ab initio prediction of protein structure from sequence: not yet.
Problem: the information contained in protein structures lies essentially in the
conformational torsion angles. Even if we only assume that every amino-acid residue
has three such torsion angles, and that each of these three can only assume one
of three "ideal" values (e.g., 60, 180 and -60 degrees), this still leaves us with 27
possible conformations per residue.
For a typical 200-amino acid protein, this
would give 27200 (roughly 1.87 x 10286)
possible conformations!
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Q: Can’t we just generate all these
conformations, calculate their energy
and see which conformation has the
lowest energy?
If we were able to evaluate 109 conformations per second, this would still keep us
busy 4 x 10259 times the current age of the universe
There are optimized ab initio prediction algorithms available as well as fold recognition
algorithms that use threading (compares protein folds with know fold structures from
databases), but the results are still very poor
Computational modeling
Solution: homology modeling
Homology (comparative) modeling attempts to predict structure on the strength
of a protein’s sequence similarity to another protein of known structure
Basic idea: a significant alignment of the query sequence with a target sequence from
PDB is evidence that the query sequence has a similar 3-D structure (current threshold
~ 40% sequence identity). Then multiple sequence alignment and pattern analysis can
be used to predict the structure of the protein
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Computational modeling
Flow chart for protein structure prediction (from Mount, 2001)
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Computational modeling
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Protein sequence
- partial or full sequences; predicted through gene finding
Computational modeling
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Database similarity search
- sequence is used as a query in a database similarity search against proteins in PDB
Computational modeling
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Does the sequence align with a protein of known structure?
- Yes: if the database similarity search reveals a significant alignment between the query
sequence and a PDB target sequence, the alignment can be used to position the
amino acids of the query sequence in the same approximate 3-D structure
-
No: proceed to protein family analysis
Computational modeling
Protein family analysis/relationship to known structure
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
- Family (structural context): structures that have a significant level of structural similarity
but not necessarily significant sequence similarity
- the goal is to exploit these structure sequence relationships; two questions: 1) is the new
protein a member of a family, 2) does the family have a predicted structural fold?
- analyze sequence for family specific profiles and patterns. Available databases: 3D-Ali,
3D-PSSM, BLOCKS, eMOTIF, INTERPRO, Pfam …)
- if the family analysis reveals that the query protein is a member of a family with a
predicted structural fold, multiple alignment can be used for structural modeling
Computational modeling
Protein family analysis/relationship to known structure
- if the family analysis is unsuccessful, proceed to structural analyses
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Computational modeling
Structural analysis
- several different types of analyses to infer structural information
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
- presence of small amino acid motifs in a protein can be indicator of a biochemical
function associated with a particular structure. Motifs are available from the Prosite catalog
- spacing and arrangement of amino acids (e.g. hydrophobic amino acids) provide
important structural clues that can be used for modeling
- certain amino acid combinations can occur in certain types of secondary structure
- These structural analyses can provide clues as to the presence of active sites and regions
of secondary structure. These information can help to identify a new protein as a member
of a known structural class
Computational modeling
3-D structural analysis in lab
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
- proteins that fail to show any relationship to proteins of known structure are candidates for
structural analyses (X-ray crystallography, NMR). There are about 600 known fold families
and new structures are frequently found to have already known structural fold.
Accordingly, protein families with no relatives of known structure may represent a novel fold
Computational modeling: summary
Partial or full sequences
predicted through gene
finding
Similarity search
against proteins
in PDB
Find structures that have a significant
level of structural similarity (but not
necessarily significant sequence similarity)
Alignment can be used to position the
amino acids of the query sequence in
the same approximate 3-D structure
If member of a family with a
predicted structural fold,
multiple alignment can be used
for structural modeling
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Structural analyses in the lab
(X-ray crystallography, NMR)
Infer structural information (e.g. presence of small
amino acid motifs; spacing and arrangement of
amino acids; certain typical amino acid combinations
associated with certain types of secondary structure)
can provide clues as to the presence of active sites and
regions of secondary structure
Computational modeling: summary
How to predict the protein structure?
Ab initio prediction of protein structure from sequence
Homology (comparative) modeling attempts to predict structure on the strength
of a protein’s sequence similarity to another protein of known structure
Experimental structure determination
Ab initio prediction
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
Homology modeling
Experimental structure determination
Computational modeling
Viewing protein structures
A number of molecular viewers are freely available and run on most computer platforms
and operating systems
Examples:
Cn3D 4.1 (stand-alone)
Rasmol (stand-alone)
Chime (Web browser based on Rasmol)
Swiss 3D viewer Spdbv (stand-alone)
Review of protein
structures
Computational
Modeling
Three-dimensional
structural analysis
in laboratory
All these viewers can use the PDB identification code or the structural file from PDB