PowerPoint プレゼンテーション

Download Report

Transcript PowerPoint プレゼンテーション

Approaches to Protein Structure
Prediction & Their Applications
Dr. S. Selvaraj
Department of Bioinformatics
Bharathidasan University
Tiruchirappalli 620 024
2015/7/16
1
Introduction to proteins
one of the major biomacromolecules
structural proteins (viral coat proteins, horny outer
layer of human and animal skin)
carrying out a variety of biological functions:
enzymatic catalysis, transportation, immune response,
hormones, storage, control of genetic transcription
made up of 20 different kinds of amino acids
linked together in a long string
each string fold into a compact, unique threedimensional structure – to perform a specific function
2015/7/16
2
Copyright 1996-98 © Dale Carnegie & Associates, Inc.
Levels of Description of Structural
Complexity
• Primary Structure (AA sequence)
• Secondary Structure
– Spatial arrangement of a polypeptide’s backbone atoms without
regard to side-chain conformations - hydrogen bonding pattern
of the main chain
• , , coil, turns
– Super-Secondary Structure
• -helix hairpin, -hairpin, -- unit
• Tertiary Structure
– 3-D structure of an entire polypeptide (assembly and interaction
of the helices and sheets
• Quaternary Structure
– Spatial arrangement of subunits (2 or more polypeptide
2015/7/16
chains)
3
Different structural classes
4MBN
all-
Dominated
by -helices
4LYZ
+
Helices and
strands tend to
segregate
3CNA
1TIM
all-
/
Dominated by
-strands
Helices and
strands mix
each other
2015/7/16
4
Protein Folding
• Proteins need to maintain their tertiary structure to perform
their specific function. This structure is stabilized by many
non-covalent interactions such as electrostatic, hydrogenbonding, hydrophobic interaction etc.
• Chemical agents such as urea (8M) or guanidinium chloride
can unfold (denature) the proteins
• C.B. Anfinsen in the late 50’s discovered that some proteins
such as ribonuclease A and staphylococcal nuclease could
be reversibly denatured i.e. they spontaneously refold to
their native structures after denaturation
• This observation led Anfinsen to conclude that the amino
acid sequences contain the necessary information to encode
2015/7/16
the tertiary structure of proteins
5
Techniques to study protein
structures / Databases
•
Primary structure – protein chemistry, cDNA sequencing
•
Secondary structure – ORD/CD
•
Tertiary (3-D) structure – x-ray crystallography, NMR
•
Information obtained from sequencing studies stored in databases such as
Swiss-prot, PIR, NCBI etc
•
www.expasy.ch ; select swiss-prot
•
www-nbrf.georgetown.edu/pir
•
www.ncbi.nlm.nih.gov/Entrez
2015/7/16
6
3-D Structure Determination
• Two methods for revealing positions of atoms in 3D:
– X-Ray Crystallography
• X-ray diffraction pattern + mathematical construction
• Good resolution of diffraction needed
• First, protein crystal needed
– Nuclear Magnetic Resonance
• Small proteins only (< 250 residues)
• Inter-proton distances + geometric
constraints
2015/7/16
7
DATABASE FOR PROTEIN
STRUCTURES
• Archive, annotate and distribute sets of atomic
coordinates
• The best-established data base for biological
macromolecular structures is the Protein Data
Bank (PDB)
• Contains structures of proteins, nucleic acids
and a few carbohydrates
• web-site: www.rcsb.org/pdb
2015/7/16
8
Why so much of interest in
bioinformatics these days ?
• Human genome project – seeks to map every gene and
spell out letter by letter the thread of life
• complete DNA sequencing of more and more
organisms will answer many important questions
• how organisms evolved
• how to treat a wide range of medical disorders
2015/7/16
9
The key issue ………
• is the need to annotate the vast amount of
DNA sequence to give it meaning. This will
come from the understanding of proteins
encoded by the genes.
2015/7/16
10
Sequence-Structure Gap
• Swissprot: Release 46.1 as of 15-Feb-
2005, contains 168297entries
• PDB has about 26864 known structures of
proteins and viruses (8 February 2005)
• only about 15-20% of structures have been
determined for known protein sequences
• Can we shorten this gap using prediction
techniques?
2015/7/16
11
The problem
• Number of amino acid sequences available in
sequence database far exceeds the number of
structures known
• Structure determination a time-consuming task
• Hence efforts have been developed over the years to
i) predict structure from sequence information
ii) build models by homology
2015/7/16
12
Protein Structure Prediction
Approaches
• Secondary Structure Prediction
• Homology Modeling
• Threading
• Ab initio prediction
2015/7/16
13
Secondary Structure
Prediction Task
• Given an amino acid sequence
• Predict a secondary structure state (, , coil) for
each residue in the sequence
• Common approach in the past:
– Make prediction for a given residue by considering a
window of n (13 – 21) neighboring residues
– Learn model that performs mapping from window of
residues to secondary structure state
2015/7/16
14
Secondary Structure
Prediction Task
• Chou and Fasman (1978)
– certain residues not only have high propensity
for a particular secondary structure, but they
tend to disrupt or break other secondary
structures
• Based on analysis of known structures:
– table giving propensities of amino acid
residues for Helical and Sheet conformation
• Possible to predict secondary structure
from sequence information using such
empirically determined information
2015/7/16
15
Secondary Structure
Prediction
• Recent methods utilize evolutionary information
(e.g., PHD system – Rost & Sander, 1993)
• Main idea is to consider related sequences when
making prediction
• Why? Conservation of residues better in secondary
structural regions such as helices and strands
2015/7/16
16
Applications of secondary
structure prediction
• Identification of remote homologs
• Using predicted structure to discriminate between related
and unrelated proteins in the range of 10-30% sequence
identity (e.g. 1hbr-a and 2vhb-b vs 1ai7-a and 2vhb-a
16%id)
• Fold recognition
• Structural clustering
2015/7/16
17
Homology Modeling
• Simplest, reliable approach
• Basis: proteins with similar sequences tend
to fold into similar structures
• Has been observed that even proteins with
25% sequence identity fold into similar
structures
• Does not work for remote homologs (< 25%
pair wise identity)
2015/7/16
18
Homology Modeling
• Given:
– A query sequence Q
– A database of known protein structures
• Find protein P such that P has high
sequence similarity to Q
• Return P’s structure as an approximation to
Q’s structure
2015/7/16
19
Steps in homology modeling
• Alignment and template selection
• Generate multiple alignments
• Construct initial models
• Constructing variable side chains and main
chains (loops, insertions and deletions)
• Selecting the most native-like conformations
2015/7/16
20
Fold recognition
(sequences with no sequence identity (<= 30%)
to sequences of known structure)
•Given the sequence, and a set of folds
observed in PDB, see if any of the sequences
could adopt one the known folds.
•Takes advantage of knowledge of existing
structures, and principles by which they are
stabilized (favorable interactions).
2015/7/16
21
New sequence:
•MLDTNMKTQLKAYLEKLTKPVELIATLDDSAKS
AEIKELL…
•Library of known folds:
2015/7/16
22
2015/7/16
23
Ab Initio or De novo
Structure Prediction
• Assumption - that the native state of a protein is at the global
free energy minimum
• large-scale search of conformational space for protein tertiary
structures that are particularly low in free energy for the given
amino acid sequence.
2015/7/16
24
Rosetta
• A Particularly successful method - based on a
picture of protein folding in which short
segments of the protein chain flicker between
different local structures consistent with their
local sequence, and folding to the native state
occurs when these local segments are oriented
such that low free energy interactions are made
throughout the protein.
2015/7/16
25
2015/7/16
26
2015/7/16
27
Web-sites for secondary structure
prediction
• PHD –
http://cubic.bioc.columbia.edu/predictpro
tein
• PSI-PRED
http://insulin.brunel.ac.uk/psipred
• JPRED – http://jura.ebi.ac.uk:8888/
• NPSA – http://npsa-pbil.ibcp.fr
2015/7/16
28
Web-sites for homology modeling
• RAMP
• http://software.compbio.washington.edu/ramp
• SWISS-MODEL
www.expasy.ch/swissmod/SWISSMODEL.html
2015/7/16
29
Web-sites for fold recognition
• 3D-PSSM Protein Fold Recognition
(Threading) Server
• www.sbg.bio.ic.ac.uk/~3dpssm
• UCLA-DOE Fold Server
• fold.doe-mbi.ucla.edu/
2015/7/16
30
Critical Assessment of
Structure Prediction (CASP)
• Judging techniques for predicting structures
requires blind tests
• Initiated by J. Moult
• Crystallographers and NMR Spectroscopists in the
process of determining a protein structure are invited
to publish the amino acid sequences several months
before the expected date of completion and to keep
the results secret
• Predictors submit models • Predictions and experimental results are compared
2015/7/16
31
Applications of homology models in
drug discovery
• Steps in Drug Discovery
• Target Identification > Target Validation >
Lead Identification > Lead Optimization >
Development
• Homology Models are useful in all of the
above steps of drug discovery
2015/7/16
32
Target Identification & Validation
• Complementary nature of drug molecules
and their corresponding target proteins
help to distinguish good target proteins
from others
• Assessment of target ‘drugability’ by
analysis of ligand binding sites
2015/7/16
33
Lead Identification and Optimization
• High-throughput docking
• Design of selective or broad spectrum
Compounds
• Prediction of binding characteristics of
lead compounds
• Prediction of metabolism, toxicity and
drug-drug interactions
2015/7/16
34
Summary and Conclusions
• Brief overview of protein structure
• Methods for structure prediction
• Their accuracy and application
• Homology modelling - Threading - Ab Initio
• Application of homology modelling in the
drug discovery process
2015/7/16
35