Protein_structure_I

Download Report

Transcript Protein_structure_I

PLPTH 890 Introduction to Genomic Bioinformatics
Lecture 20
Protein Structure Analysis - I
Liangjiang (LJ) Wang
[email protected]
April 8, 2005
Outline
• Basic concepts.
• How protein structures are determined?
– X-ray crystallography.
– NMR spectroscopy.
• Protein structure databases (PDB, MMDB).
• Protein structure visualization (RasMol,
Cn3D, etc).
• Protein structure classification (SCOP and
CATH).
Structural Bioinformatics
• A subdiscipline of bioinformatics that
focuses on the representation, storage,
visualization, prediction and evaluation of
structural information.
• References:
– Baxevanis and Ouellette. 2005. Bioinformatics - A
practical guide to the analysis of genes and proteins.
3rd edition. Chapter 9 and part of chapter 8.
– Pevsner. 2003. Bioinformatics and functional
genomics. Chapter 9.
– Bourne and Weissig. 2003. Structural bioinformatics.
Protein Primary Structures
• Amino acid sequence of a
polypeptide chain.
R
• 20 amino acids, each with a
different side chain (R).
• Peptide units are building
blocks of protein structures.
• The angle of rotation around
the N−Cα bond is called phi
(), and the angle around the
Cα−C′ bond from the same
Cα atom is called psi ().
R
(Brandon and
Tooze, 1998)
Protein Secondary Structures
• Local substructures as a result of hydrogen
bond formation between neighboring amino
acids (backbone interactions).
• The amino acid side chains affect secondary
structure formation.
• Types of secondary structures:
–  helix,
–  sheet,
– Loop or random coil.
 Helix
• Most abundant secondary structure.
• 3.6 amino acids per turn, and hydrogen bond
formed between every fourth residue.
• Often found on the surface of proteins.
 Sheet
• Hydrogen bonds formed between adjacent
polypeptide chains.
• The chain directions can be same (parallel
sheet), opposite (anti-parallel), or mixed.
Loop or Coil
• Regions between  helices and  sheets.
• Various lengths and 3-D configurations.
• Often functionally significant (e.g., part of
an active site).
The active site of
open /-barrel
structures is in a
crevice outside
the carboxy ends
of the  strands.
(Brandon and Tooze, 1998)
Protein Tertiary Structure
• The 3-D structure of a protein is assembled from
different secondary structure components.
• Tertiary structure is determined primarily by
hydrophobic interactions between side chains.
• Different classes of protein structures:
All 
All 
Mixed
Hemoglobin (3HHB)
T cell CD8 (1CD8)
Thermolysin (7TLN)
Protein Tertiary Structure (Cont’d)
• Fold: a certain type of 3-D arrangement of
secondary structures.
• Protein structures evolves more slowly
than primary amino acid sequences.
Four-helix bundles
E. coli cytochrome
b562 (256B)
Human growth
hormone (1HUW)
Three-helix bundle
Drosophila engrailed
homeodomain (1ENH)
Protein Quaternary Structure
• Two or more independent tertiary structures
are assembled into a larger protein complex.
• Important for understanding protein-protein
interactions.
Horse spleen ferritin (1IES)
E. coli
ribosome
(1ML5)
Biological Knowledge from Structures
(Bourne, 2004)
X-Ray Crystallography
• Basic steps:
Expression,
purification
Gene
targets
Crystallization
X-ray
diffraction
Structure
solution
Proteins
• Advantages:
– High-resolution structures.
– Large protein complexes or membrane proteins.
• Disadvantages:
– Molecules in a solid-state (crystal) environment.
– Requirement for crystals.
Nuclear Magnetic Resonance (NMR)
• NMR reveals the neighborhood information of
atoms in a molecule, and the information can
be used to construct a 3-D model of the
molecule.
• Advantages:
– No requirement for crystals.
– Proteins in a liquid state (near physiological state).
• Disadvantages:
– Limited by molecule size (up to 30 kD).
– Membrane proteins may not be studied.
– Inherently less precise than X-ray crystallography.
Protein Data Bank (PDB)
• The primary repository for protein structures.
• Established in 1971 (the first bioinformatics
database, set up with 7 protein structures).
• Contains 30,179 structures by March 22, 2005.
• Supports services for structure submission,
search, retrieval, and visualization.
• Search options:
– SearchLite: PDB ID and key word search.
– SearchFields: advanced search.
(PDB can be accessed at http://www.rcsb.org/pdb/)
PDB Content Growth
Last updated: 06-Mar-2005
structures
30,000
5,000
1972
year
2005
Access to Structures through NCBI
• MMDB (Molecular Modeling Database):
– Structures obtained from PDB.
– Data in NCBI’s ASN.1 format.
– Integrated into NCBI’s Entrez system.
• Cn3D (“see in 3D”): NCBI’s 3-D protein
structure viewer.
• VAST (Vector Alignment Search Tool): for
direct comparison of 3-D protein structures.
(NCBI at http://www.ncbi.nlm.nih.gov/)
Ramachandran Plot
 sheet
PSI
Used to assess
the quality of
structures.
Good structures
– tight clustering
patterns.
 helix
Thioredoxin (2TRX)
PHI
(Baxevanis and Ouellette, 2005)
3-D Visualization Tool - RasMol
• An open source software package, and the
most popular tool for viewing 3-D structures.
• RasMol represented a major break-through in
software-driven 3-D structure visualization.
• Structure file formats supported by RasMol:
– PDB file format: outdated but human-readable.
– mmCIF: a new and robust data representation,
but supported by few software tools.
• RasTop: provides a user-friendly graphical
interface to RasMol. RasTop is available at
http://www.geneinfinity.org/rastop/.
Cn3D: NCBI’s Structure Viewer
• Cn3D (“see in 3D”): allows interactive
exploration of 3-D structures, sequences
and alignments.
• Can be used to produce high-quality
molecular images.
• Limitation: only accepts structure files in
NCBI’s ASN.1 format (from MMDB).
• Cn3D is available at
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml.
Other 3-D Visualization Tools
• Chime: a Netscape plug-in for 3-D structure
visualization; based on RasMol source code.
• Protein Explorer (http://www.proteinexplorer.org/):
– A Chime-based software package.
– Particularly user friendly and feature-rich.
• Swiss-Pdb Viewer (Deep View, available at
http://us.expasy.org/spdbv/):
– Probably the most powerful, freely available
molecular modeling and visualization package.
– Supports homology modeling, site-directed
mutagenesis, structure superposition, etc.
Protein Structure Comparison
• Why is structure comparison important?
– To understand structure-function relationship.
– To study the evolution of many key proteins
(structure is more conserved than sequence).
• Comparing 3-D structures is much more
difficult than sequence comparison.
• Protein structure classification:
– SCOP: Structure Classification Of Proteins.
– CATH: Class, Architecture, Topology and
Homology.
• Protein structure alignment: DALI and VAST.
SCOP
• SCOP is based on expert definition of protein
structural similarities, and is manually curated.
• Classification hierarchy:
Class → Fold → Superfamily → Family
• SCOP has 7 major classes: all , all , /, +,
multi-domain proteins ( and ), membrane and
cell surface proteins, and small proteins.
• Domain is the base unit of the SCOP hierarchy,
and proteins with multiple domains may
appear at different places in the hierarchy.
• SCOP at http://scop.mrc-lmb.cam.ac.uk/scop/.
An Example
of the SCOP
Hierarchy
SCOP fold definition:
• Same major
secondary structures.
• Same arrangement.
• Same topology.
(Bourne, 2004)
CATH
• Classification hierarchy:
Class (C) → Architecture (A) → Topology (T)
→ Homologous superfamily (H)
• Based on secondary structure content (for C),
literature (for A), structure connectivity and
general shape (for T, using the SSAP
algorithm), and sequence similarity (for H).
• Multi-domain proteins are partitioned into their
constituent domains before classification.
• CATH at http://www.biochem.ucl.ac.uk/bsm/cath/.
An Example
of the CATH
Hierarchy
CATH classes:
• mainly .
• mainly .
• mixed  and .
• Few secondary
structures.
(Pevsner, 2003)
Summary
• Protein structures are important for
addressing many biological questions.
• Protein Data Bank (PDB) is the primary
repository for protein structures.
• Powerful software tools (e.g., RasMol) are
available for viewing 3-D protein structures.
• SCOP and CATH are two manually curated
databases for structure classification.
• Next: structure alignment and prediction.