Transcript lecture-5b

Part II : Introduction To
Protein Structure
Kong Lesheng
Victor Tong Joo Chuan
National University of
Singapore
Structure Visualization
1.1 Where can we find structure information?
1.2 Program for Visualization
1.3 Different representation of structure
1.4 Different coloring scheme
1.1 Where can we find structure information?

Protein Data Bank(PDB):maintained by the
Research Collaboratory of Structural
Bioinformatics(RCSB)
 http://www.rcsb.org/pdb/
 > 19,000 structures of proteins (19-Aug-2003)
 Also contains of structures of Protein/Nucleic
Acid Complexes, Nucleic Acids, Carbohydrates

Each entry in PDB is identified by a unique 4letter code, such as 1SHA.
Click Here
1SHA
Click Here
Click here to download
the structure file
(1SHA.pdb)
The PDB file

Structure file 1SHA.pdb is text file.

It has two parts:

HEADER

Data – 3D coordinates
PDB Header details

identifies the molecule, any modifications, date of
release of PDB entry
HEADER
COMPND
COMPND
COMPND
SOURCE
AUTHOR



PHOSPHOTRANSFERASE
18-AUG-92
1SHA
V-SRC TYROSINE KINASE TRANSFORMING PROTEIN (PHOSPHOTYROSINE
2 RECOGNITION DOMAIN SH2) (E.C.2.7.1.112) COMPLEX WITH
3 PHOSPHOPEPTIDE A (TYR-VAL-PRO-MET-LEU, PHOSPHORYLATED TYR)
ROUS SARCOMA VIRUS (SCHMIDT-RUPPIN STRAIN A)
G.WAKSMAN,J.KURIYAN
1SHA
1SHA
1SHA
1SHA
1SHA
1SHA
organism, keywords, method
Authors, reference, resolution if X-ray structure
 Smaller the number, better the structure.
Sequence, heterogen group.
2
3
4
5
6
7
The data itself

Coordinates for each heavy (non-hydrogen) atom from the first
residue to the last
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
-----ATOM
TER



1
2
3
4
5
6
878
879
N
CA
C
O
CB
N
ALA
ALA
ALA
ALA
ALA
GLU
A
A
A
A
A
A
2
2
2
2
2
3
40.757
39.528
38.513
37.607
39.882
38.694
22.808
23.448
23.693
24.536
24.777
22.905
12.014
12.431
11.308
11.413
13.140
10.238
OXT LEU B 205
LEU B 205
61.380
28.054
2.998
1.00
1.00
1.00
1.00
1.00
1.00
61.89
59.98
56.31
64.00
56.35
40.05
1.00 62.30
1SHA
1SHA
1SHA
1SHA
1SHA
1SHA
65
66
67
68
69
70
1SHA 942
1SHA 943
Any ligands (starting with HETATM) follow the
biomacromolecule
O atoms of water molecules at the end
Usually, resolution is not high enough to locate H atoms: hence
only heavy atoms shown in data.
1.2 Program for Visualization
RASMOL (authored by Roger A. Sayle ) is one of
the most frequently used software.



Downloadable at http://www.OpenRasMol.org/
Available for most of computer systems
PC/Windows, Macintosh, Unix

Easy to operate and generate nice pictures.
Swiss PDB Viewer (authored by Nicolus Guex, etc)



Downloadable at http://tw.expasy.org/spdbv/
Complex but provides more computational functions.
1.3 Different representation of structure

There is a variety of representation
methods for structure, which are suitable
for different purposes.




Spacefill
Ball and stick
Cartoons
Others
1SHA: Spacefill
Protein:
Phosphotyrosine
Recognition
Domain Sh2
Ligand:
tyrosinephosphorylated
peptides
Ball and stick
Cartoons
1.4 Different coloring schemes

By CPK
Color by atom type.

By structure
Basing on secondary structure

By group
CPK: The assignment of colors to the most
commonly used element types :
1SHA: By CPK
By Group
By Structure
2. Protein structure prediction

2.1 Why protein structure prediction important?

2.2 Secondary structure prediction

2.3 Tertiary structure prediction
2.1 Why protein structure prediction?

Structure determination methods



The rapid growth of protein sequences is far beyond the
capacity of experimental structure determination methods.


Most structures were determined by X-Ray diffraction, and a
large proportion by NMR (Nuclear Magnetic Resonance) and EM
(Electron Microscopy)
The experimental methods are both difficult and time-consuming.
SWISS-PROT (16-Aug-2003) contains 132,675 protein sequence
entries while PDB (19-Aug-2003) has 19,953 protein structures.
Structures are relatively conserved and only adopt a limited
number of folds, so it is possible to model 3-D structures
based on known structures.
2.2 Secondary structure predication

For each residues in a protein structure, three possible
states: α (α-helix), β (β-strand), t (others).
amino acid sequence
Secondary structure sequence

Currently the accuracy of secondary structure methods is
nearly 80% (CASP4, 2000).

Secondary structure prediction can provide useful
information to improve other sequence and structure
analysis methods, such as sequence alignment and 3-D
modeling.
Secondary structure prediction methods

SAM-T02:Kevin Karplus
(http://www.cse.ucsc.edu/research/compbio/HMMapps/T02-query.html)

PSIPRED: David T. Jones
(http://bioinf.cs.ucl.ac.uk/psipred/)

PHD or PredictProtein: Rost and Sander
(http://dodo.cpmc.columbia.edu/predictprotein)
2.3 Tertiary structure prediction

Tertiary structure prediction can be
divided into three groups: comparative
modeling, threading (fold recognition),
ab-initio modeling.

Currently the most accurate and reliable
3-D structure prediction methods is
comparative modeling, which is based on
known homologous structure.
Comparative modeling



Structure is conserved. For homologous
sequence, the structures are also likely to be
similar.
If the sequence identity > 40% and a structure
is available, using comparative modeling is
reasonable.
If the sequence identity >70%, very high quality
models can be obtained.
Procedure for comparative modeling
Template selection
 Target-template alignment
 Model building
 Evaluation

Useful server and Program

Program

MODELLER: Andrej Sali
(http://www.salilab.org/modeller/modeller.html )

Server

SWISS-MODEL server: Peitsch and Geux
(http://www.expasy.ch/swissmod/SWISS-MODEL.html)
Reading materials

If you want to know more about protein
structure, you can refer to the following website
and book.



http://www.paccd.cc.ca.us/instadmn/physcidv/chem_d
p/chemweb/protein/intro.htm
Introduction to Protein Structure. Second Edition Carl
Branden and John Tooze © 1999.Garland Publishing,
Inc.
For protein structure prediction


http://www.bmm.icnet.uk/people/rob/CCP11BBS/
http://www.salilab.org/modeller/modeller.html
Summary

Introduction to protein structure


Structure visualization


Basics, levels of protein structure, structural
classification.
Different representation and coloring scheme.
Protein structure prediction


Secondary structure prediction
Tertiary structure prediction
End of Lecture