Transcript BMI 731

BMI 731
Protein Structures and Related
Database Searches
Biology … Protein…
DNA
(Genotype)
Protein
A single amino acid substitution in a protein causes sicklecell disease…
What the.....!?
Why do we care about structure?
• In the factory of living cells, proteins are the
•
•
•
workers, performing a variety of biological tasks.
Each protein has a particular 3-D structure that
determines its function.
Protein structure is more conserved than protein
sequence, and more closely related to function.
Sequence -> Structure -> Function
Structural Information
• Protein Data Bank: maintained by the Research
•
Collaboratory of Structural Bioinformatics (RCSB)
– http://www.rcsb.org/pdb/
– > 15,000 structures of proteins
– Also contains of structures of Protein/Nucleic
Acid Complexes, Nucleic Acids, Carbohydrates
Most structures are determined by X-ray
crystallography. Other methods are NMR and
electron microscopy (EM). Some structures are
also theoretically predicted.
PDB Content Growth
Protein?
• Protein are linear heteropolymers: one or
more polypeptide chains
• Building blocks: 20(?) amino acid residues.
• Range from a few 10s-1000s
• Three-dimensional shapes (“fold”)
adopted vary enormously.
Structure…
Structure cont…
Basic measurements on
structures…
• Bond lengths
• Bond angles
• Dihedral (torsion) angles
Bond Length
• The distance between bonded atoms is
constant
• Depends on the “type” of the bond
• Varies from 1.0 Å(C-H) to 1.5 Å(C-C)
• BOND LENGTH IS A FUNCTION OF THE
POSITION OF TWO ATOMS.
Bond Angle…
• All bond angles are determined by
chemical makeup of the atoms involved,
and are constant.
• Depends on the type of atom, and number
of electrons available for bonding.
• Ranges from 100° to 180°
• BOND ANGLES IS A FUNCTION OF THE
POSITION OF THREE ATOMS.
Dihedral Angles
• These are usually variable
• Range from 0-360° in molecules
• Most famous are , ,  and 
• DIHEDRAL ANGLES ARE A FUNCTION OF
THE POSITION OF FOUR ATOMS.
http://www.colby.edu/chemistry/OChem/DEMOS/dihedral.html
Dihedral Angles
A torsion angles is defined by
4 atoms, A, B, C and D.
When atoms A, B, C and D are
mainchain atoms (ie. the
carboxylic carbon, C1; the alpha
carbon, C2 or C-alpha; and the
amide group nitrogen, N), There
are THREE repeating torsion
angles along the backbone chain
called phi, psi and omega.
http://bmbiris.bmb.uga.edu/wampler/tutorial/prot2.html
Ramachandran / phi-psi plot
http://www.biochem.ucl.ac.uk/~roman/procheck/manual/examples/plot_01.html
Levels of Structure…
1
2
3
4
-
Primary structure
Secondary structure
Tertiary structure
Quaternary structure
Primary structure…
• This is simply the amino acid sequences of
polypeptide chains
Secondary structure
• Local organization of protein backbone: -
helix, -strand (which assemble into -sheet),
turn and interconnecting loop.
The -helix
• One of the most
closely packed
arrangement of
residues.
• Turn: 3.6 residues
• Pitch: 5.4 Å/turn
The -sheet
• Backbone almost fully extended, loosely
packed arrangement of residues.
Ramachandran/phi-psi plot
Tertiary structure…
• Packing the secondary
structure elements into a
compact spatial unit
• “Fold” or domain– this is
the level to which
structure prediction is
currently possible.
Quaternary structure…
• Assembly of homo
or heteromeric
protein chains.
• Usually the
functional unit of a
protein, especially
for enzymes
Classification…
• Class
• Fold/Architecture
• Superfamily
Databases of structural classification
• SCOP
–
–
–
–
–
Murzin AG, Brenner SE, Hubbard T, Chothia C
Structural classification of protein structures
Manual assembly by inspection
All nodes are annotated (e.g.. All-, /)
Structural similarity search using 3dSearch(Singh and Brutlag)
• CATH
–
–
–
–
–
Dr. C.A. Orengo, Dr. A.D. Michie, etc
Class-Architecture-Topology-Homologous superfamily
Manual classification at Architecture level
Automated topology classification using the SSAP algorithms
No structural similarity search
Databases of structural classification
• FSSP
–
–
–
–
L.L. Holm and C. Sander
Fully automated using the DALI algorithms (Holm and Sander)
No internal node annotations
Structural similarity search using DALI
• Pclass
–
–
–
–
–
A. Singh, X. Liu, J. Chang, D. Brutlag
Fully automated using the LOCK and 3dSearch algorithms
All internal nodes automatically annotated with common terms
JAVA based classification browser
Structural similarity search using 3dSearch
Why Structure Alignment?
• For homologous proteins (similar ancestry), this
provides the “gold standard” for sequence
alignment—elucidates the common ancestry of
the proteins.
• For nonhomologous proteins, allows us to
identify common substructures of interest.
• Allows us to classify proteins into clusters, based
on structural similarity.
How do we recognize structural similarities?
• By eye (Alexei Murzin)
SCOP--Gold standard for structure classification!
• Algorithmically
Growth of PDB demands automated techniques
for classification and fold detection
Algorithms for Structure Alignment
• Distance based methods
– DALI (Holm and Sander): Aligning scalar distance plots
– STRUCTAL (Gerstein and Levitt): Dynamic programming using
pairwise inter-molecular distances
– SSAP (Orengo and Taylor): Dynamic programming using intramolecular vector distance
• Vector based methods
– VAST (Bryant): Graph theory based secondary structure
alignment
– 3dSearch (Singh and Brutlag): Fast secondary structure index
lookup
• Both vector and distance based
– LOCK (Singh and Brutlag): Hierarchically uses both secondary
structures vectors and atomic distances
DALI
• Based on aligning 2-D intra-molecular
distance matrices
• Computes the best subset of
corresponding residues from the two
proteins such that similarity between the
2-D distance matrices is maximized.
• Searches through all possible alignments
of residues using Monte-Carlo algorithms
VAST-Vector Alignment Search Tool
• Aligns only secondary structure elements (SSE)
• Represents each SSE as a vector
• Finds all possible pairs of vectors from the two
•
•
structures that are similar
Uses a graph theory algorithms to find maximal
subset of similar vectors
Overall alignment scores is based on the number
of similar pairs of vectors between the two
structures.
LOCK
• Define local secondary structures
• Find an initial superposition by using DP to
align secondary structure vectors.
• Use greedy algorithms to find nearest
neighbors and minimize RMSD between
the C- atoms from query and target.
• Find the core of aligned C- atoms and
minimize RMSD between them.
Where is the data?
•
GenBank
DB are equivalent
NCBI Reference Sequences
RefSeq
GenPept Database
http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html
http://inn.weizmann.ac.il/databanks/genpept.html
•
http://www.expasy.org/sprot/
STATS: http://www.expasy.org/sprot/relnotes/relstat.html
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
http://www.rcsb.org/pdb/
PIR International Protein Sequence Database
http://pir.georgetown.edu/pirwww/search/textpsd.shtml
MMDB by NCBI…
A Flow chart for structure prediction
Protein
sequence
Database
similarity
search
Predicted
three
dimensional
structure
Does
sequence
align with
protein of
known 3D
structure?
3D
comparative
modeling
3D analysis
in
laboratory
no
Protein
family,
domain,
cluster
analysis
no
Relationship to
known
structure?
yes
Is there a
predicted
structure?
no
Structural
analysis
Images..
• 3-dimensional model
showing the electron
density in a molecule
of
buckminsterfullerene,
an allotrope of carbon
(C60).
Images…
Computer generated
image, showing 3-D
structure of
uteroglobin, a protein
secreted in the uterus
of mammals.
Images… (NMR… EPR…)
A computer image of
the charge density
over the molecule
chymosin, an
important enzyme in
cheese making.
Overall negative
charge is depicted as
red, overall positive
charge is shown in
blue.
X-ray crystallography.
Thanks
Thanks to Selnur Erdal for preparing initial versions of these slides.