מצגת של PowerPoint - Tel Aviv University

Download Report

Transcript מצגת של PowerPoint - Tel Aviv University

Structural Bioinformatics
Workshop
•Max Shatsky
•Email: [email protected]
Workshop home page:
http://bioinfo3d.cs.tau.ac.il/Education/Workshop/
Schedule
•Introduction to protein structure.
•Introduction to pattern matching.
•Protein structure alignment (comparison).
•Protein Docking
•GAMB++ library.
Grade Ingredients
Presentation and Design Review
Final Project
–Software Engineering
–Efficiency of Solution
–Working Examples and Test Cases
–Documentation
–Knowledge of all project aspects
Bioinformatics - Computational
Genomics
DNA mapping.
Protein or DNA sequence comparisons.
Exploration of huge textual databases.
In essence one- dimensional methods and
intuition.
Structural Bioinformatics Structural Genomics
Elucidation of the 3D structures of
biomolecules.
Analysis and comparison of biomolecular
structures.
Prediction of biomolecular recognition.
Handles three-dimensional (3-D) structures.
Geometric Computing. (a methodology shared by
Computational Geometry, Computer Vision, Computer Graphics,
Pattern Recognition etc.)
Protein Structural Comparison
ApoAmicyanin - 1aaj
Pseudoazurin - 1pmy
Algorithmic Solution
About 1 sec. Fischer, Nussinov, Wolfson ~ 1990.
Introduction to Protein Structure
The central dogma
DNA
--->
{A,C,G,T}
Guanine-Cytosine
mRNA
{A,C,G,U}
--->
Protein
{A,D,..Y}
T->U
Thymine-Adenine
4 letter alphabets
20 letter alphabet
Sequence of nucleic acids
seq of amino acids
When genes are expressed, the genetic information (base sequence) on DNA is first
transcribed (copied) to a molecule of messenger RNA in a process similar to DNA replication.
The mRNA molecules then leave the cell nucleus and enter the cytoplasm, where triplets of
(codons) forming the genetic code specify the particular amino acids that make up an ( bases
individual protein.
This process, called translation, is accomplished by ribosomes (cellular components composed
of proteins and another class of RNA) that read the genetic code from the mRNA, and
transfer RNAs (tRNAs) that transport amino acids to the ribosomes for attachment to the
growing protein. (From www.ornl.gov/hgmis/publicat/primer/ )
Amino acids and the peptide bond
Cα atoms
Cb – first side chain carbon (except for glycine).
Wire-frame or ribbons display
Geometric Representation
3-D Curve
{vi}, i=1…n
Secondary structure
b strands and sheets
Hydrogen bonds.
The Holy Grail - Protein Folding
From Sequence to Structure.
Relatively primitive computational
folding models have proved to be NP
hard even in the 2-D case.
Determination of protein structures
X-ray Crystallography
NMR (Nuclear Magnetic Resonance)
EM (Electron microscopy)
An NMR result is an ensemble of models
Cystatin (1a67)
The Protein Data Bank (PDB)
International repository of 3D molecular
data.
Contains x-y-z coordinates of all atoms of
the molecule and additional data.
http://pdb.tau.ac.il
http://www.rcsb.org/pdb/
Why bother with structures
when we have sequences ?
In evolutionary related proteins
structure is much better preserved
than sequence.
Structural motifs may predict similar
biological function .
Getting insight into protein folding.
Recovering the limited (?) number of protein
folds.
Applications
Classification of protein databases by
structure.
Search of partial and disconnected
structural patterns in large databases.
Extracting Structure information is
difficult, we want to extract “new” folds.
Applications (continued)
Speed up of drug discovery.
Detection of structural pharmacophores
in an ensemble of drugs (similar
substructures in drugs acting on a given
receptor – pharmacophore).
Comparison and detection of drug
receptor active sites (structurally similar
receptor cavities could bind similar
drugs).
Object Recognition
Model Database
Scene
Recognition
Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Protein Alignment =
Geometric Pattern Discovery
Protein Alignment
• The superimposition pattern is not known apriori – pattern discovery .
• The matching recovered can be inexact.
• We are looking not necessarily for the
largest superimposition, since other
matchings may have biological meaning.
Geometric Task :
Given two configurations of points in the
three dimensional space,
T
find those rotations and translations of one of the
point sets which produce “large” superimpositions
of corresponding 3-D points.
Geometric Task (continued)
Aspects:
•Object representation (points, vectors, segments)
•Object resemblance (distance function)
•Transformation (translations, rotations, scaling)
-> Optimization technique
Transformations
Translation
  
x  x t
Translation and Rotation

Rigid Motion (Euclidian Trans.)


 
x  R  x  Ux  t
Translation, Rotation + Scaling


 
x  Tx  s(Ux  t )
Inexact Alignment.
Simple case – two closely related proteins with the same number of
amino acids.
T
Question: how to measure
alignment error?
Superposition - best least squares
(RMSD – Root Mean Square Deviation)
Given two sets of 3-D points :
P={pi}, Q={qi} , i=1,…,n;
rmsd(P,Q) = √
S i|pi - qi |2 /n
Find a 3-D rigid transformation T* such that:
rmsd( T*(P), Q ) = minT
√ S i|T*pi - qi |2 /n
A closed form solution exists for this task.
It can be computed in O(n) time.
Problem statement with RMSD metric.
Given two configurations of points in the
three dimensional space, and ε threshold
T
find the largest alignment, a set of matched
elements and transformation, with RMSD less
than ε.
(belong to NP, is it in NPC?)
Distance Functions
Two point sets: A={ai} i=1…n
B={bj} j=1…m
• Pairwise Correspondence:
(ak1,bt1) (ak2,bt2)… (akN,btN)
(1) Exact Matching: ||aki – bti||=0
(2) RMSD (Root Mean Square Distance)
Sqrt( Σ||aki – bti||2/N) < ε
(3) Bottleneck max ||aki – bti||
•
• Hausdorff distance: h(A,B)=maxaєA minbєB ||a– b||
H(A,B)=max( h(A,B), h(B,A))