מצגת של PowerPoint - Tel Aviv University

Download Report

Transcript מצגת של PowerPoint - Tel Aviv University

Structural Bioinformatics
Seminar
•Dina Schneidman
•Email: [email protected]
Outline
Seminar requirements
 Biological Introduction
 How to prepare seminar lecture?

Seminar Requirements

No prior knowledge in Biology is assumed
or required!
 Attend ALL lectures
 Prepare one of the lectures
Seminar Goals

Learn how to study new subject from
articles
 Learn how to present work in Computer
Science
Biological Introduction
Schedule
Introduction to molecular
structure.
 Introduction to pattern matching.
 Introduction to protein structure
alignment (comparison).
 Protein docking.

Small Ligands
Small organic molecules,
composed of tens of atoms.
 Highly flexible: can have
many torsional degrees of
freedom.

DNA – The code of life
DNA is a polymer.
 The monomer units of
DNA are nucleotides: A,
T, C, G.
 DNA is a normally double
stranded macromolecule.

RNA
RNA is a polymer too.
 The monomer units of RNA are
nucleotides: A, U (instead of T), C, G.
 DNA serves as the template for the
synthesis of RNA.

Protein
Protein is a polymer too.
 The monomer units of Protein
are 20 amino acids.
 Each amino acid is encoded
by 3 RNA nucleotides.

Hemoglobin sequence:
VHLTPEEKSAVTALWGKVNVDEVGGEAL
GRLLVVYPWTQRFFESFGDLSTPDAVMG
NPKVKAHGKKVLGA
FSDGLAHLDNLKGTFATLSELHXDKLHVD
PENFRLLGNVLVCVLAHHFGKEFTPPVQ
AAYQKVVAGVANA LAHKYH
The Central Dogma
Transcription
Translation
mRNA
Gene (DNA)
DNA
RNA
Cells express different subset of the
genes in different tissues and under
different conditions.
Protein
Protein
Symptomes
(Phenotype)
The central dogma
DNA
--->
{A,C,G,T}
mRNA
{A,C,G,U}
Guanine-Cytosine
--->
Protein
{A,D,..Y}
T->U
Thymine-Adenine
4 letter alphabets
20 letter alphabet
Sequence of nucleic acids Sequence of amino acids
Bioinformatics - Computational
Genomics
DNA mapping.
 Protein or DNA sequence comparisons.
 Exploration of huge textual databases.


In essence one- dimensional methods
and intuition.
Structural Bioinformatics Structural Genomics

Elucidation of the 3D structures of
biomolecules.
 Analysis and comparison of biomolecular
structures.
 Prediction of biomolecular recognition.
 Handles three-dimensional (3-D) structures.
 Geometric Computing. (a methodology shared
by Computational Geometry, Computer Vision,
Computer Graphics, Pattern Recognition etc.)
Protein Structural Comparison
ApoAmicyanin - 1aaj
Pseudoazurin - 1pmy
Algorithmic Solution
About 1 sec. Fischer, Nussinov, Wolfson ~ 1990.
Introduction to Protein Structure
Amino acids and the peptide bond
Cα atoms
Cb – first side chain carbon (except for glycine).
Backbone or Secondary
structure display
Wire-frame or ribbons display
Spacefill model
Geometric Representation
3-D Curve
{vi}, i=1…n
Secondary structure
b strands and sheets
Hydrogen bonds.
The Holy Grail - Protein
Folding
From Sequence to Structure.
 Relatively primitive computational
folding models have proved to be NP
hard even in the 2-D case.

Determination of protein
structures
X-ray Crystallography
 NMR (Nuclear Magnetic Resonance)
 EM (Electron microscopy)

An NMR result is an ensemble of models
Cystatin (1a67)
The Protein Data Bank (PDB)

International repository of 3D molecular
data.

Contains x-y-z coordinates of all atoms of
the molecule and additional data.

http://pdb.tau.ac.il
 http://www.rcsb.org/pdb/
Why bother with structures
when we have sequences ?
In evolutionary related proteins
structure is much better preserved
than sequence.
 Structural motifs may predict similar
biological function
 Getting insight into protein folding.
Recovering the limited (?) number of
protein folds.

Applications

Classification of protein databases by
structure.

Search of partial and disconnected
structural patterns in large databases.

Extracting Structure information is
difficult, we want to extract “new”
folds.
Applications (continued)

Speed up of drug discovery.

Detection of structural pharmacophores
in an ensemble of drugs (similar
substructures in drugs acting on a
given receptor – pharmacophore).

Comparison and detection of drug
receptor active sites (structurally
similar receptor cavities could bind
similar drugs).
Object Recognition
Model Database
Scene
Recognition
Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
Protein Alignment =
Geometric Pattern Discovery
Protein Alignment
• The superimposition pattern is not known apriori – pattern discovery .
• The matching recovered can be inexact.
• We are looking not necessarily for the
largest superimposition, since other
matchings may have biological meaning.
Geometric Task :
Given two configurations of points in the
three dimensional space,
T
find those rotations and translations of one of the
point sets which produce “large” superimpositions
of corresponding 3-D points.
Geometric Task (continued)
Aspects:
•Object representation (points, vectors,
segments)
•Object resemblance (distance function)
•Transformation (translations, rotations,
scaling)
-> Optimization technique
Transformations
Translation
  
x  x t
Translation and Rotation

Rigid Motion (Euclidian Trans.)


 
x  R  x  Ux  t
Translation, Rotation + Scaling


 
x  Tx  s(Ux  t )
Inexact Alignment.
Simple case – two closely related proteins with
the same number of amino acids.
T
Question: how to
measure alignment
error?
Superposition - best least squares
(RMSD – Root Mean Square Deviation)
Given two sets of 3-D points :
P={pi}, Q={qi} , i=1,…,n;
rmsd(P,Q) = √
S i|pi - qi |2 /n
Find a 3-D rigid transformation T* such that:
rmsd( T*(P), Q ) = minT
√ S i|T*pi - qi |2 /n
A closed form solution exists for this task.
It can be computed in O(n) time.
Problem statement with RMSD
metric.
Given two configurations of points in the
three dimensional space, and ε threshold
T
find the largest alignment, a set of matched
elements and transformation, with RMSD less
than ε.
(belong to NP,)
Docking Problem:
• Given two molecules find their correct
association:
T
=
+
Docking Problem:
+
= ?
Docking Problem:
+
= ?
How to present a paper in
Computer Science
Lecture Preparation






The lecture should cover a given slot of time
(~90 minutes).
Use PowerPoint slides for presentation.
Each slide usually spans 1-2 minutes.
The slides should not be overloaded.
Use mouse or pointer.
Use colors, pictures, tables and animation,
but don’t exaggerate.
What to say and how

Communicate the key ideas during your
lecture.
 Don’t get lost in technical details.
 Structure your talk.
 Use a top-down approach.
Lecture Structure
Introduction – general description of the
paper.
 Body - abstract of the current method.
 Technical details.
 Conclusions and discussion.

Introduction







Most important part of your talk!
Title + short explanation about the
presented topic.
Lecture outline.
Problem definition, input and output. Don’t
forget to define the problem!
Problem motivation.
Introduce terminology of the field.
Short review of existing approaches (don’t
forget to add references!).
Body

Abstract of the major results presented in
the paper.
 Significance of the results.
 Sketch of the method.
Technicalities

Extended presentation of the method.
 Present key algorithmic ideas clearly and
carefully.
 Complexity of the method.
 Experimental results.
Conclusions and Discussion





Summarize major contributions of the work.
You can highlight points based on technical
details you couldn’t discuss in introduction.
Present related open problems.
Don’t forget to thank the audience !!!
Questions.
Getting to the Audience

Use repetitions:
“Tell them what you're going to tell them.
Tell them.
Then tell them what you told them".
Remind, don’t assume
 Maintain eye contact
 Control your voice and motion

Thanks!!!
and Good Luck in your
lectures!