Transcript PPT

bio-modeling
c o u r s e













l a y o u t
introduction
molecular biology
biotechnology
bioMEMS
bioinformatics
bio-modeling
cells and e-cells
transcription and regulation
cell communication
neural networks
dna computing
fractals and patterns
the birds and the bees ….. and ants
i n t r o d u c t i o n
far and away in the past
 Newton’s equations of motions (17th -18th century)
 Molecular dynamics (MD)
 Boltzmann’s statistics (19th century)
 Monte Carlo (MC)
 Schrödinger/Heisenberg’s
century)
quantum
mechanics
(20th
birth of simulation in chemistry
 1950’s: do it by hand (or mechanical calculator)!
 Tried to solve Newton’s equation of motion for small systems
(e.g. three-atom system)
 Didn’t take very long before they saw computers
 1970’s: Age of punchcards
 1980’s: Better IO devices
Workstations dominated as research
platforms
first generation (1980’s – 1990’s)
 Gas phase reaction
 (e.g.) H + H2  H2 + H
RB-C
 MD
RA-B
first generation (1980’s – 1990’s)
 Liquid simulation
 (e.g.) Lennard-Jones Fluid
 MD/MC
first generation (1980’s – 1990’s)
 Proteins on lattice
 MC
first generation (1980’s – 1990’s)
 Quantum mechanical structure
calculation (semi-empirical, ab
initio, …)
revolution (~ 1995)
 Workstation-like PCs
 100 hr Cray time  64MB / 150MHz
Pentium
 “Cheap and fast”
 Impacts
 Two directions
1) More accurate methods
2) Larger system
 Start of bio-simulations
impact on “non-bio” simulations
 Better surface
 Revisions on existing surfaces
 Dynamics
on
mechanical surfaces
quantum
 Time
dependent Schrödinger
equation instead of Newton’s
equation
 Totally quantum (can’t be more
accurate)
 Some people still do this for
hydride/proton
transfer
in
enzyme dynamics
RB-C
 Quantum wavepacket dynamics
RA-B
Impacts on bio-simulations
 Proteins got free from the lattice!



Off lattice model (still, each residue as a bead)
United atom approach (e.g. CH3  one atom)
All atom approach
 With water (explicit solvent)
 Without water (implicit solvent)
 What to look at?


Kinetics: dynamic characteristics (e.g. folding simulation)
Thermodynamics: equilibrium characteristics (e.g. binding affinity of protein &
drug)
solvent models
 Implicit solvent
 Solvent accessible surface area (SASA)  Solvation free
energy
 Cheaper than explicit
 Discrete nature of solvent not included
 Different methods for SASA/free-E calculation
 Generalized Born model (GB/SA)
 Poisson-Boltzmann model (PB/SA)
 Distance dependent dielectric (DD/SA)
solvent models
Explicit solvent
 Water as individual molecules
 Expensive calculation
 Periodic boundary conditions usually necessary
 Rigid/flexible, polarizable/non-polarizable
 SPC, TIP3P, TIP4P, TIP5P, …
impacts on bi o-simulati ons
 Proteins got free from the lattice!
 Off lattice model (each residue as a bead)
 United atom approach (e.g. CH3  one atom)
 All atom approach
 With water (explicit solvent)
 Without water (implicit solvent)
 What to look at?
 Kinetics: dynamic characteristics (e.g. folding simulation)
 Thermodynamics: equilibrium characteristics (e.g. binding
affinity of protein & drug)
 Remember, proteins are still big!
off lattice go model




Developed from lattice model:
“funnel concept”
Nature has developed proteins to
fold (evolution)
Proteins can be modeled to fold
Native
contacts

energy
surface
Matches
with
experimental
observations
united atom/implicit model folding
 “Statistical folding”
 Starts from many independent
trajectories
 Lucky trajectories fold
Nfolded / Ntotal = kfold x time
all atom unfolding
 Folding inferred from unfolding
 At high T, unfolding is fast (~ 1 ns)
 Full atomistic detail from folded state to unfolded state
binding free energy: docking
 Molecular modeling”
 Binding free energy is calculated based on the shape of
ligand and protein
 Drug design
binding free energy: more accurate versions
 Free energy: Potential + entropy factor
 P + L  PL
 Thermodynamic integration (TI)
 Free energy perturbation (FEP)
 Jarzinsky’s inequality
 Extremely expensive calculations
DF
free energ y la nds ca pe m ethod
 Kinetic information is inferred from
free energy surface
Rough free energy surface can be
obtained faster by parallelization
 “Trajectory by intuition”

current limitation
 Accuracies of models
 Force field
 Solvent models
 Speed
 For small proteins (<50 amino acids):
1 ns ~ 1 day
 Biologically relevant event timescale > 1 ms
 Size
 Many proteins are not just large: they are huge!
responses to the challenges
 Accuracy: Blend with quantum mechanical calculation
 QM/MM, QM-trajectory method (e.g. CPMD)
 Speed
 E.g. Compute on video card
 Size
 E.g. Umbrella sampling
computational
biology
Biological
Systems
are
complex,
thus,
a
combination of experimental and computational
approaches are needed.
computational
biology
 Computational Biology  Bioinformatics
 More than sequences, database searches, statistics or
image analysis.
 A part of Computational Science
 Using mathematical modeling, simulation and visualization
 Complementing theory and experiment
simplest chemical reaction
AB
 irreversible, one-molecule reaction
 examples: all sorts of decay processes, e.g. radioactive,
fluorescence, activated receptor returning to inactive
state
 any metabolic pathway can be described by a
combination of processes of this type (including
reversible reactions and, in some respects, multimolecule reactions)
simplest chemical reaction
AB
various levels of description:
 homogeneous system, large numbers of molecules =
ordinary differential equations, kinetics
 small numbers of molecules = probabilistic equations,
stochastics
 spatial heterogeneity = partial differential equations,
diffusion
 small number of heterogeneously distributed molecules =
single-molecule tracking (e.g. cytoskeleton modelling)
k i n e t i c
d e s c r i p t i o n
 Imagine a box containing N molecules.
How many will decay during time t? k N
 Imagine two boxes containing N/2 molecules
each.
How many decay? k N
 Imagine two boxes containing N molecules
each.
How many decay? 2k N
 In general:
dn(t )
 t

  * n(t )  n(t )  N 0e
dt
differential equation (ordinary,
linear, first-order)
exact solution (in more
complex cases replaced by a
numerical approximation)
what is bio-modeling?
biological building blocks
DNA
GAA GTT GAA AAT CAG GCG AAC CCA CGA CTG
RNA
GAA GUU GAA AAU CAG GCG AAC CCA CGA CUG
PROTEIN
GLU GAL GLU ASN GLN ALA ASN PRO ARG LEU
protein folding
LEU
ARG
ASN
PRO
ALA
ASN
GLN
GLU
GLU
GLU
VAL
GLU
ASN
VAL
GLN
ALA
ASN
PRO
ARG
LEU
...
some fundamental questions
 Question #1:
Given a protein or DNA molecule, what is the geometric structure of the
molecule?
 Question #2:
Why and how protein folds to a unique three-dimensional structure?
 Question #3:
Given a set of distances between pairs of atoms, how can we
determine the coordinates of the atoms?
 Question #4:
Given the magnitudes of the structure factors of a protein, how can we
determine the phases of the structure factors?
 Question #5:
Given two proteins, how can we compare their geometric structures?
 Question #6:

…
methods for structure prediction and determination







Protein X-ray Crystallography
Nuclear Magnetic Resonance
Potential Energy Minimization
Molecular Dynamics Simulation
Homology Modeling
Fold Recognition
Inverse Protein Folding
empirical structure determination
 Two major experimental methods for determining protein
structure
 X-ray Crystallography
 Requires
growing
a
crystal
of
the
protein
(impossible for some, never easy)
 Diffraction pattern can be inverse-Fourier transformed to
characterize electron densities (Phase problem)
 Nuclear Magnetic Resonance (NMR) imaging
 Provides distance constraints, but can be hard to find a
corresponding structure
 Works only for relatively small proteins
X -ra y cr ys t a llog ra phy
 X-rays, since wavelength is near the distance between
bonded carbon atoms
 Maps electron density, not atoms directly
 Crystal to get a lot of spatially aligned atoms
 Have to invert Fourier transform to get structure, but only
have amplitudes, not phases
X -ra y cr ys t a llog ra phy
X-ray crystallography computing
 In X-ray crystallography, protein first needs to be purified
and crystallized, which may take months or years to
complete, if not failed.
 After that, the protein crystal is put into an X-ray
equipment to make an X-ray diffraction image. The
diffraction image can be used to determine the threedimensional structure of the protein.
 The process is time consuming, and some proteins
cannot even be crystallized.
X-ray crystallography computing
 A mathematical problem, called the phase problem,
needs to be solved before every crystal structure can be
fully determined from the diffraction data.
 80% of the structures in PDB Data Bank were determined
by using X-ray crystallography.
NMR structure determination
 The NMR approach is based on the fact that nuclei spin
and generate magnetic fields. When two nuclei are
close their spins interact. The intensity of the interaction
depends on the distance between the nuclei. Therefore,
the distances between certain pairs of atoms can be
estimated by measuring the intensities of the nuclei spinspin couplings.
 The distance data obtained from the NMR experiment
can be used to deduce the structural information for the
molecule. One way of achieving such a goal is based on
molecular distance geometry.
NMR structure determination
 Not all distances between pairs of atoms can be
detected. In practice, only lower and upper bounds for
the distances can be obtained also.
 Structure can be determined by solving a distance
geometry problem with the distance data from the NMR
experiments.
 15% of the structures in PDB Data Bank were determined
by using NMR spectroscopy.
potential energy minimization
Hypothesis
Protein native structure has the
lowest or almost lowest potential
energy. It can therefore be
located at the global energy
minimum of protein.
potential energy minimization
 A reasonably accurate potential energy function needs
to be constructed.
 Given such a function, a local minimum is easy to find,
but a global one is hard, especially if the function has
many local minima. No completely satisfactory algorithm
has been developed yet for minimizing proteins.
 Potential energy minimization has been used successfully
for structure refinement though.
molecular dynamics
Folding can be simulated by
following the movement of the
atoms in protein according to
Newton’s second law of motion.
molecular dynamics
 The step size has to be small in femto-second to achieve
accuracy.
 Current computing technology can make only
picoseconds to microseconds of simulation, while protein
folding may take seconds or even longer time.
 Molecular dynamics simulation has been used
successfully for the study of other types of dynamical
behavior of protein.
limitations of MD simulations
 Full atomic representation  noise  difficulty in
discerning the dominant mechanisms of motion  need
for methods for filtering out the noise, such as Essential
Dynamics.
 Empirical force fields  limited by the accuracy of the
potentials.
 Time steps constrained by fastest motion (vibrations in
bond lengths occur in the femtoseconds (fs) time range
and necessitate the use of timesteps of 1-5 fs).
 Inefficient sampling of the complete space of
conformations.
 Limited to small proteins (100s of residues) and/or short
times (subnanoseconds).
sequence structure alignment
Homology Modeling
Sequence to Sequence
Fold Recognition
Structure to Sequence
Known Sequences / Structures
Sequence Structure Alignment
Inverse Protein Folding
Sequence to Structure
Ranking Sequences / Structures
sequence structure alignment
 Scoring functions may not be able to distinguish
between good and bad matches.
 Computing the best alignment is NP-hard in general
when gaps are allowed.
 The results are not accurate and have only certain level
of confidence.
what is biomolecular modeling?
 Application of computational models to understand the
structure, dynamics, and thermodynamics of biological
molecules
 The models must be tailored to the question at hand:
Schrödinger equation is not the answer to everything!
Reductionist view bound to fail!
 This implies that biomolecular modeling must be both
multidisciplinary and multiscale
an odd remark
"Every attempt to employ mathematical methods in the
study of chemical questions must be considered profoundly
irrational and contrary to the spirit in chemistry.
If
mathematical analysis should ever hold a prominent place
in chemistry - an aberration which is happily almost
impossible - it would occasion a rapid and widespread
degeneration of that science."
A. Comte (1830)
a Nobel remark
1992 Nobel Prize in Chemistry
Rudolph Marcus (Theory of Electron Transfer)
1998 Nobel Prize in Chemistry
John Pople (ab initio)
Walter Kohn (DFT-density functional theory)
growth of biological databases
3D structures growth
http://www.rcsb.org/pdb/holdings.html
molecular modeling
structure-property relationships
“First Principles”
• H Y = E Y (QM)
•- dE / dri = mi d2ri / dt2(MD)
•Folding simulations
Molecular
Model
Mathematical
model
Predictions:
•Structure
•Properties
Empirical Correlations {property} = k {Descriptors}
^
•E = Ebonded + Enonbonded (MM)
• log ( 1 C )   k p 2 + k 'p + rs + k '' (QSAR)
•Fold recognition
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
 Conformational energy (potential energy)
Etotal  Evalence + Enonbond
 Evalence = Ebond + Eangle + Etorsion + Eoop
 bond stretching(Ebond)
 valence angle bending (Eangle)
 dihedral angle torsion (Etorsion)
 out-of-plane interactions (Eoop)
 Enonbond = EvdW + ECoulomb + Ehbond
 van der Waals (EvdW)
 electrostatic (ECoulomb)
 hydrogen bond (Ehbond)
F.Melani
Molecular Modeling in Chimica Farmaceutica
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
Force-field
Σ Force fields
conformational energy
(potential energy)
definition by
 atoms type
 atomic charges
 constant of force, equlibrium values
 energy equations
F.Melani
Molecular Modeling in Chimica Farmaceutica
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
standard force field
F.Melani
Molecular Modeling in Chimica Farmaceutica
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
F.Melani
Molecular Modeling in Chimica Farmaceutica
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
bond-stretching ( Ebond )

k 1  e  ( r r0 )

2
Morse
k ( r  r0 ) 2 k 2 ( r  r0 ) 2 + k 3 ( r  r0 ) 3 + k 4 ( r  r0 ) 4
quadratic
quartic
Morse
quadratic
valence angle bending (Eangle )
k (   0 ) 2
quadratic
dihedral angle torsion ( Etorsion )
k 1 + cos( n  0 )
F.Melani
Molecular Modeling in Chimica Farmaceutica
k 2 (   0 ) 2 + k 3 (   0 ) 3 + k 4 (   0 ) 4
quartic
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
out-of-plane interactions ( Eoop )
k 2
H
R'
O
R
k 
F.Melani
Molecular Modeling in Chimica Farmaceutica
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
nonbond term (Enonbond )
12
van der Waals ( EvdW )
 Cij 
 Dij 










i
j  rij 
 rij 
12
hydrogen bond ( Ehbond )
 Cij 
 Dij 










r
r
i
j  ij 
 ij 
qi q j
electrostatic ( Ecoulomb )
F.Melani
6
 r
i
j
ij
Molecular Modeling in Chimica Farmaceutica

10

0 6
 r 0 12

r
ij
ij 
Eij0    2  

r  
 rij 
i
j
 ij  

0 10 
  r 0 12

r
ij
ij 
0



 

E
5

6

ij




  rij 
i
j
 rij  

m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
Example: H2O (potential energy )
(
o
E  K OH b  bOH
) + (b  b )
2
'
2
o
OH
(
o
+ K HOH    HOH
)
2
 Koh, b0OH, KHOH, and 0HOH are parameters
of the forcefield
 b is the current bond length of one O-H
 b' is the length of the other O-H bond
  is the H-O-H angle.
F.Melani
Molecular Modeling in Chimica Farmaceutica
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
DOCKING
The objective: searching the
orientations with low interaction
energies.
12
6
 Cij 
 Dij  qi q j
 +
Eint      




r
rij
i
j  rij 
ij


F.Melani
Molecular Modeling in Chimica Farmaceutica
m ol ec u la r g eo m et r y a nd m ole cu la r p r o pe r ti es
MEP
V ( p) 
nucleus

A
V ( p)  
i
ZA
r (r)

dr
RAp
rrp
qi
rri p
electronic density
r (r) 
F.Melani
BasisFunctions
Pmm ( r ) ( r )

m
Molecular Modeling in Chimica Farmaceutica
molecular vibration
molecular vibration
protein structure
protein structure
 Most proteins will fold spontaneously in water, so amino
acid sequence alone should be enough to determine
protein structure
 However, the physics are daunting:
 20,000+ protein atoms, plus equal amounts of water
 Many non-local interactions
 Can takes seconds (most chemical reactions take place
~1012 --1,000,000,000,000x faster)
 Empirical determinations
advancing rapidly.
of
protein
structure
are
protein structure
 Proteins are polymers of amino acids linked by peptide
bonds.
 Properties of proteins are determined by both the
particular sequence of amino acids and by the
conformation (fold) of the protein.
 Flexibility in the bonds around C:
  (phi)
 Y (psi)
 sidechain
protein structure
 Protein structure is described in four levels
 Primary structure: amino acid sequence
 Secondary structure: local (in sequence) ordering into
 ()Helices: compressed, corkscrew structures
 ()Strands: extended, nearly straight structures
 ()Sheets: paired strands, reinforced by
hydrogen bonds
 parallel (same direction) or antiparallel
sheets
 Coils, Turns & Loops: changes in direction
 Tertiary structure: global ordering (all angles/atoms)
 Quaternary structures: multiple, disconnected amino acid
chains interacting to form a larger structure

helices
2 types of  sheets
anti-parallel
parallel
t u r n s
combining secondary structures to make motifs
DNA-binding helix-turn-helix
Calcium-binding motif
24 ways to arrange adjacent hairpins
alpha/beta domains
Triosephosphate isomerase
Dehydrogenase
Ramanchandran plot
Ramanchandran plot
always glycine
protein structure cartoons
protein structure representations
protein structure representations
protein structure representations
protein structure representations
protein structure representations
protein structure
 Proteins are created linearly and then assume their
tertiary structure by “folding.”
 Exact mechanism is still unknown
 Proteins assume the lowest energy structure
 Or sometimes an ensemble of low energy structures.
 Hydrophobic collapse drives process
 Local (secondary) structure proclivities
 Internal stabilizers:
 Hydrogen bonds, disulphide bonds, salt bridges.
CaM Kinase II structure
serine-threonine
protein kinase
calmodulin
regulation
multimer
formation
12 subunits
with the catalytic
domains facing out
sequence comparison
unc-43
rCaMKII
hCaMKI
rCaMKI
--------------------MQLQQINSGAFSVVRRCVHKTTGLEFAAKIINTKKLSARD
-------MATITCTRFTEEYQLFEELGKGAFSVVRRCVKVLAGQEYPAKIINTKKLSARD
MLGAVEGPRWKQAEDIRDIYDFRDVLGTGAFSEVILAEDKRTQKLVAIKCIAKEALEGKE
MPGAVEGPRWKQAEDIRDIYDFRDVLGTGAFSEVILAEDKRTQKLVAIKCIAKKALEGKE
.. **** * .
.
* *
* ..
unc-43
rCaMKII
hCaMKI
rCaMKI
FQKLEREARICRKLQHPNIVRLHDSIQEESFHYLVFDLVTGGELFEDIVAREFYSEADAS
HQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVAREYYSEADAS
GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS
GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS
.* * . . ..***** * *
**. **.*****. ** . .*.* ***
unc-43
rCaMKII
hCaMKI
rCaMKI
HCIQQILESIAYCHSNGIVHRDLKPENLLLASKAKGAAVKLADFGLAIEVN-DSEAWHGF
HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA
. * *.*...
* *.*********** *
. . ..****.
unc-43
rCaMKII
hCaMKI
rCaMKI
AGTPGYLSPEVLKKDPYSKPVDIWACGVILYILLVGYPPFWDEDQHRLYAQIKAGAYDYP
AGTPGYLSPEVLRKDPYGKPVDLWACGVILYILLVGYPPFWDEDQHRLYQQIKARAYDFP
CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDAKLFEQILKAEYEFD
CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDAKLFEQILKAEYEFD
.*****..**** . ** * ** *. *** **** ***** **
.*. **
*..
unc-43
rCaMKII
hCaMKI
rCaMKI
SPEWDTVTPEAKSLIDSMLTVNPKKRITADQALKVPWICNRERVASAIHRQDTVDCLKKF
SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHRQETVDCLKKF
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN
** ** .. ** * ..
* ** *. .**. ***.
.
.* *
. .**
unc-43
rCaMKII
hCaMKI
rCaMKI
NARRKLKGAILTTMIATRNLSSKRSYRLTLGAEKLVISMKNIEYWQVLLNKIFATYKIKM
NARRKLKGAILTTMLATRNFSGG-----------------------------------KS
FAKSKWKQAFNATAVVRHMR---------------------------------------FAKSKWKQAFNATAVVRHMR---------------------------------------*. * * * .* . .
…continued
…continued (overlapped)
sequence comparison
unc-43
rCaMKII
hCaMKI
rCaMKI
SPEWDTVTPEAKSLIDSMLTVNPKKRITADQALKVPWICNRERVASAIHRQDTVDCLKKF
SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHRQETVDCLKKF
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN
** ** .. ** * ..
* ** *. .**. ***.
.
.* *
. .**
unc-43
rCaMKII
hCaMKI
rCaMKI
NARRKLKGAILTTMIATRNLSSKRSYRLTLGAEKLVISMKNIEYWQVLLNKIFATYKIKM
NARRKLKGAILTTMLATRNFSGG-----------------------------------KS
FAKSKWKQAFNATAVVRHMR---------------------------------------FAKSKWKQAFNATAVVRHMR---------------------------------------*. * * * .* . .
unc-43
rCaMKII
KQCRNLLNKKEQGPPSTIKESSESS-QTIDDNDSEKGGGQLKHENTVVRADGATGIVSSS
G--G---NKKNDG----VKESSESTNTTIEDED--------------------------***. *
.******. **.*.*
unc-43
rCaMKII
NSSTASKSSSTNLSAQKQDIVRVTQTLLDAISCKDFETYTRLCDTSMTCFEPEALGNLIE
------------TKVRKQEIIKVTEQLIEAISNGDFESYTKMCDPGMTAFEPEALGNLVE
**.*..**. *..*** ***.**..** **.*********.*
unc-43
rCaMKII
GIEFHRFYFD--GNRKNQ-VHTTMLNPNVHIIGEDAACVAYVKLTQFLDRNGEAHTRQSQ
GLDFHRFYFENLWSRNSKPVHTTILNPHIHLMGDESACIAYIRITQYLDAGGIPRTAQSE
*..******.
*
****.*** .*..*.. **.**...**.** *
* **.
unc-43
rCaMKII
ESRVWSKKQGRWVCVHVHRSTQPSTNTTVSEF
ETRVWHRRDGKWQIVHFHRSGAPSVLPH---*.*** .. *.* **.*** **
p rProtein
o t e i n sstructure
tructure
basics
 proteins consist mostly of a-helices, b-sheets, and turns.
 the a-helices and b-sheets typically form the framework
of the protein.
 the turns and other atypical structures often play
important binding and catalytic roles.
 the core of the protein is hydrophobic, whereas the
surface is usually polar or charged.
 most turns and kinks have glycines and prolines
protein structure
alpha helix
protein structure
three-stranded antiparallel b-sheet
protein structure
three-stranded antiparallel b-sheet, space filled
protein structure
substrate binding cleft
rCaMKII
rCaMKI
SPEWDTVTPEAKDLINKMLTINPSKRITAAEALKHPWISHRSTVASCMHRQETVDCLKKF
SPYWDDISDSAKDFIRHLMEKDPEKRFTCEQALQHPWIAGDTALDKNIH-QSVSEQIKKN
** ** .. *** * .. .* ** *. .**.****. . .
.* *
. .**
rCaMKII
rCaMKI
NARRKLKGAILTTMLATRN
FAKSKWKQAFNATAVVRHM
*. * * *. .* . .
316
297
sliced protein
red - charged
blue - polar
green - hydrophobic
protein structure
rCaMKII
rCaMKI
HQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVAREYYSEADAS
GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS
.* * . . .****** * *
** *** .**.*****. ** . .*.* ***
119
rCaMKII
rCaMKI
HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA
. * *.*.**
* *.*********** *
. . ..****.
.
178
protein structure
rCaMKII
rCaMKI
HQKLEREARICRLLKHPNIVRLHDSISEEGHHYLIFDLVTGGELFEDIVAREYYSEADAS
GS-MENEIAVLHKIKHPNIVALDDIYESGGHLYLIMQLVSGGELFDRIVEKGFYTERDAS
.* * . . .****** * *
** *** .**.*****. ** . .*.* ***
119
rCaMKII
rCaMKI
HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA
. * *.*.**
* *.*********** *
. . ..****.
.
178
protein structure
rCaMKII
rCaMKI
HCIQQILEAVLHCHQMGVVHRDLKPENLLLASKLKGAAVKLADFGLAIEVEGEQQRWFGF
RLIFQVLDAVKYLHDLGIVHRDLKPENLLYYSLDEDSKIMISDFGLSKMED-PGSVLSTA
. * *.*.**
* *.*********** *
. . ..****.
.
178
rCaMKII
rCaMKI
AGTPGYLSPEVLRKDPYGKPVDLWACGVILYILLVGYPPFWDEDQHRLYQQIKARAYDFP
CGTPGYVAPEVLAQKPYSKAVDCWSIGVIAYILLCGYPPFYDENDAKLFEQILKAEYEFD
.*****..**** . ** * ** *. *** **** ***** **.. .*..**
*.*
238
protein structure
protein structure prediction
protein
Goodsell, PDB
model
protein structure prediction
 the 3-D structure of proteins is
used to understand protein
function and design new drugs
protein structure prediction
 Structural Predictions just from raw protein sequence?
1. ggcacgaggc acggctgtgc aggcacgcat gcaggccagc ….
2. atctgcacgt ggttatgctg ccggagtttg ggccgccact….
protein structure prediction
1
2
protein structure prediction
50
100
50
100
5.0
KD Hydrophobicity
-5.0
10
Surface Prob.
0.0
1.2
Flexibility
0.8
1.7
Antigenic Index
-1.7
CF Turns
CF Alpha Helices
CF Beta Sheets
GOR Turns
GOR Alpha Helices
GOR Beta Sheets
Glycosylation Sites
Particular structural features can be recognised in protein sequences
structure prediction
Comparative modeling
 Modeling the structure of a protein that has a high degree of
sequence identity with a protein of known structure
 Must be >30% identity to have reliable structure
statistical methods
 Residue conformational preferences:
 Glu, Ala, Leu, Met, Gln, Lys, Arg -
 Val, Ile, Tyr, Cys, Trp, Phe, Thr  Gly, Asn, Pro, Ser, Asp -
helix
strand
turn
 Chou-Fasman algorithm:
 Identification of helix and sheet "nuclei"
 helix - 4 out of 6 residues with high helix
propensity
 sheet - 3 out of 5 residues with high sheet
propensity
 Propagation until termination criteria met
structure prediction
Threading/fold recognition
 Uses known fold structures to predict folds in primary
sequence.
inverse protein folding
 based on the assumption that there is limited number of
structural protein classes (folds). One attempts to assign
a new protein sequence to one of these classes.
fold recognition/threading
...MLDTNMKTQL KAYLEKLT KPVELIATL DDSAKSAEIKELL...
structure library
fold recognition/threading
...MLDTNMKTQL KAYLEKLT KPVELIATL DDSAKSAEIKELL...
structure prediction
Ab initio
 Predicting structure from primary sequence data
 Generate as many conformations as possible, and assign
an energy score to each one
 When the search terminates (usually when resources run
out), the one with the lowest energy score is selected
 Usually not as robust nor practical, computationally
intensive
function prediction
 Key problem: predict the function of protein structures
based on sequence and structure information
 Function is loosely defined, and can be thought of at
many levels
 Atomic or molecular level
 Pathways level
 Network level
 Etc.
 Currently, relatively little progress has been made in
function prediction, particularly for higher order
processes
function prediction
Experimentation
 Experimentally determine the function of proteins and
other structures
 The “gold standard” of function determination
 Expensive in terms of time and money
current methods
function prediction
Annotation transfer
 When
sequence
or
structure
analysis
yields
correspondences between structures, the known
properties and function of one is used to extrapolate the
properties and function of the other
 This method has been extremely successful, but its
drawbacks include [Bork et al., 1998]:
 Similar sequence or structure does not always imply similar
function
 The annotated information about the “known” protein or its
sequence or structure information in the database may be
incomplete or incorrect
 Generally, only molecular functions of a protein can be
inferred by analogy (i.e. not higher level functions)
 From a formal point of view, properties derived in this
manner must be verified through experimentation
current methods
simulation-based analysis
 Simulation-based analysis tests hypotheses with in silico
experiments, providing predictions to be tested by in
vitro and in vivo studies.
 faster and more economical.
 Example: Folding@Home
Folding@Home
 Simulates protein folds
 Folds dictate the function of the
protein
 Unfolding was discovered by
Christian Anfinsen
 When folds do not fold properly,
it leads to diseases such as
Alzheimer’s disease, Mad Cow,
Parkinson’s disease
 If the fold of the protein is
known then it can also be
unfolded
Folding@Home
 Runs on a distributed system
 Runs as a screensaver
 Downloadable at:
http://folding.stanford.edu
drug design
structured-based drug design
structured-based drug design
Compound
databases,
Microbial broths,
Plants extracts,
Combinatorial
Libraries
Random
screening
synthesis
3-D ligand
Databases
Docking
Linking or
Binding
Receptor-Ligand
Complex
Lead molecule
3-D QSAR
Target Enzyme
OR Receptor
3-D structure by
Crystallography,
NMR, electron
microscopy OR
Homology Modeling
Testing
Redesign
to improve
affinity,
specificity etc.
3D QSAR
 quantitative structure activity relationships to calculate and predict
 charge distribution, solubility,
 hydrophobicity, lipophilicity
active
si tes
drug target site
Glutathione-GR
drug target site
DHFR
multiple alignments of DHFR
CLUSTAL W (1.81) multiple sequence alignment
chabaudi
vinckei
berghei
yoelii
vivax
falciparum
-----------------------E--KAGCFSNKTFKGLGNEGGLPWKCNSVDMKHFSSV
-----------AICACCKVLNSNE--KASCFSNKTFKGLGNAGGLPWKCNSVDMKHFVSV
MEDLSETFDIYAICACCKVLNDDE--KVRCFNNKTFKGIGNAGVLPWKCNLIDMKYFSSV
-----------AICACCKVINNNE--KSGSFNNKTFNGLGNAGMLPWKYNLVDMNYFSSV
MEDLSDVFDIYAICACCKVAPTSEGTKNEPFSPRTFRGLGNKGTLPWKCNSVDMKYFSSV
-------------------------KKNEVFNNYTFRGLGNKGVLPWKCNSLDMKYFCAV
*
*. **.*:** * **** * :**::* :*
35
47
58
47
60
35
chabaudi
vinckei
berghei
yoelii
vivax
falciparum
TSYVNETNYMRLKWKRDRYMEK---------NNVKLNTDGIPSVDKLQNIVVMGKASWES
TSYVNENNYIRLKWKRDKYIKE---------NNVKVNTDGIPSIDKLQNIVVMGKTSWES
TSYINENNYIRLKWKRDKYMEKHNLK-----NNVELNTNIISSTNNLQNIVVMGKKSWES
TSYVNENNYIRLQWKRDKYMGKNNLK-----NNAELNNGELN--NNLQNVVVMGKRNWDS
TTYVDESKYEKLKWKRERYLRMEASQGGGDNTSGGDNTHGGDNADKLQNVVVMGRSSWES
TTYVNESKYEKLKYKRCKYLNKET----------VDNVNDMPNSKKLQNVVVMGRTNWES
*:*::*.:* :*::** :*:
*
.:***:****: .*:*
86
98
113
100
120
85
chabaudi
vinckei
berghei
yoelii
vivax
falciparum
IPSKFKPLQNRINIILSRTLKKEDLAKEYN------NVIIINSVDDLFPILKCIKYYKCF
IPSKFKPLENRINIILSRTLKKENLAKEYS------NVIIIKSVDELFPILKCIKYYKCF
IPKKFKPLQNRINIILSRTLKKEDIVNENN--NENNNVIIIKSVDDLFPILKCTKYYKCF
IPPKFKPLQNRINIILSRTLKKEDIANEDNKNNENGTVMIIKSVDDLFPILKAIKYYKCF
IPKQYKPLPNRINVVLSKTLTKEDVK---------EKVFIIDSIDDLLLLLKKLKYYKCF
IPKKFKPLSNRINVILSRTLKKEDFD---------EDVYIINKVEDLIVLLGKLNYYKCF
** ::*** ****::**:**.**:.
* **..:::*: :*
:*****
140
152
171
160
171
136
chabaudi
vinckei
berghei
yoelii
vivax
falciparum
I----------------------------------------------------------IIGGASVYKEFLDRNLIKKIYFTRINNAYT-----------------------------IIGGSSVYKEFLDRNLIKKIYFTRINNSYNCDVLFPEINENLFKITSISDVYYSNNTTLD
IIGGSYVYKEFLDRNLIKKIYFTRINNSYN-----------------------------IIGGAQVYRECLSRNLIKQIYFTRINGAYPCDVFFPEFDESQFRVTSVSEVYNSKGTTLD
I----------------------------------------------------------*
141
182
231
190
231
137
chabaudi
vinckei
berghei
yoelii
vivax
falciparum
----------------FIIYSKTKE 240
--------FLVYSKVGG 240
---------
binding site analysis
 In the absence of a structure of target-ligand complex, it
is not a trivial exercise to locate the binding site!!!
 This is followed by Lead optimization.
lead optimisation
Active site
Lead
Lead Optimization
drug design
factors affecting the affinity of a small molecule for a target protein
LIGAND.wat n +PROTEIN.wat n
LIGAND.PROTEIN.watp+(n+m-p) wat
 HYDROGEN BONDING
 HYDROPHOBIC EFFECT
 ELECTROSTATIC INTERACTIONS
 VAN DER WAALS INTERACTIONS
 STRAIN IN THE LIGAND ( BOUND)
 STRAIN IN THE PROTEIN
difference between inhibitor and drug
Extra requirement of a drug compared to an inhibitor











Selectivity
Less Toxicity
Bioavailability
Slow Clearance
Reach The Target
Ease Of Synthesis
Low Price
Slow Or No Development Of Resistance
Stability Upon Storage As Tablet Or Solution
Pharmacokinetic Parameters
No Allergies
thermodynamics of receptor -ligand binding
Proteins that interact with drugs are typically enzymes or receptors.
Drug may be classified as: substrates/inhibitors (for enzymes)
agonists/antagonists (for receptors)
Ligands for receptors normally bind via a non-covalent reversible binding.
Enzyme inhibitors have a wide range of modes:non-covalent
reversible,covalent reversible/irreversible or suicide inhibition.
 Enzymes prefer to bind transition states (reaction intermediates) and may not
optimally bind substrates as part of energy used for catalysis.
 In contrast, inhibitors are designed to bind with higher affinity: their affi nities
often exceed the corresponding substrate affinities by several orders of
magnitude!
 Agonists are analogous to enzyme substrates: part of the binding energy may
be used for signal transduction, inducing a conformation or aggregation shift.





thermodynamics of receptor -ligand binding
 To understand ‘what forces’ are responsible for ligands binding
to Receptors/Enzymes,
 It is worthwhile considering what forces drive protein folding –
they share many common features.
 The observed structure of Protein is generally a consequence of
the hydrophobic effect!
 Secondary amides form much stronger H-bonds to water than
to other sec. Amides
hydrophobic collapse
 Proteins generally bury hydrophobic residues inside the
core,while exposing hydrophilic residues to the exterior
Saltbridges inside
 Ligand building clefts in proteins often expose hydrophobic
residues to solvent and may contain partially desolvated
hydrophilic groups that are not paired:
 The desolvation penalty is paid for by favourable (hydrophobic)
interaction elsewhere in the structure.
docking methods
 Docking of ligands to proteins is a formidable problem
since it entails optimization of the 6 positional degrees of
freedom.
 Rigid vs Flexible
 Speed vs Reliability
 Manual Interactive Docking
GRID based docking methods
 Grid Based methods
 GRID (Goodford, 1985, J. Med. Chem. 28:849)
 GREEN (Tomioka & Itai, 1994, J. Comp. Aided. Mol.
Des. 8:347)
 MCSS (Mirankar & Karplus, 1991, Proteins, 11:29).
 Functional groups are placed at regularly spaced (0.30.5A) lattice points in the active site and their interaction
energies are evaluated.
automat ed docking methods
 Basic Idea is to fill the active site of the Target protein
with a set of spheres.
 Match the centre of these spheres as good as possible
with the atoms in the database of small molecules with
known 3-D structures.
 Examples:
 DOCK, CAVEAT, AUTODOCK, LEGEND, ADAM, LINKOR, LUDI.
drug binding pocket of L. casei DHFR
predi ction & d esi gn of new dru gs
 Prediction of 3-D PfDHFR using bacterial DHFR and
homology modeling approach.
 Search for the compounds using bifunctional basic
groups that could form stable H-bonds in a plane with
carboxyl group.
 Optimize the structure of small molecules and then dock
them on PfDHFR model.
 Toyoda et. al. (1997). BBRC 235:515-519 could identify
two compounds.
identifying new leads
 These two compounds
a triazinobenzimidazole &
a pyridoindole were found to be
active with high Ki against
recombinant wild type DHFR.
 Thus
demonstrate
use
of
molecular modeling in malarial
drug design.
physiome project
virtual
human
virtual
human
Simulation of complex models of cells, tissues and organs
http://www.physiome.org/
physiome project
 “A worldwide effort to define the physiome by
developing databases and models which will facilitate
the understanding of the integrative functions of cells,
organs and organisms.”
defenition
Physiome is the quantitative and integrated description
of the functional behavior of the physiological state of
an individual or species.
physiome project
main objective:
“… to understand and
describe the human organism, its
physiology and pathophysiology
quantitatively, and to use this
understanding
to
improve
human health.”
physiome project
Specific Objectives:
1. To develop a database with observations of
physiological phenomenon and interpret these in terms
of mechanism (reductionism).
2. To integrate experimental information into quantitative
descriptions of the functioning of humans and other
organisms (modern integrative biology glued together
via modeling).
3. To disseminate experimental data and integrative
models for teaching and research.
physiome project
Specific Objectives:
4. To foster collaboration amongst investigators worldwide,
in an effort to speed up the discovery of how biological
systems work.
5. To determine the most effective targets (molecules or
systems) for therapy, either pharmaceutical or genomic.
6. To provide information for the design tissue-engineered,
biocompatible implants.
physiome project
Issues being addressed:
1. Markup language
-- development of SBML (in Caltech) for representing
biochemical networks and CellML for electrophysiology,
mechanics, energetics and general pathway.
2. Mathematical models
-- development of models that are “anatomically based”
and “biophysically based” to link gene, protein, cell,
tissue ,organ and whole body systems physiology.
physiome project
Issues being addressed:
3. Web-accessible databases
-- For easy data exchange, groups at MIT and UCSD are
developing standards for this.
Example databases: Genomic Databases, Protein
Databases, Material Property Databases, Anatomical
Model Databases, Clinical Databases
4. Development of new instrumentation
5. Development of Modeling tools, GUIs and webaccessible tools for visualization of complex models.
physiome project
1. Microcirculation
A common functional system
between organs; It provides
an
important
coupling
between cells, tissues, and
organs.
http://www.bme.jhu.edu/news/microphys
physiome project
2. Musculo-skeletal system
Continues to extend the
database of parameterised
bone geometry to individual
muscles,
ligaments
and
tendons.
a
b
Anatomically detailed model of
Skeleton.
Rendered finite element mesh for
the bones and a subset of the
muscles
a
http://www.bioeng.auckland.ac.nz/projects/nerf/skeletal.php
b
physiome project
Computational model of the skull and torso.
a
b
The layer of skeletal muscle is highlighted.
The heart and lungs shown within the torso.
a
b
physiome project
3. Cardiome Project
An attempt to provide an
integrated model of the heart,
incorporating electrical activation,
mechanical contraction, energy
supply and utilization, cell signaling
and many other biochemical
processes.
Heart model with a textured epidermal
surface
physiome project
Fibrous-sheet architecture of the heart. Ribbons are drawn in the plane of the
myocardial sheets a on the epicardial surface of the heart, b at midwall, and c on
the endocardial surface. Note the large fibre angle changes. These fibre-sheet
material axes are needed for computation of both myocardial activation and
ventricular mechanics.
a
b
heart structure
c
physiome project
The finite element model of the
right and left ventricle of the
heart showing various anatomical
structures. Geometric information
is carried at the nodes of the finite
element mesh and interpolated
with cubic Hermite basis functions.
heart structure
physiome project
Mechanics of the cardiac cycle, computed by large deformation finite
element analysis, at a zero pressure state, b end-diastole, c mid-systole,
d end-systole. Note the apex to base shortening and the twisting about
the long axis. Also note the six generations of discretely modeled
coronary vessels embedded within the myocardial elements which are
used to compute coronary flow throughout the cardiac cycle.
a
b
c
ventricular mechanics
d
physiome project
The collagenous structure of the
extra-cellular myocardial tissue
matrix, as revealed by confocal
microscopy. The material axes
used for defining mechanical and
electrical constitutive laws in the
continuum modeling of the
myocardium are based on these
microstructurally defined axes.
ventricular mechanics
physiome project
Activation wave front computed on the finite element model using
finite difference techniques based on grid points which move with the
deforming myocardium. Bi-domain current conservation equations are
solved with trans-membrane ionic currents. The stimulus in this case is a
point on the left ventricular endocardial surface near the apex. The
activation sequence is heavily influenced by the fibrous-sheet
architecture of the myocardium.
myocardial activation
physiome project
Computed flow in the coronary
vasculature
coronary perfusion
physiome project
Epicardial Fibers – FEM Model
www.ccmb.jhu.edu
ventricular fluid flow
Endocardial Fibers – FEM Model
physiome project
Human Torso
model has been developed which includes the heart, lungs
and the layers of skeletal muscle, fat and skin. Current flow
from the heart into the torso is computed in order to predict
the body surface potentials arising from activation of the
myocardium.
physiome project
4. Lungs
Development of models of the
integrated function of various
physical processes operating in the
lung.
5. Bladder and Prostate
An anatomically detailed model of
the bladder and prostate is
developed.
6. Circulation System
A model of the circulation system is
being developed based on the
Visual Human Project dataset
(http://www.nlm.nih.gov/research/v
isible)
future
Development of Precision Models
 Simulation requires the integration of multiple hierarchies
of models that have different scales and qualitative
properties
 Some biological processes take place within milliseconds
while others may take hours or days
Example: Protein folding vs. Cell Mitosis
future
Development of Precision Models
 Biological processes can involve the interaction of
different types of processes
(i.e. biochemical networks coupled to protein transport,
chromosome dynamics, cell migration or morphological
changes in tissues)
future
Development of Precision Models
 Types of modeling:
 Using differential equations and stochastic
simulation
 Many
cell
biological
phenomena
require
calculation of structural dynamics
 Deformation of elastic bodies
 Spring-mass models and other physical processes
the
end