Transcript Powerpoint
Thinking Outside the Box:
Applications Including Finding
Off-targets for Major
Pharmaceuticals
Philip E. Bourne
[email protected]
Agenda
• Overall Theme - Thinking differently
about proteins:
– Spherical harmonics and phylogeny
– The Gaussian Network Model and new
modes of motion
– The Geometric Potential for Describing
Ligand Binding Sites
– SOIPPA for finding off-site targets
The Curse of the Ribbon
7
8
The conventional view
of a protein (left) has
had a remarkable
impact on our
understanding of
living systems, but
its time for new
views It is not how a
ligand sees a
protein after all.
Limitations
• A local viewpoint – does not capture the
global properties of the protein
• A local viewpoint does not capture the
global properties of a protein
• Cartesian coordinates do not
necessarily capture the properties of the
protein
• Comparative analysis is limited
Agenda
• Overall Theme - Thinking differently
about proteins:
– Spherical harmonics and phylogeny
– The Gaussian Network Model and new
modes of motion
– The Geometric Potential for Describing
Ligand Binding Sites
– SOIPPA for finding off-site targets
Protein Kinase A – Open Book
View
Scheeff & Bourne 2005 PLoS Comp. Biol. 1(5): e49
Superfamily Members – The
Same But Different
Scheeff & Bourne 2005 PLoS Comp. Biol. 1(5): e49
An Alternative Approach:
Multipolar Representation
• Roots in spherical harmonics
• Parameter space and boundary
conditions can be a variety of properties
• Order of the multipoles defines the
granularity of the descriptors
• Bottom line – interpreted as shape
descriptors
Gramada & Bourne 2006 BMC Bioinformatics 7:242
Geometric Comparison Does
Not Reflect Biological Reality
Gramada & Bourne 2006 BMC Bioinformatics 7:242
Results – Protein Kinase Like Superfamily
Alignment
Clear distinction
between families.
Some clustering
seen inside TPKs
that resemble
various groups,
even though there is
little shape
discrimination at this
level.
Gramada & Bourne 2006 BMC Bioinformatics 7:242
Results – Protein Kinase Like Superfamily
Alignment
Gramada & Bourne 2006 BMC Bioinformatics 7:242
Possibilities – Structure Based Phylogenetic
Analysis
Scheeff & Bourne
Multipoles
Gramada & Bourne 2007 PLoS ONE submitted
Agenda
• Overall Theme - Thinking differently
about proteins:
– Spherical harmonics and phylogeny
– The Gaussian Network Model and new
modes of motion
– The Geometric Potential for Describing
Ligand Binding Sites
– SOIPPA for finding off-site targets
Protein Motion
Ordered
Structures
Disordered
Structures
Structures exist in a spectrum from
order to disorder
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Obtaining Protein Dynamic Information
Protein Structures Treated as a
3-D Elastic Network
Bahar, I., A.R. Atilgan, and B. Erman
Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential.
Folding & Design, 1997. 2(3): p. 173-181.
Gaussian Network Model
• Each Ca is a node in the network.
• Each node undergoes Gaussian-distributed
fluctuations influenced by neighboring interactions
within a given cutoff distance. (7Å)
• Decompose protein fluctuation into a summation of
different modes.
Functional Flexibility Score
• Utilize correlated movements to help
define regional flexibility with functional
importance.
Functionally Flexible
Score
For each residue:
1. Find Maximum and
Minimum Correlation.
2. Use to scale normalized
fluctuation to determine
functional importance.
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Identifying FFRs in HIV Protease
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Other Examples BPTI and Calmodulin
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Side Note: Gaussian Network
Model vs Molecular Dynamics
• GNM relatively course grained
• GNM fast to compute vs MD
–Look over larger time scales
–Suitable for high throughput
Agenda
• Overall Theme - Thinking differently
about proteins:
– Spherical harmonics and phylogeny
– The Gaussian Network Model and new
modes of motion
– The Geometric Potential for Describing
Ligand Binding Sites
– SOIPPA for finding off-site targets
Motivation
• What if we can characterize a proteinligand binding site from a 3D structure
(primary site) and search for that site on
a proteome wide scale?
• We could perhaps find alternative
binding sites (secondary sites) for
existing pharmaceuticals?
• We could use it for lead optimization
and possible ADME/Tox prediction
Background – PDB Contains Major
Pharmaceuticals Bound to Receptors
Generic Name
Other Name
Treatment
PDBid
Lipitor
Atorvastatin
High cholesterol
1HWK, 1HW8…
Testosterone
Testosterone
Osteoporosis
1AFS, 1I9J ..
Taxol
Paclitaxel
Cancer
1JFF, 2HXF, 2HXH
Viagra
Sildenafil citrate
ED, pulmonary
arterial
hypertension
1TBF, 1UDT,
1XOS..
Digoxin
Lanoxin
Congestive heart
failure
1IGJ
Background – Superfamily
(Derived from Structure) Covers
38% of the Human Proteome
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY
Background – Advantage to Using Functional
Site Similarity
Small molecule
Similarity
Protein
Sequence/Structure
Similarity
Protein
Functional Site
Similarity
• Poor correlation
between structure and
activity
• Infinite chemical space
. Not adequately reflecting
functional relationship
. Not directly addressing
drug design problem
. Build closer structurefunction relationships
. Limit chemical space
through co-evolution
Overview of Algorithm
Protein structure is represented with Ca atoms only and is
characterized with a geometric potential
• tolerant to protein flexibility and model uncertainty
Optimum superimposition is achieved with a maximum
weighted sub-graph algorithm with geometric constraints
• sequence order independent to detect cross-fold
relationships
• to identify sub site similarity
Functional site similarity is measured with both
evolutionary correlation and physiochemical similarity
• to distinguish divergent and convergent evolution
Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
Characterization of the Ligand Binding
Site - Conceptual
1
2
ab
3
4
5
c
1. Represent the protein
structure
2. Determine the
environmental
boundary
3. Determine the protein
boundary
4. Computation of the
geometric potential
5. Computation of the
virtual ligand
Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
Characterization of the Ligand Binding
Site - Conceptual
Conceptually similar to hydrophobicity
or electrostatic potential that is
dependant on both global and local
environments
• Initially assign Ca atom with
a value that is the distance
to the environmental
boundary
• Update the value with those
of surrounding Ca atoms
dependent on distances and
orientation – atoms within a
10A radius define i
GP P
Pi
cos(ai) 1.0
2.0
neighbors Di 1.0
Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
Discrimination Power of the Geometric
Potential
4
binding site
non-binding site
3.5
• Geometric
potential can
distinguish
binding and
non-binding
sites
3
2.5
2
1.5
1
0.5
100
Geometric Potential
99
88
77
66
55
44
33
22
11
0
0
0
Geometric Potential Scale
Boundary Accuracy of Ligand Binding Site
Prediction
25
70
60
20
Distribution (%)
Distribution (%)
50
15
10
40
30
20
5
10
0
0
10
20
30
40
50
60
70
Sensitivity (%)
80
90 100
10
20
30
40
50
60
70
80
90 100
Specificity (%)
• ~90% of the binding sites can be identified with above 50%
sensitivity
• The specificity of ~70% binding sites identified is above 90%
So Far…
• Geometric potential dependant on local
environment of a residue – relative to other
residues and the environmental boundary
• Geometric potential reasonably good at
discriminating between ligand binding sites
and non-ligand binding sites
• Boundary of the binding site reasonably well
defined
• How to compare sites ???
Agenda
• Overall Theme - Thinking differently
about proteins:
– Spherical harmonics and phylogeny
– The Gaussian Network Model and new
modes of motion
– The Geometric Potential for Describing
Ligand Binding Sites
– SOIPPA for finding off-site targets
Identification of Functional Similarity with
Local Sequence Order Independent Alignment
• Geometric and graph characterization of the
protein structure
• Chemical similarity matrix and evolutionary
relationship with profile-profile comparison
• Optimum alignment with maximum-weight subgraph algorithm
Xie and Bourne 2007 PNAS, Submitted
Similarity Matrix of Alignment
Chemical Similarity
• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and
(EDNQKRH)
• Amino acid chemical similarity matrix
Evolutionary Correlation
• Amino acid substitution matrix such as BLOSUM45
• Similarity score between two sequence profiles
d f a Sb f b S a
i
i
i
i
i
i
fa, fb are the 20 amino acid target frequencies of profile a
and b, respectively
Sa, Sb are the PSSM of profile a and b, respectively
Xie and Bourne 2007 PNAS, Submitted
Local Sequence-order Independent Alignment
with Maximum-Weight Sub-Graph Algorithm
Structure A
Structure B
LER
VKDL
LER
VKDL
• Build an associated graph from the graph representations of two
structures being compared. Each of the nodes is assigned with a
weight from the similarity matrix
• The maximum-weight clique corresponds to the optimum alignment
of the two structures
Efficient Functional Site Comparison with
Evolutionary and Geometric Constraints
• The search space is segmented with the residue
clusters determined from the geometric potential
• The nodes and edges are greatly reduced with the
robust residue boundary orientation and neighbors
a
1
+
b
a2 a1
c
2
b1
a2 a1
2c
b2
1c
b1
2c
b2
1c
The time complexity is almost linearly dependant on
the number of residues
Improved Performance of Alignment Quality
and Search Sensitivity and Specificity
90
0.03
Amino Acid Grouping
Chemical Similarity
Substitution Matrix
Profile-Profile
80
Amino Acid Group
Chemical Similarity
Substitution Matrix
Profile-Profile
0.025
70
False Positive Ratio
Frequency (%)
60
50
40
30
0.02
0.015
0.01
20
0.005
10
0
0
<1.0
<3.0
<5.0
<7.0
<9.0
<11.0
RMSD (Angsgroms)
.
RMSD distribution of the aligned common fragments of ligands from
247 test cases showing four scores: amino acid grouping, chemical
similarity, substitution matrix and profile-profile.
0
0.04
0.08
0.12
True Positive Ratio
0.16
0.2
So What is the Potential of
this Methodology?
Lead Discovery from Fragment
Assembly
• Privileged molecular moieties
in medicinal chemistry
• Structural genomics and high
throughput screening generate
a large number of proteinfragment complexes
• Similar sub-site detection
enhances the application of
fragment assembly strategies
in drug discovery
1HQC: Holliday junction migration motor protein
from Thermus thermophilus
1ZEF: Rio1 atypical serine protein kinase
from A. fulgidus
Lead Optimization from
Conformational Constraints
• Same ligand can bind to
different proteins, but with
different conformations
• By recognizing the
conformational changes in the
binding site, it is possible to
improve the binding specificity
with conformational constraints
placed on the ligand
1ECJ: amido-phosphoribosyltransferase
from E. Coli
1H3D: ATP-phosphoribosyltransferase
from E. Coli
Finding Secondary Binding Sites
for Major Pharmaceuticals
• Scan known binding sites for major
pharmaceuticals bound to their
receptors against the human proteome
• Try and correlate strong hits with known
data from the literature, databases,
clinical trials etc. to provide molecular
evidence of secondary effects
A Case Study
Selective Estrogen Receptor Modulators
(SERM)
• One of the largest
classes of drugs
• Breast cancer,
osteoporosis, birth
control etc.
• Amine and benzine
moiety
Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted.
Adverse Effects of SERMs
cardiac abnormalities
thromboembolic
disorders
loss of calcium
homeostatis
?????
ocular toxicities
Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted.
0.02
Density
0.04
0.06
Ligand Binding Site Similarity Search
On a Proteome Scale
0.00
SERCA
ERa
0
20
40
Score
60
80
• Searching human proteins covering ~38% of the
drugable genome against SERM binding site
• Matching Sacroplasmic Reticulum (SR) Ca2+ ion
channel ATPase (SERCA) TG1 inhibitor site
• ERa ranked top with p-value<0.0001 from reversed
search against SERCA
Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted.
Structure and Function of SERCA
• Regulating cytosolic
calcium levels in
cardiac and skeletal
muscle
• Cytosolic and
transmembrane
domains
• Predicted SERM
binding site locates in
the TM, inhibiting Ca2+
uptakes
Xie, Wang and Bourne 2007 Nature Biotechnology, Submitted.
Binding Poses of SERMs in SERCA from
Docking Studies
• Salt bridge
interaction between
amine group and
GLU
• Aromatic
interactions for both
N-, and C-moiety
6 SERMS A-F (red)
Off-Target of SERMs
cardiac abnormalities
thromboembolic
disorders
loss of calcium
homeostatis
SERCA !
ocular toxicities
in vivo and in vitro Studies
TAM play roles in regulating calcium uptake activity of cardiac SR
TAM reduce intracellular calcium concentration and release in the
platelets
Cataract results from TG1 inhibited SERCA up-regulations
EDS increases intracellular calcium in lens epithelial cells by
inhibiting SERCA
in silico Studies
Ligand binding site similarity
Binding affinity correlation
Conclusion
• By thinking differently about how to
represent proteins we have seen
potential value in:
– Phylogenetic analysis
– The study of the dynamics of proteins
– Improvements to the drug discovery
process
Acknowledgements
Lei Xie
Jian Yang
Jenny Gu
Protein Motions
Apostol Gramada
Multipole Analysis
Support Open Access
www.pdb.org • [email protected]
Implications on Drug Development
Affinity (ER Site)
Affinity (SERCA)
Affinity Difference
Bazedoxifene(BAZ)
-9.44 +/- 0.54
-7.23 +/- 0.13
2.21
Lasofoxifene(LAS)
-8.66 +/- 0.40
-6.54 +/- 0.20
2.12
Ormeloxifene(ORM)
-8.67 +/- 0.18
-5.84 +/- 0.33
2.83
Raloxifene(RAL)
-8.08 +/- 0.64
-5.78 +/- 0.23
2.30
4-hydroxytamoxifen(OHT)
-7.67 +/- 0.47
-5.40 +/- 0.15
2.27
Tamoxifen(TAM)
-7.30 +/- 0.28
-5.64 +/- 0.28
1.66
• Taking account of both target and off-target for
lead optimization
• Drug delivery and administration regime
Swiss-Prot - 20 Year Celebration
A Protein is More than the Union
of its Parts
• Breaking the protein into
parts changes the object
of the comparison
• This is interpreted in
many cases to imply that
the rmsd measure is
inadequate.
• The reality is that it is the
aligning of structure that
breaks the triangle
inequality and not the
measure per se. The
reason for failure is that
we effectively compare
different objects then we
say we do.
From Røgen & Fain (2003), PNAS 100:119-124
New Tricks – Protein Representation
An Alternative Approach:
Multipolar Representation
Roots in Spherical Harmonics
• Parameterization
Charge distribution
(i.e. structure)
+ boundary conditions
f
g
Spatial distribution of
a scalar quantity
Ð
Scalar
potential
i
qlm out ; M lm in ; qlm ;
Gramada & Bourne 2006 BMC Bioinformatics 7:242
New Tricks – Protein Representation
i
M lm g
An Alternative Approach:
Multipolar Representation
• “Out” Multipoles
qlm =
PN
i= 1
ã
( òi ; þ i ) ; l = 0; ááá; 1 ; m = à l; ááá; l
r li Ylm
For a given rank l, they form a 2l+1 dimensional vector
under 3D rotations
q l = f q l;m gm = à l;ááá;l
Vector algebra applies => metric properties
Gramada & Bourne 2006 BMC Bioinformatics 7:242
New Tricks – Protein Representation
An Alternative Approach:
Multipolar Representation
The multipoles can be interpreted as shape descriptors
In principle, from the entire series of multipoles one can
reconstruct the scalar field and therefore the density, i.e
the entire set of Cartesian coordinates, i. e. of the
structure with a geometric level of detail
The partitioning of the multipole series according to
various representation of the rotational group allows for a
multi-scale description of the structure
Gramada & Bourne 2006 BMC Bioinformatics 7:242
New Tricks – Protein Representation