Intersecting different databases to define the inner and outer limits of

Download Report

Transcript Intersecting different databases to define the inner and outer limits of

1
Intersecting different databases to define the inner and
outer limits of the data-supported druggable proteome
http://www.slideshare.net/cdsouthan/update-on-the-druggable-proteome
Christopher Southan, Adam J. Pawson, Joanna L. Sharman, Elena
Faccenda, Simon Harding, Jamie Davis, IUPHAR/BPS Guide to
PHARMACOLOGY, Centre for Integrative Physiology, University of
Edinburgh
ACS Tue, Mar 15, CINF 98: Linking Big Data with Chemistry:
Databases Connecting
Genomics, Biological Pathways & Targets to
www.guidetopharmacology.org
Chemistry 9:30 AM - 11:50 AM Room 24C 11:10am - 11:30am
2
Abstract
(will be skipped for presentation)
Hopkins and Groom coined the term “druggable genome” in 2002 for the extrapolated total of ~
10% of the human proteome likely to bind small molecules with lead-like chemical properties
and sufficient binding affinity for activity modulation. Fast-forward to 2015 and the UniProtKB
website now include four database cross-references in the new Chemistry section. These
provide a more detailed picture, based largely on chemistry-to-protein mapping data curated
from the literature. They are thus evidence-supported statistics rather than homology-based
transitive estimates. These included (Sept 2015) human protein links to 2927 target entries
from ChEMBL, 2191 from BindingDB, 1563 from DrugBank and 1340 from the IUPHAR/BPS
Guide to PHARMACOLOGY (GtoPdb). Statistical comparisons between these will be presented
here defining different levels evidence support and following their continued expansion. The
union of all four sets, 3603, encompasses ~ 18% of the proteome. However, the proportion that
would match the most stringently curated of these, GtoPdb for chemistry-to-protein mapping is
lower and comparison indicate curation strategies and source selections for each database
diverge considerably (PMID 24533037). This is manifest in the relatively high unique content of
1147 (31% of the union) for the sources. However, they converge as a 4-way intersect for 490
proteins (13% of the union). Concordance between at least two independent sources (i.e. the
non-unique proportion) expands to 2456 or 12% of the proteome. This represents the most
precise data-supported druggable proteome snapshot for each UniProtKB release. Orthogonal
comparative analyses of these intersecting sets will be presented, including by Gene Ontology
functional categories, target class content, secreted vs. non-secreted, and disease gene links.
The utility of this druggable proteome assessment is very high in pharmacology and drug
discovery, especially in terms of being able to data mine leads as chemical starting points for
target validation experiments.
3
Outline
• Origins of the druggable genome
• Sources for the druggable proteome
• Comparing coverage
• Inner and outer limits
• Distribution of target attributes
• Selection example
• Future expansion
4
Druggable genome in 2002
Hopkins and Groom, PMID 12209152
5
Druggable proteome: 2016 update
Working definitions for IUPHAR/BPS Guide to PHARMACOLGY curation
• Protein “has ligand”: data-supported pharmacologically relevant interaction
•
•
•
•
•
(quantitative if possible)
Drugged target: molecular mechanism of action (MMOA) involves binding of
drug to primary target
Drugged proteome (targets of approved drugs):
• 120 in 2002 (PMID 12209152)
• 213 in 2006 (PMID 7139284)
• 312 in 2015 (PMID 26464438)
Tractable target: assay > documented in vitro activity modulation of target by
small molecule or other therapeutic modality
Druggable target: data-supported plausibility of in vivo modulation
Validated target: in vivo modulation via MMOA > clinical efficacy for disease
6
UniProt curated druggable sources
Select example: “database:(type:guidetopharmacology) AND reviewed:yes AND
organism:"Homo sapiens (Human) [9606]“ = 1379
Proteins all:
31788
6162
1957
1833
Swiss-Prot hum: 2245
2935
1640
1379
PubChem CIDs: 540313
1458720
7426
6293
7
UniProt query results
8
Human Swiss-Prot intersects and differentials
9
Druggable inner and outer limits
(Swiss-Prot human proteome at 20,198)
Source-unique 1,099
2-way 861
3-way 1053
4-way 539
All sources (union) 3,568 = 18% of proteome
4-way = 2.7% of proteome
4-way = 15% of the union
10
Intersects by GO-function splits
Uniques
3-way
2-way
4-way
11
Attribute distributions in the 4-way target set (539)
12
Advanced selection example
From the 4-way set
database:(type:merops)
annotation:(type:signal)
database:(type:pdb)
annotation:(type:"alternative products")
database:(type:hpa)
13
Initiatives for expansion
NIH Illuminating the Druggable Genome (IDG)
Program objective is to improve our understanding of
the properties and functions of proteins that are
currently unannotated within the four most commonly
drug-targeted protein families: the G-protein coupled
receptors, nuclear receptors, ion channels, and
protein kinases.
14
Non-active site pockets: broadening druggability
15
Conclusions
• The data-supported druggable proteome is expanding
• UniProt chemistry cross-referencing collates curated sources with
•
•
•
•
•
complementary selectivity
Sources indicate an outer limit of 18% with an inner limit of 3%
Advanced “slice-and-dice” options can identify subsets
Expanding choice of experimental perturbagens for systems pharmacology,
dug discovery, chemical biology and synthetic biology
Challenge of the constitutive “loss of function” for disease causality
It is hoped the druggable expansion will translate into
• novel validated targets
• broader potential therapeutic coverage (including rare diseases)
• new approved medicines
• new combinations and hybrids
• more repurposing via target-hopping
16
Acknowledgements, references and questions
Benoit Bely, UniProt Release Production Project Leader, EBI (for x-refs)
Database teams of BindingDB, ChEMBL and DrugBank
GtoPdb Team Members and funders from title slide