Modelling interactomes

Transcript Modelling interactomes

MODELLING INTERACTOMES
RAM SAMUDRALA
ASSOCIATE PROFESSOR
UNIVERSITY OF WASHINGTON
How does the genome of an organism specify
its behaviour and characteristics?
How can we use this information to improve
human health and quality of life?
MOTIVATION
The functions necessary for life are undertaken by proteins and their
interactions (with other proteins, DNA, RNA, and small molecules).
Protein function is mediated by atomic three dimensional structure.
Knowing protein structure at atomic resolution will therefore enable us to:
Determine and understand molecular function.
Understand substrate and ligand binding interactions.
Devise intelligent mutagenesis and biochemical.
experiments to understand biological function.
Design therapeutics rationally.
Design novel proteins.
Knowing the atomic level structures of all proteins encoded by an organism’s
genome, along with interactions, will enable us to understand the underlying
mechanistic basis or “wiring diagram” that comprises pathways, systems,
and organisms.
Applications in the area of medicine, nanotechnology, and biological
computing.
PROTEOME
~30,000 in human
~60,000 in rice
~4500 in bacteria
countless numbers total
Several thousand
distinct sequence
families
STRUCTURE
A few thousand
distinct structural
folds
FUNCTION
Tens of thousands
of functions
EXPRESSION
Different expression
patterns based on
time and location
INTERACTION
Interaction and
expression are
interdependent
with structure and
function
PROTEIN FOLDING
Gene
…-CTA-AAA-GAA-GGT-GTT-AGC-AAG-GTT-…
Protein sequence
…-L-K-E-G-V-S-K-D-…
One amino acid
Unfolded protein
Spontaneous self-organisation
(~1 second)
Native biologically
relevant state
• Not unique
• Mobile
• Inactive
• Expanded
• Irregular
PROTEIN FOLDING
Gene
…-CTA-AAA-GAA-GGT-GTT-AGC-AAG-GTT-…
Protein sequence
…-L-K-E-G-V-S-K-D-…
One amino acid
Unfolded protein
Spontaneous self-organisation
(~1 second)
Native biologically
relevant state
• Not unique
• Mobile
• Inactive
• Expanded
• Irregular
• Unique shape
• Precisely ordered
• Stable/functional
• Globular/compact
• Helices and sheets
STRUCTURE
One distance constraint
for every six residues
0
2
Experiment
(X-ray, NMR)
One distance constraint
for every ten residues
Cα RMSD
4
ACCURACY
Computation
(de novo)
Computation
(template-based)
Hybrid
(Iterative Bayesian interpretation of noisy NMR data
with structure simulations)
6
STRUCTURE
T0290 – peptidyl-prolyl isomerase from H. sapiens
T0288 – PRKCA-binding from H. sapiens
0.5 Å Cα RMSD for 173 residues (60% identity)
2.2 Å Cα RMSD for 93 residues (25% identity)
T0332 – methyltransferase from H. sapiens
T0364 – hypothetical from P. putida
2.0 Å Cα RMSD for 159 residues (23% identity)
5.3 Å Cα RMSD for 153 residues (11% identity)
Liu/Hong-Hung/Ngan
FUNCTION
Ion binding energy
prediction with a
correlation of 0.7
Calcium ions predicted
to < 0.05 Å RMSD
in 130 cases
Meta-functional signature
accuracy
Meta-functional signature for DXS model
from M. tuberculosis
Wang/Cheng
INTERACTION
Transcription factor bound to DNA promoter
regulog model from S. cerevisiae
Prediction of binding energies of HIV
protease mutants and inhibitors
using docking with dynamics
BtubA/BtubB interolog model from P. dejongeii
(35% identity to eukaryotic tubulins)
McDermott/Wichadakul/Staley/Horst/Manocheewa/Jenwitheesuk/Bernard
SYSTEMS
Example predicted protein interaction network from M. tuberculosis
(107 proteins with 762 unique interactions)
Proteins
PPIs
TRIs
H. sapiens
26,741 17,652 828,807 1,045,622
S. cerevisiae
5,801 5,175 192,505
2,456
O.sativa (6) 125,568 19,810 338,783 439,990
E. coli
4,208
885
1,980
54,619
In sum, we can predict functions for more than 50% of a
proteome, approximately ten million protein-protein and
protein-DNA interactions with an expected accuracy of 50%.
Utility in identifying function, essential proteins, and host pathogen interactions
McDermott/Wichadakul
SYSTEMS
Combining protein-protein and protein-DNA interaction networks to determine regulatory circuits
McDermott/Rashid/Wichadakul
INFRASTRUCTURE
~500,000 molecules over 50+proteomes served using a 1.2 TB PostgreSQL database and a sophisticated AJAX webapplication and XML-RPC API
http://bioverse.compbio.washington.edu
http://protinfo.compbio.washington.edu
Guerquin/Frazier
INFRASTRUCTURE
Guerquin/Frazier
INFRASTRUCTURE
http://bioverse.compbio.washington.edu/integrator
Chang/Rashid
APPLICATION: DRUG DISCOVERY
SINGLE TARGET SCREENING
MULTITARGET SCREENING
Disease
Target identification
Single disease related protein
Compound library
Multiple disease related proteins
Small molecule library
DRUG-LIKE COMPOUNDS
High throughput screening
Experimental verification
Success rate+
Time++++
Cost++++
Computational screening
Computational screening with dynamics
Initial candidates
Initial candidates
Experimental verification
Success rate ++
Time +++
Cost +++
Experimental verification
Success rate +++++
Time +++
Jenwitheesuk
Cost +
APPLICATION: DRUG DISCOVERY
HSV
CMV
KHSV
Jenwitheesuk
APPLICATION: DRUG DISCOVERY
APPLICATION: DRUG DISCOVERY
Computionally predicted broad spectrum human herpesvirus protease inhibitors is effective in vitro
against members from all three classes and is comparable or better than anti-herpes drugs
HSV
KHSV
CMV
Our protease inhibitor acts synergistically with acylovir (a nucleoside analogue that inhibits replication)
and it is less likely to lead to resistant strains compared to acylovir
HSV
HSV
Lagunoff
APPLICATION: DRUG DISCOVERY
Mkytsza
Predicted
inhibitory
constant
10-13
10-12
10-11
10-10
10-9
10-8
10-7
None
Van Voorhis/Rivas/Chong/Weismann
APPLICATION: RICE INTERACTOMICS
Proteome
Number
of
proteins
O. sativa japonica KOME cDNAs
O. sativa indica BGI 9311
O. sativa japonica Syngenta
O. sativa indica IRGSP
O. sativa japonica nrKOME cDNAs
O. sativa indica BGI pa64
Total
Total (unique)
25,875
40,925
38,071
36,658
19,057
37,712
198,298
125,568
Number
annotated
(%)
11,841
22,278
20,874
20,481
7478
15,286
Number
in
protein
network
Number
of
protein
interactions
(44%)
(55%)
(55%)
(56%)
(39%)
(41%)
4705
5849
5911
5835
3047
5780
88,102
95,149
104,640
110,118
38,793
98,779
98,238 (50%)
60,272 (48%)
31,127
19,810
535,581
338,783
http://bioverse.compbio.washington.edu
http://protinfo.compbio.washington.edu
BGI/McDermott/Wichadakul
APPLICATION: RICE INTERACTOMICS
BGI/McDermott
APPLICATION: NANOTECHNOLOGY
Oren/Sarikaya/Tamerler
APPLICATION: AMELOGENIN
Principal protein involved in enamel and hard tissue formation.
Multifunction protein:
– Mineralisation.
– Signaling.
– Adhesion to process matrix.
– Physical protein-protein interactions.
Never been crystallised (irregular/unstable?).
– Most proteins with non-repeating sequence are active in globular form.
– Many proteins fold into globular form upon interaction with substrate /
interactor.
– Assumption of linear and globular forms.
– Goal is to understand protein structure, function, binding to
hydroxyapatite, interaction with other amelogenin molecules, and
formation of enamel and hard tissue formation
APPLICATION: AMELOGENIN
Predicted five models (typical for CASP).
Annotate structure with experimental and simulation evidence to find best
predicted globular structure and infer function.
APPLICATION: AMELOGENIN
Signal Region
Exon 4
MGTWILFACLLGAAFAMPLPPHPGSPGYINLSYEKSHSQAINTDRTALVLTPLKWYQSMIRQPYPSYGYEPMGGWLHHQIIPVLSQQHPPSHTLQPHHHLPVVPAQQPVA
1
10
20
30
40
50
60
70
80
90
100
110
PQQPMMPVPGHHSMTPTQHHQPNIPPSAQQPFQQPFQPQAIPPQSHQPMQPQSPLHPMQPLAPQPPLPPLFSMQPLSPILPELPLEAWPATDKTKREEVD
120
130
140
150
160
170
180
190
200
210
Horst/Oren/Cheng/Wang
FUTURE
+
Structural
genomics
+
Functional
genomics
Computational
biology
MODELLING PROTEIN AND PROTEOME STRUCTURE FUNCTION AT
THE ATOMIC LEVEL IS NECESSARY TO UNDERSTAND THE
RELATIONSHIPS BETWEEN SINGLE MOLECULES, SYSTEMS,
PATHWAYS, CELLS, AND ORGANISMS
ACKNOWLEDGEMENTS
Current group members:
•Baishali Chanda
•Brady Bernard
•Chuck Mader
•Cyrus Hui
•Ersin Emre Oren
•Gong Cheng
•Imran Rashid
•Jeremy Horst
•Juni Lee
•Ling-Hong Hung
•Michal Guerquin
•Shu Feng
•Siriphan Manocheewa
•Somsak Phattarasukol
•Stewart Moughon
•Tianyun Liu
•Weerayuth Kittichotirat
•Zach Frazier
•Renee Ireton, Program Manager
Past group members:
•Aaron Chang
•David Nickle
•Duangdao Wichadukul
•Duncan Milburn
•Ekachai Jenwitheesuk
•Jason McDermott
•Marissa LaMadrid
•Kai Wang
•Kristina Montgomery
•Rob Braiser
•Sarunya Suebtragoon
•Shing-Chung Ngan
•Vanessa Steinhilb
•Vania Wang
•Yi-Ling Cheng
ACKNOWLEDGEMENTS
Collaborators:
Funding agencies:
•BGI
-Gane Wong
-Jun Yu
-Jun Wang
-et al.
•BIOTEC/KMUTT
•MSE
-Mehmet Sarikaya
-Candan Tamerler
-et al.
•UW Microbiology
-James Staley
-John Mittler
-Michael Lagunoff
-Roger Bumgarner
-Wesley Van Voorhis
-et al.
•National Institutes of Health
•National Science Foundation
-DBI
-IIS
•Searle Scholars Program
•Puget Sound Partners in Global Health
•Washington Research Foundation
•UW
-Advanced Technology Initiative
-TGIF
E. coli INTERACTIONS
McDermott
M. tuberculosis INTERACTIONS
McDermott
C. elegans INTERACTIONS
McDermott
H. sapiens INTERACTIONS
McDermott
Network-based annotation for C. elegans
McDermott
KEY PROTEINS IN ANTHRAX
Articulation points
McDermott
HOST PATHOGEN INTERACTIONS
McDermott

Modelling interactomes

Transcript Modelling interactomes

Directory