Transcript Slides

Lecture 13
Structural Aspects of
Protein-Protein Interactions
PHAR 201/Bioinformatics I
Philip E. Bourne
Department of Pharmacology, UCSD
Slides modified from JoLan Chung
Agenda
• Understand the importance of studying
protein-protein interactions at the
structural level
• Classify the various types of interactions
• Look in detail at one structure-based
method for predicting protein-protein
interactions
LINK
Still seems the best review
Types of protein-protein interactions (PPI)
Obligate PPI
Non-obligate PPI
the protomers are not
found as stable
structures on their
own in vivo
Non-obligate homodimer
Sperm lysin
Obligate
homodimer
P22 Arc
repressor
DNA-binding
Obligate
heterodimer
Human cathepsin D
1LYB
Non-obligate heterodimer
RhoA and RhoGAP signaling complex
Types of protein-protein interactions (PPI)
Non-obligate PPI
Obligate PPI
usually
permanent
the protomers are
not found as stable
structures on their
own in vivo
Permanent
Transient
(many enzyme-inhibitor
complexes)
Weak
dissociation constant
Kd=[A][B] / [AB]
(electron
transport
complexes)
10-7 - 10-13 M
Kd mM-M
Intermediate
Non-obligate transient
homodimer, Sperm lysin
(interaction is broken and
formed continuously)
(antibody-antigen, TCR-MHC-peptide,
signal transduction PPI), Kd M-nM
Strong
Obligate
heterodimer
Human cathepsin D
Non-obligate
permanent
heterodimer
(require a molecular
trigger to shift the
oligomeric
equilibrium)
Kd nM-fM
Thrombin and rodniin Bovine G protein dissociates into G and G subunits
inhibitor
upon GTP, but forms a stable trimer upon GDP
Types of protein-protein interactions (PPI)
Non-obligate PPI
Obligate PPI
usually
permanent
the protomers are
not found as stable
structures on their
own in vivo
Permanent
Transient
(many enzyme-inhibitor
complexes)
Weak
dissociation constant
Kd=[A][B] / [AB]
(electron
transport
complexes)
10-7 ÷ 10-13 M
Kd mM-M
Intermediate
Non-obligate transient
homodimer, Sperm lysin
(interaction is broken and
formed continuously)
(antibody-antigen, TCR-MHC-peptide,
signal transduction PPI), Kd M-nM
Strong
Obligate
heterodimer
Human cathepsin D
Non-obligate
permanent
heterodimer
(require a molecular
trigger to shift the
oligomeric
equilibrium)
Kd nM-fM
Thrombin and rodniin Bovine G protein dissociates into G and G subunits
inhibitor
upon GTP, but forms a stable trimer upon GDP
One Approach Using Structural Bioinformatics:
Exploiting Using Sequence and Structure
Homologs to Identify Protein-Protein Binding Sites
Work of Jo-Lan Chung (Former Graduate
Student Chemistry/Biochemistry)
Co-mentored by Wei Wang
Proteins: Structure Function and
Bioinformatics 2006 62:630-640 [PDF]
Current Situation
•
We have a relatively small number of protein
complexes
•
We have a large number of predominantly apo form
structures being determined by structural genomics
and functionally driven structure determination
•
Exploiting the information obtained from these apo
structures to identify functional sites, such as proteinprotein binding sites, is an important question.
General Properties of Protein-Protein Binding Sites
•
More hydrophobic than the rest of the protein surface.
•
Relatively flat (except for enzyme-substrate binding
sites).
•
The binding free energy is not distributed equally
across the protein interfaces: a small subset of
residues at the interfaces forms energy hot spots,
enriched in tyrosine, tryptophan, and arginine.
•
Structurally conserved residues, especially polar
resides, correspond to the energy hot spots.
Previous Methods to Identify Protein-Protein
Binding Sites
•
•
•
•
•
•
•
•
Sequence conservation e.g. Consurf
Docking
Threading and homology modeling
Evolutionary tracing
Correlated mutations
Properties of patches
Hydrophobicity
Neural networks and support vector machines
Characteristics of Previous Methods
• The above mentioned methods use evolutionary
information from sequence alignments, and/or
residue properties, and/or the geometric
information from structures.
• It is difficult to judge the predictive powers of
these various methods due to the differences of
learning sets, the different definitions of binding
sites and the lack of benchmarks.
Structurally Conserved Surface Residues?
• None of the above methods consider the
residues which are spatially conserved on the
surfaces among their structure homologs.
• These residues are reported to have a
correspondence to the energy hot spots on
protein interfaces and can be derived from
multiple structure alignments.
Approach
• Survey the structurally conserved surface
residues in the non-redundant chains of hetero
complexes from the Protein Data Bank (PDB).
• Incorporate the structurally conserved surface
residues with the information derived from
sequence alignments and single structures to
identify protein-protein binding sites.
Dataset
•
All non-redundant hetero-complexes were
collected from the PDB (<30% sequence identity)
•
A pair of chains was then selected if the reduction of
the accessible solvent area (ASA) was at least 450 Å2
upon binding.
The pairs containing chains belong to SCOP class >=
8 (9 and 10 did not exist) or with length less than 80
amino acids were discarded
•
Dataset
•
•
•
•
Retrieve structure homologs at SCOP family level from Astral
database at 40% sequence purge level.
Align each chain with its homologs with CEMC.
A chain was selected if at least 4 members in its alignment were
aligned together over 60% of its length and with Z score > 4.
Discard the chains with less than 20 interfacial contacts or with
constant B-factors.
The final dataset was composed of 274 non-redundant chains of
hetero-complexes. Each of these chains was accompanied with a
structure alignment with at least 4 members.
Derive the Structurally Conserved Residues
• The structural conservation scores were
derived from the multiple structural
alignments.
• Each position in the alignment has a
structural conservation score, which
represents the conservation in 3D space.
• A position has a high conservation score
if the aligned residues are spatially
conserved.
The Structural Conservation Score
•
Raw structural conservation score
where
N N
2
C ( x) 
  L( si ( x), sj ( x))
N ( N  1) i j i
L( si ( x), sj ( x))  exp( d ( si ( x), sj ( x)))  M ( si ( x), sj ( x))
m ( a ,b )  min( m )

 max( m )  min( m )
M ( si ( x), sj ( x))  
0


if a is not gap and b is not gap
otherwise
where N is the total number of aligned structures, si(x) is the amino acid at position x
in the ith structure in the alignment, M is a modified PET substitution matrix calculated
by Valdar et al. d is the distance between Calpha atoms.
The Structural Conservation Score
• The B-factors determined by X-ray crystallographic
experiments provide an indication of the degree of mobility and
disorder of an atom in a protein structure
• Raw structural conservation scores were weighted by the
normalized B-factors (Bnorm, i) to consider the structure flexibility
C ( x)  C ( x) r ( x )
weighted
where
r ( x)  exp( Bnorm, i )
Does the Structural Conservation Score Discriminate
Interface from Non-interface Residues?
60,834 Residues from 274 Chains
60
Interface residues
Non-interface residues
50
% of the residues
40
30
20
10
0
Top 50%
Top 40%
Top 30%
Top 20%
Structural conservation score
Top 10%
Incorporating the Structural Conservation Scores to Predict the
Interface Residues
A surface residue
↓
Sequence profile + ASA + Structural conservation score
in a window of 13 residues
(The residue to be predicted and 12 spatially nearest surface residues)
↓
Support vector machine classifier trained to distinguish surface
non-interface residues vs surface residues
↓
Interface or non-interface residue ?
↓
Clustering of sites
The Performances of the Predictors
80
75
70
65
60
55
Recall (%)
50
45
Predictor 1
Predictor 2
40
35
30
25
20
15
10
5
0
5
10
15
20
25
30
35
40
45
50
55
60
65
1-Precision (%)
Predictor 1: Sequence profile + ASA.
Predictor 2: Sequence profile + ASA + structural conservation score
Clustering of Results
• Clustering was used to remove isolated interface
residues and include non-interface residues
surrounded by several interface residues
• Any 2 residues were clustered if the distance
between the C beta’s was less than 6.5A
• Clusters with only 1or 2 residues were removed
• A non-interface residue was converted to an
interface residue if the C beta of the residue and
at least 3 interface residues were located within
6A
The Performances of the Predictors
Precise prediction: at least 70% interface residues were identified.
Correct prediction: at least 50 % interface residues were identified.
Partial prediction: some but less than 50 % interface residues were identified.
Wrong prediction: no interface residues were identified.
The Predicted Binding Sites (Example 1)
Protein : domain 1 of the human coxsackie and adenovirus receptor
(CAR D1) - yellow
• Mediate adenoviruses and coxsackie virus B infection.
• CAR is an integral membrane protein expressed in a broad range of human and
murine cell type. CAR D1 is one of its two extracellular domains.
Binding partner: knob domain of the adenoviruses serotype 12 (Ad12) - blue
Predictor 1
Predictor 2
The Predicted Binding Sites (example 1)
Each CAR D1 binds at the interface between two adjacent Ad12 knob domain
• Consistent with the observation that most neutralizing antibodies to knob
are directed against the trimer, rather than monomer.
The Predicted Binding Sites (example 2)
Protein : adrendoxin (Adx)
• In mitochondria of the adrenal cortex, the steroid hydroxylating system
requires the transfer of electrons from the membrane-attached flavoprotein
AR via the soluble Adx to the membrane-integrated cytochrome P450 of
the CYP 11 family.
Binding partner: adrenodoxin reductase (AR) - blue
Predictor 1
Predictor 2
The Predicted Binding Sites (example 3)
Protein : fibroblast growth factor receptor 2 (FGFR2) Ser252Trp Mutant (Gray)
• Apert syndrome (AS) is caused by substitution of one of two adjacent
residues, Ser252Trp or Pro253Arg (green).
Binding partner: fibroblast growth factor (FGF2) (blue)
Predictor 1
Predictor 2
The Predicted Binding Sites (example 4)
Protein : Nitrogenase molybdenum-iron protein of Clostridium pasteurianum
ß subunit
Binding partner: Nitrogenase molybdenum-iron protein of Clostridium
pasteurianum  subunit
Predictor 1
Predictor 2
Is the Method Sensitive to the Multiple
Structure Alignment Algorithm?
Hemoglobin  chain
Left: automatic multiple structure alignments (CE-MC)
Right: expert curetted multiple structure alignments (HOMSTRAD)
Is the Method Sensitive to the Multiple
Structure Alignment Algorithm?
Murine t-cell receptor variable domain
Left: automatic multiple structure alignments (CE-MC)
Right: expert curetted multiple structure alignments (HOMSTRAD)
Is the Method Sensitive to the Multiple
Structure Alignment Algorithm?
Adrendoxin
Left: automatic multiple structure alignments (CE-MC)
Right: expert curetted multiple structure alignments (HOMSTRAD)
What Effects the Performance of
Predictor 2?
• Poor structure alignment
eg. More than 30% of the residues of the methane monooxygenase
hydroxylase  subunit were in the gapped positions of its structure
alignment.
• Binding residues located in flexible loops
eg. The deteriorated prediction of the interfaces on VP 1 in
p1/mahoney poliovirus may be caused by its long and flexible loops
contacting the other two coat proteins, VP 2 and VP 3.
Conclusions
1. Analysis of the structurally conserved surface residues
• The conserved residues were measured by the structural
conservation scores derived from the multiple structure alignments
followed by a weighting process with the B-factors.
• Although derived from the alignments of single structures, these
structurally conserved residues did differentiate the protein
interfaces and the rest of the surfaces.
Conclusions
2. Incorporating the structural conservation score improved the
prediction significantly.
• Before clustering, the recall increases about 10% at precision 50%,
increases about 18% at precision 40%.
• After clustering, at precision about 50 %, the number of correctly
predicted binding sites increase close to 16%, the number of
precisely predicted binding sites increase close to 13%.
• 53.0% of the binding sites were precisely predicted, 75.9% of the
binding sites were correctly predicted, and 20.4% of the binding
sites were partially predicted.
Conclusions
3. This study was an initial trial that exploits multiple structure
alignments on a large scale for the prediction of functional
regions.
4. A more suitable scoring function or weighting function could be
developed in the future.
5. This method can be used to guide experiments, such as sitespecific mutagenesis or combined with docking procedures to
limit the search space.
Acknowledgments
Dr. Wei Wang
Dr. Chittibabu Guda
Dr. BVB Reddy
All the members in Bourne group
Dr. Philip E. Bourne
This work was supported by
the Molecular Biophysics Training Grants at UCSD