PowerPoint-presentatie

Download Report

Transcript PowerPoint-presentatie

Identification of protein-protein binding
motifs
Felipe Leal Valentim
Aalt-Jan van Dijk
[email protected]
[email protected]
Plant Research International
Applied Bioinformatics
Protein-protein binding interfaces
Protein-protein binding interfaces
Surface
Surface
Interface
Ligand binding site
Core
Core
Core structural residues
Properties:


DNA-binding site
Exposed in the protein surface;
Functionally/Structurally important residues are more highly conserved;
Changing the specificity of the protein interaction
[van Dijk AD et al., PLoS Comput Biol. 2010] - Sequence Motifs in MADS
Transcription Factors Responsible for Specificity and Diversification of ProteinProtein Interaction
Protein-protein binding motifs
Interface
Protein-protein binding motifs
Protein binding interfaces are composed by residues highly conserved and
exposed in the surface;
The interface can be represented by short sequence motifs; which are
thought to be overrepresented in pairs of interacting proteins.
Identification binding interfaces from structures
Protein 1
Arabidopsis Protein
Histidine
1
Binding interface
Protein 2 Kinase4
Interface
Complex 1-2
Arabidopsis Trans Zeatin
[Hubbard SJ, Thornton JM] Naccess V2.1.1 - Atomic Solvent Accessible Area Calculations
Protein 2
Binding interface
Structural information available in the PDB
Sequence- and interactome-based pipeline to locate binding sites
in Arabidopsis proteins
 Sequences
-> The evolutionary conservation;
 Sequences
-> Residue surface accessibility;
 Interactome -> Overrepresented motifs;
Motif that are: likely to be exposed in the
surface; conserved across species; and
overrepresented in pairs of interacting
proteins.
Sequence- and interactome-based pipeline to locate binding sites
in Arabidopsis proteins
IAA7
IAA2
IAA11
SHY2
IAA1
IAA16
TPL
IAA18
Sequence- and interactome-based pipeline to locate binding sites
in Arabidopsis proteins
Protein1-Protein2
>Protein sequence2
Protein2-Protein4
...
>Protein sequenceN
ProteinN-ProteinM
Conservation Protein N
Find orthlogs from each protein sequence
OrthoMCL1
Best blast reciprocal hint2
Calculate conservation score
Al2CO3
Predict residue surface accessibility (RSA)
SABLE4
RSA
RSA Protein 1
RSA Protein 2
..
.
>Protein sequence1
Conservation
Conservation Protein 1
Conservation Protein 2
..
.
Input Interacting list
...
Input fasta sequences
RSA Protein N
Assessment of the pipeline's performance
Non-interface
motifs
Predicted
motifs
False Positives (FP)
Precision = TP/(TP + FP)
Interface
motif
True Positives (TP)
Assessment of the pipeline's performance
 Coverage: up to 42%, 22%
and 42%, respectively for
the human, yeast and
Arabidopsis subsets.
 Precision: up to 58%, 96%
and 100%.
Locating interaction binding sites in Arabidopsis sequences at a
large scale – Overview
 Predicted motifs: 1498 interactions among
985 proteins
36% of the proteins in the interactome and
~5.5% of all Arabidopsis proteins
Validation and bioinformatics analysis
Comparison with single nucleotide polymorphism (SNP) data
nsSNP’s
Protein
sequence
Predicted protein-protein binding sites
nsSNPs(protein sequence):2.2% > nsSNPs(binding sites):1.6%
Functional constraints
Intermolecular coevolution
Comparison with annotation of amino acid mutagenesis
amino acid mutagenesis
Proteins with a predicted motif
n=985
Protein
sequence
Others functionally important sites
Protein-protein binding sites
DNA binding sites
Mutagenesis annotation (UniProt)
(n=38)
16 cases: predicted motifs overlap
the mutated amino acid
Some interesting cases
Master's Project Proposal: Cross-species analysis of proteinprotein binding motifs
Question???????
Practical assignment – Perl scripting for