PPT - Bioinformatics.ca

Download Report

Transcript PPT - Bioinformatics.ca

Protein Interactions
Michel Dumontier, Ph.D.
Carleton University
[email protected]
Lecture 4.1
1
Outline
Protein interactions
• Discovery
– Experimental
• Storage
Lecture 4.1
2
Molecular Interactions
A
B
• Between two molecular objects
– DNA, RNA, gene, protein, molecular complex,
small molecule, photon
– Binding Sites
• Under some Experimental Condition
• With a particular Cellular Location
• Possibly having a Chemical Action
Lecture 4.1
3
Interaction Discovery
• Databases
– Fully electronic
– Easily computer readable
• Literature
– Increasingly electronic
– Human readable
• Biologist’s brains
– Richest data source
– Limited bandwidth access
• Experiments
– Basis for models
Lecture 4.1
4
Yeast Two Hybrid Assay
•
•
The two-hybrid system is a molecular genetic tool
which facilitates the study of protein-protein
interactions.
If two proteins interact, then a reporter gene is
transcriptionally activated.
– e.g. gal1-lacZ - the beta-galactosidase gene
•
•
A colour reaction can be seen on specific media.
You can use this to
– Study the interaction between two proteins which you
expect to interact
– Find proteins (prey) which interact with a protein you have
already (bait).
Lecture 4.1
5
Two-hybrid assay
SNF4
SNF1
1.
B
A
2.
GAL4-DBD
3.
Transcription activation domain
UASG
Fields S. Song O.
Nature. 1989 Jul 20;340(6230):245-6.
PMID: 2547163
Lecture 4.1
4.
GAL1
Allows growth on galactose
6
Some Two-hybrid caveats
1.
A
2.
3.
4.
Does the DNA Binding Domain
fusion have activity by itself?
Lecture 4.1
7
Some Two-hybrid caveats
1.
C B
A
2.
Is the ‘interaction’ mediated
by some other protein?
Lecture 4.1
3.
4.
8
Some Two-hybrid questions
1.
B
A
2.
3.
Are the proteins expresssed?
Are they over-expressed?
Are they in-frame?
Are the interacting domains defined?
Was the observation reproducible?
Was the strength of interaction significant?
Was another method used to back-up the conclusion?
Are the two proteins from the same compartment?
Lecture 4.1
4.
9
Affinity purification
A
Protein of interest
Lecture 4.1
this molecule will bind
the ‘tag’.
tag modification
(e.g. HA/GST/His)
10
Affinity purification
the cell
A
Lecture 4.1
11
Affinity purification
lots of other untagged proteins
the cell
A
B
naturally binding protein
Lecture 4.1
12
Affinity purification
Ruptured membranes
A
B
Lecture 4.1
cell extract
13
Affinity purification
A
B
Lecture 4.1
untagged proteins
go through fastest
(flow-through)
14
Affinity purification
A
B
Lecture 4.1
tagged complexes
are slower and come out
later (eluate)
15
Some affinity purification questions
A
B
Lecture 4.1
Is the bait protein expressed and in frame?
Is the bait protein observed?
Is the bait protein over-expressed?
Are the interacting domains defined?
Was the observation reproducible?
Was the interactor found in the background?
Was the strength of interaction significant?
Was the interaction saturable?
Was the interactor stoichiometric with the bait protein?
Was another method used to back-up the conclusion?
Was tandem-affinity purification (TAP) used?
Was the interaction shown using an extract or a purified protein?
Is the inverse interaction observable?
Are the two proteins from the same compartment?
Are the two proteins known to be involved in the same process?
Is the interactor likely to be physiologically significant?
16
Some affinity purification caveats
First and most importantly,
this is only a representation of the observation.
A
You can only tell what proteins are in the eluate;
you can’t tell how they are connected to one another.
B
If there is only one other protein present (B), then its likely that
A and B are directly interacting.
A
B
Lecture 4.1
C
But, what if I told you that two other proteins (B and C) were
present along with A….
17
Complexes with unknown topology
A
B
A
C
B
A
C
B
C
Which of these models is correct?
The complex described by this experimental result is
said to have an Unknown Topology.
Lecture 4.1
18
Complexes with unknown stoichiometry
A
B
A
C
Here’s another possibility?
The complex described by this experimental result is
also said to have Unknown Stoichiometry.
Lecture 4.1
19
High-throughput Mass Spectrometric Protein
Complex Identification (HMS-PCI)
Mike Tyers, SLRI
Ste12
Ho et al. Nature. 2002 Jan 10;415(6868):180-3
Lecture 4.1
20
Lecture 4.1
21
Synthetic Genetic Interactions
• Synthetic genetic interactions (lethal, slow growth)
• Mate two mutants without phenotypes to get a
daughter cell with a phenotype
• Synthetic lethal (SL), slow growth
• robotic mating using the yeast deletion library
• Genetic interactions provide functional data on
protein interactions or redundant genes
• About 23% of known SLs (1295 - YPD+MIPS) are
known protein interactions in yeast
Tong et al. Science. 2001 Dec 14;294(5550):2364-8
Lecture 4.1
22
Working overtime
Charlie Boone’s Robots
Lecture 4.1
23
Cell Polarity
Cell Wall Maintenance
Cell Structure
Mitosis
Chromosome Structure
DNA Synthesis
DNA Repair
Unknown
Others
Lecture 4.1
Synthetic Genetic Interactions in Yeast
24
Tong, Boone
SGA Synthetic Genetic Interaction Network 2004
~1000 Genes
~4000 Interactions
132 SGA Screens
Lecture 4.1
25
Tong, Boone, Science, Feb 2004
A measure of confidence?
• How do you know if the interaction really
exists?
• Each method has its advantages and
disadvantages.
– Be aware of systematic errors (i.e. tag effects)
– Be aware of contaminating proteins.
• Each method observes interactions from a
slightly different experimental condition.
• Support from many different sources is
certainly better than just one.
Lecture 4.1
26
Outline
Molecular interactions
• Discovery
• Storage
– Databases
– File Formats
• Data Mining
Lecture 4.1
27
Interaction/Pathway Databases
• Arguably the most accessible data source, but...
• Varied formats, representation, coverage
• Pathway data extremely difficult to combine and use
Pathway Resource List (http://cbio.mskcc.org/prl/)
Lecture 4.1
28
http://bind.ca
• A free, open-source database for archiving and exchanging
molecular assembly information. BIND is managed by the
Blueprint Initiative at Mount Sinai Hospital in Toronto.
• The database contains
– Interactions/Reactions
– Molecular complexes
– Pathways
• BIND has an extensive data model, GNU software tools and is
based on the NCBI toolkit; extended recently to XML/Java
• The ~175000 BIND records are curated and validated.
Bader GD, Betel D, Hogue CW. (2003) BIND: the Biomolecular Interaction Network
Database. Nucleic Acids Res. 31(1):248-50 PMID: 12519993
Lecture 4.1
29
BIND Interaction Types
BIND Interaction Types
Protein - DNA
25%
Protein - Not
Specified
12%
Other
9%
Protein - Protein
54%
Lecture 4.1
Protein - RNA
1%
Protein - Small
Molecule
1%
Small Molecule Gene
1%
Gene - Gene
4%
30
Interaction Experimental Evidence Captured
Affinity
Chromatography
8%
Cross Linking
25%
Interaction
Experimental
Evidence
in BIND
Three Dimensional
Structure
20%
Other
1%
SGA
8%
Two Hybrid Test
38%
Interaction Experimental Evidence Captured
Light Scattering
11%
Remaining
1% Immunostaining
9%
Microarray
14%
Resonance Energy
Transfer
9%
Not Specified
6%
Other
8%
Gel Filtration
Chromatography
14%
Fluorescence
Anisotropy
6%
Lecture 4.1
Elisa
6%
Equilibrium
Dialysis
16%
Electron
Microscopy
2%
Gel Retardation
Assays
1%
Gradient
Sedimentation
1%
Colocalization
1%
Competition
Binding
1%
31
Lecture 4.1
32
Lecture 4.1
33
55 Identifier Searches
Supported!
Lecture 4.1
34
Lecture 4.1
35
GI Pair - CSV Export
Lecture 4.1
36
BIND Record Header
•
•
•
•
•
BIND record identifier
Description & Division
Publications that support or dispute interaction
Export Options
Network Visualization
Lecture 4.1
37
BIND Record View
Lecture 4.1
38
BIND Record View
•
•
•
•
•
•
The Interacting Molecules (A and B)
Main identifier: GI
Organism
Cross-references and aliases
Gene Ontology terms
Proteoglyphs
– Graphical representations of domain and protein structure.
• Ontoglyphs
– Graphical representations of molecule function, localization
and binding
Lecture 4.1
39
Gene Ontology
• Functional protein annotation
• http://www.geneontology.org
• Controlled vocabulary for protein function and
localization
• Molecular function e.g. DNA helicase
• Biological process e.g. mitosis
• Cellular Component e.g. nucleus
• Thousands of terms…
Lecture 4.1
40
Lecture 4.1
41
Lecture 4.1
42
Ontoglyph Summary View
Lecture 4.1
43
Ontoglyph Filtering
Lecture 4.1
44
Lecture 4.1
45
Lecture 4.1
46
Other Interaction Databases
• DIP
– http://dip.doe-mbi.ucla.edu
• MINT
– http://mint.bio.uniroma2.it/mint
• MIPS
– http://mips.gsf.de/proj/yeast/tables/interaction/
• IntAct – EBI’s interaction database
– http://www.ebi.ac.uk/intact/
• Human Protein Interaction Database
– http://www.hpid.org/
• TRANSFAC – transcription factors
– http://www.gene-regulation.com/
Lecture 4.1
47
Information Exchange
Software
Database
User
>100 DBs and tools
Tower of Babel
Lecture 4.1
With Data
Exchange Format
48
Data Exchange File Formats
• BIND http://bind.ca
– Peer reviewed but closed process (Spec v3.1)
– ASN.1 or XML DTD/Schema
• PSI-MI http://psidev.sourceforge.net
– Peer reviewed, HUPO community standard
– Widely adopted
• BioPax http://www.biopax.org
– Community schema (Sloan Kettering, BioPathways
Consortium)
– XML Schema, OWL, Protégé and GKB
• SBML
– Widely adopted for representing models of biochemical
reaction networks
Lecture 4.1
49
BIND
ASN.1 (text)
Lecture 4.1
XML
Flat File
50
PSI level 2
Lecture 4.1
51
PSI Record Format
Lecture 4.1
52
BioPAX
http://www.biopax.org
• Represent:
–
–
–
–
–
Metabolic pathways
Signaling pathways
Protein-protein, molecular interactions
Gene regulatory pathways
Genetic interactions
• Accommodate representations used in existing
databases such as BioCyc, BIND, WIT, aMAZE,
KEGG, Reactome, etc.
• Community effort (open meetings)
Lecture 4.1
53
Conclusion
• Many experimental techniques to generate
interaction data
• Interaction databases like BIND are a great
resource for building up interaction networks
into pathways
• Common standards for file formats imperative
for making use of all this data!
Lecture 4.1
54