Amsterdam 2004

Download Report

Transcript Amsterdam 2004

Bioinformatics and Evolutionary Genomics
High throughput “functional” data / functional
genomics / Omics
High-throuhput data on gene function
• What do I mean: omics, microarray, chip-on-chip
• Why are people generating these data?
– post-genomic era / systems biology: the challenge to
understand the roles of the e.g. 6,000 gene products in yeast
and how they interact to create a eukaryotic organism.
– Because they can: apply automation also to other areas of
molecular biology beyond sequencing
– To have “screens” for the research question at hand rather
than to have to test each guess at a time
• What about evolutionary genomics?
• Yeast
• Accuracy / noise
HTP data
• What do they mean: experimental knowledge, but still
what do they in terms of e.g. function?
• A deluge
• Bioinformatics is needed for basic data handling; and
has IMHO only scratched the surface in terms of
coming up with biological questions with which we
can probe this data
Microarray
data
Microarray data
two conditions often used for “screens”
(Correlated)
mRNA
expression
• mRNA levels are
systematically measured
under a variety of
different cellular
conditions, and genes
are grouped if they show
a similar transcriptional
response to these
conditions.
Hughes et al. 2000Cell
Profile Similarity Identifies Sterol-Pathway Disturbance Resulting from Deletion of
Uncharacterized ORF YER044c (ERG28) and from Dyclonine Treatment
(A) Prominent gene clusters responding to interference with ergosterol
biosynthesis,
(B) Comparison of the transcript profile of an erg28Δ strain to that of an erg3Δ
strain.
(C) Sterol content of wild-type (left) and erg28Δ (right) strains.
Ihmels et al.
2002
Nature
Genetics
Conventional hierarchical clustering of co-expression data could fail,
because genes can play a role in multiple cellular processes and their
common regulatory element can only be detected in a subset of
experiments.
detect genes that are co-expressed under a subset of conditions. a
comprehensive set of overlapping ‘transcriptional modules’
Citric acid cycle? Different activity under different
experimental conditions
Rapid divergence in expression between duplicate genes inferred from
microarray & promotor data
0.1 = 3.2 My
Clustering conditions
where the conditions are
genes: yet another way to
get to functional “links”
Yeast-2-hybrid
Pairs of proteins to be tested for
interaction are expressed as
fusion proteins ('hybrids') in
yeast: one protein is fused to a
DNA-binding domain, the other
to a transcriptional activator
domain. Any interaction
between them is detected by the
formation of a functional
transcription factor.
Examples from the
original Ito
publication:
A autophagy
B spindle pole body
function
C and vesicular
transport
Arrows ~ orientation
of two-hybrid
interaction,
beginning from the
bait to the prey.
Accuracy of Y2H and how to improve it
b
Improving reliability using protein complexes reasoning /
internal consistency
Internal filtering!
Accuracy of Y2H and how to improve it
B
Mass
spectrometry of
purified
complexes.
• Individual proteins
are tagged and
used as 'hooks' to
biochemically
purify whole
protein
complexes. These
are then
separated and
their components
identified by mass
spectrometry.
b
Exosome
Ski
Stages in mRNA degradation
socio-affinity indices:
dotted lines, 5–10;
dashed lines, 10–15;
plain lines, >15. Bait
proteins are shown
in bold and shaded
circles around
groups of proteins
indicate cores and
modules.
Cellular Function
pdb
Phylogenetic profile
Y2H
Protein interactions: literature databases
• Literature derived, normally manually curated (as opposed to
text mining)
• Biased?
• No new knowledge
• Useful for benchmarking & for the study of the evolution of e.g.
protein complexes
• For example: Munich Informatation center for Protein
Sequences (MIPS)
• Databases that contain literature and omics: Database of
Interacting Proteins (DIP), Biomolecular INteraction Database
(BIND),
Systematic screening for lethality of knockouts on a rich
medium
•
The functions of many open reading frames (ORFs) identified in genomesequencing projects are unknown. New, whole-genome approaches are
required to systematically determine their function. A total of
6925 Saccharomyces cerevisiae strains were constructed, by a highthroughput strategy, each with a precise deletion of one of 2026 ORFs Of
the deleted ORFs, 17 percent were essential for viability in rich medium.
Winzeler et al. 1999 Science
Genetic interactions (synthetic lethal/sick)
• Two nonessential
genes that cause
lethality when mutated
at the same time form
a synthetic lethal
interaction. Such
genes are often
functionally associated
and their encoded
proteins may also
interact physically.
Tong et al. 2001 Science
One thing we can do with synthetic lethals
• Ideker: protein interactions
What do to with
synthetic
lethals?
Kelley and Ideker 2005 Natu
ChIP-on-chip
• Tagged strains (one strain for each regulator).
• Micro-array for a strain to see which pieces of DNA
are found in excess if you isolate the regulator plus
bound DNA.
b
Gfp localization
• Mating of fluorescent
protein markers specific
for organelles plus
fluorescent protein tags
for each gene
Other functional genomics data: the omes
• quantitative proteomics
• Kinome
• PTMome
• (almost) All of these data is freely and publicly
available
• Take home message “wow this exists !!!”
fraction of reference set covered by data
Coverage
Bioinformatics for Benchmarking & Integration
Purified
Complexes
HMS-PCI
purified
complexes
TAP
genomic context
mRNA
co-expression
two methods
synthetic
lethality
yeast
two-hybrid
raw data
filtered data
parameter choices
Accuracy
fraction of data confirmed by reference set
combined
evidence
three methods
Advanced integration
B