Large-Scale High-Resolution Orthology Using Gene Trees

Download Report

Transcript Large-Scale High-Resolution Orthology Using Gene Trees

FOG: High-Resolution Fungal
Orthologous Groups
René van der Heijden
Project 5.10: Comparative genomics for the
prediction of protein function and pathways in
Saccharomyces cerevisiae
What is this presentation about?
• What is ‘orthology’?
• Why do we study gene-ancestry/gene-trees
(phylogenies)?
• Why high-resolution orthology?
• Automated high-resolution orthology
detection
• The FOG database and some applications
Orthology
•
•
•
•
“This gene in that other species …”
We don’t have chicken genes !
They mean: the corresponding gene ?
Why that particular gene ?
Sure this actually is the gene ?
• Sure that all n orthologs are correct ?
Orthologous genes
orthologs
paralogs
a long long time ago
in a land far far away
time
there
current
is
a
speciation
set
of
genes
event
another
speciation
event
…
one
of
the
genes
gets
duplicated
the
line
represents
a
gene
another speciation event
resulting
with
apparent
in
two
history
species
butresulting
one
of
the
paralogous
genesgenes
is lost
in
two
paralogous
in some ancestral species
withinthe
onesame,
of theorthologous
new speciesgene
Duplications, Speciations,
and Orthology
Two genes in two species are orthologous if
they derive from one gene
in their last common ancestor
Orthologous genes are
likely to have the same function
Detecting orthologous genes
• Usual methods based on blast hit quality:
e.g. bi-directional best hit (BBH)
ortholog
ortholog
BBH
BBH
KOG clusters
• Based on triangle of BBH between genes of
three species
• InParalogs are added
• Triangles are extended by other genes and
other species
KOG statistics
Low Resolution:
There must be functional specialization
within these clusters!
These large KOG clusters must have
multiple representatives per species
High-res versus Low-res
• Many,
• Complete, and
• Closely related
genomes
Challenge:
Automatic Orthology assignment
Gene Families
• Use PSI-blast to recognize (distant)
homologs
• Split gene set into families of homologous
genes
Challenge:
Promiscuous domains
Multi domain genes occur very
often in Eukaryotic genomes
Gene Families
• Promiscuous domains cause genes to be only
partially homologous:
– Gene A-B is partially homolgous to gene A-C,
as is gene B-C
• Merging everything with homologous parts
generates far too large gene families:
– Not possible to obtain proper multiple alignments
• More advanced technique for separating multidomain genes into gene families
Generating Gene Families
• More advanced technique for the merging of genes
into gene families is not functional yet
• Fall back on ‘known’ gene families using KOG:
– Low resolution orthology assignments for Eukaryotes
– Some inclusive families with many genes per species
Some statistics:
• 15 Fungal species with 104.440 genes in total
• Divided into 11.020 KOG clusters (gene families)
• Involving 70.867 genes (= 68%)
Uncertainty in trees
• Evolutionary noise
– Differing rates of evolution
– Convergent evolution (low complexity, coiled coils)
– Promiscuous domains (recombination, fusion, fission)
• Use of heuristic methods
– Multiple alignment
– Tree making
Reading Gene-Trees
Although genes spec1,1 and spec2,1 are
closer relatives, their distance is larger
than that between spec1,1 and spec3,1
The tree suggests at least 2 gene losses
Analyze trees …
but don’t trust them fully
If this is correct …. this can’t be
• Rigid analysis suggests many duplications and losses
• Presume scp branch is wrongly placed!
Analyze trees …
but don’t trust them fully
• And if we accept wrong placement of
branches …
Considering
Three orthologous
one wrongly
groups
placed
gene
suggesting
leaves only
15 gene
2 gene
losses
losses
Automatic Orthology assignment
• LOFT:
Levels of Orthology From Trees
Result
• Collection of genes is split into KOG
families
• KOG families are aligned and phylogenetic
trees are derived
• Phylogenetic trees are analyzed using LOFT
resulting in high-resolution orthology
Result
Can LOFT
be trusted?
It
seems
okay!
Applications
• We now have FOG: a complete set of high
resolution orthology assignments for fungi
• We ‘know’ which orthologous genes are
present and absent in which species
• Phyletic distribution
Complex I
Complex I
Complex I
Phyletic
distribution of
mitochondrial
orthologous
groups
Phylogenetic
Tree for
Mitochondrial
Carrier
Proteins
Orthologous group 24
is an uncharacterized
mitochondrial carrier
In yeast this is
known as YMC1,
unknown function
It is present in
all fungi, except
in Ashbya gossypii
YMC1: predicted glycine/serine
antiporter
• There are three S.cerevisiae genes with the
same phyletic distribution:
– subunit glycine decarboxylase
– other subunit glycine decarboxylase
– gene with unknown function