Finding orthologous groups
Download
Report
Transcript Finding orthologous groups
Finding Orthologous Groups
René van der Heijden
NCMLS / CMBI, October 24th 2006
What is this lecture about?
• What is ‘orthology’?
• Why do we study gene-ancestry/gene-trees
(phylogenies)?
• Several approaches to find orthologous genes
• High-resolution orthology
• Steps involved
• Things to think about (homework)
Homology
Genes are homologous if and only if they
derive from the same ancestral gene
• Sufficient sequence similarity proofs
homology
• Very dissimilar sequences:
PSI blast, HMM searches
Homologous genes tend to have
similar functions
The usual range
Homologous genes tend to have
similar functions
Accurate function prediction
requires something better than
homology
Orthology
Orthology
“This gene in that other species …”
We don’t have chicken genes !
• They mean: the corresponding gene
• Why that particular gene ?
• Sure this actually is the gene ?
• Sure that all n orthologs are correct ?
Duplications, Speciations,
and Orthology
Evolution results in:
• Growing number of genes
– Gene duplications
– Horizontal
gene
transfer
Tendency for functional
– De novo generation
• Growing number of species
The fate of gene duplicates:
• Perish
• Find a new functional niche
expansion
Duplications, Speciations,
and Orthology
Two genes in two species are orthologous if
they derive from one gene
in their last common ancestor
• Orthologous genes are likely to have the
same function
• Much stronger than “tend to have similar
function”
Orthologous genes
orthologs
paralogs
a long long time ago
in a land far far away
time
there
current
is
a
speciation
set
of
genes
event
another
speciation
event
…
one
of
the
genes
gets
duplicated
the
line
represents
a
gene
another speciation event
resulting
with
apparent
in
two
history
species
butresulting
one
of
the
paralogous
genesgenes
is lost
in
two
paralogous
in some ancestral species
withinthe
onesame,
of theorthologous
new speciesgene
Duplications, Speciations,
and Orthology
present
genes
primal
ancestor
evolutionary distance
Homologs, Orthologs,
and Paralogs
• Homologous: one common ancestral gene
• Orthologous: separated by a speciation event
The view on orthology and
• Paralogous:
separated
by a duplication
event
paralogy
is relative
to a
certain speciation
• Orthologs and Paralogs must be Homologs
Are there homologous genes which
are not orthologous nor paralogous?
Inparalogs and Outparalogs
• Both, In- and Outparalogous genes are
separated by a gene duplication event
• For Inparalogs,
the duplication
event
Are Inparalogs
Orthologs
? is not
followed by
speciation(s)
Depends
on your definition:
Yes: two genes are orthologous if
• Outparalogs
arederive
separated
a duplication
they
from oneby
gene
in the last
ancestor
event, followed
bycommon
speciation(s)
No: two genes are orthologous if
they are only separated by
Inparalogs are
recent
paralogs
cell
division
events
•
• Outparalogs are more ancient paralogs
Reading Gene-Trees
Although genes spec1,1 and spec2,1 are
closer relatives, their distance is larger
than that between spec1,1 and spec3,1
The tree suggests at least 2 gene losses
In-, and Outparalogs,
Orthologs, and Co-orthologs
www = What, Why, and hoW?
• What:
Orthologous genes are separated by cell
division only
• Why:
Orthologous genes are likely to have the
same function
• How:
Indeed: the “how” forms the remainder of
this lecture
Several approaches
• The COG approach
• InParanoid
• Tree-based methods
COG approach
• Based on blast hits
• Establishment and extension of triangles:
COG approach
II
Extension of
orthologous groups
InParanoid I
• Method denotes
– IN- and OUTparalogs
– For TWO species
• Find all hits from species A on B
• Find all hits from species B on A
• Find all bi-directional best hits (BBH)
– These form putative orthologs
InParanoid II
• Find all hits from A on A
• Find all hits from B on B
• Find all InParalogs
– These are all hits better than the orthologs
– Better => more recently split
Detecting orthologous genes
• Usual methods based on blast hit quality:
e.g. bi-directional best hit (BBH)
ortholog
ortholog
BBH
BBH
Genes with promiscuous domains
• Gene A may hit on gene B because of a
shared domain X
• Gene B may hit on gene C because of a
shared domain Y
• Promiscuous domains require (manual)
curation
Tree-based methods
1.
2.
3.
4.
Get all homologous genes
Make multiple alignments
Generate phylogenetic gene trees
Analyze trees
•
•
•
•
Uncertainty in multiple alignment?
Different methods for distance calculations
Superpose a trusted species tree?
How to assess a level of accuracy?
The Phylogenetic Gene-Tree
• Multiple alignment for all genes
• Distance matrix calculation
– Kimura correction
– PAM model
– Categories model
• Large trees: distance-based methods
– Neighbor Joining
Uncertainty in trees
• Evolutionary noise
– Differing rates of evolution
– Convergent evolution (low complexity, coiled coils)
– Promiscuous domains (recombination, fusion, fission)
• Use of heuristic methods
– Multiple alignment
– Tree making
Analyze trees …
but don’t trust them fully
If this is correct …. this can’t be
• Rigid analysis suggests many duplications and losses
• Presume scp branch is wrongly placed!
Analyze trees …
but don’t trust them fully
• And if we accept wrong placement of
branches …
Considering
Three orthologous
one wrongly
groups
placed
gene
suggesting
leaves only
15 gene
2 gene
losses
losses
Horizontal gene-transfer
!
Remember … “In-, and Outparalogs,
Orthologs, and Co-orthologs”
Levels of Orthology
High-res versus Low-res
• Many,
• Complete, and
• Closely related
genomes
• Use phylogenetic
trees
Challenge:
Automatic Orthology assignment
Differential gene-loss
Things to think about
(homework)
• Select a partner
• Collect a gene tree (and some copies)
• Carefully deduce which nodes are duplications
and which are speciations
• Denote which genes are orthologous to each other
(orthologous groups)
• Select interesting parts to predict what
– The COG procedure would say
– InParanoid would say
– What would have happened if some genes (or species)
where not involved in the analysis
Homework: also think about …