- Cal State LA - Instructional Web Server
Download
Report
Transcript - Cal State LA - Instructional Web Server
Tree Pattern Matching in
Phylogenetic Trees
Automatic Search for Orthologs or Paralogs
in Homologous Gene Sequence Databases
By: Jean-François Dufayard, Laurent Duret,
Simon Penel, Manolo Gouy, François
Rechenmann, and Guy Perrière
Presented by: Jean Yeh
Background Information
The authors have created three databases that
gather genes into homologous families
HOVERGEN – vertebrates
HOBACGEN – prokaryotes
HOGENOM – completely sequenced organisms
Among homologous genes, need to be able to
differentiate orthologs from paralogs
Homologous Sequences
Homologs: Two genes related by descent
from a common ancestral DNA sequence
Orthologs: Two genes in different species;
evolved from a single ancestral gene by
speciation
Paralogs: Two genes related by duplication
within a genome
Orthologs and Paralogs
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif
Gene Function
Gene function tends to change after gene
duplication
Orthologs are more reliable predictors of
gene function than paralogs
Evolutionary distance also plays a role
Closely related paralogs probably more
similar than distantly related orthologs
Goal
Create algorithms that allow for automatic
searching for orthologs or paralogs in their
databases
One algorithm for tree reconciliation
One algorithm for tree pattern matching
Implement under architecture used to query the
databases
Tree Reconciliation
Infers speciation and duplication events
Compares gene tree G with species tree S to
give a reconciled tree R
Algorithm:
R=S
Step through G and R simultaneously
If nodes are incongruent, insert duplication node
in R and annotate gene losses
Tree Reconciliation
Tree Pattern Matching
A tree pattern is a peculiar tree structure with
taxonomic and evolutionary parameters
contained in nodes and leaves
Can be considered a subtree
Want to match to a target tree
E.g. pattern (X, Y, Z) matches ((X, Y), Z),
(X, (Y, Z)), and ((X, Z), Y)
Tree Pattern Matching
Uses a recurrence algorithm that takes into
account different taxonomic levels as well as
the specific branch constraints
Cuts down on run time by checking the
number of leaves in the pattern and the target
tree
Allows users to search for orthologs/paralogs
FamFetch Interface
User interface to access the databases
Incorporates both algorithms
Pattern editor has two frames: tool and
pattern
Pattern frame – interactive editor to construct,
load, save, and match patterns with a tree
database
Tool frame – tools used in pattern frame
FamFetch
Tree Rooting
For tree reconciliation, the trees must be
rooted
Authors use their reconciliation algorithm to
find the most parsimonious solution – the
one that requires the least number of gene
duplications
Reconciliation algorithm relatively fast
Tree Pattern Search
By forming their algorithm as a tree pattern
search, the authors managed to increase
possible queries for the users
Can search for gene duplication or gene
speciation events, not just orthologs and
paralogs
Also relatively fast algorithm, though lose
the human flexibility of pattern matching
Automatic Search for Orthologs
Previously done with pairwise BLAST
searches and reciprocal hits
Need all genes and if genes are wrong, results
may be wrong
Classifying genes into clusters of orthologs
depends on evolutionary distance between
species
Possible Improvement
Have program estimate reliability of
reconciliation
While it allows for easier comparative
sequence analysis, it was designed solely for
databases the authors had already created
Might be improved if it could be generalized
for more databases