- Cal State LA - Instructional Web Server

Download Report

Transcript - Cal State LA - Instructional Web Server

Tree Pattern Matching in
Phylogenetic Trees
Automatic Search for Orthologs or Paralogs
in Homologous Gene Sequence Databases
By: Jean-François Dufayard, Laurent Duret,
Simon Penel, Manolo Gouy, François
Rechenmann, and Guy Perrière
Presented by: Jean Yeh
Background Information
 The authors have created three databases that
gather genes into homologous families



HOVERGEN – vertebrates
HOBACGEN – prokaryotes
HOGENOM – completely sequenced organisms
 Among homologous genes, need to be able to
differentiate orthologs from paralogs
Homologous Sequences
 Homologs: Two genes related by descent
from a common ancestral DNA sequence
 Orthologs: Two genes in different species;
evolved from a single ancestral gene by
speciation
 Paralogs: Two genes related by duplication
within a genome
Orthologs and Paralogs
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/orthologs3.gif
Gene Function
 Gene function tends to change after gene
duplication
 Orthologs are more reliable predictors of
gene function than paralogs
 Evolutionary distance also plays a role
 Closely related paralogs probably more
similar than distantly related orthologs
Goal
 Create algorithms that allow for automatic
searching for orthologs or paralogs in their
databases



One algorithm for tree reconciliation
One algorithm for tree pattern matching
Implement under architecture used to query the
databases
Tree Reconciliation
 Infers speciation and duplication events
 Compares gene tree G with species tree S to
give a reconciled tree R
 Algorithm:



R=S
Step through G and R simultaneously
If nodes are incongruent, insert duplication node
in R and annotate gene losses
Tree Reconciliation
Tree Pattern Matching
 A tree pattern is a peculiar tree structure with
taxonomic and evolutionary parameters
contained in nodes and leaves
 Can be considered a subtree
 Want to match to a target tree
 E.g. pattern (X, Y, Z) matches ((X, Y), Z),
(X, (Y, Z)), and ((X, Z), Y)
Tree Pattern Matching
 Uses a recurrence algorithm that takes into
account different taxonomic levels as well as
the specific branch constraints
 Cuts down on run time by checking the
number of leaves in the pattern and the target
tree
 Allows users to search for orthologs/paralogs
FamFetch Interface
 User interface to access the databases
 Incorporates both algorithms
 Pattern editor has two frames: tool and
pattern


Pattern frame – interactive editor to construct,
load, save, and match patterns with a tree
database
Tool frame – tools used in pattern frame
FamFetch
Tree Rooting
 For tree reconciliation, the trees must be
rooted
 Authors use their reconciliation algorithm to
find the most parsimonious solution – the
one that requires the least number of gene
duplications
 Reconciliation algorithm relatively fast
Tree Pattern Search
 By forming their algorithm as a tree pattern
search, the authors managed to increase
possible queries for the users
 Can search for gene duplication or gene
speciation events, not just orthologs and
paralogs
 Also relatively fast algorithm, though lose
the human flexibility of pattern matching
Automatic Search for Orthologs
 Previously done with pairwise BLAST
searches and reciprocal hits

Need all genes and if genes are wrong, results
may be wrong
 Classifying genes into clusters of orthologs
depends on evolutionary distance between
species
Possible Improvement
 Have program estimate reliability of
reconciliation
 While it allows for easier comparative
sequence analysis, it was designed solely for
databases the authors had already created
 Might be improved if it could be generalized
for more databases