PhyloPat2 - Department of Computing Science

Download Report

Transcript PhyloPat2 - Department of Computing Science

PHYLOPAT: AN UPDATED VERSION
OF THE PHYLOGENETIC
PATTERN DATABASE CONTAINS GENE
NEIGHBORHOOD
Tim Hulsen et.al.,
Nucleic Acids Research, 2009, Vol. 37, Database issue
Presenter: Reihaneh Rabbany
Presented in Bioinformatics Course (CMPUT 606),
Instructed by Prof. Guohui Lin,
Computing Science Department,
University of Alberta,
Winter 2009
INTRODUCTION
Phylogenetic patterns
 Show the presence or absence of certain genes
in a set of whole genome sequences
 Can be used to determine sets of genes that
occur only in certain evolutionary branches
 More Common as increasing amounts of
orthology data have become available
 Phylogenetic Patterns Search tools are available
for querying proteins, but not for querying genes

2
PHYLOPAT
PhyloPat is a database which offers the
possibility of querying the Ensembl database
using any phylogenetic pattern
 Functionalities :

Gene neighborhood view
 Anticorrelating patterns
 Support of Entrez ‘ Gene IDs
 Direct sequence retrieval of members of a
phylogenetic lineage

3
ENSEMBL

Human genome



3 billion base-pairs
35,000 genes
The genome alone is of little use

Locations and relationships of individual genes
Manual annotation
 Ensembl


Ensembl (freely accessible)
Sequence data is fed into a software "pipeline“
 Creates a set of predicted gene locations
 Saves them in a MySQL database


Originally focus on Human

Now includes mouse, fruitfly, zebrafish, plants, fungi, …
4
PHYLOPAT - DATABASE CONTENT


A set of phylogenetic lineages
Complete set of orthologies Collected

All 39 species’ genes in Ensembl




741 species pairs
815 452 genes
19 010 478 orthologous relationships
 11 446 546 one-to-one
 4 588 300 one-to-many
 2 975 632 many-to-many
Ensembl ortholog detection pipeline

Similarity values by





Best reciprocal hits and best score ratio (WU BLASTP)
Graph of gene relations and Clustering
Multiple alignment (MUSCLE )
Phylogenetic tree (TreeBeST )
Orthologous relationships
5
PHYLOPAT - DATABASE CONSTRUCTION
Generating phylogenetic lineages
 Determining evolutionary order



Using the NCBI Taxonomy
Phylogenetic tree  Phylogenetic lineages
 For each gene in the first species
Look for orthologs in the other species
 Add all orthologs to the phylogenetic lineage



Check for orthologs themselves, until no additional
orthologies were found for any of the genes
Repeat for all genes in all 39 species that were
not yet connected to any phylogenetic lineage
6
WEB APPLICATION

A web interface

Query the PhyloPat MySQL database
Phylogenetic lineages
 Phylogenetic patterns

7
OMNIPRESENT - OLIGOPRESENT POLYPRESENT GENES

Omnipresent

Genes present in all 39 species


688 omnipresent genes


Which most likely have important functions, since they are
present in all species.
Oligopresent

Genes that exist in only one or two species


phylogenetic pattern
‘11111111111111111111111111111111111111’ (or MySQL
regular expression ‘^1+$’)
Which species are evolutionary most related
Polypresent

Genes that are missing in only one or two species

Measure for evolutionary relatedness
8
ANTICORRELATING PATTERNS
Patterns that are exactly opposite
 Phylogenetic lineages with anticorrelating
patterns can be functionally completely different,
but could also be highly similar in function

‘000000000000000010111001111001111110010’
 ‘111111111111111101000110000110000001101’


These genes can be analogous i.e. performing a
similar function without being evolutionary
related.
9
GENE NEIGHBORHOOD

Inferring ‘true’ orthology
Orthologous conservation of gene neighborhood
 Human gene ENSG00000134398

Has two predicted orthologs in chimpanzee:
 gene ENSPTRG00000007893
 gene ENSPTRG00000009535
 Only correspond to the gene neighborhoods of gene
ENSPTRG00000007893, for nine of the nearest neighbors

Inferring functional annotation
 Build hypotheses about the processes or
pathways that genes might be involved in

10
FASTA-FORMAT SEQUENCE FILES

Both the pattern search output and the gene
neighborhood view contain links to FASTA files
of the peptide sequences
11
DISCUSSION AND CONCLUSION

PhyloPat is useful in
Orthology detection
 Evolutionary studies
 Gene annotation


Complex Queries

It is possible to determine
A species set that should be included (1),
 A species set that should be excluded (0)
 A species set which presence is indifferent (*)

Using of regular expression queries
 Easy-to-use web interface
 Relies only on one database (Ensembl)

12
DISCUSSION AND CONCLUSION (CONT.)

Gene neighborhood view
Locating evolutionary-related genomic clusters of
genes
 Detecting the ‘true orthologs’ within large sets of
predicted orthologs
 Functional annotating less well known genes


PhyloPat will be updated with each major
Ensembl release to ensure up-to-date and reliable
phylogenetic lineages (species added)
13
LINEAGE INFORMATION OF PP000255
14
QUESTIONS
15