Orthology+Paralogy
Download
Report
Transcript Orthology+Paralogy
Orthology & Paralogy
Alignment & Assembly
Alastair Kerr Ph.D.
[many slides borrowed from various sources]
Overview
Orthology & Paralogy
Definitions and examples
Ways to determine an ortholog
Pre-calculations: resources
Alignment & Assembly
Differences
Key programs for each
Jalview example
Homologs
Have common origins but may or may
not have common activity.
Homologous or not?: Often determined
by arbitrary threshold level of similarity
determined by alignment
Homologs
…have common ancestry, but the way they are related
can vary
(i.e. the reasons they have diverged into different
sequences can vary)
orthologs - Homologs produced by speciation. They tend
to have similar function.
paralogs - Homologs produced by gene duplication. They
tend to have differing functions.
Orthologous or paralogous homologs
Early globin gene
Gene Duplication
-chain gene
mouse
human
Orthologs ()
ß-chain gene
cattle
cattle ß
Paralogs (cattle)
human ß
mouse ß
Orthologs (ß)
Homologs
Orthologs – diverged after speciation – tend to have similar function
Paralogs – diverged after gene duplication – some functional divergence occurs
Therefore, for linking similar genes between species, or performing
“annotation transfer”, identify orthologs
True or False?
A1x is the ortholog in
species x of A1y?
A1x is a paralog of A2x?
A1x is a paralog of A2y?
Identifying Gene/Protein Relationships
from Phylogenetic trees
orthologs - Homologs produced by speciation. Gene
phylogeny matches organismal phylogeny.
paralogs - Homologs produced by gene duplication.
Multiple copies of homologs in a given species or
evidence that gene duplication involved through
phylogenetic analysis and lack of match to
organismal phylogeny
Gene Orthology: How to detect?
Most : Identify reciprocal best BLAST hits (EGO, COGs,…)
Example Problem:
If making comparisons between human and bovine, for
example, the bovine gene dataset is still quite incomplete
Therefore, current best hit may be a paralog now and the
true ortholog not yet sequenced
human
cattle
mouse
cattle
2 Forms in 1 Species
+
+
++
+
Slides from Jonathan Eisen
+
2 Forms in 1 Species - Gene Loss
+
++
+
Loss
Loss
Gene duplicated in common ancestor
++
Unusual Distribution Pattern
+
+
Unusual Distribution - Gene Loss
+
+
Gene lost
here
Gene present in ancestor
Unusual Distribution Evolutionary Rate Variation
Gene too diverged to be found
+
+
-?
Ortholog guess via synteny
A
A
B
?
C
C
Syntenic blocks
ensEMBL calculations
http://www.ensembl.org
demo
OMA Browser
http://omabrowser.org
demo
Alignments and Assemblies
Alignment
ALL sequences from SAME region
Therefore can be useless for a
Good for
non-overlapping contigs
PCR probes/oligos
paralog/orthologs
Basis for phylogeny
Assembly:
Good for near identical sequences
Types:
De-novo
Guided [reference sequence]
Alignment
Implicit statement
Each residue in an aligned sequence
derived from the last common ancestor
[LCA]
Therefore ok to only look at
conserved regions or mask nonconserved regions
Especially for phylogeny
Alignment Tools
Faster but less accurate (some better with
gaps)
Muscle
ClustalW/X
MAFFT
Slow but more accurate
*-Coffee
T: original
3D: uses pdb as guide (structural)
M: uses multiple methods
Probcons
Alignment Edit Tools
NEVER use a word processor or
excel to edit alignments……
JalView (Java Alignment Viewer)
Good for editing
DAS capable
Multiple
Sequence
Alignment
Secondary
Structure
Prediction
Consensus
Conservation
& Clustering
PDB
Structures
‘Standard’ Formats
Analysis
Sequences
Alignments
Visualization
Features
Clickable
HTML
Annotation
Figure
Generation
FASTA MSF CLUSTAL
PILEUP BLC PFAM
Distributed
Annotation
System
GFF
Jalview Features
Trees
Jalview Annotation
Images
Line Art
Newick
Jalview DAS Client Functionality
DAS
ANNOTATION
SERVERS
•Select specific sources
•Filtered list
•Add user defined
sources
•Query matches ID to Authority
•Map to local reference frame
•Group features by source
•Type==colour
•Highlight start-end
•Mouse over for feature
name, links and scores
Assemblers
Many free options
STADEN - staden.sf.net
Original assembler, all platforms
No longer in development
Useless for next gen sequencing
MAQ and MAQView
Installed in computers in COIL