Transcript Slides

Journal Club
Jenny Gu
October 24, 2006
Introduction
Defining the subset of Superfamilies in LUCA
Examine adaptability and expansion of particular
superfamilies of LUCA related to function and
genome size.
Challenged Woese’s Annealing hypothesis.
Methods
3-D Structural Comparison
Domain Similarity Defined by:
SSAP
Dynamic Programming based Structure Comparison Algorithm
CORA
Comparison to 3D templates for each Superfamily.
Manual Inspection.
Profile based approaches
Detect sequence patterns between relatives
Functional Information
Public resources (COGs, GO, KEGG) and literature
Expect Curators
Methods
Genome Structural Annotation and Occurrence
Profiles
Dataset: 114 complete genomes.
100 Prokaryotic Genomes
85 Bacteria, 15 Archeobacteria species
14 Eukaryotic Genomes
Structural Annotation
CATH HMMs -> Gene3D database.
Superfamily Domain Occurrence Profiles
(Prokaryotes)
940/1278 CATH domain present in at least one genome.
Annotation Coverage: 50% of genes.
Methods
Ancestral Superfamily Set Selection
Defined by:
Present in at least 90% of species from all kingdoms.
Present in at least 70% archaeal and eukaryotic species.
Definition avoids selection of superfamilies
overrepresented in Bacteria but poorly represented
in smaller groups.
Flexibility for considering false-negative prediction error with
sequence based approach.
Guarantee selection of families in LUCA .
Eliminate error introduced by horizontal gene transfer.
Methods
Functional Annotation
Automatic Functional Annotation for 940 structural
superfamilies annotated in 100 prokaryotic species with COG.
Superfamily functionally classified according to statistically most
represented functional COG subcategory.
726/940 superfamilies annotated in COG (5% or more of
species, at least 5 genes)
For ancestral superfamily, further annotation with Pfam and
literature.
Methods
Definition of the Superfamily Functional
Groups
COG has six functional groups
Translation
Replication
Metabolism
Cellular Process
Transcription
Poorly Characterized
Not considered
RNA processing and modificaton
Chromatin structure and dynamics
Results and Discussion
Superfamily Functional Distribution in the
Ancestral Domain Set
140 superfamilies found in all organisms of the three main
kingdoms (Bacteria, Archaea, and Eukaryotes)
15% of Superfamilies, 55% of all domains in bacterial genes,
and 18% of all domains in eukaryotes.
Results and Discussion
Superfamily Functional Distribution in the
Ancestral Domain Set (cont..)
Representatives in all six COG functional groups.
Translation (48 superfamilies) and Metabolic (46
superfamilies) comprise majority of ancestral
domains.
Metabolism (385 superfamilies) has undergone a
higher expansion than translation (90 superfamilies).
Results and Discussion
Analysis of the Cellular Functions of Ancestral
CATH Superfamilies in the LUCA
Two issues in defining ancestry:
Domain ubiquity through all species.
Probable functions such domains could have
performed in LUCA.
Results and Discussion
Analysis of the Cellular Functions of
Ancestral CATH Superfamilies in the
LUCA
Results and Discussion
Analysis of the Cellular Functions of
Ancestral CATH Superfamilies in the
LUCA
Interconversion of sugars and synthesis of
polysaccharides.
Synthesis of ATP and partial equilibrium of
NAD/NADH
Part of the Calvin Cycle
Pentose phosphate pathway
Acetyl-CoA for cholesterol and/or steroids and
synthesis and degradation of fatty acids.
Part of the Krebs Cycle
Results and Discussion
Analysis of the Cellular Functions of
Ancestral CATH Superfamilies in the
LUCA
Nucleotide metabolism incomplete.
Two alternatives for LUCA
Synthesized nucleotides by de novo pathways
Incorporated from surrounding soup.
Enzyme for interconversion of nucleoside
monophosphates are present.
Results and Discussion
Analysis of the Cellular Functions of
Ancestral CATH Superfamilies in the
LUCA
DNA synthesis, repair, ligation, and modification are
represented.
Synthesis of RNA and DNA transcription
represented.
Domain related to robosomal partical and protein
synthesis are abundant.
Methyl Transfer Proteins
Results and Discussion
Analysis of the Cellular Functions of
Ancestral CATH Superfamilies in the
LUCA
Membrane and Cell wall biogenesis
Transduction of protein-protein signals and gene
regulation
Protein signal recognitio for protein transport
Cell division
Electron transport
And ATP synthase
Methods
Universal Distribution Percentage of
Superfamilies
Universal Distribution Percentages
Superfamily occurrence profiles derived from the
prokaryotic sample (Archaea and Bacteria)
100% = Superfamily present in all species.
0% = Superfamily has highly specific distribution in
just a few species.
Results and Discussion
Ancestry and Evolutionary Temperature
Results and Discussion
Ancestry and Evolutionary Temperature
Results and Discussion
Superfamily Duplication Rates and Functional
Diversification
Another measure to gauge
evolutionary temperature.
Number of homologues
within a superfamily.
Observed high correlation
with duplication and
functional diversification.
Results and Discussions
Superfamily Duplication Rates and Functional
Diversification
High universality spans
across more function
subcategories.
Metabolism has a higher
duplication rate and
functional diversification
than translation.
Methods
Genome Size Correlation and the Coefficient
of Interspecies Gene Variation (CIGV) of
Superfamilies
Domain occurrence profiles from 100 prokaryotic
sample.
Correlation coefficients between occurrence and
genome size. (compared to randomly generated null
model.)
CIGV calculated by dividing standard deviation over
all values of occurrence profile for a given
superfamily.
Methods
Statistical Analysis of Superfamily
Distributions
Kolmogorov-Smirnov two-sample test in the twotailed version for large samples.
Compared pairs of distribution between different
functional groups.
Results and Discussions
Superfamily Occurrence Profiles and Genome
Size Correlation
Results and Discussions
Superfamily Occurrence Profiles and Genome
Size Correlation
Results and Discussions
Superfamily Occurrence Profiles and Genome
Size Correlation
Results and Discussions
Superfamily Coefficient of Interspecies Gene
Variation
High CIGV values =
more adaptable.
Hotter evolutionary
temperature
Low CIGV values =
less adaptable.
Results and Discussions
Superfamily Coefficient of Interspecies Gene
Variation
Results and Discussions
Rates of Superfamily Innovation in the
Functional Groups
High Innovation
Poor Innovation
Conclusions
A more realistic distribution of superfamilies in
distant species.
Life achived modern cellular status long before
separation of three kingdoms.
Woese’s annealing hypothesis called into question.
A function of specific features and adaptabilities versus
time.