Reactive oxygen signaling during normal metabolism and high light

Download Report

Transcript Reactive oxygen signaling during normal metabolism and high light

What makes species different?
A study of Unique Genes
Our Definition of genes with unknown function
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces
pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster
(Dm), Anopheles gambiae (Ag), Caenorhabditis elegans (Ce), Mus musculus (Mm),
Rattus norvegicus (Rn), and Homo sapiens (Hs) were downloaded from the NCBI
website (ftp.ncbi.nlm.nih.gov).
HMMPFAM search against several major signature databases- PFAM,
TIGRFAM, SMART, and Superfamily
match to one or more of
the models in any one of
the databases
“Known (PDF)”
no matches to any one of the
models in any database
“Unknown(POF)”
“Unknowns” account for about 25% of each genome
100.00%
% of Genome
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
Unknown
10.00%
Known
0.00%
Sc
Sp
At
Os
Dm
Ag
Mm
Rn
Hs
Ce
“Unknowns” are not as conserved as “knowns”, even
between related organisms!
Yeast
100.00%
90.00%
Sc vs All unknown
80.00%
Sc vs Sp known
Sc vs Sp unknown
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
BLAST e-value
1e-80
1e-35
1e-27
1e-18
1e-12
1e-9
1e-6
1e-3
0.1
1
0.00%
10
% hits at each e-value
Sc vs All known
Relationship tree among the 10 different genomes
reveals a high degree of evolutionary divergence
among “unknowns” from different species
Known
Unknown
Sc
Sp
At
Os
Dm
Ag
Rn
Mm
Hs
Ce
Outl.
“Unknowns” have a different rate of evolution?
“Unknowns” are new genes?
“Unknowns” are mainly species-specific.
Representation of “unknowns” in the “unique-ome” of different
species.
Sc
Sp
At
Os
Dm
Ag
Mm
Rn
5955
196
20
0.00%
528
1440
2041
792
10.00%
5832
20.00%
1908
30.00%
487
40.00%
5694
3173
5601
Unknown
133
50.00%
Known
Unknown
2919
3384
60.00%
Known
19157
70.00%
836
genes
% Unique
% unique
80.00%
882
90.00%
1197
100.00%
Hs
Ce
“Unique-ome” was defined by a BLAST cut off of 10-6.
Between the 10 different genomes!
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.60
Hydro index
0.50
0.40
0.30
0.20
0.10
0.00
Sc
Sp
At
Os
Dm
Ag M m Rn
Hs
Ce
Sc
Sp
At
Os
Dm
Ag M m Rn
700.00
Avg Seq Length (aa)
Disorder/length
Compared to “knowns”, “Unknowns” are more
disordered, less hydrophobic and shorter.
600.00
500.00
Known
400.00
Unknown
300.00
200.00
100.00
0.00
Sc
Sp
At
Os
Dm Ag M m Rn
Hs
Ce
Hs
Ce
“Unknown” Conclusions
• Unknown genes are typically species-specific and might provide
some of the keys that define species-specific differences.
• Unraveling the function of “unknowns” would improve our
understanding of species-specific functions.
• Disordered protein functions are thought to include the formation
and regulation of large multi-molecular assemblies that participate
in important regulatory functions. Disordered regions on proteins
have been reported to evolve significantly more rapidly than
ordered regions.
• “Unknowns” are likely to be the result of greater
evolutionary divergence among species leading to the
establishment of new, species-specific regulatory
networks.