gene finding: the never ending problem

Download Report

Transcript gene finding: the never ending problem

Recerca de
selenoproteïnes en el
genoma d’organimes
eucariotes
Bioinformàtica, UPF.
what are selenoproteins?
Selenoproteins are proteins that
incorporate selenocysteine,
the 21st aminoacid
selenocysteine
Role of selenium
• Selenium is an essential nutrient for animals,
microorganisms and some other eukaryotes.
• Selenium deficiency may lead to disease
– Keshan disease (the primary symptom is
myocardial necrosis, which leads to weakening of
the heart)
• Named after a province in Keshan in China with low
levels of Selenium. Studies in the Jiangsu province have
indicated a reduction in the prevalence of this disease by
taking selenium supplements.
• Excess selenium can be toxic
– General Custer and the “Little Big Horn” massacre
Selenium is found in cells
mostly in selenoproteins
• Mostly redox enzimes
– Possible antioxidant protection capability
• Distributed in the three domains of life
• About 25 known selenoproteins in mammals, but the
number varies for different taxa
– 3 selenoproteins in Drosophila melanogaster
– 1 selenoprotein in Caenorhabditis elegans
Selenoproteins though to be essential for Metazoan life
• Sometimes the orthologue of a selenoprotein has Cys
instead of Sec.
SelU is a selenoprotein in
fishes, but it is not in humans
the selenocysteine codon?
the selenocysteine codon:UGA
recoding of
UGA
Selenoprotein
Biosynthesis
Selenoprotein biosynthesis
• Synthesis of Selenocysteine (Sec)
–
–
–
–
SPS1
SPS2
SLA/LP
Sec43p
• Incorporation of Sec into selenoproteins
– SBP2
– Efsec
– tRNAsec
• SECIS
selenoproteins
are usually
Aren’t selenoproteins
incorrectly
annotated
annotated?
Selenocysteine
Glycine
Real STOP
NO SECIS!
Selenoprotein identification
selenoprotein search:
SECIS search
SECIS came in a variety of sequences
SECIS search: PatScan
SECIS search in the
Drosophila genome
• 35,876 potential SECIS elements
• 1,220 termodynamically stable
selenoprotein search:
codon bias across TGA
selenoprotein search:
SECIS + exon prediction
1.Predict SECIS with PatScan
2.Gene prediction with geneid (allowing TGAinterrupted exons)
Geneid uses dynamic programming to chain input exons into gene
structures maximizing a log-likelihood function. SECIS predictions
and TGA-interrupted exons are now among the input exons.
Chaining rules state that SECIS elements can only be chained if
they terminate genes containing TGA exons, and that genes
containing TGA exon can only be terminated by SECIS predictions.
selenoprotein search:
5’
3’
selenoprotein search:
5’
3’
selenoprotein search:
5’
3’
selenoprotein search:
Putative
selenoprotein
5’
3’
Independent but coordinated
TGA in-frame gene
and
SECIS prediction
selenoprotein search in
Drosophila
(Castellano et al. EMBO Reports 2:697-702, 2001)
SECIS predicted
SECIS thermo
assessment
Genes predicted
35876
1220
12194
Predicted
Selenoproteins
(4)
Real
Selenoproteins
3
dSelK
dSelH
dSelK and dSelH
are ubiquitous selenoproteins
selenoprotein search in
mammalian genomes
• Larger genome. Much more room for
false positive SECIS predictions
• Poorer gene predicitons.
conserved SECIS between
human and mouse
characterization
of mammalian
selenoproteins
(Kryukov et al.,
Science 300:1439-1443, 2003)
selenoprotein search in
other vertebrate genomes.
SelH alignment
human vs. fugu
SelU
SelU: a novel
selenoprotein family
SelU: scattered
phylogenetic distribution
Copyright ©2005 by the National Academy of Sciences
Fig. 2. 75Se labeling
Copyright ©2005 by the National Academy of Sciences
Fig. 3. Subcellular localization of SelJ
SelJ and crystallins
Fig. 4. Expression pattern of the SelJ gene during development in zebrafish embryos
Castellano, Sergi et al. (2005) Proc. Natl. Acad. Sci. USA 102, 16188-16193
Copyright ©2005 by the National Academy of Sciences
the eukaryotic selenoproteome
Sergi Castellanos
- Doctorat l’any 2007
- PostDoc
- Marla Berry,
Universitat de Hawaii
- Andrew G. Clark,
Cornell University
- Sean Eddy,
Janelia Farm
-Group Leader
Max Plank Institute for
Evolutioary Antropology,
Leipzig
SelH alignment across fly genomes
SelH:alignment at the
DNA level
SelH:alignment at the
DNA level
3-period conservation pattern
increasing the sequences,
increases the signal
actgtgacattgactcccatgcaggacttgaca
accgtgacattcactcccatgcaggacttgaca
** ******** *********************
increasing the sequences,
increases the signal
actgtgacattgactcccatgcaggacttgaca
accgtgacattcactcccatgcaggacttgaca
acagtgacattgacacccatgcaggacttgaca
actgtgacatttactccaatacaggacttcaca
** ******** ** ***** ******** ***
increasing the sequences,
increases the signal
actgtgacattgactcccatgcaggacttgaca
accgtgacattcactcccatgcaggacttgaca
acagtgacattgacacccatgcaggacttgaca
actgtgacatttactccaatacaggacttcaca
actgtaacattgactcccatgcacgacttgaca
** ** ***** ** ***** ** ***** ***
increasing the sequences,
increases the signal
actgtgacattgactcccatgcaggacttgaca
accgtgacattcactcccatgcaggacttgaca
acagtgacattgacacccatgcaggacttgaca
actgtgacatttactccaatacaggacttcaca
actgtaacattgactcccatgcacgacttgaca
actgtgacattgactcccatgcacgacttgact
** ** ***** ** ***** ** ***** **
increasing the sequences,
increases the signal
actgtgacattgactcccatgcaggacttgaca
accgtgacattcactcccatgcaggacttgaca
acagtgacattgacacccatgcaggacttgaca
actgtgacatttactccaatacaggacttcaca
actgtaacattgactcccatgcacgacttgaca
actgtgacattgactcccatgcacgacttgact
actgtaactttgactcccatacaggacttgaca
** ** ** ** ** ***** ** ***** **
increasing the sequences,
increases the signal
actgtgacattgactcccatgcaggacttgaca
accgtgacattcactcccatgcaggacttgaca
acagtgacattgacacccatgcaggacttgaca
actgtgacatttactccaatacaggacttcaca
actgtaacattgactcccatgcacgacttgaca
actgtgacattgactcccatgcacgacttgact
actgtaactttgactcccatacaggacttgaca
accgtgacattgactcccatccaggacttgact
** ** ** ** ** ***** ** ***** **
increasing the sequences,
increases the signal
actgtgacattgactcccatgcaggacttgaca
accgtgacattcactcccatgcaggacttgaca
acagtgacattgacacccatgcaggacttgaca
actgtgacatttactccaatacaggacttcaca
actgtaacattgactcccatgcacgacttgaca
actgtgacattgactcccatgcacgacttgact
actgtaactttgactcccatacaggacttgaca
accgtgacattgactcccatccaggacttgact
actgtgacgttgactccgatgcaggacttgaca
** ** ** ** ** ** ** ** ***** **
scanning the fly mutiple-alingments for
the 3-period conservation pattern across
conserved TGA codons
glucose dehydrogenase (gld)
proposed as a selenoprotein
by Perlaky, S., Merritt, K., and Cavener (1998)
SelH alignment across fly genomes
willistoni lacks all known fly selenoproteins
SelK is a Cys homologue
SPS2 does not exist
willistoni lacks some of the
genes involved in selenoprotein
metabolism
in addition to SPS2,
• tRNA-Sec
• EF-Sec
• SLA/LP
The first animal known to lack
selenoproteins
NHGRI, Press Release
“Scientists Compare Twelve Fruit Fly Genomes”
“In
a surprising finding, researchers found that the genes that
produce selenoproteins appear to be absent in the D. willistoni
genome. Selenoproteins are responsible for reducing excess
amounts of the mineral selenium, an antioxidant found in a
variety of food sources. Selenoproteins are present in all
animals, including humans. D. willistoni appears to be the first
animal known to lack these proteins. However, researchers
suggest that D. willistoni may possibly encode selenoproteins in
a different way, opening a new avenue for further research.”
Surprisingly willistoni maintains
some others
• Secp43 is highly conserved
• SPS1 is highly conserved
Which implies that these proteins are
involved in functions other than
selenoprotein biosynthesis
Alignment of SPS1 across
Drosophilas (including willistoni)
Questions
• What are the functional consequences of the loss of
selenoproteins (if any)?
• What is the evolutionary path leading to
selenoprotein extinction?
– Loss of selenoproteins  Loss of selenoprotein factors
– Loss of selenoprotein factors  Loss of selenoproteins
D. m elanogas ter
D. willistoni
Life Span at 20јC (5mM paraquat)
100%
90%
80%
Survival
70%
Survival under
oxidative stress
C. Pallares
M.Corominas, F.Serras
Genetics, U. Barcelona
60%
50%
40%
30%
20%
10%
0%
0
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11th
Day
Control medium
D. melanogaster
D. w illistoni
100%
90%
100%
90%
% survival
80%
D. melanogaster
D. w illistoni
Life Span at 20јC (3% H 2O2)
70%
60%
50%
40%
30%
20%
80%
10%
Survival
70%
0%
0
60%
3rd
6th
9th
12th
15th
18th
day
50%
40%
30%
20%
10%
0%
0
1st
2nd
3rd
4th
5th
Day
6th
7th
8th
9th
10th
21st
24th
27th
30th
33rd
Survival under selenium toxicity,
A. Punset, M. Corominas, F. Serras, Genetics, UB
Control medium
D. M elanogaster
10-4 M Selenium
Survival
10-5 M Selenium
100%
10-6 M Selenium
90%
10-7 M Selenium
80%
10-8 M Selenium
70%
No Selenium
60%
50%
40%
30%
20%
10%
0%
0
5th
10th
14th
19th
23rd
28th
33rd
38th
42nd
47th
52nd
56th
61st
66th
70th
75th
80th
84th
Days
Control medium
D. Willistoni
10-4 M Selenium
Survival
10-5 M Selenium
100%
10-6 M Selenium
90%
10-7 M Selenium
80%
10-8 M Selenium
70%
No Selenium
60%
50%
40%
30%
20%
10%
0%
0
5th
10th
14th
19th
23rd
28th
33rd
38th
42nd
Days
47th
52nd
56th
61st
66th
70th
75th
80th
84th
Evolutionary path leading
to selenoprotein extinction
Search for selenoproteins
in other arthropoda
Evolutionary path leading
to selenoprotein extinction
• It can not be attributed to a single evolutionary event
• A consequence of the relaxation of the selective
constraints acting to maintain selenoproteins in other
metazoans
• Dispensability of selenoproteins in insects, maybe
related to the fundamental differences in the
antioxidant defense systems between these animals
and other metazoan.
Charles Chapple
-Doctorat l’any 2009
-PostDoc
-Christine Brune
INSERM, Marseille
UPF Biologia. Curs 2010-11
selenoproteins in protists
ACKNOWLEDGEMENTS
CRG/IMIM/UPF, Barcelona
Charles Chapple
Tyler Alioto
Enrique Blanco
Marco Mariotti
University of Nebraska
Gregory V. Kryukov
Sergey V. Novoselov
Vadim N. Gladyshev
Universitat de Barcelona
Marta Morey
Montserrat Corominas
Florenci Serras
IBMC, Strasbourg
Alain Lescure
Alain Krol
Harvard Unversity, Boston
University of Hawaii
Nadia Morozova
Marla J. Berry
MPI for Informatics, Saarbrucken
Mario Albrecht
Thomas Lengauer
Barcelona/Hawaii/Cornell/Janelia,
Sergi Castellano