Individual eukaryotic genomes

Download Report

Transcript Individual eukaryotic genomes

Eukaryotic Genomes:
From Parasites to Primates
(part 2 of 2)
Monday, November 3, 2003
Introduction to Bioinformatics
ME:440.714
J. Pevsner
[email protected]
Copyright notice
Many of the images in this powerpoint presentation
are from Bioinformatics and Functional Genomics
by J Pevsner (ISBN 0-471-21004-8).
Copyright © 2003 by Wiley.
These images and materials may not be used
without permission from the publisher.
Visit http://www.bioinfbook.org
Individual eukaryotic genomes:
Introduction
We will next survey eukaryotic genomes. Basic issues are:
-- description of complete sequence of the chromosomes
-- annotation of the DNA to characterize noncoding DNA
-- annotation to identify protein-coding genes
-- chromosome structure
-- comparative genomics analyses
-- molecular evolution
-- relation of genotype to phenotype
-- disease relevance
Page 567
Individual eukaryotic genomes:
Introduction
We will explore the eukaryotic tree of Baldauf et al. (2000)
moving from the bottom upwards.
Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF (2000).
A kingdom-level phylogeny of eukaryotes based on
combined protein data. Science 290(5493), 972-977.
Page 567
Individual eukaryotic genomes:
Protozoans at the base of the tree
Giardia lamblia is a water-borne parasite
Disease relevance: giardiasis (causes diarrhea)
Distinguishing features: lack of mitochondria, peroxisomes
Genome size: 12 Mb
Chromosomes: 5 (range 0.7 to >3 Mb)
Website: http://www.mbl.edu/Giardia (sequencing in progress)
The genome has just three retrotransposons.
Also, it appears to have a single intron (ferredoxin gene).
Page 570
Individual eukaryotic genomes:
trypanosomes and Leishmania
Trypanosoma brucei causes sleeping sickness (Africa)
Trypanosoma cruzi causes Chagas’ disease (S. America)
Distinguishing features: transmitted by tsetse flies
Genome size: 35 Mb (+/- 25% in various isolates)
Chromosomes: 11 (range 1 to >6 Mb); also has intermediate
chromosomes and 100 linear minichromosomes
Website: http://parsun1.path.cam.ac.uk
Trypanosomes have kinetoplast DNA (circular rings of
mitochondrial DNA)(studied by Paul Englund’s lab here).
Page 571
Individual eukaryotic genomes:
trypanosomes and Leishmania
Leishmania major causes leishmaniasis
Genome size: 34 Mb
Chromosomes: 36 (range 0.3 to 2.5 Mb)
Genes: about 9800
Website: http://www.sanger.ac.uk/Projects/L_major/
Leishmania chromosome 1 has 79 protein-coding genes.
The first 29 (from the left telomere) are all transcribed from
one strand, and the next 50 from the opposite strand.
Page 571
Individual eukaryotic genomes:
malaria parasite Plasmodium falciparum
Plasmodium falciparum causes malaria, killing 2.7 million
people each year.
Distinguishing features: Four Plasmodium species infect
humans: P. falciparum, P. vivax, P. ovale, P. malariae.
The life cycle is extremely complex.
Genome size: 22.8 Mb
Chromosomes: 14 (range 0.6 to 3.3 Mb)
Genes: 5268 (comparable to S. pombe)(1 gene/4300 bp)
Website: http://www.plasmodb.org
P. falciparum has an adenine+thymine (AT) content of 80.6%.
The P. yoelli yoelli genome was also sequenced
(infects rats).
Page 573
Individual eukaryotic genomes:
malaria parasite Plasmodium falciparum
Bioinformatics approaches to Plasmodium falciparum:
-- The apicoplast (relic plastid; fatty acid, isoprene metabolism)
is a potential drug target. Apicoplast signal sequences found.
-- Comparative genomics defines some gene functions,
identifies genes lacking in closely related species
-- Genes implicated in antigenic variation and immune system
evasion can be identified (e.g. 1000 copies of vir)
-- Proteomics applied to four stages of the life cycle
(sporozoites, merozoites, trophozoites, gametocytes)
-- Atypical metabolic pathways may be exploited, e.g. use of
1-deoxy-D-xylulose 5-phosphate (DOXP) in isoprene
biosynthesis.
Page 573
Individual eukaryotic genomes:
overview of plants
• Plants for a distinct clade in the eukaryotic tree
• All plants are multicellular
• Plants are sessile, and depend of photosynthesis
(Epifagus is an exception)
• Plants originated about 1.5 billion years ago (BYA),
after eukaryotes had acquired a mitochondrion by
endosymbiosis. Plants acquired a plastid (i.e. the
chloroplast) over 1 BYA.
Page 575
After Myerowitz (2002)
and Wang et al. (1999)
Figure 16.22
Page 575
Individual eukaryotic genomes:
overview of plants
Eudicots (e.g. Arabidopsis) diverged from monocots
(e.g. rice) about 200 million years ago (MYA).
Dicots include rosids (Arabidopsis, Glycine max [soybean],
M. trunculata) and asterids (e.g. Lycopersiocon esculentum
[tomato]).
Monocots include cereals (seeds of flowering plants from
the grass family).
Page 578
Figure 16.23
Page 577
Individual eukaryotic genomes:
Arabidopsis thaliana
A. thaliana is a thale cress, sometimes called a weed.
Distinguishing features: Rapid growth rate, extensive genetics.
Member of the Brassicaceae (mustard) family.
A flowering plant (emerged 200 MYA).
Genome size: 125 Mb (very small for a plant genome).
Wheat is 16.5 Gb, barley is 5 Gb.
Chromosomes: 5
Genes: 25,498 (comparable to human)
Website: http://www.arabidopsis.org
--The entire Arabidopsis genome may have duplicated twice.
-- 24 duplicated segments of > 100 kilobases
Page 578
The TAIR web
browser for
Arabidopsis
Fig. 16.25
Page 580
Individual eukaryotic genomes:
rice
Oryza sativa is rice (subspecies indica, japonica).
Distinguishing features: This crop is a staple for half the
world’s population. Four groups generated draft versions.
Genome size: 430 Mb (1/8th of human genome).
One of the smallest grass genomes.
Chromosomes: 12
Genes: about 50,000? (more than human)
Website: http://www.usricegenome.org (and other sites)
--The rice genome displays an unusual gradient in GC
content. The mean is 43%. The 5’ end of most genes has
a higher GC content than the 3’ end (by 25%). GC-rich
regions occur selectively in exons (not introns).
Page 579
Individual eukaryotic genomes:
overview of the metazoans
The metazoans are animals including worms, insects,
and vertebrates (e.g. fish and primates).
Page 582
Individual eukaryotic genomes:
the slime mold Dictyostelium discoideum
Dictyostelium discoideum is a slime mold. This forms an
outgroup to the metazoans.
Distinguishing features: The remarkable life cycle includes
single-cell and multicellular forms.
Genome size: 34 Mb
Chromosomes: 6
Genes: about 11,000
Website: http://dictybase.org
--The Dicty genome has almost 80% AT content (similar
to Plasmodium). Thus a whole-chromosome shotgun
strategy was employed.
Page 582
Individual eukaryotic genomes:
the nematode C. elegans
C. elegans is a free-living soil nematode.
Distinguishing features: Its genome was the first of a multicellular animal to be sequenced (1998).
Genome size: 97 Mb
Chromosomes: 6
Genes: about 19,000 (spanning 27% of genome)
Website: http://www.wormbase.org
--Many worm functional genomics projects have been
performed, such as microarrays at multiple developmental
stages.
Page 584
Individual eukaryotic genomes:
the fruitfly Drosophila
Drosophila’s distinguishing features: Short lifecycle,
varied phenotypes, model organism in genetics.
Genome size: 180 Mb
Chromosomes: 5
Genes: about 13,000 (spanning 27% of genome)
Website: http://www.fruitfly.org
--At the time, largest genome for which whole genome
shotgun sequencing was applied.
--Each genome annotation improves the gene models
Page 585
This is Ann:
the mosquito Anopheles gambiae
A. gambiae was the second insect genome sequenced.
Distinguishing features: It is the malaria parasite vector.
Genome size: 278 Mb (twice the size of Drosophila)
Chromosomes: 3
Genes: about 14,000
Website: http://www.ensembl.org/Anopheles_gambiae/
--Diverged from Drosophila 250 MYA (average amino acid
sequence identity of orthologs is 56%). Compare human
and pufferfish (diverged 400 MYA, 61% identity): insect
proteins diverge at a faster rate.
--High degree of genetic variation
Page 587
Individual eukaryotic genomes:
the sea squirt Ciona intestinalis
The chordates include vertebrates (fish, amphibians,
reptiles, birds, mammals) which have a spinal column.
Some chordates an invertebrates, such as the sea squirt.
Genomes size: 160 Mb (20 times smaller than human)
Chromosomes: 14
Genes: 15,852
Significant for our understanding of vertebrate evolution.
Page 587
Individual eukaryotic genomes:
the fish Fugu rubripes
Fugu is a pufferfish (also called Takifugu rubripes).
Distinguishing features: Diverged from humans 450 MYA;
has comparable number of genes in a compact genome.
Genome size: 365 Mb (1/10th human genome)
Genes: about 30,000
Website: http://genome.jgi-psf.org/fugu6/fugu6.info.html
--Only 2.7% of genome is interspersed repeats (compare
45% in human), based on RepeatMasker.
--Introns are relatively short. 75% of Fugu introns are <425
base pairs (for human, 75% are <2609 base pairs).
Page 588
Individual eukaryotic genomes:
the mouse Mus musculus
M. musculus is the second mammal to have its genome
sequenced. Mouse diverged from human 75 MYA.
Distinguishing features: only 300 of 30,000 annotated
genes have no human orthologs
Genome size: 2.5 Gb (euchromatic portion)(cf. 2.9 Gb human)
Chromosomes: 6
Genes: about 30,000
Website: http://www.informatics.jax.org
--Dozens of mouse-specific expansions occurred, such as
olfactory receptor gene family.
--40% of mouse genome can be aligned to human genome
at the nucleotide level.
Page 589
Individual eukaryotic genomes:
primates
The phylogenetic tree shows that chimpanzee (Pan
troglodytes) and bonobo (pygmy chimpanzee, Pan
paniscus) are the two species most closely related
to humans. These three species diverged from a
common ancestor about 5.4 million years ago, based
on an analysis of 36 nuclear genes.
Large-scale genome sequencing projects have begun for
the chimpanzee. Other genomes under consideration are
the rhesus macaque monkey (Macaca mulatta) and the
olive baboon (Papio hamadryas anubis).
Page 591
Perspective and pitfalls
One of the broadest goals of biology is to understand the
nature of each species: what are its mechanisms of
development, metabolism, homeostasis, reproduction,
and behavior? Sequencing a genome does not answer
these questions directly. After genome annotation, we
try to interpret the function of the genome’s constituents
in the context of various physiological processes.
The field of bioinformatics needs continued development
of algorithms to find genes, repetitive sequences, genome
duplications and other features, as well as tools to identify
conserved regions. We may then generate and test
hypotheses about genome function.
Page 531