Drosophila melanogaster - School of Life Sciences

Download Report

Transcript Drosophila melanogaster - School of Life Sciences

Drosophila melanogaster – dark-bellied dew-lover - not really fruit flies,
originally called vinegar or pomace flies. Live on yeast and bacteria in
rotting fruit and other vegetation. There are about 2000 species of
Drosophila, and many more in the Drosophilidae.
True fruit flies are the Tephritidae, and live on and in fruit, causing
economic damage, e.g. the apple maggot fly Rhagoletis pomonella
(below) and the Mediteranean fruit fly or medfly. There are about 100
Rhagoletis species, and about 5000 in the family in 500 genera.
IB404 - Drosophila melanogaster 1 - Feb 15
D. melanogaster has been a premier genetic model organism since
Thomas Hunt Morgan started using it at Columbia Univ. in NY in 1910.
For example, not only did they isolate many mutants, but they mapped
them, figured out sex-linkage, utilized the larval salivary gland polytene
chromosomes (Calvin Bridges) for mapping, and made interspecific
comparisons (Alfred Sturtevant). Hermann Müller later showed X-ray
mutagenesis, which led to 1946 P/M Nobel and carcinogenesis concerns.
Christiane Nüsslein-Volhard, Eric Wieschaus, and Ed Lewis won the
P/M Nobel in 1995 for early developmental genes and HOX complex.
Bridges, Sturtevant, and Morgan in 1920
The basic features of the genome architecture were already fairly well
known, for example, that it is parsed into compact heterochromatin
around the centromeres (made up largely of satellite sequences, that is,
many tandem repeats of 100-500 bp stretches, and therefore not easily
cloned and sequenced, and also not replicated in the larval salivary gland
polytene chromosomes), with the Y and dot 4th chromosomes almost
entirely heterochromatic. Roughly 60 Mbp is heterochromatic and 120
Mbp is euchromatic (clonable, sequencable, and containing most
genes). It was also known that roughly 15% of the euchromatin is made
up of transposons, primarily long retroviral-like retrotransposons, while
many more flank, and are in, the centromeric heterochromatin. About
1300 genes had been cloned and sequenced the old-fashioned way, in
lambda phage (it used to be a PhD project to clone and sequence a gene).
Mitotic chromosomes
The idea of sequencing the entire Drosophila genome
had a rough start, with lots of scepticism in the 1980s.
Several different groups made starts, including Ian
Duncan and Dan Hartl at WashU, however eventually
in the early 1990s two major groups got going, the
Berkeley Drosophila Genome Project led by Gerry
Rubin (now vice-president of the Howard Hughes
Medical Institute), and the European Drosophila
Genome Project, led in part by Michael Ashburner at
Cambridge.
They first sequenced several well-known regions like
the Antennapedia and Bithorax HOX complexes, the
Adh region, and the tip of the X-chromosome. This
whetted the appetite of the community and the BDGP
began a genome-wide BAC clone-by-clone approach,
while the EDGP “walked” along the X chromosome
using cosmids. They generated about 15% of the
genome in a mixture of finished and draft sequence.
Gerry Rubin
Michael Ashburner
In 1998 Craig Venter left TIGR and formed Celera with $300m from
ABI, including 300 99-capillary sequencers, with a plan to sequence the
human genome. As a “demonstration project”, they sequenced D.
melanogaster using their WGS strategy in 3 months (Science in 2000).
The BDGP cleaned up this draft to a near-finished genome in 2003.
Celera’s WGS strategy was to sequence about 2m reads from a 2 kb
insert plasmid library to provide the bulk of the basic sequence (±7X
coverage), as well as about 1.3m reads from a 10 kb insert plasmid
library in an effort to bridge the many long transposons during
scaffolding (±5X coverage), and then 20,000 reads from a 130 kb insert
BAC library (±0.1X coverage) to provide long-range scaffolding
information. Total sequence coverage was around 13X.
Assembly resulted in around 800 scaffolds of total 118 Mbp, with about
1600 sequence gaps within the scaffolds and of course the 800 clone
gaps between scaffolds. However, the vast majority of the euchromatic
chromosome arm sequence was in a few long scaffolds, with many short
scaffolds near the centromeric heterochromatin (still incomplete today).
Drosophila genome (2000).
Each chromosome arm is
depicted: (A) transposable
elements, (B) gene density,
(C) scaffolds from the joint
assembly, (D) scaffolds
from the WGS-only
assembly (clone gaps are
bars), (E) polytene
chromosome divisions, and
(F) clone-based tiling path.
Red/blue clones were
completely/draft sequenced.
Each chromosome arm is
oriented left to right; the
centromere is located at the
right side of X, 2L, and 3L,
left side of 2R and 3R.
FlyBase is the home for this genome, and the other Drosophila genomes,
with an enormous body of connected information, including all the
mutants and their phenotypes, nearby transposon insertions, results of
microarray and RNAi experiments, plus the entire Drosophila literature.
Here is white and a nearby gene (CG32795) in the Genome Browser.
Like many genes in Drosophila, white is within an intron of another
gene, kirre, which is an amazing ~400 kb gene with several ~100 kb
introns, containing altogether 23 other annotated gene models, including
a more normal paralog, rst. Note ESTs suggesting additional genes.
But much of the genome looks more like this, with lots of genes, in
either orientation, and remarkably short promoter regions between them.
This is just 40 kb and it contains 13 genes. Overall, we have ~15,000
genes in ~120 Mbp, so roughly 10 kb per gene on average. Notice that
even today most of these genes still have simple CG (originally meaning
cognate gene) numbers. Per is period, of circadian rhythm fame. eIF2B
is an elongation factor involved in translation.
Perhaps the most outrageous gene in Drosophila is Dscam (Down
syndrome cell adhesion molecule), which encodes an immunoglobulin
superfamily trans-membrane protein that is involved in both brain
development and the immune system. It has four exons that are spliced in
a cassette fashion, yielding a possible 38,016 possible mRNAs, and that
many slightly different proteins, mediating cell organization in the brain.
The highest number of alternative splices known for a single gene in any
organism, even for Dscam in other insects. None for the human ortholog!
Cropped view of the X annotation - 1st, 8th, & 16th Mbp
First line
shows GC
content,
then
transposons
as black
marks, then
genes on
Numbers of genes, paralogs, and families in H. influenzae,
S. cerevisiae, C. elegans, and D. melanogaster
First row shows the total number of genes predicted in each species.
Second row shows the number of genes in each genome that appear to
have arisen by gene duplication in each lineage (are paralogs of each
other). Third row is the total number of distinct gene families for each
genome. Note that flies have fewer genes than worms, despite seemingly
increased complexity. And that the total number of “distinct” types of
proteins, presumably doing significantly different things, in an animal
approaches 10,000!
Species H. influenzae S. cerevisiae C. elegans D. melanogaster
Genes
1709
6241
18424
13601
Paralogs
284
1858
8971
5536
Families
1425
4383
9453
8065
Proteins involved in about 300
human diseases compared with
fly, worm, and yeast proteins
(1/3 here). Light to dark colors
indicate increasing similarity. +
indicates likely same function.
– indicates not.