gene families

Download Report

Transcript gene families

IB404 - 12 - Other arthropods – Feb 27
1. Mosquitoes of the genus Anopheles are the major
vectors of malaria around the world, with about
300m cases per year, while A. gambiae is the primary
vector in Africa, where about 1m deaths, mostly
children, occur each year, plus lots of morbidity.
2. Plasmodium resistance to quinone and Anopheles
resistance of insecticides, and opposition to DDTspraying, has led to a resurgence of malaria in Africa,
with a particularly virulent and lethal form of
Plasmodium causing considerable concern.
3. Two Plasmodium genomes
and the Anopheles gambiae
genome were completed and
published in 2002. We may
talk about Plasmodium later.
Anopheles was sequenced by
Celera using a 12X WGS
strategy, funded by the NIHNational Institutes of Allergy
and Infectious Diseases.
4. The genome is about 250 Mbp with about 14,000 genes. Obviously the major comparisons to
be made were with Drosophila melanogaster, and these two major lineages of flies (representing
the two major suborders of Diptera) are thought to have diverged roughly 250 Myr ago (in the
Permian, before the mass extinction that ended the trilobites, and mammals split from reptiles).
5. The first observation is that roughly half the genes, about 6000, are still 1:1 orthologs, simply
diverged in sequence since then, with the average amino acid identity down to around 55%,
although ranging from about 20-100%. Another ~1000 genes encode proteins that have
complicated 2:1, 3:1, 4:1 and
higher relationships, that is, there are
paralogs in one or both species resulting
from gene duplications in either lineage.
6. Another ~2000 genes encode proteins
with weak matches in the other fly and
other insects, while another ~1000 have
matches in non-insect species - these
may be lost from one fly.
7. Finally 2-3000 genes encode proteins
with no matches in other genomes so
are species-specific. These must be
rapidly evolving proteins with
ecologically or sexually relevant roles to
be evolving so fast. They are also
shorter, and may not be well annotated.
8. Within the
~6000 1:1
orthologs it is
possible to discern
trends in terms of
which classes of
proteins are most
conserved versus
rapidly evolving.
The most
conserved are
structural proteins,
like actins and
tubulins, while the
most rapidly
evolving even
though not
duplicated, are
proteins involved
in immunity and
defense. And, of
course, the 50%
UNKNOWN
category.
9. Anopheles has three chromosomes, and remarkably,
when they mapped where all these 1:1 fly orthologs were
on the chromosome arms, it turns out that despite a lot of
gene movement between arms, the basic identity of the
five chromosome arms can still be recognized
(unfortunately, except for the X, they have different
names). That is, the arms have stayed intact through 250
Myr of evolution in each lineage so there is still a lot of
synteny (shown by colors in diagram). Presumably there
have been innumerable paracentric inversions within the
arms, but very few pericentric inversions that would mix
the arms, and relatively few translocations or
transpositions between different chromosomes. The
autosomal arms themselves have not even been reassociated with each other, e.g. 2L and 2R in Drosophila
are now 3R and 3L in Anopheles, although this may just
be chance, because such re-assortments are known even
within the genus Drosophila.
10. There is also still considerable microsynteny, that is,
genes in the same orientation and order on segments of
chromosomes, but it never extends very far before some
kind of chromosomal rearrangement has reorganized
segments of genes. A few large clusters remain, like the
homeotic genes in the HOX cluster.
11. Several gene families were
identified as considerably larger in the
mosquito, and these commonly
involved local expansions or clusters
of genes, for example, this family of
FBN domain proteins, thought to
mediate binding to bacteria, perhaps in
the gut in association with the
bloodmeal. Notice that there are just
two instances of 1:1 orthologs in this
gene family (near the top of the tree),
the rest are species-specific
expansions, primarily in Anopheles.
And most of these expanded
subfamilies of genes are in tandem
arrays in particular sites in the genome
(indicated by the lines to the
chromosomes on the right).
Other gut enzymes are also somewhat
expanded, e.g. serine proteases and
exo- and endo-nucleases, again
presumably all to do with digesting
the large bloodmeal.
12. Many other rapidly evolving
gene families show similar, if less
uneven, species-specific gene
subfamily expansion. For
example, cytochrome p450s
involved in detoxification of all
sorts of xenobiotics, including
insecticides, and immune system
genes involved in destroying
bacteria. I worked up all the
chemoreceptors involved in
olfaction and taste and found the
same kinds of things. The example
here is the odorant receptors. Most
of the Anopheles receptors in red
form discrete clusters in this tree,
as do most of the black Drosophila
receptors. Only two 1:1 orthologs
remain, for example at the very
bottom of this tree. This protein
happens to be a obligate
heterodimer partner for all of the
others, so it can’t evolve very
quickly (notice short branches).
Anopheles
subfamily
expansion
Drosophila
subfamily
expansion
Conserved orthologs
13. Many transcriptome and proteome studies have now been done with this genome. Here is a
simple one comparing un-fed versus blood-fed females (after 1-3 days when they are converting
the blood meal into eggs). The classes of up- and down-regulated genes match the biology, for
example, muscle and other structural genes are down-regulated because the mosquito is not
actively flying and seeking hosts, while digestive enzymes, lipid transporters, and synthetic
pathways were up-regulated, presumably facilitating conversion of the blood meal into eggs.
Aedes aegypti
Aedes aegypti is the major vector of dengue fever today,
but is commonly called the yellow fever mosquito because
it vectored that before it was largely eliminated by
vaccination (which works poorly for dengue because of
the many viral variants). Its genome turned out to be huge,
around 1.4 Gbp or half the size of ours, and was a real
struggle for WGS sequencing. The assembly therefore has
many more gaps making it harder to work with.
The larger size is mostly the
result of accumulation of a
huge diversity of transposons,
shown on right in pie chart.
Note that only one third of
this genome is single copy
DNA, which is the genic
portion. MITES are small
versions of DNA transposons,
while SINES are small
versions of retrotransposons.
The Culex pipiens genome (vector of West Nile virus) is done. 12 more Anopheles are starting.
Bombyx mori
Bombyx mori, the domesticated silk moth from Asia,
was the first lepidopteran genome sequence available
(Japanese and Chinese papers in 2004, and joint in
2009). It is around 500 Mbp, 50% transposons, with
again around 15,000 genes annotated. Without
belaboring all the details, you can imagine their
identification of many genes involved in producing
silk, as well as the usual arrays of odorant receptors,
cuticle proteins, defense genes, etc.
The Beijing Genome Institute, now known simply as
the BGI, then re-sequenced 40 wild and domesticated
strains using ILLUMINA, each to 3X coverage, to
identify SNPs (single nucleotide polymorphisms) and
indels (insertions and deletions) amongst them by lining
these up with the reference sequence (the closest form
of comparative genomics, becoming population
genomics, something now done with humans too). They
find remarkably that the wild strains (green) are only
slightly more polymorphic than the domesticated strains
from all over the world. Somehow domestication was
achieved without a bottleneck?
Tribolium castaneum
Tribolium castaneum is called the red flour beetle because it is a pest
of stored grains. It is also the second best developed molecular
genetic model system in insects after Drosophila melanogaster, with
good RNAi, mutants, transformation, and ease of maintenance of
strains. As a result it has become a major system for comparative evodevo studies, especially of the early developmental and HOX
complex genes. Its genome is also nice and small at just 160 Mbp,
only 30% transposons, and ~16,000 nice small compact genes and
small introns (unlike Aedes and Bombyx genes with long introns).
I’m jumping ahead here, but including the
honey bee Apis mellifera in the analysis,
the authors of this paper show that while
the vast majority of the clear ~6700
orthologous genes that these insects share
with us humans are shared by all three
insect genomes (center of Venn diagram),
each group has some genes uniquely
shared with us, usually around 100-200.
Thus some of these genes cannot be
studied in Drosophila as they have been
lost from flies. We’ve studied several, e.g.
telomerase and a novel opsin lineage, plus
the entire DNA methylation system.
Apis mellifera
The honey bee genome is ~250 Mbp and we conservatively annotated
~12,000 genes. We are currently upgrading the genome assembly with
454 and ILLUMINA sequence, plus using deeper 454 sequencing of
cDNAs and comparisons with dwarf honey bees and bumble bees to
identify additional genes. It has many remarkably divergent features,
some of which make sense in light of its radically divergent ecology.
1. Its genome is a mosaic of AT- and GC-rich regions, with most short conserved genes in the ATrich regions. We have no idea what this is about, essentially the reverse of our genome.
2. It has essentially no retrotransposons, and only 1% of the genome is DNA transposons. Again
we haven’t a clue why. It is not a feature of Hymenoptera or haplo-diploidy because Nasonia
vitripennis, a parasitoid wasp, has lots of retrotransposons, as do the ants we are doing now.
3. 15 of its 16 chromosomes are acrocentric (like the Drosophila X), with the centromere at one
end, while most insects have mostly metacentric chromosomes - no idea why.
4. It has the highest recombination rate of any known organism, ~10X ours, and don’t know why.
5. Its repertoire of odorant receptors has expanded, to around 170 compared with 60-100 in flies
and moths, presumably mediating perception of floral odors and several pheromone blends.
6. Its repertoire of gustatory receptors and detoxification enzymes like p450s is much smaller
compared with flies and moths and beetles and wasps, apparently because of its mutualistic
relationship with plants (doesn’t need to handle nasty defensive plant secondary compounds) and
inactive larval stage (fed by the workers). Both loss of gene lineages (presumably by deletion)
and failure to amplify others (presumably by failing to retain duplicated genes) have occurred.
Gene Robinson won a $2.5m NIH Pioneer Award in 2009 and is now sequencing 10 more bees.
Acyrthosiphon pisum and Pediculus humanus
These are the first two non-metamorphosing insects to be sequenced. Again they reveal all sorts
of interesting genome biology, especially with regards to their obligate bacterial endosymbionts
that facilitate their remarkable “parasitic” lifestyles, and which were sequenced along with them.
The aphid genome is ~500 Mbp and they annotated ~30,000 genes, but
10,000 of these are only ab initio predictions and 10,000 are aphid-specific.
The most interesting result, perhaps, is that the aphid genome has lost most of
the genes involved in enzymatic pathways producing and inter-converting
amino acids, while its obligate endosymbiont Buchnera aphidicola, has all of
these pathways. Bizarrely enough, the aphid has a few of these enzymes that
the bacterium has lost, making this symbiotic relationship mutually obligate
and allowing aphids to live on plant phloem or sap that is low in amino acids.
The louse genome is the smallest so far at 110 Mbp, and concordant with
this it has lost a lot of genes, down to ~11,000, presumably resulting from
its obligate parasitic lifestyle. For example, it has only 10 odorant receptors
and only 10 gustatory receptors, as well as far fewer detoxification enzymes
that any other insect. It too has an obligate endosymbiont bacterium, called
Riesia pediculicola, which like Buchnera has a highly reduced genome of
~600 genes, mostly in a single linear chromosome. Remarkably it also has a
large circular plasmid which encodes the enzymatic pathway for vitamin B5
(pantothenic acid), which is deficient in the louse diet of human blood.
This tree of the arthropods targeted for genome sequencing from the bee genome paper in 2006 is
a little out of date (red was published, blue was draft, and green was in progress), but it shows the
major lineages. The Daphnia pulex (crustacean water flea) genome is pretty amazing, encoding at
least 30,000 and maybe 40,000 genes, apparently because of the complex aquatic environments it
inhabits (several “species” in one genome!). The Ixodes scapularis deer tick (vector of Lyme
disease) genome is huge at around 2.2 Gbp, and the Rhodnius prolixus kissing bug (vector of
Chagas disease) is also unwieldly large (1 Gbp), hence they are both lagging.
Woodcock courtship displays
The small migratory ground bird known as the American woodcock returns to this area early each
spring and the males perform spectacular display flights at dusk on prairie patches around town.
The easiest place to watch them is Meadowbrook park; dress warmly and go to the west parking
lot on South Race Street, about 100 yards beyond Windsor Rd on the left, just after the Lindsley
Clark retirement home. Displays start around 6 and by 6:30 it is too dark to see anything. It's best
to get there early and walk along the path to the east to a bench next to an old wagon looking
south. Sitting there allows you to see the birds against the still somewhat light sky. You will hear
the males making a loud "neep" call on the ground (reminiscent of the courtship call of the
nighthawks that will soon return as well), and then take off on a high circling twittering flight
before spiraling down again to repeat the performance. Females can sometimes be seen flying by
taking in the shows, and will mate with their chosen male(s) before nesting alone.