Comparative Genome and Proteome Analysis of Anopheles

Download Report

Transcript Comparative Genome and Proteome Analysis of Anopheles

Comparative Genome and
Proteome Analysis of Anopheles
gambiae and Drosophila
melanogaster
Evgeny M. Zdobnov, Christian von Mering, Ivica Letunic, David Torrents, Mikita Suyama, Richard R. Copley,
George K. Christophides, Dana Thomasova, Robert A. Holt, G. Mani Subramanian, Hans-Michael Mueller,
George Dimopoulos, John H. Law, Michael A. Wells, Ewan Birney, Rosane Charlab, Aaron L. Halpern, Elena
Kokoza, Cheryl L. Kraft, Zhongwu Lai, Suzanna Lewis, Christos Louis, Carolina Barillas-Mury, Deborah
Nusskern, Gerald M. Rubin, Steven L. Salzberg, Granger G. Sutton, Pantelis Topalis, Ron Wides, Patrick
Wincker, Mark Yandell, Frank H. Collins, Jose Ribeiro, William M. Gelbart, Fotis C. Kafatos, Peer Bork
Presented by Leon G Xing
SCIENCE VOL 298 4 OCTOBER 2002
Why Anopheles gambiae?
• It is the principal vector of malaria
• It carries many other infectious diseases
• Malaria afflicts more than 500 million
people
• More than 1 million people die each year
from malaria
The Culprit
Why Drosophila melanogaster
• One of the most intensively studied organisms
in biology
• Serves as a model system for the investigation
of many developmental and cellular processes
common to higher eukaryotes
• Modest genome size ~ 180 MB
• Its genome has been sequenced in 2000
Mosquito vs. Fruit Fly
• They diverged about 250 million years
ago
•
(Human and pufferfish diverged about 450 million years
ago)
• Share considerable similarities
• Half of the genes in both genomes
are interpreted as orthologs
• Average sequence identity about 56%,
Mosquito vs. Fruit Fly
• Anopheles genome is twice the size of
Drosophila
• Female Anopheles feeds on blood
(Hematophagy), which is essential for
egg development and propagation
• Viruses and parasites use Anopheles as
a vehicle for transmission
Orthologs
• Genes in different species that evolved from
a common ancestral gene by speciation
• Typically retain the same function in the
course of evolution
Paralogs
• Genes related by duplication within an
organism and have evolved a related but
different function
Predict the function of a new
protein
• A powerful approach is to use
bioinformatics and domain database
searches to find its characterized orthologs
• We know a lot about Drosophila but don’t
know much about Anopheles
• Compare their genomes may deduce a lot of
information
Drosophila melanogaster Genome
• The assembled and annotated genome
sequence of 5 Drosophila melanogaster
chromosomes is in GenBank
• It’s the collaboration between Celera
and the Berkeley Drosophila Genome
Project
• Published in the March 24, 2000 issue
of Science.
Drosophila Genome
Anopheles vs Drosophila
Gene Comparison at Protein Level
• The proteins are classified into 4
categories based on:
– 12,981 deduced Anopheles proteins out
of 15,189 annotated transcripts
– Omit transposon-derived bacterial like
sequences, and alternative transcripts
Classification of Anopheles
proteins
• 1:1 orthologs:
– Anopheles proteins with one clearly
identifiable counterpart in Drosophila and
vice versa
– 47% of the Anopheles
– 44% of the Drosophila proteins
Classification of Anopheles
proteins
• “Many-to-many” orthologs.
– Gene duplication has occurred in one or
both species after divergence
– Includes 1779 Anopheles proteins
Classification of Anopheles
proteins
• The third category:
– Have homologs in Drosophila and/or other
species but without easily discernable
orthologous relationships
– 3590 Anopheles predicted proteins
Classification of Anopheles
proteins
• The fourth category
– Has little or no homology in Drosophila but
instead have best matches to other
species.
– 1283 proteins
Classification of Anopheles
proteins
• Remaining proteins:
– No detectable homologs in any other
species with a fully sequenced genome;
– 1437 in Anopheles
– 2570 in Drosophila
– Might be new or quickly evolving genes.
Classification of proteins
Some Notes
• The numbers and derived estimates are
approximations.
• Annotation of genomes is an ongoing
effort
• Some Anopheles genes have not been
sequenced yet
• Highly polymorphic regions or in highly
repetitive contexts prone to errors
• > 70% accuracy
The core of conserved
proteins
• The 1:1 orthologs (6089 pairs) can be
considered the conserved core
• The average sequence identity is 56%
• Humans and pufferfish share 61%
• Indicates that insect proteins diverge at
a higher rate
Properties of 1:1 orthologs.
Orthologous proteins constitute a core of
conserved functions
• Early embryogenesis are conserved between
Drosophila and Anopheles
• 315 early developmental genes in
Drosophila vs 251 genes showed a clear
single ortholog in Anopheles
Orthologous proteins
• 85% of the developmental genes have single
orthologs
• 47% for the genome as a whole
Protein family expansions and
reductions
• Due to adaptations to environment and life
strategies
• Leads to changes in cellular and phenotypic
features
• Implies duplications after speciation
Protein family expansions and
reductions example
• Epsilon subunit of the adenosine
triphosphate-synthase complex
• Encoded by two genes in both Anopheles
and Drosophila
• They might share a single-copy ancestral
gene
• After speciation they were duplicated
independently later
Expansions of proteins with FBN-like
domains in Anopheles.
• Fibrinogen (FBN) are found originally in
human blood coagulation proteins
• A large expansion of mosquito proteins
contains a domain resembling the COOHterminus of the beta and gamma chains of
FBN
Expansions of proteins with FBN-like
domains in Anopheles.
• Phylogenetic tree of 58 Anopheles and
13 Drosophila FBN genes
• They largely belong to two distinct speciesspecific clades
• Identified only two 1:1 orthologous
relationships
The significant implication of FBN
gene expansion
• The massive expansion of the Anopheles
gene FBN family might be associated with
particular aspects of the mosquito's biology
• That is, hematophagy and exposure to
Plasmodium
• Blood meal is a challenge associated with
microbial flora in the gut and blood
coagulation
The implication of FBN gene
expansion
• The bacteria-binding properties of FBNs
might be important in controlling or
aggregating bacteria in the midgut
• These proteins might be used as competitive
inhibitors i.e. anticoagulants
• Some mosquito FBN proteins are upregulated by invading malaria parasites
Expansion
of FBNlike
proteins in
Anopheles
Gene losses in insects
• Some genes are absent in both Anopheles
and Drosophila but are present in other
eukaryotes
• Criteria: genes must be present in at least
one animal but also in fungi or plants
Gene losses in insects.
Gene genesis and gene loss
• 1437 predicted genes in Anopheles have no
detectable homology with genes of other
species
• 522 of these have putative paralogs only
within Anopheles
• At least 26 of such genes expressed in the
adult female salivary glands
Strategy for identifying gene
losses
• Search for genes that are present in only one
of the two insects but that do have orthologs
in other species
Gene Losses
• Widespread orthologs missing from both
Anopheles and Drosophila are putative
insect-specific gene losses
• Example:
– Insects are known to unable to synthesize
sterols
– Absence of several enzymes involved in sterol
metabolism
Gene Losses example
• Absence of the DNA repair enzyme uracilDNA glycosylase in insects
• DNA methylation can lead to spontaneous
deamination of cytosine to uracil
• Drosophila has long been known to have no
or only very little DNA methylation
Cladogram based on Orthologs
Intron gain and loss
• Drosophila are known to have a reduction
of noncoding regions
• 11,007 out of 20,161 Anopheles introns in
1:1 orthologs have equivalent positions in
Drosophila
• Almost 10,000 introns have either been lost
or gained
The Drosophila Dscam gene
• Able to encode up to 38,000 proteins
through extensive alternative splicing
• Three different cassettes of duplicated exons
that can generate exponential combinations
of splice variants
• The numbers of exons within the cassettes
are at least similar in Anopheles
Microsynteny
• Through evolution genome structure may
vary greatly, but small regions of conserved
gene will be retained
• Microsynteny studies the localized region of
sequences with high similarity
Microsynteny blocks
Mapping of orthologs and microsynteny
blocks to chromosomal arms in Anopheles
and Drosophila.
Chromosome mapping
• Both Anopheles and Drosophila have five
major chromosomal arms (X, 2L, 2R, 3L,
and 3R, and a small chromosome 4 in
Drosophila melanogaster).
• In Drosophila, reassortment of recognizable
chromosomal arms occurs by fission and
fusion at the centromeres
Chromosome mapping
• The most conserved pair of chromosomal
arms is Dm2L and Ag3R
• 76% of the orthologs and 95% of
microsynteny blocks in Dm2L mapping to
Ag3R
Chromosome mapping.
Chromosome mapping surprise
• Significant portions of the Anopheles X
chromosome appear to have been derived
from what are presently autosomal
Drosophila chromosome segments
• 11% of Dm3R and 33% of Dm4
Homology of chromosomal arms
Thank you!